Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Memory/Storage

Overcoming embedded memory bottleneck (Part 1)

Posted: 17 Aug 2012 ?? ?Print Version ?Bookmark and Share

Keywords:on-chip memory? latency? MOPS? algorithmic?

Historically, advances in embedded memories have been limited to maximizing the number of transistors on a chip and cranking up the clock speed. This has been successful up to a point, but as transistors approach atomic dimensions, manufacturers are running into fundamental physical barriers. For this reason, the industry needs to rethink its approach to embedded memory design. As an analogy, increases in processor performance have come not only because of advances in circuitry, but also because of architecture improvements, such as pipelined execution and exploitation of instruction-level parallelism. What if embedded memories could be designed to take advantage of architectural and parallel mechanisms similar to processor architectures to increase memory performance? A new approach called algorithmic memory technology does exactly that.

Parallel architecture performance boosts
Algorithmic memories introduce architectural improvements by adding logic to existing embedded memory macros that enables the memories to operate much more efficiently. Within the memories, algorithms intelligently read, write, and manage data in parallel using a variety of techniques such as buffering, virtualization, pipelining, and data encoding. Woven together, these techniques create a new memory that internally processes memory operations an order of magnitude faster and with guaranteed performance. This increased performance capability is made available to the system through additional memory ports such that many more memory access requests can be processed in parallel within a single clock cycle (figure 2). The concept of using multiport memories as a means of multiplying memory performance mirrors the trend of using multicore processors to increase performance over uniprocessors. In both cases, it is parallel architecture rather than faster clock speeds that drives performance gains.

Figure 2: Physical memory (left) can deliver up to 500 million memory operations per second (MOPS), while algorithmic memory (right) can deliver 2000 million.

Algorithmic memory technology is implemented as a soft RTL. The resulting solutions appear exactly as standard multiport embedded memories. A system architect can specify the level of memory performance that is required from a customized algorithmic memory. As will be described later, an algorithmic memory can also significantly lower layout area and reduce memory power in certain instances. Using this approach requires no change to existing memory interfaces or ASIC design flows. Algorithmic memory technology is both process node and foundry independent. In essence, the approach opens the door to allow system architects to rapidly and reliably create customized memory solutions that can be optimized for specific applications. The extra area overhead required to implement a 2X MOPS increase is typically around 15% of the total physical memory area, for example. In one implementation of a networking SoC, the performance of a 32Mb (128K deep x 256 bits wide) ultra-high-density SRAM running at 500MHz (500 million MOPS) in 32-nm process-, was increased to 1000 million MOPS with 13% area overhead.

Insofar as one is prepared to tradeoff some area, memories can be made significantly faster and up to 10X increase in performance is possible. In practice, the majority of applications benefit from up to 4X in memory performance. In some cases, algorithmic memory technology can also be used to lower memory area and power consumption without sacrificing performance.

Developing higher performance memory using circuits alone imposes a significant area and power penalty. Algorithmic memory technology combines a lower performance memory circuit (which typically has lower area and power requirements) with memory algorithms to synthesize a new memory. This algorithmic memory achieves the same MOPS as a high performance memory built using circuits alone, but can lower area and power up to 50%.

About the author
Sundar Iyer is co-founder and CTO at Memoir Systems, a startup specializing in semiconductor intellectual property (SIP) for algorithmic memories. Previously, Iyer was CTO and co-founder of Nemo ("Network Memory") Systems, acquired by Cisco Systems in '05. Iyer was a founding member at SwitchOn Networks (acquired by PMC-Sierra in '00), where he developed algorithms for associative memory and deep packet classification. In 2008, Iyer was awarded the MIT technology review (TR35) young innovator award for his work on network memory. He received his Ph.D. in Computer Science from Stanford University in 2008.

To download the PDF version of this article, click here.

?First Page?Previous Page 1???2

Article Comments - Overcoming embedded memory bottlenec...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top