Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Memory/Storage

Overcoming embedded memory bottleneck (Part 2)

Posted: 23 Aug 2012 ?? ?Print Version ?Bookmark and Share

Keywords:MOPS? XOR? algorithmic memory?

Part 1 of this series tackles the concept of algorithmic memory and the benefits of memory operations per second (MOPS) as a metric. In this installment, we delve into the specifics of how algorithmic memory works and how it is implemented in embedded systems.

The existence of the processor-memory performance gap is well known in the industry. Up until now, advances in embedded memories have focused on maximizing the number of transistors on a chip and cranking up the clock speed. As transistors approach atomic dimensions, however, manufacturers are running into fundamental physical barriers. For this reason, the industry needs to rethink its approach to embedded memory. What if embedded memories could be designed to take advantage of architectural and parallel mechanisms similar to those used to enhance processor architectures? Algorithmic memory technology provides a way to do exactly that.

Inside an algorithmic memory
Every algorithmic memory consists of a number of individual memory macros, where each of the macros can be accessed in parallel. Each memory macro has its own physical address and data busthus, four external accesses that address four different macros can take place in parallel in a single clock cycle. The matter gets more complex when all four accesses are trying to access the same memory macro. In this case, the logic temporarily buffers the accesses in an internal cache, or directs the other accesses to other macros within the memory.

The actual addresses of the alternative locations are a form of virtual addressing that is kept track of in scratchpad memory so that virtual addresses are correlated with the intended address. Since reads and writes can come in rapid succession and in all kinds of combinations, the logic in the algorithmic memory core has to be able to manage all the patterns of hot spots and multiple accesses to the same macro intelligently. When there is time, the algorithm can move data to its intended location in main memory and perform clean up. The logic must also handle a worst-case-for-life scenario, however, and intelligently rearrange things so that the operations continue to be posted. In fact, it is possible to prove mathematically that with the right scheme of data caching, virtualization, and data rearranging, all sequences of write operations can be posted.

Memory read operations are a little more complex than writes. In the case of two simultaneous read accesses, for example, if the application wants data from the same memory macro, both addresses cannot be accessed in parallel. At the same time, trying to access them sequentially would impact performance by introducing latency. To avoid these problems, all of the data that is stored in the physical memory is encoded using a variety of schemes to allow the algorithm to extract the read data using data from other macros, so that multiple read accesses can proceed simultaneously.

A key characteristic of algorithmic memory is that the increased performance (MOPS) is completely deterministic; this performance guarantee has been mathematically proven using adversarial analysis models. Algorithmic memory even resolves all row, address, and bank conflicts that may arise due to simultaneous accesses from multiple interfaces. Since algorithmic memory is not subject to memory bank conflicts or memory stalls, system-on-chip (SoC) designs can be greatly simplified because there is no need to deal with the possibility of system backpressure.

Algorithms in action
The inner workings of an algorithmic memory reflect a microcosm of many already familiar mechanisms such as virtualization, caching, encoding, and so on. Erasure-coding algorithms are used in a wide range of applications to prevent data loss. For instance, Reed-Solomon codes are block-based error-correcting codes that are commonly used in redundant array of inexpensive disks (RAID) mass storage systems. In the case of algorithmic memory, erasure coding is not used to encode redundant data for error recovery. Rather, it is used to encode and decode data that cannot be written to or read from its real address because doing so would result in a bank conflict.

Let's take a closer look at how erasure coding works in algorithmic memory. Rather than providing two fully independent representations of every piece of data, it is possible to create memory blocks that store a conventional representation of each piece of data and an encoded version of each data item. The conventional representation of a data value can be stored in a consistent location in a main memory bank. The encoded version of the data is implemented in a manner that efficiently combines multiple data items such that only a small amount of additional memory is required to store the encoded versions of data.

When an algorithmic memory receives two simultaneous read operations requesting data that have their conventional representations in the same bank, then it may retrieve the conventional representation of the first data item from the main memory bank and retrieve the second data item by decoding the encoded version of the second data item. To operate properly, the memory system must always be able to fetch the encoded version of the second data item without the use of the main memory bank that is being accessed to retrieve the first data item.

1???2?Next Page?Last Page

Article Comments - Overcoming embedded memory bottlenec...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top