Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Memory/Storage

Fast cycle scheme breaks bottlenecks

Posted: 15 Apr 2001 ?? ?Print Version ?Bookmark and Share

Keywords:dram? sdram? fast-cycle memory? memory i/o? mpu systems?

Soon after designers began developing new networking, communications and cellular products, they ran directly into an old and familiar technical challenge. The need for higher levels of speed and performance was hampered by significant gaps between the speed of CPUs and their closely linked memories. Such gaps lead to bottlenecks in high-performance networking, graphics and cellular designs.

While new classes of memories from the mid-1990s, such as Rambus and synchronous DRAM (SDRAM), provide the accelerated data-transfer speeds required in those designs, faster overall performance requires a fundamental change in the memory architecture. SDRAM and Rambus increased peak bandwidth by increasing same-page data-transfer rates. Those DRAM technologies broke new ground by using multiple banks to ease the random cycle bottleneck. They did so by reducing the timing needed between commands. For example, if one address is followed by a new address that falls within the same bank, the controller needs to wait about 70ns before it is allowed to issue a new command.

Sustained speed required

As fast as this is?both in relative terms and in comparison with older memory architectures?the 60ns to 70ns lag time is still too long for the new generation of networking and communications systems. Designers working on these products want to realize effective bandwidth at speeds that remain close to peak bandwidth data-transfer rates-without reducing the peak bandwidth.

What is required is a fast-cycle architecture that can achieve higher speed performance in a different way, by treating the memory unit as a quasi-multibank configuration. This kind of architecture can realize effective bandwidth at speeds that approach peak bandwidth data?transfer rates--without any measurable reduction in the rate. The result is a significant enhancement in speed, even in systems that already have high levels of random access.

One central feature in the new fast-cycle memory architecture is a non-multiplexed addressing scheme that activates a minimum sub-array block in the column axis (or direction). This significantly reduces the power consumption in the memory array. Only 2,048 sense amps are driven, compared with more than 16,000 sense amps that are driven at one time in a typical 64Mb DRAM. Each sub-array acts as its own driving circuit, which reduces load capacitance and enables high-speed access. An analog amp amplifies the bit-line voltage while a latch circuit restores the charge into the cell capacitor, achieving high-speed data sensing and a stable operation.

Multiple banks coupled with fast random cycles can issue a new command in far less time, more like 20ns to 30ns. Read data streams can be transmitted from the memory systems without any "bubbles" or gaps, even if addresses are coming from the same bank. Another important aspect of this new fast-cycle technology is a segmented memory core that can save some 50 percent of the operating power consumed compared with SDRAMs operating under similar conditions.

Finally, the new memory architecture is built using a pipeline configuration characterized by multistaged operation, in which various stages are operating at the same time to achieve different tasks. A fast-cycle architecture includes a three-stage pipeline. Stage one consists of command input and decoding. Stage two corresponds to the sensing operations, and the third stage is where queued data is provided to an output terminal. The first stage of a new access begins consecutively after the second pipe operation. That means that the cycle time is determined by the length of the second pipes. The combination of pipelining and multibank configuration accelerates the entire system efficiently.

Extensive RAM

That kind of architecture is excellent for a range of new high-performance products, from graphics ICs to networking to cellular handsets. For example, the emerging wireless markets, which include cellphones and pagers that combine data and voice in two-way, multimedia communications, require a significant amount of RAM storage capacity. This is in contrast to earlier generations of cellphones, which needed only small amounts of RAM for small tables and numbers.

In order to move speech and data efficiently, and to allow the phones to access the Internet quickly, memory densities in these products must move beyond those provided by SRAM devices, which have been used as RAM storage in cellphones. The largest density available?8Mb?no longer suffices for new standards such as CDMA.

At least 16Mbits are needed now, and the optimal design is a 16Mb fast-cycle RAM, which uses a DRAM type of memory core, with an I/O structure very similar to SRAM. This design provides the necessary density at a reasonable cost, and allows easy modification of controllers that are required to interface with the memory system.

Similarly, memory technology with fast-cycle capabilities eliminates performance degradation in servers and high-end computers that run multiple tasks in parallel and multiple MPU systems. Those systems usually have a very high number of random accesses, and traditional DRAM memory with high page-mode-access speeds but low random-access capabilities simply does not do the job well enough. In a typical data-access flow, caches supply data used frequently and multiple times, so they perform well. But when a random access is performed in a branch or data retrieval, a main-memory cycle occurs. In most typical systems, main-memory accesses are very random; in multiple-MPU systems, the degree of randomness is even higher.

In a system with four processors, for example, at least four tasks, and often more, run in parallel. That means that four random streams of data are generated which already are random, due to the cache filtering process. Consequently, a memory system that includes both fast random cycling and multiple internal banks becomes a significant advantage.

Communications applications also benefit from the fast-cycle/multibank combination. Many of these are built with data buffers that have very poor locality of reference, since data is transferred from the network to the buffer and on to the network again. Thus, a data-switching system does not take advantage of page hits in the main DRAM memory, because these relatively small packets (usually 64 bytes) are located randomly throughout the buffer memory.

? Fumio Baba

Vice President

Fujitsu Microelectronics Inc.

Article Comments - Fast cycle scheme breaks bottlenecks
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top