Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Embedded

Cache vs. DMA: Trade offs for programmers

Posted: 17 Nov 2003 ?? ?Print Version ?Bookmark and Share

Keywords:cache? dma? embedded media processor? memory? mcu?

Now that there are embedded media processors available that can handle both MCU and DSP tasks, C programmers who are familiar with the MCU model of application development are transitioning into a new realm, where intelligent management of code and data flow can significantly improve system performance. Careful consideration needs to be given to the high-performance DMA capabilities of the media processor. Recognizing the trade-offs between using cache and DMA in these applications can lead to a better understanding of programming for system optimization.

Today's media processors have hierarchical memory architectures that strive to balance several levels of memory with differing sizes and performance levels. Typically, the memory closest to the core processor (known as "Level 1" or "L1" memory) operates at the full clock rate and usually supports instruction execution in a single cycle.

A quick survey of the embedded media processor market reveals core processor speeds at 600MHz and beyond. While this performance can open the door to many new applications, the maximum speed is only realized when code runs from internal L1 memory. Of course, the ideal embedded processor would have an unlimited amount of L1 memory, but this is not practical. Therefore, programmers must consider several alternatives to take advantage of the L1 memory that exists in the processor, while optimizing memory and data flows for their particular system.

Data memory management

Since there are often multiple data transfers taking place at any one time in a multimedia application, the bus structure must support both core and DMA accesses to all areas of internal and external memory.

To effectively use DMA in a multimedia system, there must be enough DMA channels to support the processor's peripheral set fully, with more than one pair of memory DMA streams. This is an important point, because it recognizes that there are bound to be raw media streams incoming to external memory (via high-speed peripherals), while at the same time data blocks will be moving back and forth between external memory and L1 memory for core processing.

What's more, DMA engines that allow direct data transfer between peripherals and external memory, rather than requiring a "stopover" in L1 memory, can save extra data passes in numerically intensive algorithms.

As data rates and performance demands increase, it becomes critical for designers to have "system performance tuning" controls at their disposal. For example, the DMA controller might be optimized to transfer a data word on every clock cycle. When there are multiple transfers ongoing in the same direction, this is usually the most efficient way to operate the controller because it prevents idle time on the DMA bus.

But in cases involving multiple bidirectional video and audio streams, "traffic control" becomes mandatory to prevent one stream from usurping the bus entirely. In situations where data transfers switch direction on nearly every cycle, the latency associated with turn-around time on the SDRAM bus will lower throughput significantly.

As a result, DMA controllers that have a channel-programmable burst size hold a clear advantage over those with a fixed transfer size. Since each DMA channel can connect a peripheral to either internal or external memory, it is also important to be able to automatically service a peripheral that may issue an urgent request for the bus.

The flexibility of today's DMA controllers is a double-edged sword. When a large C/C++ application is ported between processors, the programmer is sometimes hesitant to integrate DMA functionality into already working code. This is where data cache can be very useful. Typically, the data cache can be used to bring in data to L1 memory for the fastest processing. The data cache is attractive because it acts like a mini-DMA, but with minimal interaction on the programmer's part.

- David Katz & Rick Gentile

Analog Devices Inc.

Article Comments - Cache vs. DMA: Trade offs for progra...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top