Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Embedded

Memory-oriented optimisation techniques (Part 2)

Posted: 15 Aug 2013 ?? ?Print Version ?Bookmark and Share

Keywords:Buffers? algorithms? prefetching? allocation? memory?

Buffers are vital because they mediate between sub-systems. In many embedded applications, the various stages produce and consume data somewhat irregularly. As a result, we need buffers to make sure that all data makes it from the producer to the consumer.

If the buffer is too small, data is lost. If the buffer is too large, then memory is wasted. Overly large buffers not only add to the manufacturing cost of the system, but they also consume both static and dynamic energy.

A traditional approach is to use dynamic memory managementfor example, the C library functions malloc() and free(). By allocating buffers of the proper size, we can avoid memory overflows. This approach may be difficult to verify, and memory leaksmemory that is allocated but never freedmay escape into production code.

Just as important for embedded systems, the memory management routines themselves cost a great deal in both performance and energy consumption. Because the execution time of memory management routines is highly data-dependent, these routines make it difficult to meet real-time deadlines. A combination of static and dynamic buffer management techniques can make sure that buffers are properly sized at a much smaller runtime cost.

Several groups at IMEC, including De Greef et al. [5], Franssen et al. [6], and Masselos et al. [10], developed methodologies and algorithms for data transfer and storage management. This line of work tries to restructure code to both minimise the required amount of buffer memory and to improve the performance of accesses to that buffer memory.

Data transfer and storage management make use of many techniques, including loop transformations and control flow analysis. It requires analysing the complete application to understand the dependencies between different sections of code, such as different loop nests, and to balance concerns across the entire program.

Panda et al. [14] give some examples that illustrate how loop transformations can be used to improve buffer utilisation. Consider the following code.

for (i = 0; i ??for ( j = 0; j????b[i ] [j ] = 0;
for (i = 0; i ??for ( j = 0; j ????for (k = 0; k ??????b[i ][j ] + = a[i ][j + k]

This code causes buffering problems because both loop nests modify the b array. The first loop nest modifies all elements of b before the second loop nest has a chance to use those locations. If we combine the two loop nests, we can reuse the b elements much more easily.

for (i = 0; i ??for ( j = 0; j ????b[i ] [j ] = 0;
????for (k = 0; k ??????b[i ] [j ] + = a[ i ] [ j + k];

Moving the first definition of b closer to its next use makes it much easier for lower-level optimisations to take advantage of mechanisms such as prefetching. Panda et al. [14] also showed that loop analysis can be used to help analyse buffers more explicitly.

They added signal copies to the code to make data reuse explicit. The buffer a_buf is L words long and buffers the a array; b_buf is one word long and buffers the b values. These buffers are declared in the program but do not need to exist in the final implementation.

int a_buf[L];
int b_buf;
for (i = 0; i ??initial ize a_buf;
??for ( j = 0; j ????b_buf = 0;
????a_buf[( j + LC1)%L] = a[i ][ j + L C 1];
?? for (k = 0; k ????b_buf + = a_buf[( j + k)%L]
??b[i ][j ] = b_buf; ;??}

Once the copies have been made explicit, analysis methods can be used to determine the lifetimes of values in those buffers.

Cache- and scratch padCoriented optimisations
Cache-oriented optimisations take advantage of two different cache effects. Clearly, some optimisations are designed to improve cache hit rates by reducing the number of conflicts caused by memory access patterns. By rearranging data so that important elements do not knock each other out of the cache, we can greatly increase performance as well as reduce energy consumption.

Data can also be rearranged in the cache to take advantage of prefetching. Virtually all caches fetch more than one word at a time. If we can organise data so that a single cache miss pulls in several successive values of interest, then we can greatly reduce the access time for the other elements of the cache line.

Panda et al. [11] developed algorithms to place data in memory to reduce cache conflicts. Their methodology for scalar variables includes four steps:
1. Build a closeness graph that describes the relationship between accesses.
2. Cluster the variables into cache line-sized units.
3. Build a cluster interference graph that describes how clusters interact in the cache.
4. Use the cluster interference graph to optimise cluster placement.

1???2???3???4?Next Page?Last Page

Article Comments - Memory-oriented optimisation techniq...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top