Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Memory/Storage

Memory-oriented optimisation techniques (Part 1)

Posted: 07 Aug 2013 ?? ?Print Version ?Bookmark and Share

Keywords:Memory? embedded systems? Optimisations? loop? Compilers?

Memory is a crucial bottleneck in embedded systems. Many embedded computing applications spend a lot of their time accessing memory. The memory system is a prime determinant of not only performance but also energy consumption.

Memory system optimisations can target any stage of the memory hierarchy. A large variety of techniques have been developed by both the general-purpose and embedded communities to optimise cache performance. More recently, optimisation techniques for scratch pad memories have been developed. Optimisations can also target main memories, particularly when they are partitioned.

Memory system optimisations can target either data or instructions. Array optimisations are an important class of data-oriented optimisations. Control flow analysis leads to methods for improving the cache behaviour of instructions. Global memory analysis is particularly important in embedded systems.

Many embedded systems are composed of many sub-systems that pass data between themselves. The buffers between those sub-systems must be carefully sized to avoid both buffer overflows and wasted memory.

Loop transformations
Some optimisations are applied early during compilation, without detailed knowledge of the target hardware. Such transformations try to expose parallelism that can be used by later stages. Loops are a prime candidate for such transformations since they may provide a great source for data parallelism.

Loop transformations have been studied for decades for use in scientific programs and optimising compilers. There is not enough space here to do this topic justice; we only introduce a few concepts that illustrate how this body of work can be used.

An ideal loop can be executed entirely in parallel. The code on the left side of figure 1 has a loop body in which all the arrays are indexed only by i. Therefore, no loop iteration depends on any other iteration. As a result, all the loop bodies could be executed in parallel, in any order, without affecting the result. In the code on the right side of the figure, the ith iteration depends on the result of the i C 1th iteration.

Figure 1: Parallelism and constraints in loops.

In this case, we cannot finish iteration i until i C 1 has also finished, so these loop bodies must be done in the order in which they are enumerated by the loop. Data dependencies from one loop iteration to another are known as loop-carried dependencies

The compiler must schedule operations so that they do not violate the data dependenciesthat is, the code does not try to use a value before it has been computed. In general, many possible schedules satisfy the data dependencies, provided that we have a way to enumerate those dependencies. While a single loop may provide some opportunities, a loop nest offers many possibilities for parallelism that require detailed analysis.

As shown in figure 2, a loop nest is a set of loops, one inside the other. A perfect loop nest has no conditionals inside the nest. An imperfect loop nest has conditionals that cause some statements in the nest to not be executed in some cases.

Figure 2: Loop nests.

1???2???3???4?Next Page?Last Page

Article Comments - Memory-oriented optimisation techniq...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top