Memory-oriented optimisation techniques (Part 1)
Keywords:Memory? embedded systems? Optimisations? loop? Compilers?
Loop and loop nests have many data dependencieseach operation in each iteration generates its own data dependency. However, we also know that the loop provides structure and regularity that we can exploit. Some possible transformations on loops and loop nests include:
???Loop permutation changes the order of the loops in the nest.
???Index rewriting changes the way the loop indexes are expressed.
???Loop unrolling creates several copies of a loop body and modifies the loop indexes appropriately.
???Loop splitting takes a loop with multiple operations and creates a separate loop for each operation; loop fusion performs the opposite.
???Loop tiling splits a loop into a nest of loops, with each inner loop working on a small block of data.
???Loop padding adds data elements to an array to change how the array maps into the memory system structure.
The polytope model [1; 4] is commonly used to represent and manipulate the data dependencies in loop nests. Figure 3 shows a loop nest. We can represent the loop nest as a matrix: each column in the matrix represents the iteration bounds of a loop.
Figure 3: Polytope representation of loop nests. |
The inner loop body modifies the values of c, creating a data dependency from c[i ][j ] to c[i ][j + 1]. We represent each data element in the array as a node in a two-dimensional space, with the loop iteration variables forming the axes of the space. The nodes define a polytope in that space in a triangular shape.
We add edges between the nodes that describe the loop-carried dependency between c[i ][j ] and c[i ][j + 1]. The points in the polytope fully describe all the loop-carried dependencies, but we have not yet reduced the complexity of that representation. We can use a distance vector that describes the direction and distance of a set of vectors. In this case, all the data dependencies are of the form [i j ] = [1 0]. Any set of loops that performs this computation must satisfy these loopcarried dependencies, but many schedules are, in general, possible. We can apply an ordering vector that describes the order in which the loops visit the nodes.
We can use matrix algebra to manipulate the form of the loops. For example, if we want to interchange the loops, putting j in the outer loop and i in the inner loop, we can multiply the loop nest matrix by an interchange matrix, as follows.
This gives the loop nest
for (j = 0; j ???for (i = 0; i ??????c [i + 1][ j ] = a[ i ][j ]*b[j ] ;
Not all loop transformations are legal. Wolf and Lam [15] showed that a loop transformation is legal if all the transformed dependence vectors are lexicographically positivethat is, if they do not point backward in the iteration space.
Not all important loop transformations can be manipulated by matrix algebra. Loop tiling, for example, splits a large array into several smaller arrays to change the order in which the array elements are traversed. This transformation cannot be represented by matrix manipulations, but once a loop is tiled, the new loop nest can be analysed using matrix methods.
Loop permutation and loop fusion [16] can be used to reduce the time required to access matrix elements. Loop permutation reduces latency when it is used to change the order of array accesses to that used in the underlying data structure. Multidimensional arrays are stored by C in row-major format, so we want to access rows first. Figure 4 shows an example of loop permutation.
Figure 4: An example of loop permutation. |
Related Articles | Editor's Choice |
Visit Asia Webinars to learn about the latest in technology and get practical design tips.