Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Controls/MCUs

Reconfiguring SoC according to Garp

Posted: 01 May 2001 ?? ?Print Version ?Bookmark and Share

Keywords:garp? soc? vliw processor? processor? coprocessor?

Programmable SoC components, containing a microcontroller and reconfigurable hardware, promise the flexibility of software and the performance of hardware.

Our group designed Garp, a single-issue microprocessor with a rapidly reconfigurable array attached as a coprocessor for loop acceleration. Since our research group is investigating reconfigurable hardware primarily in the context of general-purpose computing, compiler development is an integral part of our architecture research. In contrast, automatic compilation to reconfigurable architectures has been considered less important in the embedded domain because most designers have hardware expertise. However, as embedded applications grow in size, the productivity benefits of automatic compilation also grow.

The Garp compiler's challenge is to exploit the reconfigurable hardware's parallelism starting from a sequential software program. We drew heavily from compilation techniques for VLIW architectures, which also exploit operation-level parallelism.

Garp's coprocessor is a two-dimensional array of configurable logic blocks. Array details dictate that modules run horizontally along a row. Fast, flexible carry chains running along each row enable fast sums and comparisons, while horizontal wire channels between rows enable fast shifts between adjacent modules.

Garp's array has four 32-bit data buses and one 32-bit address bus. While the array is idle, the processor can use the memory buses to load configurations or to transfer data between processor registers and array registers. While the array is active, it is master of the memory buses and uses them to access memory?the same memory system viewed by the microprocessor, including all levels of caching.

Novel targets

The Garp compiler essentially targets loops for acceleration using the reconfigurable array. The compiler uses a straightforward approach where the one thread of control hops back and forth between the microprocessor and the array. When a loop using the array is encountered during the execution of a program, the correct configuration is loaded. Live values are then transferred to the array, the array is activated and the microprocessor goes to sleep. When the loop exits, the microprocessor wakes up, determines which exit was taken, retrieves live variables from the array as appropriate for that exit, and continues software execution.

The Garp-specific part of the compiler begins by recognizing natural loops in the control flow graph. All loops are treated as WHILE loops-having data-dependent exits. From each loop we form a hyperblock, which contains only the commonly executed paths in the loop. Only the hyperblock is implemented on the array.

Next, a data-flow graph is constructed from the basic blocks selected for the hyperblock. As the data-flow graph is built, all control dependence is converted to data dependence through the introduction of predicates. The optimized data-flow graph is then implemented as a direct, fully spatial network of modules on the array. Our fully spatial approach allows groups of adjacent data-flow graph nodes to be merged into optimized modules, simplifying pipelining.

In the case of Garp, pipelining is accomplished mainly through rescheduling the execution of the modules on the array, which is one of the last compilation steps.

Pipelining characteristics

Pipelined computation, whether in software or hardware, characteristically has a prologue, the period from the start of computation to the time the first iteration completes at which point the pipeline achieves a steady state.

With VLIW processors, a number of schemas for handling the prologue have been put forward. One category emits the prologue as separate code, with inactive pipeline stages simply eliminated. Another solution, termed "kernel-only," uses the same instructions for both the prologue and the steady-state execution, but uses stage predicates to suppress inactive stages during the prologue. The sequencing technique used by the Garp compiler more closely resembles the latter, since each iteration maintains the same schedule even those started during the prologue.

Software pipelining of counted loops traditionally involves the generation of an epilogue: a special code to finish off the iterations that are in progress at the point when the last iteration is started. However, the Garp compiler takes a simpler approach that does not require an epilogue. It simply allows the last iteration to complete using the array, then discards work that has started speculatively on subsequent iterations. This approach is possible because of the Garp architecture's support for speculative execution.

It is not surprising that pipelining is more straightforward with Garp's reconfigurable hardware, since pipelining is essentially a spatial concept.

VLIW challenge

We realized that for a VLIW processor, moving the data through the virtual pipeline is even more of a challenge than processing the data. While reconfigurable hardware handles this naturally, VLIW architectures typically require special support-rotating register files-to sustain the same effective transfer bandwidth.

The compiler techniques were adapted from VLIW compilation, and in many cases simplified in the process. We consider it to be a positive result that our compiler can use simpler techniques to achieve greater parallelism in many loops, targeting what is in fact a simpler architecture than most contemporary VLIW architectures.

? Timothy Callahan and John Wawrzynek

University of California

Article Comments - Reconfiguring SoC according to Garp
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top