Improve software through memory-oriented code optimisation
Keywords:software code? compiler? SIMD? StarCore? DSPs?
Software pipelining is one optimisation that can result in increased code size due to additional instructions that are inserted before and after the loop body of the transformed loop. When the compiler or assembly programmer software pipelines a loop, overlapping iterations of a given loop nest are scheduled concurrently with associated "set up" and "tear down" code inserted before and after the loop body.
These additional instructions inserted in the set up and tear down, or prologue and epilogue as they are often referred to in the compiler community, can result in increased instruction counts and code sizes. Typically a compiler will offer a pragma such as "#pragma noswp" to disable software pipelining for a given loop nest, or given loops within a source code file. Users may want to utilise such a pragma on a loop-by-loop basis to reduce increases in code size associated with select loops that may not be performance-critical or on the dominant run-time paths of the application.
Loop unrolling is another fundamental compiler loop optimisation that often increases the performance of loop nests at run-time. By unrolling a loop so that multiple iterations of the loop reside in the loop body, additional instruction-level parallelism is exposed for the compiler to schedule on the target processor; in addition fewer branches with branch delay slots must be executed to cover the entire iteration space of the loop nest, potentially increasing the performance of the loop as well.
Because multiple iterations of the loop are cloned and inserted into the loop body by the compiler, however, the body of the loop nest typically grows as a multiple of the unroll factor. Users wishing to maintain a modest code size may wish to selectively disable loop unrolling for certain loops within their code production, at the cost of compiled code run-time performance. By selecting those loop nest that may not be on the performance-critical path of the application, savings in code size can be achieved without impacting performance along the dominant run-time path of the application.
Typically compilers will support pragmas to control loop unrolling-related behaviour, such as the minimum number of iterations a loop will exist or various unroll factors to pass to the compiler. Examples of disabling loop unrolling via a pragma are often of the form "#pragma nounroll". Please refer to your local compiler's documentation for correct syntax on this and related functionality.
Procedure inlining is another optimisation that aims to improve the performance of compiled code at the cost of compiled code size. When procedures are inlined, the callee procedure that is the target of a caller procedure's callee invocation site is physically inlined into the body of the caller procedure. Consider the example in figure 4.
Instead of making a call to callee_procedure() every time caller_procedure() is invocated, the compiler may opt to directly substitute the body of callee_procedure into the body of caller_procedure to avoid the overhead associated with the function call. In doing this, the statement a 1 b will be substituted into the body of caller_procedure in the hope of improving run-time performance by eliminating the function call overhead, and hopefully proving better instruction cache performance. If this inlining is performed for all call sites of callee_procedure within the application, however, one can see how multiple inlinings can quickly lead to an explosion in the size of the application, especially for examples where callee_procedure contains more than a simple addition statement.
![]() |
Figure 4: Candidate function inlining use case. |
As such, users may wish to manually disable function inlining for their entire application or for selective procedures via a compiler-provided pragma. Typical pragmas are of the form "#pragma noinline" and will prevent the tools from inlining the procedure marked at compilation time.
Used with permission from Morgan Kaufmann, a division of Elsevier, Copyright 2012, this article was excerpted from Software Engineering for Embedded Systems, written and edited by Robert Oshana and Mark Kraeling.
About the author
Dr. Michael C. Brogioli is principal and founder at Polymathic Consulting as well as an adjuct professor of computer engineering at Rice University, Houston, Texas. Prior to Polymathic, he was senior member of the technical staff and chief architect at Freescale Semiconductor. In addition to that he also served in several roles at TI's Advanced Architecture and Chip Technology Research and Intel's Advance Microprocessor Research Lab. He holds a PhD/MSc in electrical and computer engineering from Rice as well as a BSc in electrical engineering from Renssealer Polytechnic Institute.
Related Articles | Editor's Choice |
Visit Asia Webinars to learn about the latest in technology and get practical design tips.