Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
?
EE Times-Asia > Memory/Storage
?
?
Memory/Storage??

Improve software through memory-oriented code optimisation

Posted: 24 Nov 2014 ?? ?Print Version ?Bookmark and Share

Keywords:software code? compiler? SIMD? StarCore? DSPs?

This is where custom calling conventions may be used with a given ABI to further improve performance, or in the case of this example to further reduce the code size and increase performance as well. Suppose that now the user has altered the calling convention within the ABI to be used for these two procedures. Let's call this new calling convention as specified by the user "user_calling_convention". The user has now stated that rather than pass only the first two parameters from the caller function to the callee in registers, with subsequent parameters being passed via the stack, that user_calling_convention may pass up to eight parameters from the caller to the callee function via registers, namely R00R07.

In doing this, the tools will need to account for additional registers being used for parameter passing, and the bookkeeping required on both sides of the caller/callee world; however, for this user's example code which passes large numbers of parameters from caller to callee, a benefit can be gained. Figure 3 illustrates what assembly code the user could expect to be generated using this user_calling_convention customisation as specified by the developer.

Referring to figure 3, it can be seen that by using the user_calling_convention, the resulting assembly generated by the compiler looks quite different from that of the default calling convention. By permitting the build tools to pass additional parameters between caller and callee functions using registers, a drastic reduction in the number of instructions generated for each procedure is evident. Specifically, the caller_procedure can be shown to require far fewer moves to the stack before the call to the callee_procedure. This is due to the fact that additional hardware registers are now afforded to the calling convention, whereby values from the caller's memory space may simply be loaded into registers before making the call rather than loading into registers and then copying onto the stack (and possibly adjusting the stack pointer explicitly).

Assembly language-based caller procedure

Assembly language-based caller procedure

Figure 3: Example assembly language based caller procedure (optimised).

Similarly, referring to the callee_procedure, it can be seen that a number of instructions have been removed from the previous example's generated assembly. Once again, this is due to the fact that parameters are now being passed from the caller to the callee function via the register file, rather than pushing onto and pulling off the stack.

As such, the callee does not need the additional instruction overhead to copy local copies from the stack into registers for local computation. In this particular example, not only is it likely that performance improvements will be seen due to fewer instructions being required to execute dynamically at run-time, but code size has also been reduced due to the number of instructions statically reduced in the executable.

While this example has shown how custom calling conventions can be used as part of an embedded system's larger ABI to reduce code size, and tailor memory optimisation, there are a number of other concepts that may also play into this.

Beyond the scope of this example are subjects such as spill code insertion by the compiler, the compiler's ability to compute stack frame sizes to utilise standard MOVE instructions to/from the stack frame rather than PUSH/POP style instructions, and also SIMD-style move operations to the stack whereby increased instruction density is obtained, further increasing performance and reducing code size overhead, are left as further reading and considered beyond the scope of this example.

Caveat emptor: compiler optimisation orthogonal to code size!
When compiling code for a production release, developers often want to exploit as much compile-time optimisation of their source code as possible in order to achieve the best performance possible. While building projects withOs as an option will tune the code for optimal code size, it may also restrict the amount of optimisation that is performed by the compiler due to such optimisations resulting in increased code size. As such, a user may want to keep an eye out for errant optimisations performed typically around loop nests and selectively disable them on a one-by-one use case rather than disable them for an entire project build. Most compilers support a list of pragmas that can be inserted to control compile-time behaviour. Examples of such pragmas can be found with the documentation for your target processors' build tools.

?First Page?Previous Page 1???2???3???4???5?Next Page?Last Page



Article Comments - Improve software through memory-orie...
Comments:??
*? You can enter [0] more charecters.
*Verify code:
?
?
Webinars

Seminars

Visit Asia Webinars to learn about the latest in technology and get practical design tips.

?
?
Back to Top