Improve software through memory-oriented code optimisation
Keywords:software code? compiler? SIMD? StarCore? DSPs?
Variable-length instruction encoding is one particular technology that a given target architecture may support, which can be effectively exploited by the build tools to reduce overall code size. In variable-length instruction coding schemes, certain instructions within the target processor's ISA may have what is referred to as "premium encodings", whereby those instructions most commonly used can be represented in a reduced binary footprint. One example of this might be a 32bit embedded Power Architecture device, whereby frequently used instructions such as integer add are also represented with a premium 16bit encoding. When the source application is compiled for size optimisation, the build tools will attempt to map as many instructions as possible to their premium encoding counterpart in an attempt to reduce the overall footprint of the resulting executable.
Freescale Semiconductor supports this feature in the Power Architecture cores for embedded computing, as well as in their StarCore line of DSPs. Other embedded processor designs such as those by ARM Limited and Texas Instruments' DSP have also employed variable encoding formats for premium instructions, in an effort to curb the resulting executable's code size footprint.
In the case of Freescale's Power Architecture, Freescale states that both standard 32bit code and 16bit premium-encoded code can be mixed interchangeably within the executable on a flash page size access basis. Other architectures may opt to specify the encoding within some format of prefix bits, allowing an even finer level of code intermingling.
It should be mentioned than the reduced-footprint premium encoding of instructions in a variable-length encoding architecture often comes at the cost of reduced functionality. This is due to the reduction in the number of bits that are afforded in encoding the instruction, often reduced from 32 bits to 16 bits.
An example of a non-premium encoding instruction versus a premium encoding instruction might be an integer arithmetic ADD instruction. On a non-premium-encoded variant of the instruction, the source and destination operations of the ADD instruction may be any of the 32 general-purpose integer registers within the target architecture's register file. In the case of a premium-encoded instruction, whereby only 16 bits of encoding space are afforded, the premium-encoded ADD instruction may only be permitted to use R0R7 as source and destination registers, in an effort to reduce the number of bits used in the source and register destination encodings. Although it may not readily be apparent to the application programmer, this can result in subtle, albeit minor, performance degradations. These are often due to additional copy instructions that may be required to move source and destination operations around to adjacent instructions in the assembly schedule because of restrictions placed on the premium-encoded variants.
As evidence of the benefits and potential drawbacks of using variable-length encoding instruction set architectures as a vehicle for code size reduction, benchmarking of typical embedded codes when targeting Power Architecture devices has shown VLE, or variable- length encoding, enabled code to be approximately 30% smaller in code footprint size than standard Power Architecture code while only exhibiting a 5% reduction in code performance. Resulting minor degradations in code performance are typical, due to limitations in functionality when using a reduced instruction encoding format of an instruction.
Floating-point arithmetic and arithmetic emulation may be another somewhat obfuscated source of code size explosion. Consider the case in which the user's source code contains loops of intensive floating-point arithmetic when targeting an architecture lacking native floating-point functionality in hardware. In order to functionally support the floating-point arithmetic, the build tools will often need to substitute in code to perform floating-point arithmetic emulation at program run-time. This typically entails trapping to a floating-point emulation library that provides the required functionality, such as floating-point division, using the existing non-floating-point instructions natively supported on the target architecture.
As one might predict, it is not uncommon for a given floating-point emulation routine to require hundreds of target processor clock cycles to emulate the floating-point operation, which execute over tens if not hundreds of floating-point emulation instructions. In addition to the obvious performance overhead incurred versus code targeting a processor with native floating-point support in hardware, significant code size increases will occur due to the inclusion of floating-point emulation libraries or inlined floating-point emulation code. By correctly matching the types of arithmetic contained in the source application with the underlying native hardware support of the target architecture, reductions in the overall resulting executable size can be achieved with some effort.
Tuning the ABI for code size
In software engineering, the application binary interface (ABI) is the low-level software interface between a given program and the operating system, system libraries, and even inter-module communication within the program itself. The ABI itself is a specification for how a given system represents items such as data types, data sizes, alignment of data elements and structures, calling conventions and related modes of operations. In addition, a given ABI may specify the binary format of object files and program libraries. The calling convention and alignment may be areas of interest to those wishing to reduce the overall code size of their application by using a custom calling convention within their particular application.
Related Articles | Editor's Choice |
Visit Asia Webinars to learn about the latest in technology and get practical design tips.