Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Memory/Storage

Improve software through memory layout optimisation

Posted: 25 Nov 2014 ?? ?Print Version ?Bookmark and Share

Keywords:algorithms? memory optimisation? compiler? SIMD? vectorisation?

Read Part 1 of this series here.

To achieve sufficient levels of performance, application developers and software systems architects must not only select the appropriate algorithms to use in their applications, but also the means by which those applications are implemented. Quite often this also crosses the line into data structure design, layout and memory partitioning for optimal system performance.

It is true that senior developers often have insight into both algorithms and their complexity, as well as a tool-box of tips and tricks for memory optimisation and data structure optimisation. At the same time, the scope of most embedded software engineering projects prohibits manual code and data hand optimisation due to time, resource and cost constraints.

As such, developers must often rely on the tools as much as possible to optimise the general use cases, only resorting to hand-level tuning and analysis to tweak performance on those performance-critical bottlenecks after the initial round of development.

This last round of optimisation often entails using various system profiling metrics to determine performance-critical bottlenecks, and then optimising these portions of the application by hand using proprietary intrinsics or assembly code, and in some cases rewriting performance-critical kernel algorithms and/or related data structures. This article follows on from the topics discussed in Part 1 and details design decisions that may prove useful for embedded system developers concerned with the issues those topics mentioned above.

Overview of memory optimisation
Memory optimisations of various types are often beneficial to the run-time performance and even power consumption of a given embedded application. As was mentioned previously, these optimisations can often be performed to varying degrees by the application build tools such as compilers, assemblers, linkers, profilers and so forth. Alternatively, it is often valuable for developers to go into the application and either manually tune the performance or design in consideration of memory system optimisation a priori, for either given performance targets or so as to design the software architecture to be amenable to automated-tool optimisation in subsequent phases of the development cycle.

In tuning a given application, quite often the baseline or "out of box" version of the application will be developed. Once functionality is brought online, the development team or engineers may select to profile the application for bottlenecks that require further optimisation. Often these are known without profiling, if certain kernels within the application must execute within a given number of clock cycles as determined by a spreadsheet or pen and paper exercise during system definition. Once these key kernels are isolated or key data structures are isolated, optimisation typically begins by those experts with knowledge of both software optimisation techniques, compiler optimisations, the hardware target and perhaps details of the hardware target instruction set.

Focusing optimisation efforts
Amdahl's law plays an interesting role in the optimisation of full application stacks, however, and is not always appreciated by the software system developer. If only 10% of the dynamic run-time of a given application can benefit from SIMD or instruction-level parallel optimisations versus the 90% of dynamic run-time that must be executed sequentially, then inordinate amounts of effort on parallelizing the 10% portion of the code will still only result in modest performance improvements. Conversely, if 90% of the total application's dynamic run-time is spent in code regions exhibiting large amounts of instruction-level parallelism and data-level parallelism, it may be worthwhile to focus engineering effort on parallelizing these regions to obtain improved dynamic run-time performance.

1???2???3???4???5?Next Page?Last Page

Article Comments - Improve software through memory layo...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top