Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Embedded

Omniscient C compiler boosts PIC32 RISC performance

Posted: 25 Jul 2008 ?? ?Print Version ?Bookmark and Share

Keywords:omniscient code generation? PIC32 RISC CPU? ANSI C compiler?

At Microchip Technology's MASTERS Conference (July 23), HI-TECH Software took the wraps off an "omniscient" ANSI C compiler for 32bit MCU code that it claims boosts real-time response by more than 25 percent as well as nearly doubling code density.

The new HI-TECH C PRO compiler for Microchip's PIC32 MCU uses a new technique called omniscient code generation (OCG) to optimize stack and register allocation across all code modules prior to generating the object code. Smaller code generally executes more quickly and requires smaller, less expensive flash memory for storage.

It collects comprehensive data on every register, stack, pointer, object and variable declaration across the entire program. It uses this information to optimize register usage, stack allocations and pointers across the whole program. It also ensures consistent variable and object declarations between modules and deletes unused variables and functions.

According to CEO and company founder Clyde Stubbs, its performance on the PIC32 proves out the company's belief that OCG technique should result in even better performance and code density improvements on 32bit register-based MCUs than that achieved in 8bit and 16bit MCUs where the company has focused its OCG efforts previously.

Because PIC32 is based on a MIPS Technologies 32bit core, he believes that the performance improvements achieved should be repeatable on most other MIPS architectural derivatives, as well as many other RISC-based designs. "Right now we are being somewhat conservative and are confining ourselves to architectures that have a clear and large following in the embedded systems market."

Next on the company's agenda is the 32bit RISC ARM architecture, with a particular focus on the ARM Cortex-M3, which is targeted specifically at embedded applications. There, as with most other 32bit RISC CPUs, said Stubbs, code is most often generated one module at a time, using variations of GNU Compiler Collection (GCC) techniques.

Because GCC generates code one-module-at-a-time, he said, no comprehensive cross-module data is available. "But without knowing how objects are used across the whole program, it is impossible to achieve the same level of optimization as an OCG compiler," said Stubbs.

In code density benchmarks, the company's OCG compiler achieved code that can be as much as 40 percent smaller than that generated using industry leading GCC-based PIC32 compilers. "The smaller code size can cut device costs by reducing the amount on on-chip flash required," he said.

Stubbs pointed out what because GCC-based 32bit compilers are constrained as to which registers can be used to store parameters for called functions. "Whenever a function is called from another code module, the parameters of that function are usually stored in the registers," said Stubbs, via four specific registers reserved for this purpose in GCC-based compilers.

The problem is that if the function has more than four parameters, the additional parameters must be stored on and passed to the called function using the stack (in RAM)a cycle intensive process that degrades performance and leads to increased RAM usage.

Faster interrupt handling
By comparison, he said, interrupt-intensive code generated by omniscient code compilation typically requires 26 percent fewer cycles for the PIC32 to execute than code compiled using a non-OCG compiler.

By reducing the number of CPU cycles spent moving data between the registers and stack, HI-TECH's OCG compiler effectively gives the CPU a 26 percent performance boost. More important, called functions frequently call other functions, which may, in turn call other functions.

"This is particularly true for interrupt intensive applications," said Stubbs. "For example, if the code calls a function, which then calls a second function, the parameters for the first function will have to be saved to the stack to make room for the parameters for the second function. "

If this second function calls a third function, the parameters for the second function will also have to be saved to the stack to make room for the parameters of the third function.

"Data will have to be shifted continuously between the stack and the registers," he said. "The penalty for this is at least a cycle every time data is moved to or from the stack " or 8 cycles to move the data for a single four-parameter function to the stack and back to the registers."

Even if other registers are available, the GCC compiler allocates the extra parameters to the stack once the fixed set of four registers is full. This process wastes both cycles and RAM. It also results in code bloat due to the extra instructions required to save function parameters to the stack.

In contrast, with OCG compilation, said Stubbs, there is perfect knowledge of the register usage of each function. At any point in the program, it knows which registers are available and which registers are not available, and can optimize register usage without any arbitrary constraints.

"When there are two or three deep function calls, it allocates parameters for different functions into non-overlapping register sets, often eliminating the need to store parameters into memory completely," he said.

"This results in better utilization of the available registers, fewer cycles wasted moving parameters between the stacks and the registers, and less RAM usage. It also contributes to smaller code size by reducing or eliminating the need for code to save registers to the stack."

With the use of OCG, the HI-TECH C PRO knows the register usage of every function in the entire program, including interrupts and any functions that are called by the interrupt code.

"It also knows exactly which registers need to be saved and restored for each interrupt routine. The OCG compiler saves only those registers that are necessary, reducing the size of the interrupt context switching code, and decreasing the number of cycles required to execute the interrupt routine."

Improving memory optimization
Since the HI-TECH C PRO compiler knows the usage of every instance of every variable in the program, it has the ability to optimize the allocation of every variable between either the stack or the registers. The optimization is based on the frequency of use of each variable.

Variables that are used intensively can be allocated permanently to registers, which have no cycle penalty at all. All register and stack allocations are always optimized to elicit the best overall performance for the entire program. This highly refined optimization of memory both boosts performance and minimizes power consumption by keeping frequently used data in locations that have the shortest access time.

HI-TECH C PRO for the PIC32 MCU Family is available now through September 30 for the introductory price of $1,595, after which it will sell for $1,995. A fully functional 45-day trial version can be downloaded, free of charge, at HI-TECH's Web site.

- Bernard Cole

Article Comments - Omniscient C compiler boosts PIC32 R...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top