Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > EDA/IP

ARM-wrestling: MCU vs. core

Posted: 14 Jun 2006 ?? ?Print Version ?Bookmark and Share

Keywords:ARM? ARM 9E? STMicroelectronics? Cortex-R4? ARM core?

A pair of 32bit ARM-based embedded processors will let designers wrestle with a tough choice: take advantage of an off-the-shelf ARM 9E-based solution from STMicroelectronics or design their own high-performance solution with the latest Cortex-R4 ARM core from ARM Ltd.

The STR910F family from STMicroelectronics combines Ethernet connectivity, an ARM9E processor core and large embedded SRAM and flash memories, along with an assortment of I/O support functions. For designers who need much higher performance, ARM's Cortex-R4 processor core leverages an advanced microarchitecture with dual-instruction issue to deliver more than 600 Dhrystone MIPS when implemented in a performance-optimized 90nm process flow (based upon the ARM Artisan Advantage library).

The ARM966E-STM core used in ST's STR910F series delivers better performance than previous ARM7-based solutions, said Mark Rootz, marketing manager for ARM9-based products in the Microcontroller Division at STMicroelectronics. It accesses its instruction and data memories using two separate internal buses, thus enabling simultaneous access of both code and data, he said.

"Each of these memories is attached to the core through a highly optimized tightly coupled memory (TCM) interface for rapid access. The STR910F exploits this architecture by placing a high-speed burst flash memory on the instruction TCM, and a zero-latency SRAM on the data TCM," Rootz said. "The result is 96 MIPS peak code execution at 96MHz." Rootz called that "the highest peak performance for general-purpose flash ARM-based MCUs."

Efficient data movement

The controller also delivers "extremely efficient data movement between the CPU core and SRAM," Rootz added.

In contrast, the ARM7TDMI CPU core shares a single bus for access to its instruction and data memories, making simultaneous access impossible. Both the ARM7-TDMI and ARM966E-S cores execute the standard ARM and Thumb instruction sets.

The STR910F was given large memories to support the use of TCP/IP and real-time operating system stacks, in addition to complex control applications, said Rootz. SRAM sizes range up to 96Kbytes, which ST maintains is the largest SRAM of any general-purpose ARM-based flash MCUs in the market today. Such large SRAM blocks are ideal for larger packet buffers, enabling faster serial communications. The SRAM can be protected by a battery or supercapacitor connected to the battery input pin; alternatively, the SRAM contents can be automatically destroyed for secure applications in response to a signal on the STR910F's tamper-detection input pin.

Flash memory sizes range up to 544Kbytes, and the memory is configured into dual banks of read-while-write memory to support robust in-application programming for remote firmware updates and also for E2PROM emulation. Each of the SRAM and flash memories may be used for either instructions or data.

The STR910F MCU series support a full set of peripherals in addition to the Ethernet media-access controller and up to 80 5V-tolerant programmable I/O lines.

The peripherals include a USB full-speed port, a CAN interface, three UART/IrDA ports, two SPI ports and two I2C ports. Also on board are an eight-channel 10bit A/D converter, four 16bit timers, a three-phase ac motor control unit, a full-featured real-time clock and an external memory interface. The MCUs also include an ETM9 debug and trace interface and supervisor functions, with low-voltage reset and brownout detect.

Cost, power savings

For its part, ARM's Cortex-R4 core runs the enhanced Thumb-2 instruction set and can trim cost and power consumption for system developers, said John Cornish, the vice president of marketing in the Processor Division at ARM.

"The core occupies less than 1mm and consumes less than 0.27mW/MHz when fabricated in an area-optimized 90nm process flow," Cornish said. "This latest member of the Cortex processor family gives chip designers a high-performance processor for use in 3G phones, hard-disk drives, imaging and automotive systems, to name a few applications."

To get the higher performance, the Cortex-R4 employs an eight-stage pipeline vs. the five-stage pipeline in the ARM9E. Additionally, the latter stages of the pipeline in the R4 are split into four parallel pipelines. Each parallel portion of the pipeline handles different instruction types so that more can be done in parallel. The four pipelines are Load-Store, Multiply-Accumulate, Arithmetic-and-Logic and Divide. In contrast, the 9E core pipeline can only process a single instruction in each stage.

Thus, designers can either get much higher throughput, by optimizing the R4 for performance, or much lower operating power by delivering the same MIPS throughput as the 9E but running at a lower clock speed to save power.

Other improvements over the ARM9E core include the addition of branch prediction to avoid the need to flush the pipeline once the branch is executed. Since the 9E core had no prediction, each time a branch occurred, the pipeline had to be flushed and processing cycles were lost. A direct-memory-access port was also added to the R4 core to improve data transfers over an enhanced version of the TCM interface.

To further reduce cycles, the R4 employs a 64bit Amba 3 AXI memory interface bus (the 32bit Amba AHB interface was used in the 946E-S core). The AXI bus allows the processor to issue multiple outstanding addresses and supports data to be returned out of order. This prevents a slow peripheral on the bus from blocking the bus for the duration of its access, thus allowing the core to perform additional accesses rather than waiting for the slow peripheral to complete its operation. The wider interface also halves the time required by a cache line fill.

The Cortex-R4 processor can run programs written with the ARMv7 instruction set, thus making it fully backward compatible with existing ARM code. The core is also optimized for the Thumb-2 instruction set, which together with the ARM RealView Development Suite makes it possible to reduce on-chip memory sizes by up to 30 percent, saving cost in the system.

In addition, it can produce a 40 percent performance improvement over the previous Thumb instruction set running on an ARM946E-S core, according to the company. Since memory is an ever-growing portion of a chip, this capability provides a significant savings in area and cost when designers use the core in SoC solutions.

The ARM Cortex-R4 processor is available for licensing, along with the instruction-set simulator and RealView Development Suite tools environment. The complementary technologies for implementing full SoC solutions such as the Amba 3 AXI Interconnect (PL301), configurable dynamic-memory controller (PL340), static-memory controller family (PL350) and L2 cache (L220) are all available.

The STR910F family includes six devices, all in 80- or 128-lead Pb-free packages. SRAM ranges from 64- to 96Kbytes and flash memory from 288- to 544Kbytes. The core operates at 1.8V 110 percent, and the I/O ring at 2.7-3.6V, over a temperature range of -400C to 850C. Prices start at $6.99 apiece (STR910FM32X6) in quantities of 10,000.

Comprehensive support from ST and third parties includes starter kits from $199 from Hitex, IAR, Keil and Raisonance. These kits include a compiler and debugger (limited code size), a JTAG debugging and programming cable, code examples and all necessary hardware to begin a design.

Article Comments - ARM-wrestling: MCU vs. core
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top