Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Controls/MCUs

MCU architecture touts optimized peripheral data management

Posted: 12 Jul 2007 ?? ?Print Version ?Bookmark and Share

Keywords:32bit MCU architecture? embedded applications? ultralow power standard? ARM Cortex-M3?

To meet today's market requirements, MCUs must offer continually increasing performance and embedded functionality. Consequently, technology shrinks, design flows and embedded systems are increasing in complexity to meet the demand for higher performance.

However, most embedded applications today do not need to operate at high frequencies (typically Although 32bit MCUs now offer a large choice of peripherals, their architecture is not always suited to optimally manage these peripherals. Internal bus bandwidth is often the system bottleneck and increasing the number of peripherals or integrating high data rate peripherals increases bus activity. Frequency modification can partially solve this bandwidth limitation, but at a high cost in power consumption. To compensate, technology shrinks, ultralow power standard cells and/or clock gating during synthesis flow are used to reduce dynamic power.

STM32, the new 32bit MCU family introduced by STMicroelectronics is claimed to address these challenges. By combining optimized peripheral data management at the architecture level with an ARM Cortex-M3 core, which provides high performance (1.25DMIPS/MHz) and improved code density, STM32 allows optimum use of all its embedded resources. Moreover, a tightly coupled Nested Vectored Interrupt Controller reduces the interrupt latency to a maximum of 12 CPU cycles (6 CPU cycles in inter-interrupt management with tail-chaining).

Intuitive architecture
The STM32F10x embeds on-chip resources that are said to exceed most of today's market requirements for a 32bit MCU with 128Kbytes of flash, 20Kbytes of SRAM and a wide range of peripherals (USB, CAN, USART, ADC, PWM, SPI, I?C, Timers, RTC and DMA). For optimum management of these on-chip resources, direct memory access (DMA) and the Cortex-M3 core are connected to the memories and peripherals using a bus-matrix multilayer architecture that is illustrated in Figure 1.

Figure 1: STM32, the new 32bit MCU family introduced by STMicroelectronics combines optimized peripheral data management at the architecture level with an ARM Cortex-M3 core.

One master layer of the bus matrix is used for the DMA and two layers for the Cortex-M3 core. Memories (flash and SRAM) and peripherals are independently connected to a slave port. If two masters want to access the same slave, arbitration is performed using a round-robin algorithm. The master with the highest priority is served while the lower priority master waits until the end of the on-going transfer to be served. The three-layer bus matrix enables concurrent transfer from one master to a slave at a maximum rate of 288MBps at 72MHz.

Peripheral management optimization
STM32F10x peripherals have been split between two ARM advanced peripheral buses (APB), for optimum power consumption and efficiency. Peripheral clocks, including system peripherals on the ARM Advanced high-performance bus (AHB), can be independently and dynamically enabled and pre-scaled.

To save power, slower peripherals can be grouped on the same bus running at low frequency. CPU and peripheral clocks are fully synchronous to remove the latency introduced by the synchronization stage (a few cycles) between peripherals and CPU interrupts/events or DMA requests. As a result, interrupts and DMA requests are sampled on the first cycle and can be served much more efficiently.

Bus bridges are also optimized to reduce latency between AHB and APB buses. Read access is optimized so that the AHB wait state is released on the last cycle of the APB cycle, reducing transfer latency by a few cycles. A write buffer has been inserted in the bridge to release the master while the bridge is managing the transfer. Write transfer is done without wait states on the AHB bus, even if a large pre-scaler ratio is applied between AHB and APB to release the AHB for the next transfer.

Figure 2: Several tasks are performed by the CPU to manage an SPI in receive mode.

Peripheral configuration is done by the CPU. However, data can be managed by the Cortex-M3 core or by the DMA. Managing data using the CPU through interrupts has several drawbacks, such as latency between the interrupt generation and the data read/write inside the peripheral. However, data management by CPU or DMA enables concurrent transfers for optimum use of on-chip peripherals. For example, the following are the tasks performed by the CPU to manage an SPI in receive mode:

?SPI generates a receive interrupt
?CPU enters the corresponding interrupt handler
?CPU manages interrupt (clear)
?CPU reads the data from the peripheral
?CPU writes this data in the SRAM
?CPU returns from interrupt handler to the main.

These tasks are represented in system cycles in Figure 2.

One of the seven DMA channels could be used to manage a peripheral's data, reducing the latency to manage data to the minimum. For example, the following are the main tasks performed by the DMA once the peripheral and DMA have been initialized by the CPU for management of an SPI in transmit mode:

?DMA samples the request
?DMA performs arbitration between all requests and serves the request with the highest priority
?DMA generates a read access to the memory
?DMA generates a write access to the peripheral and acknowledge the request at the same time.

These tasks are also represented in system cycles in Figure 2.

In a traditional architecture, the CPU manages the first peripheral and then the DMA manages the other or vice-versa. The two can not be done simultaneously. The multilayer concept is quite different and enables the STM32 to manage both tasks in parallel (Figure 2). As a result, where a traditional architecture requires a minimum of five cycles to manage the two transfers, the STM32F10x architecture reduces it to only three cycles. The CPU is able to perform a read of the peripheral while DMA is performing a read of SRAM, followed by the CPU performing a write to SRAM while DMA is performing a write to the peripheral. Both actions can be done in parallel. The number of cycles used (bandwidth) to perform these accesses on SRAM and the peripheral bus is reduced. This enables STM32F10x to manage more peripherals at the same time, or to increase the performance of the peripheral while keeping system frequency constant. With the support of concurrent data management, it is possible to handle several transfers in parallelDMA can manage peripherals while the CPU is able to access peripherals and memories during free cycles on the bus.

The STM32F10x's concurrent transfer capability allows the optimum use of embedded resources without compromising frequency and power consumption, said ST. The Cortex-M3 core, with its advanced architecture, allows the STM32F10x to offer best-in-class code density (30 percent less than an ARM7) and power consumption (0.5mA/MHz with peripheral clock ON) at 72MHz.

This article was contributed by STMicroelectronics.

Article Comments - MCU architecture touts optimized per...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top