Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Controls/MCUs

Explore 8-bit, 32-bit MCU options for IoT apps

Posted: 06 Jan 2016 ?? ?Print Version ?Bookmark and Share


Architecture specifics
We've now painted our basic picture. Assuming there is both an ARM and an 8051-based MCU with the required peripherals, the ARM device will be a better choice for a large system or an application where ease-of-use is an important factor. If low cost/size is the primary requirement, then an 8051 device will be a better choice. Now it's time to look at a more detailed analysis of applications where each architecture excels and where our general guidelines break down.

There is a noticeable difference in interrupt and function-call latency between the two architectures, with 8051 being faster than an ARM Cortex-M core. In addition, having peripherals on the Advanced Peripheral Bus (APB) can also impact latency since data must flow across the bridge between the APB and the AMBA High-Performance Bus (AHB). Finally, many Cortex-M-based MCUs require the APB clock to be divided when high-frequency core clocks are used, which increases peripheral latency.

I created a simple experiment where an interrupt was triggered by an I/O pin. The interrupt does some signaling on pins and updates a flag based on which pin performs the interrupt. I then measured several parameters shown in the following table. The 32-bit implementation is listed here.

The 8051 core shows an advantage in Interrupt Service Routine (ISR) entry and exit times. However, as the ISR gets bigger and its execution time increases, those delays will become insignificant. In keeping with the established theme, the larger the system gets, the less the 8051 advantage matters. In addition, the advantage in ISR execution time will swing to the ARM core if the ISR involves a significant amount of data movement or math on integers wider than 8 bits. For example, an ADC ISR that updates a 16- or 32-bit rolling average with a new sample would probably execute faster on the ARM device.

Control vs. processing
The fundamental competency of an 8051 core is control code, where the accesses to variables are spread around and a lot of control logic (if, case, etc.) is used. The 8051 core is also very efficient at processing 8-bit data while an ARM Cortex-M core excels at data processing and 32-bit math. In addition, the 32-bit data path enables efficient copying of large chunks of data since an ARM MCU can move 4 bytes at a time while the 8051 has to move it 1 byte at a time. As a result, applications that primarily stream data from one place to another (UART to CRC or to USB) are better-suited to ARM processor-based systems.

Consider this simple experiment. I compiled the function below on both architectures for variable sizes of uint8_t, uint16_t and uint32_t.

uint32_t funcB(uint32_t testA, uint32_t testB){
return (testA * testB)/(testAtestB)

|data type | 32bit(-o3) | 8bit |
| uint8_t | 20 | 13 | bytes
| uint16_t | 20 | 20 | bytes
| uint32_t | 16 | 52 | bytes

As the data size increases, the 8051 core requires more and more code to do the job, eventually surpassing the size of the ARM function. The 16-bit case is pretty much a wash in terms of code size, and slightly favors the 32-bit core in execution speed since equal code generally represents fewer cycles. It's also important to note that this comparison is only valid when compiling the ARM code with optimization. Un-optimized code is several times larger.

This doesn't mean applications with a lot of data movement or 32-bit math shouldn't be done on an 8051 core. In many cases, other considerations will outweigh the efficiency advantage of the ARM core, or that advantage will be irrelevant. Consider the implementation of a UART-to-SPI bridge. This application spends most of its time copying data between the peripherals, a task the ARM core will do much more efficiently. However, it's also a very small application, probably small enough to fit into a 2 KB part.

Even though an 8051 core is less efficient, it still has plenty of processing power to handle high data rates in that application. The extra cycles available to the ARM device are probably going to be spent sitting in an idle loop or a "WFI" (wait for interrupt), waiting for the next piece of data to come in. In this case, the 8051 core still makes the most sense, since the extra CPU cycles are worthless while the smaller flash footprint yields cost savings. If we had something useful to do with the extra cycles, then the extra efficiency would be important, and the scales may tip in favor of the ARM core. This example illustrates how important it is to view each architecture's strengths in the context of what the system being developed cares about. It's a simple but important step to making the best decision.

8051 devices do not have a unified memory map like ARM devices, and instead have different instructions for accessing code (flash), IDATA (internal RAM) and XDATA (external RAM). To enable efficient code generation, a pointer in 8051 code will declare what space it's pointing to. However, in some cases, we use a generic pointer that can point to any space, and this style of pointer is inefficient to access. For example, consider a function that takes a pointer to a buffer and sends that buffer out the UART. If the pointer is an XDATA pointer, then an XDATA array can be sent out the UART, but an array in code space would first need to be copied into XDATA. A generic pointer would be able to point to both code and XDATA space, but is slower and requires more code to access.

?First Page?Previous Page 1???2???3???4?Next Page?Last Page

Article Comments - Explore 8-bit, 32-bit MCU options fo...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top