Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
?
EE Times-Asia > Embedded
?
?
Embedded??

Net processes shuffle design priorities

Posted: 02 Dec 2002 ?? ?Print Version ?Bookmark and Share

Keywords:network processing? risc processor? internet-edge processor? consumer device? network protocols?

The special demands of network protocols and packet processing are a radical departure from the applications for which general purpose processors were designed many years ago. Some of the assumptions underlying the design of general purpose processors and the microcontrollers derived from them are 1800 out of phase with the modern realities of network processing.

Fundamental RISC tenets such as 32-bit aligned data and load/store architectures seem outdated in a world of variable-length packets and the need to move torrents of data through a processor with minimal detours in the registers.

Besides dealing with the deterministic nature of packet processing at the lower levels of network protocols and the real-time nature of stream applications such as MP3, VoIP, and MPEG, another essential issue in the design of Internet-edge processors is which communication protocol to support. There are many network protocols and general purpose communication protocols: Ethernet, Fast Ethernet, 802.11 variants, Bluetooth, HomePlug, USB, DSL variants, DOCSIS, PC Card, ISDN, GPRS, and more.

Another significant factor that affects the design of Internet-edge processors is cost. Unlike the expensive servers and routers that manage traffic deep inside the Internet, the embedded networking systems are often end-user devices and must be as affordable as consumer products.

Embedding connectivity

Those considerations led our engineers to come up with a new CPU architecture specifically for embedding network connectivity in low-cost systems at the Internet edge. We hope to introduce the architecture, code-named Mercury, in a standard-product processor. Fundamentally, the architecture is still a 32bit RISC processor. It has a Harvard architecture, fixed-length 32-bit instructions, RISC-like pipeline and single-cycle throughput. But from there on, it diverges.

While most other CPUs have hundreds of instructions because of their PC/workstation/server heritage, the ISA is limited to only 39 instructions. It is optimized for Internet-edge packet processing, not for running databases, compilers, word processors, spreadsheets, or games. So we can afford to create an ISA that does the same very specific network-edge operations with fewer instructions, allowing the use of reduced code size and the reduction or elimination of external flash memory.

Perhaps the most interesting divergence from RISC in our design is its memory-to-memory architecture. Several instructions access memory twice--once to load a value from memory, and again to store the value after manipulating it.

Conventional RISC architectures shun multiple memory accesses because off-chip memory latencies have not kept pace with the rising core frequencies of CPUs. Instead, they use separate instructions to load a value from memory into a register, store the value in a register after manipulating it and then copy the register back to memory. That still requires two memory accesses, but separating the load/store instructions allows a program to perform multiple register-to-register operations on a value before the final store.

While the so-called load/store architecture of RISC is well-suited for the software applications that RISC processors were designed to run in the 1980s and 1990s, it is not as suitable for 21st-century Internet-edge packet processing.

Different approach

Our approach is different in several fundamental ways. First, is the matter of memory-to-memory instructions. A packet processor rarely needs to perform multiple operations on a fragment of packet data between loading it from memory and storing it back to memory. So it is redundant to use separate instructions that load the packet data into a register, manipulate the data in a register-to-register fashion, and then store the data back to memory.

Typical packet-processing operations touch the data only once - for instance, to perform a cyclic redundancy check for TCP checksums on every byte in a packet. Why use multiple instructions that waste CPU cycles and inflate code size when a single instruction can do the job?

In this architectural approach, a single optimized instruction can load some packet data from memory, perform the necessary operations on the data and then store the data directly back to memory without stopping at a register along the way. For flexibility, the memory-to-memory instructions have multiple modes for base+index, base+offset. and auto-increment memory addressing.

To implement the architecture, we opted for a modified pipeline and fast local memory. The pipeline has extra stages to calculate memory addresses, read memory, and write back to memory. It is deeper than a minimal RISC pipeline but still supports the single-cycle throughput that is characteristic of modern RISC processors.

- David Fotland

Chief Technology Officer

Ubicom Inc.





Article Comments - Net processes shuffle design priorit...
Comments:??
*? You can enter [0] more charecters.
*Verify code:
?
?
Webinars

Seminars

Visit Asia Webinars to learn about the latest in technology and get practical design tips.

?
?
Back to Top