Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Networks

Coding an NPU to maximize throughput

Posted: 17 Oct 2005 ?? ?Print Version ?Bookmark and Share

Keywords:agere? npu? network? environment? throughput?

The networking environment is populated by routers and switches that require highly parallelized operations using NPUs optimized for data flow. They must simultaneously perform specialized classification operations to determine the source, destination, service level and degree of service reliability.

But high data rates are forcing a paradigm shift from a traditional, control-oriented programming methodology to a data-flow paradigm, where units of data are tagged and worked on by independent functional units operating in parallel.

To achieve high throughput rates, NPUs use a combination of replicated and pipelined parallel-processing techniques. In replicated parallel processing, each functional unit is replicated and each processing element (PE) performs all the necessary functions on a packet before going on to the next. In contrast, each unit performs a portion of the total work in pipelining.

The NPU must also perform classification operations such as examining each data packet and determining the action to be taken. In the data path, classification is performed many times on virtually all portions of the input data stream: L2, L3 and L4 switching; L3+ routing; priority processing; access control list processing; error detection; filtering; load balancing; flow detection; statistics gathering; and segmentation and reassembly of packets.

In such environments, traditional, sequential procedural programming is inadequate and requires a shift to more appropriate functional programming languages (FPLs). But developers need to know the differences between procedural and functional language methods, as well as which environments are most appropriate for each and when to abandon or minimize one and use the other.


Use an FPL for coding the data-plane elements. As such, a programmer need only specify what functions need to be accomplished without coding the algorithm needed to accomplish the task. Data-plane code can also be written as if it runs on a single thread on a single processor, with the underlying hardware creating and assigning contexts to the packets, running the program and managing the contexts. An FPL does not have to specify the order of execution; thus, it is easier to program and is less prone to errors.

Carefully map the functional programming model to the appropriate underlying functions to be performed. In a classification application, the FPL should include features for the following: definition of incoming data cells or packets to identify important fields; rules that allow linkage of classification outcomes to appropriate actions; an extensible set of actions as outcomes for classification; the ability to combine fields for complex classifications; a wild-card and priority capability for subnets; the ability to express multiple rule sets; and support for deep classification and reassembly.

Use an FPL designed not only to express classification based on packet inspection, but also to allow classification on other actions and let the host update the rules using APIs.

Include constructs that invoke external functions as part of the classification process so that real-world applications can be implemented.

Use C or another procedural language at those stages in the control plane where the NPU is not directly involved in classification. Once appropriately classified, packets move on to the traffic manager, a procedural language permits manipulation of the NPU's hardware registers.


Use NPU architectures in which parallelism is explicitly exposed to the software. While this exposes the nuances of the architecture to the software, it requires code that explicitly allocates tasks to multiple processing elements and manages the interprocess communication between the PEs. Such programming is CPU-dependent and must be rewritten as the number of PEs changes.

Use procedural languages for code development in the NPU's data plane. Procedural languages, such as C, used to program multiple parallel PEs require more code and more effort. First, to specify what functions are to be accomplished. Second, to determine how to do them through instructions that provide explicit access to low-level primitives such as semaphores and mutexes. Third, to allocate tasks to threads. And lastly, to map these low-level tasks to the functional units in the hardware.

Use traditional, single-pass programming techniques with an FPL. Because of the way network interface hardware works, a classification FPL should use a two-part, or, two-pass, programming paradigm. This allows a programming language to deal with cell-oriented protocols such as ATM as well as packet-oriented protocols such as Internet Protocol.

Use the full C programming repertoire. Use only a subset of C, optimized for use in NPUs. Since procedural languages have to express how to implement a function, C for an NPU should provide for deterministic processing of packets, allowing basic mechanisms in hardware and the policies to be followed in software.

Use C features that eat up CPU cycles. Limit yourself to a "pure function" interface, i.e., one that obeys syntactic constraints that produce no side effects. The only effect a pure function should have on the state of a program is to return a result.

- David Sonnier

Chief Technology Officer

Agere Systems Inc.

Article Comments - Coding an NPU to maximize throughput
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top