Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Interface

A technical overview of RapidIO

Posted: 07 Nov 2007 ?? ?Print Version ?Bookmark and Share

Keywords:RapidIO? high-speed interconnect? embedded processors? Ethernet? Serdes?

By Greg Shippen
Freescale Semiconductor

Today's high-speed embedded applications are like networks unto themselves. They possess tremendous processing resources capable of acquiring and analyzing large amounts of data that require a complex internal fabric to facilitate the transfer of data throughout the system. Data often passes through many interconnect layers and protocols as it crosses the fabric, with each layer of interconnect introducing undesirable latency, complexity and cost (Figure 1).

To achieve the performance required for these applications (many of which must produce results in real-time with the minimum latency) developers are seeking ways to consolidate interconnect layers across the system. Not only are they trying to more seamlessly connect chips, boards and chassis, ideally they'd like to collapse the data and control planes into a single fabric.

A system-level interconnect must perform efficiently and offer the mix of functionality appropriate for the data it is to transport. Developers aiming to achieve the highest performance and reliability understand that selecting the optimal interconnect for a system fabric involves many considerations beyond theoretical maximum throughputincluding how efficiency is an inherent part of the physical, transport and logical layers and features such as multiple PHY options, deadlock avoidance through priority mechanisms and buffer management, advanced quality of service with short- and long-term flow control mechanisms, and data plane capabilities that enable the interconnect to transport any protocol.

Design by intention
While there are many interconnect technologies available for use as a system interconnect fabric, the RapidIO standard was designed specifically to address the needs of embedded developers. Originally conceived as a next-generation front-side bus for high-speed embedded processors, its designers had the foresight to widen its focus to include embedded in-the-box and chassis control plane applications. In this way, the standard provides reliability with minimal latency, avoids unnecessary software dependence, offers extensibility to maximize application applicability, and simplifies switch design, all while achieving effective data rates from 667Mbps to 30Gbps.

The origins of the RapidIO interconnect standard date back to 1997 when Motorola began work on a next-generation bus for its Power- and PowerPC-based processors. By 1999, Motorola and Mercury Computer had joined together to complete the initial RapidIO specification. In 2000, these two companies drove the formation of the RapidIO Trade Association, making the RapidIO specification an independent standard and, in 2004, the RapidIO specification become an international standard as ISO/IEC 18372:2004.

It is useful to contrast the development of the RapidIO standard to that of Ethernet. Many embedded developers are quite familiar with Ethernet and, because of their familiarity, attempt to carry Ethernet from the WAN and LAN down into the system-level fabric. Ethernet, however, was originally designed to connect large computers. Its creators assumed that every Ethernet node would have substantial processing resources available to it. Thus, to provide Ethernet with the flexibility to support the widest range of applications, the base specification uses a simple, generalized header with a single transaction type and a large ID field (6bytes for a MAC address. Best-effort service simplified packet delivery, and implementing most of the protocol in software kept pricing of early deployments down.

While many of these characteristics are advantageous in LAN and WAN environments (there's a reason Ethernet is as ubiquitous as it is in the network) they actually introduce significant inefficiencies when used in a system fabric. Given the high speeds and minimal latency requirements of backplane and box-to-box applications, these inefficiencies are unacceptable.

Figure 1: Today's embedded applications require a complex internal fabric to facilitate the high-speed transfer of data throughout a system, often passing through many interconnect layers and protocols, with each layer of interconnect introducing undesirable latency, complexity and cost.

By being focused on chip-to-chip, board-to-board and chassis-to-chassis interconnect, the RapidIO standard is optimized for the types of transactions typical in these applications. For example, while the inefficiencies introduced by large headers are minimized when typical transactions use the maximize packet size, most control plane transactions, as well as chip-to-chip transactions, are relatively small. The efficient and minimal header size of RapidIO packets, therefore, has a substantial impact on effective throughput and efficiency in these applications.

Moreover, most of the RapidIO protocol is implemented in hardware. This minimizes the burden placed on a host processor to process packets. This is especially important in high-speed applications where software protocol processing can actually limit effective throughput. For example, a 1- or 10GbE can overwhelm a host processor, requiring the use of TCP/IP offload engine (TOE) technology. As there is no set TOE standard, such implementations are proprietary and vary significantly from vendor to vendor. By implementing protocol processing in hardware, implementation of RapidIO links is consistent across the industry.

The RapidIO PHY is defined to move packets reliably and efficiently across individual links while the transport layer defines how packets are identified within a fabric and how they are routed to a particular destination. Specifically, the protocol guarantees packet delivery while managing link utilization, packet transmission, and error recovery. Individual packets participate in the link protocol through physical layer packet header fields, and packet loss is detected because the receiver must positively acknowledge each packet received. Through the use of control symbols, which can be embedded within packets, link operation can be coordinated between links while minimizing loop latency. Except for certain rare error scenarios, hardware automatically recovers most link errors without software intervention since an error acknowledgement triggers a packet resend.

The logical layer is the highest layer of the RapidIO protocol and defines many operations common to interconnect protocols with additional operations helpful to embedded applications. The base protocol supports messaging and guarantees packet delivery. Ethernet, in contrast, must support these functions as higher layer protocols (RDMA and TCP/IP respectively) which increase both the latency and software processing required. Other supported functions include data streaming, globally shared memory, flow control, and user-defined operations.

Physical Layer Options

The RapidIO protocol supports both a serialized embedded clock signaling interface (Serdes) and a parallel source synchronous interface which provide developers with the ability to trade off between latency, data rate, physical channel length, and number of differential pairs used based on the needs of a particular application (Table 1). While the control symbol formats between the two PHYs are relatively different, packet formats are the same but for a few physical layer-specific bits. Additionally, the RapidIO protocol was intentionally designed to match existing PHY standards, where possible, so as to leverage existing ecosystems and economies of scale. For example, the Serial RapidIO specification follows XAUI electricals.

Table 1: The RapidIO protocol defines both serial and parallel PHY interfaces with multiple lane widths, giving developers the ability to trade off between latency, data rate, physical channel length and number of differential pairs used based on the needs of a particular application.

The 1x/4x LP-serial specification offers the lowest pin count and longest channel lengths, making this XAUI-based PHY well suited for backplane fabric applications. The PHY further extends the capabilities of the XAUI specification by defining a reduced voltage swing short-haul specification to accommodate varying channel lengths and limit power dissipation and radiated EMI. Developers also have the option of increasing the overall data rate per port by using either one or four differential pairs/lanes per port.

The 8/16 LP-LVDS specification defines a parallel source-synchronous signaling interface passing either 8 or 16 bits at a time together with a framing bit and clock. Compared to the serial PHY, the parallel PHY requires more signals but offers inherently less latency per clock by eliminating the need for serialization and deserialization of the data stream. This makes the parallel PHY ideal for applications such as processor front-side buses and board-level designs that require very high bandwidth and the lowest possible latency.

As both the serial and parallel PHYs support two width options, the RapidIO specification also automatically allows interoperability between them by detecting when a narrow PHY is connected to a wider one.

Deadlock avoidance
Read transactions introduce the possibility of dependency loops which can lead to deadlock. In a RapidIO system, these loops can arise when the entire loop between two endpoints is filled with read requests, and deadlock avoidance measures are required that ensure that responses to outstanding reads can complete and therefore release their resources.

The RapidIO protocol uses priority mechanisms to avoid deadlock. Since switches and endpoints are required to handle higher-priority packets first, assigning responses a higher priority than the associated request ensures that responses are able to make forward progress through the fabric. By assigning each packet PHY priorities, logical flows can be identified, ordering rules enforced and deadlocks avoided. Additionally, defining priority at the physical layer greatly simplifies switch operation and design as there is no need to know a packet's transaction type or its interdependency with other packets when making switching decisions.

For priority mechanisms to be effective, switches must manage their buffers to prevent lower-priority packets from filling a buffer and blocking acceptance of higher-priority packets. This can be achieved in a number of ways, including assigning at least one buffer to each priority. RapidIO systems can also implement virtual channels requiring completely separate buffer pools by dedicating buffers to particular flows. A more common approach is to allow buffers to hold packets of a set priority or higher. In effect, this gives higher-priority packets access to the entire buffer pool.

Quality of Service
QoS is an inherent part of the RapidIO specification, implemented directly in hardware and enabling traffic to be classified into as many as six prioritized logical flows. While the mechanism for forward progress in the fabric relies upon ordering rules at the physical layer to give responses higher priority, the degree to which prioritization results in lower average latency or jitter for a particular flow is specific to the actual implementation. For example, more aggressive switches might make ordering decisions based upon a flow's priority, source, and destination ID fields while less aggressive designs might only utilize the priority field.

QoS is also affected by specific fabric arbitration policies. While the specification explicitly defines prioritized flows, developers are free to choose the particular arbitration policies to put into place to prevent starvation of lower-priority flows, such as the well-known leaky-bucket scheme. As even the least aggressive design must support these mechanisms, higher-priority flows are guaranteed to demonstrate better lower-average latency.

For applications requiring even more aggressive and effective QoS, advanced flow control and data plane capabilities are available. The RapidIO protocol defines multiple flow control mechanisms at the physical and logical layers. By managing physical layer flow control at the link layer, short-term congestion events are effectively managed for serial and parallel applications using both receiver- and transmitter-controlled flow control. Longer-term congestion is controlled at the logical layer using XOFF and XON messages which enable the receiver to stop the flow of packets when congestion is detected along a particular flow.

Receiver-only flow control, where the transmitter does not know the state of receiver buffers and the receiver alone determines whether packets are accepted or rejected based on receiver buffer availability, results in packets being resent, creating wasted link bandwidth. Additionally, ordering rules require a switch to send higher-priority packets before resending any packets associated with a retry, aggravating worst-case latency for lower priority packets.

Transmitter-based flow control avoids bandwidth wasting retries by enabling the transmitter to decide whether to transmit a packet based on receiver buffer status. Through receiver buffer status messages sent to the transmitter using normal control symbols, the transmitter is able to limit transmissions within the maximum number of buffers available at the receiver. In general, priority watermarks at the various buffer levels are used to determine when the transmitter can transfer packets with a given priority.

A third link-level mechanism is available within the parallel PHY specification which enables the receiver to throttle the packet transmission rate by requesting that the transmitter insert a selectable number of idle control symbols before resuming transmission of packets.

Revision 1.3 of the RapidIO specification achieves further efficiency and higher throughput through the introduction of data plane extensions. Since data plane fabrics can carry multiple data protocols, these extensions enable the encapsulation of virtually any protocol using a data streaming transaction type with a payload up to 64 Kbytes. Hardware-based SAR support is expected for most implementations, with up to 256 classes of service and 64,000 streams.

The upcoming 2.0 revision of the specification builds on revision 1.3 capabilities, introducing a new 5.0 Gbaud and 6.25 Gbaud PHY, lane widths up to 16x, 8 virtual channels with either reliable or best-effort delivery policies, enhanced link-layer flow control, and end-to-end traffic management with up to 16 million unique virtual streams between any two endpoints.

The RapidIO protocol is a simple and efficient interconnect designed specifically for high-speed embedded applications and appropriate to serve as a system-level fabric. By implementing protocol processing in hardware, many quality of service and flow control mechanisms are an inherent part of the PHY, maximizing efficiency and throughput while minimizing latency and switch complexity. Backed by new data plane extensions which enable RapidIO switches to encapsulate virtually any data protocol, the RapidIO specification is an ideal interconnect technology, enabling developers to consolidate interconnect layers, as well as both control and data planes, into a single fabric, reducing cost while increasing overall system reliability.

About the Author
Greg Shippen
is system architect at Freescale Semiconductor's digital systems division, NCSG. He is also a member of the RapidIO Trade Association Technical Working Group and Steering Committee.

Article Comments - A technical overview of RapidIO
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top