Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Networks

Enable DSP co-processing in FPGAs

Posted: 02 Jul 2008 ?? ?Print Version ?Bookmark and Share

Keywords:sRIO? DSP? FPGA? interconnects? DSP co-processing?

Today's ever increasing demand for high-speed communication and super-fast computing in support of "triple-play" applications is creating new challenges for system developers, algorithm developers and hardware engineers alike who need to draw together a multitude of standards, components and networking equipment. At the same time, developers need to keep pace with increasing demands for performance while keeping costs low. These feats can be accomplished by leveraging Serial RapidIO (SRIO)-enabled FPGAs as DSP co-processors.

This article explains the basics of Serial RapidIO, how it compares to other interconnects, and how it enables DSP co-processing in FPGAs. It also looks at the features of Xilinx's Serial RapidIO solution.

Because triple-play applications unite voice, video and data, development and system optimization strategies must be parameterized using newer algorithms. Specific challenges that developers need to address include building scalable and extensible architectures, supporting distributed processing, using standards-based design, and optimizing for performance and cost.

Figure 1: The advent of packet-based processing ushered the trend towards serial connectivity

A closer look at these challenges reveals two themes: Connectivity!which is essentially "fast" data movement across devices, boards and systems!and Computing power!i.e., the individual processing resources that are available in the devices, boards and systems!address the needs of the application.

Connect across platforms
Standards-based designs are usually much easier than "roll your own" designs, and are the norm of the day. Parallel connectivity standards (PCI, PCI-X, EMIF, etc) can meet today's demands, but fall short when scalability and extensibility are taken into consideration. With the advent of packet-based processing, the trend is clearly towards high-speed serial connectivity. Figure 1 illustrates this trend.

High speed serial standards like PCIe and GbE/XAUI have been adopted in the desktop and networking industry. Meanwhile, data processing systems in wireless infrastructure have slightly different interconnect requirements:

? Low pin count
? Backplane and chip-to-chip connectivity
? Bandwidth and speed scalability
? DMA and message passing
? Support for complex scalable topologies
? Multicast
? High reliability
? Time of day synchronization
? Quality of Service (QoS)

The SRIO protocol standard can easily meet and exceed most of these requirements and has become the dominant interconnect for data-plane connectivity in wireless infrastructure equipment. SRIO networks are built around two "Basic Blocks"!endpoints and switches. Endpoints source and sink packets, while switches pass packets between ports without interpreting them.

Figure 2: SRIO networks are built around the two basic building blocks: endpoints and switches.

Figure 2 shows SRIO network building blocks.

SRIO is specified as a three-layer architectural hierarchy, as illustrated in Figure 3. It has the following elements:

Physical Layer!describes the device-level interface specifics such as packet transport mechanisms, flow control, electrical characteristics, and low-level error management.

Transport Layer!provides routing information for moving packets between endpoints. Switches operate at the transport layer by using device-based routing.

Logical Layer!defines the overall protocol and packet formats. All packets contain 256 payload bytes or less. The transactions use Load, Store, or DMA operations targeting a 34-/50-/66bit address space.

Figure 3: SRIO is specified as a three-layer architectural hierarchy.

A four-lane SRIO link running at 3.125Gbit/s can deliver 10Gbit/s throughput with full data integrity. SRIO is similar to microprocessor buses!it implements memory and device addressing as well as packet processing in hardware. This allows significantly lower I/O processing overhead, lower latency and increased system bandwidth relative to other bus interfaces. But unlike most other bus interfaces, SRIO has low pin count interfaces and scalable bandwidth based on high speed serial links, which can scale from 1.25- to 3.125Gbit/s. Figure 4 illustrates the SRIO specification.

Figure 4: Shown is the SRIO specification.

Computing resources
With the availability of configurable processing resources, developers are implementing applications in hardware. For example, data compression and encryption algorithms, even complete firewall and security applications, which which were previously implemented in software, are now implemented in hardware. These hardware implementations demand a massive parallel ecosystem of shared bandwidth and processing power. They require shared or distributed processing through CPUs, NPUs, FPGAs, and/or ASICs. Some of the computing resource requirements for building such a system include:

  1. Distributed processing supporting complex topologies

  2. Direct peer-to-peer communication with high reliability

  3. Multiple heterogeneous OS's

  4. Ability to support communications data plane using multiple heterogeneous OS's

  5. Availability of modular and extendable platforms that have broad ecosystem support

The SRIO protocol was architected and specified to support the disparate requirements of compute devices in the embedded and wireless infrastructure space. SRIO makes it possible to achieve architectural independence, the ability to deploy scalable systems with carrier grade reliability, advanced traffic management, and provisioning for high performance and throughput. In addition, a broad ecosystem of vendors makes it easy to build SRIO systems with off-the-shelf components. SRIO is a packet-based protocol that supports:

  • Data movement using packet-based operations (read, write, message)

  • I/O non-coherent functions and cache coherence functions

  • Efficient interworking and protocol encapsulation through support for data streaming, and SAR functions

  • Traffic management framework, by enabling millions of streams, support for 256 traffic classes, and lossy operations

  • Low control to support for multiple transaction request flows including provision for QoS

  • Priorities support to alleviate problems like bandwidth allocation, transaction ordering, and deadlock avoidance

  • Topology support for standard (trees and meshes) and arbitrary (daisy-chains) hardware topologies through system discovery, configuration and bring-up including support for multiple hosts

  • Error management and classification (recoverable, notification and fatal)

Intellectual property solutions
To support fully compliant maximum-payload operations for both sourcing and receiving user data through target and initiator interfaces on the Logical (I/O) and Transport Layer IP, vendors like Xilinx offer endpoint IP solutions for Serial RapidIO designed to the latest RapidIO Specification v1.3.

The complete Xilinx endpoint IP solution for SRIO is shown in Figure 5. It consists of the following components: LogiCORE RapidIO Logical (I/O) and Transport Layer IP; Buffer Layer Reference Design; LogiCORE Serial RapidIO Physical Layer IP; and Register Manager Reference Design.

Figure 5: The complete Xilinx endpoint IP architecture for SRIO is shown.

The IP
Xilinx provides the Buffer Layer Reference Design as source-code that performs automatic packet re-prioritization and queuing. The SRIO Physical Layer IP implements link training and initialization, discovery and management, and error and retry recovery mechanisms. Additionally, high-speed transceivers are instantiated in the Physical Layer IP to support 1- and 4-lane SRIO bus links at line rates of 1.25Gbit/s, 2.5Gbit/s, 3.125Gbit/s.

A Register Manager reference design enables the SRIO host device to configure and maintain endpoint device configuration, link status, control, and time-out mechanisms. In addition, ports are provided on the Register Manager for the user-design to probe the status of the endpoint device.

LogiCORE provides complete endpoint IP. It has been tested by leading SRIO device vendors. LogiCORE is delivered through the Xilinx CoreGen GUI tool, which allows users to configure the baud-rates and endpoints. It supports extended features like flow-control, re-transmit suppression, doorbell and messaging. This enables the user to create a flexible, scalable and customized SRIO endpoint IP optimized to the needs of the application.

Using the varied resources available in most high-performance FPGAs from Xilinx and other vendors, a system designer can easily create and deploy intelligent solutions to harness advantage scenarios like time-to-market, scalability and extensibility, future-proofing, and so forth. Below we outline some system design examples using SRIO and DSP technologies.

Embedded system example
CPU architectures such as the x86 are optimized for general-purpose applications that do not require extensive use of multiplication. In contrast, DSP architectures are optimized for signal-processing operations including filtering, FFTs, vector multiplication and searching, and image or video analysis.

Embedded systems that use CPUs and DSPs can easily be architected to take advantage of both general-purpose and signal processing. An example of such a system is outlined in Figure 6. It features FPGAs, CPUs and DSP architectures.

Figure 6: SRIO has become the primary data interconnect for CPU-based high-performance DSP sub-systems.

In the system depicted here, the PCIe system is hosted by the Root Complex chipset. The SRIO system is hosted by a DSP. The 32/64bit PCIe address space (base-address) can be intelligently mapped to the 34/66bit SRIO Address space (base-address). The PCIe application communicates with the Root Complex through Memory or I/O Reads and Writes. These transactions can be easily mapped to SRIO space through I/O operations such as streaming writes, atomic and acknowledged read/write transactions (SWRITEs, ATOMIC, NREADs, NWRITE/NWRITE_Rs).

Designing such bridge functions is easy in Xilinx FPGAs, since the back-end interfaces for the PCIe and SRIO endpoint functional blocks are similar. The Packet Queue block can then perform the crossover from PCIe to SRIO or vice-versa to establish the data flow between these two protocol domains.

DSP processing app
In applications where DSP processing is the primary architectural requirement, the system architecture can be designed as depicted in Figure 7.

Figure 7: In applications where DSP processing is the primary architectural requirement, the system architecture can be designed as shown.

Xilinx Virtex-5 FPGAs can act as co-processors to other DSP devices in the system. The complete DSP system solution can be scaled easily if SRIO is used as the data interconnect. These solutions can be future-proofed, made extensible, and be supported across multiple form-factors.

If the DSP-intensive application additionally requires fast number-crunching or data processing, such processing can be offloaded to x86 CPUs. Xilinx Virtex-5 FPGAs allow the PCIe sub-system and the SRIO architecture to be bridged to enable efficient offloading of functionality.

Baseband processing system
With 3G networks maturing rapidly, OEMs are deploying new form-factors to alleviate capacity and coverage problems FPGA-based DSP architectures using SRIO are the ideal solution to such challenges. Legacy DSP systems can also be retargeted to such fast, low-power FPGA-based architectures to harness FPGA's scalability advantage.

Figure 8: FPGA-based DSP architectures using SRIO can alleviate capacity and coverage problems in 3G networks.

In such a system, as depicted in Figure 8, FPGAs can meet the demands of line-rate processing of antenna traffic and also provide connectivity to other system resources through SRIO. Migration of existing legacy DSP applications, which have inherently slow parallel connectivity, is easy thanks to the high speed and bandwidth provided by the SRIO protocol.

Navneet Rao
Connectivity System Architect
Xilinx Inc.

Article Comments - Enable DSP co-processing in FPGAs
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top