Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Processors/DSPs

How to achieve 200-400GE network buffer speeds

Posted: 27 Nov 2014 ?? ?Print Version ?Bookmark and Share

Keywords:400GE? DDR4? ASIC? FPGA? transmission protocols?

Applying the ongoing bit error mitigation of the short CRC per operation ensures less than 1 Failure in Time (FIT). Comparing the system impacts to parallel interconnects, parallel memory interfaces usually uncover the data, leaving addresses and commands potentially vulnerable. However, GCI protocol protects all aspects of the interconnect including commands, addresses, data, etc. Since the GCI specifies error counts as part of its specification, error rates can be monitored periodically. In the very unlikely event that the error count exceeds a prudent level, the system may choose to retrain the links. Figure 6 shows the schematic of CRC error handling with positive ACK.

 CRC handling with positive ACK

Figure 6: CRC handling with positive ACK.

How GCI performs in a system
Because the GCI is agnostic it can carry a variety commands of varied lengths mapped into 72b frames. The data return is over a 72b bus. Figure 7 shows commands that cross frame boundaries are immune to the effects of errors. GCI can carry short transactions suitable for header processing as well as packet buffering applications. The protocol is efficient for short 9B (72b) transfers when compared to the Interlaken Look-Aside (ILA) or Hybrid Memory Cube (HMC).

Using GCI in a packet processor
GCI has been used in two generations of the Bandwidth Engine Family for the past 4 years. Multiple supporters and users have also adopted the GCI in ASIC and FPGA based systems including Altera, Xilinx, Tabula, LSI, and Avago. Although the GCI is compatible with the standard electrical layer as defined by OIF CEI 11 SR, it can also be used with other electrical standards. Figure 7 shows relationship of the GCI protocol in a packet processor including an intelligent serial memory co-processor.

 GCI implementation

Figure 7: GCI implemented with packet processor and intelligent serial memory processor.

When implemented, the GCI supports high bandwidth density with differential serial links and is lightweight, requiring only 100,000 ASIC gates for 8 lanes. In current design implementations that run 8 links, each at 10Gbit/s, performance reaches in excess of 1 Billion 72b transfers per second with 90% efficiency. The protocol incorporates PRBS scrambled encoding, fixed 80b frames, and provides for 8 lanes @ 10Gbps with 1 ns latency.

The GCI is highly scalable in 1,2, 4, 8 and 16 lane configurations and scales with OIF roadmap. By implementing 6b CRC per frame with a built-in frame replay mechanism, it achieves less than 1 undetected error in 1025 frames transferred. Stated another way, the GCI achieves reliability of less than 1 FIT or 1 undetected error in 1 billion hours (150,000 years of operation).

Reliability is critical when transferring commands to co-processors and control memory. Any corruption in the data could result in lost state. The ideal interface uses a reliable chip-to-chip transport protocol that supports end-to-end data protection. The protocol must be agnostic with reference to the payload and precisely recoverable should a bit failure occur.

This is important to support transactions with higher levels of abstraction. In the future, higher levels of functionality are envisioned to support Longest Prefix matching and indirect data structure accesses. The protocol must also be efficient at transferring small transactions such as structure pointer values as well as large transactions for packet buffering. The protocol must provide low latency because this has a direct impact on how much system latency is incurred and how much speculative work is done before a transaction can be committed. The GCI protocol meets all these criteria.

1. P. Koopman and T. Chakravarty, Cyclic Redundancy Code (CRC) polynomial section for embedded networks, International Conference on Dependable Systems and Networks (DSN), Florence, Italy, June-July 2004.
2. Watanabe, D.; Advantest Corp., Gunma Japan; Suda, M. ; Okayasu, T. "34.1Gbps low jitter, low BER high-speed parallel CMOS interface for interconnections in high-speed memory test system," Test Conference, 2004. Proceedings. ITC 2004. International, October 26-28, 2004.

About the author
Michael Miller is vice president of technology innovation and systems design at MoSys. Previously he was Chief Technical Officer, Systems Architecture for Integrated Device Technology, Inc. He has also held engineering management positions in software, applications and product definition for networking memory, RISC processors and communications ICs, serving IDT for more than 20 years. He has also managed software teams within two systems companies and filled logic design and application functions at Advanced Micro Devices. He has a bachelor of science degree in computer science from the California Polytechnic State University at San Luis Obispo and has been awarded 30 patents to date.

?First Page?Previous Page 1???2???3???4

Article Comments - How to achieve 200-400GE network buf...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top