Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Interface

Researchers offer alternative to common crossbar design

Posted: 22 Aug 2002 ?? ?Print Version ?Bookmark and Share

Keywords:on-chip interconnect? switching fabric in routers? transistor? design?

In a technical paper presented at the Hot Chips conference here Monday (Aug.19) researchers Ting Wu, Chi-Ying Tsui, and Mounir Hamdi from Hong Kong University of Science and Technology (China) offered an alternative pipeline approach to crossbar design.

Their approach has yielded a 256-by-256 signal switch with a 2GHz input bandwidth, simulated in a 0.255m, 5-metal process.

The growing importance of crossbar switch matrices, now used for on-chip interconnect as well as for switching fabric in routers, has led to increased study of the best ways to build these parts.

The obvious way to implement a crossbar switch fabric, according to presenter Tsui, is to simply route inputs horizontally and outputs vertically, and then to place a pass transistor at each intersection. Turning on the transistor connects an input line to an output line. The layout is intuitive, and provides easily for multicasting.

But even setting aside the unpleasant characteristics of pass transistors there are serious disadvantages to this approach. Setting up the switch requires n-squared control bits, and, more important for high-bandwidth interconnect, the performance of each connection is limited by the on-resistance of the pass transistor, and the capacitance of both the input and output lines, all of which must be long enough to span the entire matrix in a fully populated switch.

For this reason most high-performance implementations are done not with pass gates but with MUXs. Each output is driven by a wide MUX that selects one of the input lines.

The routing is variable and less obvious, but contains many shorter segments, reducing both input capacitance and potential crosstalk hot spots. The output capacitance is comparatively quite small.

But as signal rates approach the capabilities of the process, problems exist with the MUX architecture as well. There are still long wires on the input routes, which must span the whole array of MUXs, and as the number of inputs and outputs increases, the wire delays increase as well. MUX complexity increases rapidly with increased switch width as well.

The Hong Kong University researchers decided to take a novel approach to this problem by pipelining the MUXs. Thus one 256-bit MUX is replaced by several cascaded narrower MUXs separated by registers. The result is alleviation of the problems of MUX-based crossbar design, but in exchange for some new issues.

The basic element of the new design is a flip-flop with an embedded MUX. The flip-flop chosen was a semidynamic device attributed to Klass and Stojanovic, favored for its negative set-up time and small transistor overhead.

The designers chose to break the MUXs into a cascade of a 2-to-1 static MUX, a 4-to-1 flip-flop/MUX, an 8-to-1 static MUX and finally another 4-to-1 flip-flop/MUX. This gave the design an effective two-stage pipelined 256-to-1 MUX.

But upon floorplanning and delay estimation it was found that the length of the input lines, reaching all the way across the row of first-level MUXs, was still too great to achieve the necessary 1GHz signal speed. So the researchers added another pipeline stage, this one simply for wire delay. In effect, they broke the crossbar in half with a vertical line of flip-flops. A matching set of flip-flops was placed on the output of the pipelines for the first 128 outputs, so that both the pipelines - those for the left half of the array and those for the right half - had the same number of stages. This technique broke the long input lines in two and produced an acceptable capacitance. It had been calculated that buffer insertion would not have yielded a sufficient bandwidth in this process.

With the addition of control and clocking circuitry, a 256-by-256 crossbar, each line handling 1Gbps, was designed. The design was laid out, extracted and simulated, but has not been fabricated.

While the device meets its performance requirements in simulation, there are some disadvantages to the approach. Obviously, clock distribution becomes a critical issue. Further, because of all the synchronous stages, the device was estimated to consume 40W in operation.

But as an approach to the very real problems of layout and timing in larger crossbars, the design contributes some valuable new ideas to the toolkit.

- Ron Wilson

EE Times

Article Comments - Researchers offer alternative to com...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top