Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > EDA/IP

Fulcrum IC heats asynchronous design debate

Posted: 21 Aug 2002 ?? ?Print Version ?Bookmark and Share

Keywords:ic design? switch chip? cmos process? asynchronous chip? verification tools?

A coolly tantalizing alternative to the latest approaches to blazing IC performance is expected to cut this week's sweltering heat at the Hot Chips conference.

The design technique is said to virtually eliminate many of the most challenging problems in high-speed system-level chip design and may successfully sidestep an emerging roadblock in conventional design flows.

This breath of cool air will come to the Stanford University campus by way of Fulcrum Microsystems, a design house derived from research work at the California Institute of Technology. Fulcrum officials are among the few North American advocates of asynchronous design. Those officials will tell the Hot Chips conference that they're now putting their silicon where their mouth is, as it were. The company will unveil a fully asynchronous crossbar switch chip that achieves 260Gbps cross-section bandwidth in a standard 0.185m CMOS process.

But the real impact of the Fulcrum crossbar is not so much its performance as, first, its existence among the few fully asynchronous chip designs of its size, and second, the substantial benefits of asynchronous design that the chip demonstrates.

The chip is by no means the first asynchronous design to be fabricated. Asynchronous techniques were common in the early days of IC design. But most modern digital design is formed from small blocks of combinatorial logic separated by synchronously clocked registers. The longest delay path between registers determines the maximum operating frequency of the circuit. So design teams lavish months, server farms and tool budgets on finding and predicting the critical-path delays.

But, as Fulcrum founder and chief technical officer Andrew Lines points out, a crisis looms in this design style. It is becoming vastly harder to predict the timing of critical paths. At the same time, the huge surges of clock current necessary in system-level synchronous designs tax the design of power distribution nets, the thermal stability of the die, the design team's ability to control noise and even the integrity of the metal itself.

Just predicting delays-an essential task in synchronous design-has proved increasingly difficult. New delay models must take into account both variations in the drive strength of buffer transistors and three-dimensional models of the interconnect network accurate enough to be turned into a distributed RLC model. Attempts to cope with these problems in 130-nm and 90-nm processes are proving less than completely reassuring.

And Lines warned that the problems are just starting. Added to the interconnect-related issues are whole new kinds of problems. "As geometries get finer, shot noise, charge sharing, thermal effects, supply voltage noise and process variations all create further uncertainty in delay," he said. Eventually, attempting to control the delay between even nearby points on a die becomes a losing battle.

The early impact of this struggle is already visible within the closely guarded inner nets of many high-performance chips. Design teams at Intel Corp. and other high-end houses have been using asynchronous techniques for several generations to avoid timing problems, reduce power, improve performance or control noise. In particular, domino logic, a throwback to design techniques of the 1970s, has re-emerged as a significant tool in high-speed design.

ARM effort

So asynchronous blocks exist, unpublicized, within many high-speed designs. In addition, a few design teams have been investigating fully asynchronous chips. A research team at the University of Manchester, England, several years ago reported fabrication of a fully asynchronous ARM processor. The chip, intended as a proof of concept, achieved similar power and performance to then-current conventional ARM CPUs.

Fulcrum itself recently announced licensing of a MIPS core as a preliminary to developing an asynchronous MIPS CPU offering. And this week the company will unveil the F1 crossbar.

The great challenge of asynchronous design, according to many in the industry, is not the fundamental techniques. Circuits for several approaches to asynchronous design are well-understood. But there exists no infrastructure of design and verification tools for asynchronous circuits comparable to that which exists for register-based synchronous design. Given that the very formalism of RTL assumes clocked registers, there is little compatibility between existing tools and asynchronous techniques.

The work at Manchester was in part directed to alleviating this problem, as was much of the CalTech research that led to the formation of Fulcrum. But the two groups pursued fundamentally different approaches.

The Manchester work, and much of the rest of the development in the industry, is based on externally timed circuits. That is, while the computational element does its work, a separate, parallel circuit creates a timing signal to indicate when the computational element is done. This signal then triggers the inputs to subsequent processes.

Predicated on predictions

Lines argued that this approach does not help with the emerging timing problems in nanometer processes. "You still have to predict, a priori, the delay in the computational element and the delay in delivering both the computational result and the 'done' signal to the next block. If you can't predict those delays accurately, you are nearly back where you started," Lines said.

Fulcrum designers are working with an alternative approach: self-timed logic. In particular, Fulcrum uses a variation of domino logic with either two-rail or four-rail encoding. In a NAND gate, for example, domino logic would implement the gate. But instead of logic True and False being represented by a high or low voltage on a single line, True might be high levels on both lines in a two-rail pair, and False might be low voltages on both lines. Either condition in which one line was high and the other low would be an invalid state. The logic gate would be designed so that its output would remain invalid until a valid input had propagated through to the output. Since domino logic is inherently state-holding, entirely latchless networks can be designed.

Obviously, the approach avoids the problem of accurately estimating delays. The signal announces its arrival at the next element, which may then set its outputs invalid and start work. Circuits continue to operate at the best speed available, despite variations in temperature, voltage, aging and process. Further, the circuits have no clock signals and thus eliminate all the power dissipation and noise generation of the clock networks.

The design flow for the F1 is as unique as its circuitry. Fulcrum starts with functional models of the blocks in a design-models detailed enough to include all the states of the blocks. Currently, the models are written in Java. The use of Java permits the functional model to be decomposed into message-passing blocks. In an iterative process, each block is defined in an intermediate language called CSP (Communicating Sequential Processes) and broken into between three and six subprocesses. Then the subprocesses are similarly defined and decomposed, until at the final iteration the subprocesses are transistor-level circuits. Thus the design flow is not exactly using either synthesis or libraries.

SoC step

The Fulcrum F1 is certainly a stake in the ground, indicating that a complex SoC element can in fact be implemented using asynchronous techniques. It is also an important tool for SoC designers who can use the design as a black-box solution to on-chip interconnect, without having to be concerned with the fact that it is an asynchronous design.

But whether the F1 proves to be a major step toward the general acceptance of asynchronous design remains to be seen. What can be accomplished by a team of ex-CalTech researchers may be quite different from what may be expected of a more mainstream industry design team. And whether the methodology constructed at Fulcrum can be ported to other teams and other designs remains to be seen.

In the past there has been unremitting hostility from the EDA community to anything asynchronous. But Cadence is an investor in Fulcrum and a partner in development of its tools.

- Ron Wilson

EE Times

Article Comments - Fulcrum IC heats asynchronous design...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top