Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Power/Alternative Energy
Power/Alternative Energy??

For startup, power savings start in architecture

Posted: 16 Sep 2004 ?? ?Print Version ?Bookmark and Share

Keywords:media? silicon? mobile? processor? architecture?

Most discussion of energy efficiency in fine-geometry processes focuses on the finest levels of abstraction: materials, transistors and libraries. But researchers at the system level seek energy efficiency in architecture, not detail. Energy squandered in architectural design or software implementation cannot be recovered by even the finest low-power libraries or processes; carelessness at the beginning can stalemate the savings that might have been available at later stages.

Surprisingly, little has been done in recognition of these facts. CPU and signal-processing architectures are, for the most part, legacies of previous generations, when energy management was at best an afterthought. Thus it has usually been left up to RTL designers to throw in some gated clocks or ad-hoc power-down circuitry.

But fabless chip supplier 3Plus1 Technology Inc. is taking a different tack. The company is entering the market for mobile, connected media-processing silicon with an application-specific processor built around the concept that energy efficiency begins in the architecture.

"Saving dynamic power has two components," said Amir Zarkesh, EVP of engineering at 3Plus1. "The first is utilization. At any moment you want all of the circuitry that is powered up to be actually doing necessary work."

"The second is what we refer to as adiabatic computing. This is not about adiabatic logic circuit design, which is a very interesting area of research. It is simply the fact that, other things being equal, the less charge you move on a given cycle, the less energy you will consume."

Those principles led 3Plus1 into a novel design methodology. The design starts with a detailed signal-flow analysis of the range of applications to be covered. Then comes the art. Architects must extract from this data a set of common functions that are similar enough across the whole range of applications that they can be implemented in single parameterized hardware blocks.

The blocks are then implemented according to the two guiding principles. Latencies are arranged to keep the blocks continually active during processing, using only the speed!or energy!necessary to complete the function by its deadline. And the individual nets that implement the block are organized so that signals with high transition rates are confined to low-capacitance nets.

The resulting architecture, Zarkesh said, is highly asymmetric. Engines run specific macro-level functions!midway in complexity between a conventional functional block and a processor!and interconnect to implement specific data flows. It means the chip design is efficient only in a narrow range of tasks, with the architecture best used to map specific instances of tasks statically unto specific engines.

In 3Plus1's first chip design, this entails 11 types of engines. The engines are clustered into two types of processors: wide blocks with 8bit to 32bit data paths; and narrow blocks with 1bit to 16bit data paths. Each cluster has up to 51 engines.

The blocks are combined as the application requires--two wide plus two narrow, for example!under the principle of having just enough engines to meet the task deadlines, but not so many that any will sit idle. The cluster of wide and narrow engines is then combined with a generic ARM core, which provides control flow and scheduling. Add local memory, a DDR DRAM interface and peripheral controllers, and you have the heart of a chip that can handle 802.11 baseband processing, video and audio codec functions concurrently. The architecture can handle a total of nine media codec standards.

The engine clusters are small: 3Plus1 estimates the area of a block comprising two wide plus two narrow clusters at 24mm? on a 130nm process. The company reckons that the entire chip, when actively executing a "scenario"!3Plus1's term for a fixed mapping of multiple concurrent tasks onto the hardware!will come in at about 100mW in the same 130nm low-power process.

A low-power process is feasible in part because 3Plus1's guiding principles result in a very high level of activity per clock cycle. Each layer of the processor will execute up to 50 operations per clock, according to president and CEO Allan Cox.

For now, 3Plus1 is describing simulated data. The company recently delivered the first functional simulation of the chip to a prospective customer. Silicon tapeout is scheduled for 2005 Q1.

- Ron Wilson

EE Times

Article Comments - For startup, power savings start in ...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top