Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Embedded

Superscalar CPU executes two threads at once

Posted: 25 Nov 2010 ?? ?Print Version ?Bookmark and Share

Keywords:broadband? video? SoC?

The company started a parallel CPU effort in 2000 when it acquired SiByte, a startup that had developed its own high-end MIPS CPU. In 2005, however, Broadcom decided that integrated SoCs were a better investment than standalone processors, so it merged the SiByte CPU team with its in-house efforts. The new team brought in expertise in high-performance CPU design, allowing Broadcom to develop a much more powerful CPU: the BRCM 5000.

Mutlithread design
When choosing an architecture for the BRCM 5000 CPU, the design team examined several options, according to Ramesh Senthinathan, senior director of engineering, broadband communication group, Broadcom. "Increasing CPU performance through complex out-of-order techniques produces an exponential rise in die area and power with relatively little increase in performance," he said. "Multithreading turns out to be a more efficient way to achieve higher performance."

Multithreading helps fill the empty cycles caused when the CPU must access the second-level cache for data. In this case, the CPU simply executes instructions from the second thread until the first thread receives its data. Multithreading also allows the BRCM 5000 to emulate the dual-CPU structure of the predecessor BRCM 4380. Because the two threads appear to software as separate CPUs, a single BRCM 5000 core can run two OS.

Depending on the number of cache misses it encounters, a single-issue CPU can fill 60 to 75 percent of its execution slots on many software applications. This situation leaves relatively few slots for the second thread, limiting the performance gain of multithreading. A CPU that can issue two instructions at a time, however, will typically fill about 50 percent of its execution slots, leaving plenty of room for the second thread. According to Senthinathan, this dual-issue, dual-thread design is a "sweet spot" for multithreading, which is why Broadcom chose this approach for the BRCM 5000.

To achieve the 1.3GHz cycle time, Broadcom used a combination of custom logic and synthesized logic. For example, the clock tree is hand-designed to minimize clock skew. Critical speed paths use custom domino circuitry. Floor planning is also important, so the major circuit blocks are placed early in the process to minimize wire delays. Broadcom's Central Engineering team provided custom circuits such as high-speed SRAM and register files to achieve the high frequency.

Chips using the BRCM 5000 include a technology that Broadcom calls Adaptive Voltage Scaling (AVS). The chip contains certain test circuits that determine if it is operating near the fast-fast corner or the slow-slow corner. These test circuits contain both analog and digital functions to get a precise reading of the transistor characteristics.

For a chip with fast, leaky transistors, the supply voltage is internally lowered, reducing both leakage and transistor speed, but the fast transistors can still achieve the rated clock speed even at the lower voltage. Conversely, the voltage is increased for chips with slow transistors, boosting their performance. Thus, AVS reduces the rated worst-case power, which only occurs in fast-fast chips, while improving speed yield.

Comparing with competitionAlthough 1.3GHz may pale in comparison to the top clock speeds advertised by ARM and MIPS for their licensable cores, these numbers are not directly comparable. Broadcom rates its CPU at its production clock speed, which is based on the worst-case (slow-slow) chips using the standard voltage with 10 percent margin. IP cores are often rated at the typical or even fast-fast corner with overvoltage and no margin. Under such extreme conditions, the BRCM 5000 could easily exceed 2GHz, says Senthinathan.

Intel's Atom CPU is starting to appear in some consumer applications. Like the BRCM 5000, Atom is a dual-issue dual-thread CPU with no out-of-order execution, so its performance per megahertz should be similar. Atom typically ships at 1.6GHz, but its speed must be cranked down to match the power levels of Broadcom's CPU. Furthermore, Intel does not provide Atom in a highly integrated STB or DTV processor, although the company is likely to do so in the future.

Using its new BRCM 5000 CPU, Broadcom is already shipping a variety of SoC products for broadband and video applications. This dual-thread design can match or exceed the performance of common consumer CPUs such as Cortex-A9, MIPS 74K, and Atom, meeting the needs of emerging STBs with high-definition 3D user interfaces and other advanced capabilities. Yet Broadcom has carefully designed its processors to reduce their cost and power, both within the processor itself and at the system level, giving it an edge in this highly competitive market.

- Linley Gwennap
?? The Linley Group

Find related content:
??-?company/industry news

?First Page?Previous Page 1???2

Article Comments - Superscalar CPU executes two threads...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top