Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Controls/MCUs

Intel's 80-core chip crunches 1Tflop at 3.2GHz

Posted: 19 Feb 2007 ?? ?Print Version ?Bookmark and Share

Keywords:processor? multicore? Polaris? Intel? Teraflop Research Chip?

A close-up of the Teraflops wafer containing the Intel Teraflops Research Chip.

Intel Corp. has demonstrated its Teraflop Research Chip!code-named Polaris!at last week's International Solid-State Circuits Conference in San Francisco.

The 80-core chip crunches 1 trillion flops when running at a 3.2GHz clock speed and consumes 62W, to yield a record 16Gflops/W. And by cranking the clock up to 5.6GHz, the chip bested 1.8Tflops!that's 80 percent faster!albeit by increasing power consumption fourfold to 265W, or 3.7Gflops/W.

"Others are building massively parallel multicore chips, but with this research chip Intel is thinking outside the box," said Jim McGregor, who is principal analyst and research director of the Enabling Technologies Group at In-Stat. "Intel also plans to make the necessary software efforts to fully realize the capabilities of high-core-count chips, including special instructions, new software tool sets, new software development tools and new software compilers."

Last year, Advanced Micro Devices Inc. announced a coherent-processor approach to multiprocessors dubbed Torrenza and based on its proprietary HyperTransport CPU bus. Intel and IBM Corp. countered with Geneseo, a set of extensions to PCIe that manages massively parallel computers using a content-addressable memory. Startups like Ambric Inc. have announced plans for noncoherent multiprocessor research chips. Ambric's Kestrel device will pack 360 RISC processor cores.

"Intel's 80-core chip is basically a mainframe-on-a-chip-literally," said McGregor. "It's the equivalent of 80 blade processors plugged into a high-speed backplane, or 80 separate computers using a high-speed hardware interconnect." The Teraflop Research Chip's hardware does the multitasking coordination "instead of depending on software, which just could not keep up with 80 cores," he said.

Photos reveal that almost a quarter of the area of each of the 80 cores is dedicated to the mesh router, which can simultaneously coordinate communication among any adjacent cores. Also on board is a 3-D vertical path to an SRAM that will be located on a planned separate chip stacked above the processor chip.

"AMD took the first step by integrating a crossbar switch into each core on their X86-based multiprocessor, but Intel's effort goes beyond having just one switch per core," said McGregor. "Intel's router-per-core will aid in performance scaling and power scaling as well as enabling self-repair of chips, since if one of the cores gets damaged they can just disable it and route around it in a manner transparent to software."

Massively parallel apps
All these chips are aimed at massively parallel applications enabling higher-performance computing abilities for scientific simulations, such as for global-warming and weather mapping; data-intensive applications where massive amounts of information have to be processed, such as financial modeling and transaction processing; and security applications, where massive databases need to be scanned in real-time.

"All these multicore processors will initially be aimed mainly at enterprise-level applications, though there are also possibilities for consumer applications in the future, such as creating virtual environments," said McGregor. "As time goes by, more and more applications will be able to use these massively parallel capabilities." He cited, for example, medical applications, "where you have to compare tons and tons of images and view them anywhere in the world."

This board houses Intel's 80-core Teraflops Research Chip.

Made in Ireland
Intel fabricated its 80-core Teraflop Research Chip in its Ireland manufacturing facility, using a state-of-the-art 65nm process. Each core houses two single-cycle floating-point units, which were first described in another ISSCC paper presented two years ago. The 80 cores are arranged in a 10 x 8 two-dimensional mesh network, with each core housing a router with five I/Os!four of its paths going to adjacent processors and one going out vertically to an SRAM chip stacked 3-D style above them.

The demonstration, however, did not use the second stacked memory chip, but relied on the 5Kbytes of memory (2Kbytes for data and 3Kbytes for instructions) located inside each core.

"Each of our cores measured 3mm?, including its two independent 32bit floating-point processors with single-cycle instruction execution," said Jerry Bautista, director in Intel's Tera-Scale research program. "A separate 2Mbytes of SRAM for each core will be mounted on a second chip vertically above the Teraflop Research Chip, with one of the ports in the five-port router communicating with it vertically."

Although the chip consumed so much power!from 62W to 265W!that it required a special bench-mounted liquid-cooling apparatus, Intel claims that the IC met its design goals and that final production versions will consume less power and require only fans for air cooling.

"We exceeded our design goal, which was 1 teraflops of performance for under 100 W, when in fact we got a teraflops at only 62 W," said Nitin Borkar, the engineering manager of the lab team for the Tera-scale Research Chip.

Using mesh networks
According to Intel chief technology officer Justin Rattner, the architectural goal of the Teraflop Research Chip was to learn how to use hardware to manage multicore processors that exceed the management capabilities of software alone. Intel has already achieved some of those goals, discovering that mesh networks make it possible to relax some of the timing restraints, compared with conventional processors.

"We want to understand how to manage cores at these high counts," said Rattner. "And one thing we have already come to understand about high core counts is that timing does not have to be as uniform as we are used to. For instance, cores communicating to each other don't need all their clocks synched with 3ps accuracy across the whole chip, as would be required if it was one big core."

Besides architectural issues, Intel also fabricated the Teraflop Research Chip to learn how to cope with the inevitable nonuniformities that will be inherent in future processors as they scale down to the atomic level. At 22nm and beyond, "dopants are going to get down in the tens of atoms, where there is no way to get uniformity," said Bautista. "Some units are just going to fail, so we have to plan for it. And with our mesh network, some units can fail and other units can transparently pick up their workload."

For future test chips, Intel will try creating even larger core counts, as well as making application-specific models that aim at particular problems. "There's nothing that says all our cores have to be the same in the future," said Bautista. "There are all sorts of specialized cores we could add for specific applications."

Transcontinental effort
Counting the fabrication team in Ireland, the Teraflop Research Chip team was spread across three continents and consumed more than a year of effort. "We had a team of 30 engineers, half of which were in Bangalore, India and half in the United States!which made for some late-night work occasionally over our 18 month development effort," said Borkar.

Now Intel plans to enlist the expertise of its software engineers to begin creating specialized program development and management tools that can handle such high-core-count multiprocessors.

"The biggest hurdle for Intel will be software, because there is no OS today that could take advantage of the power of 80 cores," said In-Stat's McGregor. "Even the smartest programmer on the planet today still could not take advantage of so many cores, except for specialized applications. So Intel will need to create a new generation of software tools and a new generation of software engineers trained to use them."

Intel has already started that effort and pledges to seed as many as 400 universities worldwide with multiprocessor development tools, as well as to assist them in creating curriculums that result in a new generation of multiprocessor software engineers.

- R. Colin Johnson
EE Times

Article Comments - Intel's 80-core chip crunches 1Tflop...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top