Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Controls/MCUs

Roadrunner strives to lead supercomputer race

Posted: 01 Jul 2008 ?? ?Print Version ?Bookmark and Share

Keywords:supercomputer? Roadrunner? Infiniband network? pentraflop barrier?

A handful of engineers have assembled what they expect will become!at least for a while!the world's most powerful computer. IBM Corp.'s Roadrunner likely will go down in history as the first computer to consistently crank out 1 petaflops!a quadrillion floating-point operations per second.

Some 3,240 compute modules are slotted together and linked on a two-tier Infiniband network. Each Roadrunner module consists of two AMD dual-core Opteron processors and four PowerXCell 8i CPUs!65nm versions of the Cell processor developed by IBM, Sony and Toshiba for the PlayStation. The completed system will consume 4MW and fit into a 6,000-square-foot room.

"We have been manufacturing and populating this system for the last several months, and the last pieces should be in place today; then we will make a run for the roses with the Linpack test," said Don Grice, chief engineer of the Roadrunner project at IBM.

The Linpack test is a standard benchmark of theoretical performance used as a measure for entry onto the Top 500 list of the world's most powerful computers.

The Top 500 list is published in June and November each year around the major supercomputing conferences. Typically, April 15 is the cutoff date for the June rankings, but list administrators have granted an extension to the Roadrunner team.

Once fully tested by IBM, the system will be packed up and shipped to Los Alamos National Laboratory in New Mexico, where it will be used to run classified physics experiments as part of the U.S. nuclear missile program.

IBM puts two of its modified 65nm Cell processors on a 250W board and uses two of the boards in each of the 3,240 nodes of the Roadrunner supercomputer.

Modeling climate change
Breaking the petaflops barrier is "certainly a milestone. What it really accomplishes is giving scientists an ability to do more science than they could before," said Buddy Bland, project director for a major supercomputer center at Oak Ridge National Laboratory, which expects to install its own petaflops system in Q3 08.

The lab has a Cray XT-4 that can deliver 263Tflops, and it can be used by scientists for unclassified scientific work. Its petaflops system will be follow-on to the XT-4 that raises the number of AMD Opteron cores from 31,000 to 100,000.

The system has run global climate simulations that fed into the work of Nobel Peace Prize winner Al Gore. It is also being used to help design a fusion reactor being built in France. "Our mission is to deliver the largest systems possible to the unclassified science community to solve some of the most challenging problems facing the world today," said Bland.

He noted that the teraflops systems of a generation ago could model in one day five years of climate change. His current system can model 40 years of change in a day.

Performance gap
A National Aeronautics and Space Administration (NASA) supercomputer specialist agreed that a deeper problem lurks below the excitement of breaking the petaflops barrier. "Roadrunner is very impressive, but the bottom line is how much real work you can do on the machine," said Bill Thigpen, chief of the engineering branch at NASA's advanced supercomputing division in the Ames Research Center. "One of the challenges is being able to get the available work out of the theoretical performance peak."

Thigpen said he has observed a growing gap between the rate at which benchmark performance is rising and the increases in actual work delivered by new machines. For example, NASA's next-generation supercomputer!a 245Tflops system called Pleiades, based on Intel Corp.'s quad-core Xeon processor!has twice the theoretical performance but handles only 1.5 times the actual work of NASA's current top system, the 89Tflops Columbia, based on Intel's Itanium CPU.

The good news for NASA is that the more powerful Pleiades system actually costs less than the older Columbia system did, and at about 1MW, it consumes a little less than half the power.

"We're busy trying to meet NASA's requirements for computing to design spacecraft and look at ways to mitigate what humans are doing to the Earth, looking at global warming and ocean and earthquake simulations," Thigpen said. "It requires more and more computing to meet these needs."

"We have people who need 40,000 cores to do their global-climate simulation, and that's something we can't offer them right now," Thigpen said. The system must also support design simulations for the Constellation vehicles that will travel to the moon and perhaps Mars.

"They could use our whole system, but they are just one mission we need to support," he said.

Heterogeneous advantages
What computer scientists learn from the IBM Roadrunner's use of heterogeneous processor cores may be more important than the fact the system breaks the petaflops barrier.

In addition to its AMD x86 cores, the Roadrunner has Cell processors with eight vector-processing cores and a PowerPC controller. The system looks like a standard message-passing supercomputer made up of x86 cores, but it can also offload application hot spots to the Cell for acceleration by invoking the parallel libraries IBM provides.

IBM is providing software tools and a scaled-down version of the Roadrunner for what it hopes could be dozens of other computer users who want to try a similar approach.

Don Grice expects IBM's Roadrunner to be the first supercomputer to benchmark at a sustained petaflops rate.

There is still a lively debate as to whether the heterogeneous approach is the best one. Grice is quick to admit the industry is still looking at hybrid systems for a standard programming model that would be the equivalent of the message-passing interface used widely in today's homogeneous supercomputers.

"We are trying to find the right way to structure the hardware to make the software easier to program," said Grice. "There is a little bit of extra burden on the programmer to control via software memory flow and caching on the Cell," he added. "We need to continue to come up with good parallel libraries and algorithms to stack together into solutions."

Several other systems are actively engaged in a race to break the petaflops barrier, but none are in a stage of actually testing performance. Dongarra said some of the machines might be in a position to be tested in time for the November rankings.

Japan had the world's most powerful supercomputer, the Earth Simulator, for five iterations of the Top 500 list starting in 2002. But in November 2004, IBM's 70-Tflops BlueGene/L system at Lawrence Berkeley Lab leapfrogged the 35Tflops Earth Simulator. Since that time, the BlueGene system has remained the most powerful system in the world. It is now rated at about 478Tflops.

Japan has announced a follow-on project called the Life Simulator, targeted at achieving 10Pflops of sustained performance. But it is not expected to be ready until 2011, Dongarra said.

- Rick Merritt
EE Times

Article Comments - Roadrunner strives to lead supercomp...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top