Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Controls/MCUs

Multicore design strives for balance

Posted: 01 May 2006 ?? ?Print Version ?Bookmark and Share

Keywords:multicore architecture? XX? XX? XX? XX?

The consensus from the recent Multicore Expo in Santa Clara, California was clear: Putting more than one core on a chip is the best way to elevate performance while keeping power under control. But the advantages of multicore architectures will be lost unless new approaches are developed for programming and debugging multicore systems.

Programming is tough enough for today's multicore systems, which are often two- or four-core homogeneous systems using symmetric multiprocessing (SMP). The wave of the future, many observers believe, is heterogeneous multicore ICs that may contain a mix of general-purpose processors, coprocessors and DSPs, along with different operating systems.

And these systems may grow very large. The number of cores will double every 18 months, with 256-core systems becoming common by 2011, predicted Anant Agarwal, professor of engineering and computer science at the Massachusetts Institute of Technology and founder of startup Tilera Corp.

Taking advantage of even simple multicore ICs is likely to require multithreading and parallel processing. These techniques are nothing new, but they haven't been widely used by embedded-systems programmers in the past.

"With multitasking software, you have to parallelize the application," said Robert Craig, senior software engineer at QNX Software Systems. "There's a lot of fear associated with concurrency in the embedded space. There's an education people have to go through with multiple processors to understand how to get performance out of them."

Rupert Baines, VP of marketing at picoChip Designs Ltd, noted that many multicore devices have failed because they couldn't be programmed. "The software development environment is the elephant at the party," he said. "It's the thing that's often forgotten. But if no one can program the device, it just sits there as an interesting curiosity."

More than half of the development money for embedded systems goes into software, said Michael Uhler, CTO of MIPS Technologies Inc. Hardware optimizations are wasted if software development is too costly, he noted. "It's still not easy to partition software, even if you have SMP."

Many questions
Multicore architectures raise many programming questions, noted Tomas Evensen, CTO of Wind River Systems. Among them: How do you boot the system? Do you run different operating systems on different cores? How do the cores communicate? How are resources, such as memory and caches, shared? How is load balancing done?

Debugging is perhaps the most problematic question of all. "It's really tricky to debug things happening in parallel," Evensen said. The hardest problems, he said, involve interactions between the cores, especially race conditions. The cause of the problem and its effect may be millions of cycles apart.

The old way of debugging by stopping the processor doesn't work when multiple processors are running at once. The key, said Evensen, is to debug dynamically without stopping the cores. Wind River's Workbench debugger, he said, lets users add "sensor points" dynamically while the system is running. He acknowledged, however, that the sensor points can change the timing.

Evensen noted that most people today are using two to four cores. When the number skyrockets, he said, "the solution may be a new language where parallelism is built in. C/C++ will stick around for a while, but when you see 256 cores on a chip, we'll need more than this."

For its part, picoChip is already confronting the problem: Its PC102 picoArray includes 308 processors and 14 coprocessors. The PC102 tool chain includes a VHDL parser, C compiler, cycle-accurate simulator, design partitioner, "place-and-switch" tool, network checker and debugger. C is used to program individual processors, and VHDL is used to connect processors together. Although partitioning the design among multiple chips is manual, the picoChip tools automatically split signals that cross clock boundaries.

To debug a system with more than 300 cores, the trick is the placement of probes on signal lines, said Gajinder Panesar, chief architect at picoChip. These non-intrusive probes gather useful data without impacting the performance of signal-processing blocks. Several types of probes are provided with the debugger, and users can define their own probe types.

What the industry urgently needs is a multicore design flow that supports the mapping of multiple applications onto multiple cores, with runtime load balancing, said Rudy Lauwereins, VP of design technology at Belgian research center IMEC. Lauwereins described a tool flow that IMEC has developed for a tightly coupled one-dimensional and two-dimensional VLIW processor template. Most C code is mapped to the one-dimensional VLIW, but tight loops are mapped to the two-dimensional VLIW, which offers a higher degree of parallelism. A retargetable tool suite includes a compiler, hardware generator, cycle-accurate simulator and instruction-accurate simulator.

According to Lauwereins, the multicore design flow requires three new steps: platform-independent optimization, which creates an optimized C model; a sequential-to-concurrent model conversion, which produces parallel code; and platform-dependent optimization. "It's not a hardware game; it's a software game," he said. "Most of the time, you should start with software and figure out how the architecture should be built, not the other way around."

Multicore devices may challenge some of the design community's fundamental assumptions about programming. Because multicore communication is cheaper than memory access, said MIT's Agarwal, designers must migrate from memory-oriented computation models to communication-centric models. Traditional cluster-computing programming methods squander the multicore opportunity, he said, because message-passing and shared-memory techniques were designed assuming high-overhead communications.

Agarwal advocates a "stream" programming approach that makes minimal use of memory. Values are read from the network, computed and sent out. This avoids memory-access instructions and synchronization. Thanks to the stream approach, he said, MIT's "Raw" architecture has a three-cycle overhead on the first word.

With the advent of new programming technology, many fear widespread incompatibility without standard application programming interfaces and protocols. "Open standards are the only way to go," said John Bruggeman, chief marketing officer of Wind River. "I propose that we as an industry decide and set standards for APIs for multicore, we define operating system requirements, and we as a group define the needs of development tools."

Wind River announced a broad multicore initiative that includes strong backing for standards, including the Eclipse integrated development environment and the Multicore Association's debug API. Wind River is the primary mover behind the transparent interprocess communications protocol, now under standardization by the association.

- Richard Goering
EE Times

Article Comments - Multicore design strives for balance
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top