Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Controls/MCUs

Intel publishes papers on future programmable multicores

Posted: 17 Aug 2007 ?? ?Print Version ?Bookmark and Share

Keywords:Intel technical papers? programmable multicores? future architectures?

Intel Corp. said it is releasing eight technical papers this week describing key findings from the company's work on future programmable multicore architectures.

The papers will be published in the Intel Technical Journal and will provide details on how the company expects future microprocessors with simplified parallel programming models to evolve. With the commodity server market moving quickly toward increasingly more powerful multicore processors, new tools are needed to help programmers develop software that can take full advantage of the platforms.

What's different in developing software for multicore environments is the need for parallel programming, which is the divvying up of tasks from an application among multiple processors, and having them perform the work simultaneously. The complexity of such an environment requires different development tools than the ones typically used today.

'Data center-on-a-chip'
Three of the papers analyze three characteristic future multicore applications, Sean Koehl, technology strategist for Intel, said in the company's blog. One looks at the concept of a "data center-on-a-chip." Researchers are looking at the possibility of running an e-commerce data center with 133+ processors on a single system based on a 32-core tera-scale processor. Each core would have four threads capable of taking advantage of a technique called simultaneous multithreading. SMT improves overall efficiency by permitting multiple independent threads of execution of tasks.

The paper proposes changes in the memory architecture in order to balance all the processing in such a powerful system. The changes include a model for a hierarchy of shared caches; a new, high-bandwidth L4 cache; and a cache quality of service to optimize how multiple threads share cache space.

The other two papers demonstrate parallel scalability for two model-based applications: realism in games and movies, and home multimedia search and mining. The papers, however, also point to the need for more cache/memory bandwidth, which would be provided by a large L4 cache.

Two other related papers are more hardware focused. One covers packaging and integration of the L4 cache, and the other on-die integration of many cores. The first discusses how providing high-bandwidth memory would eventually require memory to be built right on top of the die, which is the integrated circuitry of a chip. "Our Assembly and Test Technology Development division are evaluating possible options to achieve this," Koehl said.

The second paper discusses how Intel might design and integrate caches shared between cores, and also explores the on-die interconnect mesh and other non-core components that would be integrated, such as memory controllers, I/O bridges and graphics engines.

Another paper proposes a specific architectural change that would accelerate applications using many threads. Specifically, Intel is proposing the in-hardware implementation of a function called task scheduling, which is the mapping of work to cores for execution. The software-based methods used today introduce too much overhead for use in highly parallel workloads.

HW/SW innovations
Finally, the remaining two papers cover new HW/SW innovations that are in development at Intel to simplify parallel programming. One involves the integration of non-Intel Architecture accelerator cores, such as media accelerators. Because the accelerators have different instruction sets, they would require different compilers, tools and knowledge bases developed for IA programming. The paper outlines architectural extensions, language extensions and runtime to extend IA for handling accelerators.

The other paper addresses the tailoring of runtimes to the special environment of tera-scale platforms. "Runtimes designed to enable efficient, low-overhead use of the many cores and threads on a tera-scale processor will be critical for software scalability," Koehl said.

The runtime presented, called McRT, provides support for fine-grain parallelism and new concurrency abstractions that ease parallel programming. "Results show how an application using this runtime stack scales almost linearly to more than 64 hardware threads," Koehl said. "McRT provides a high-performance transactional memory library to ease parallel programming by allow the programmer to often avoid error-prone and hard to scale locking techniques."

- Antone Gonsalves

Article Comments - Intel publishes papers on future pro...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top