Comparing TI C6000 DSP, Motorola AltiVec PowerPC processors
Keywords:dsp chip? dsp algorithm? processor core? risc microprocessor? vliw?
Traditional DSP chips with specialized on-board hardware-like multipliers and sophisticated address generators to facilitate DSP algorithms are now being challenged by a new generation of RISC microprocessors. Specifically, Motorola Inc.'s G4 PowerPC with the new AltiVec engine delivers impressive benchmarks for popular DSP algorithms. At the same time, significant strides have been made in pushing the compute power of true DSP chips, as evidenced by the latest C64X family of devices from Texas Instruments Inc.
Although new processors are being announced continually, we will focus on the devices which are available or sampling at this time. While all of them are targeted for high-performance computing tasks, each is aimed significantly to different markets. The C6000 devices were developed for highly-embedded, real-time applications like wireless telecommunications, wideband modems, and real-time image processing. On the other hand, the major market for the PowerPC is Apple Computer's highest-performance G4 workstation, where it must support a full OS, user interfaces and network peripherals.
To meet these needs, both classes of processors must process and move data. Taking best advantage of these inherent capabilities requires a deeper look inside each of the architectures.
The C6000 DSP family utilizes the VelociTI processor core, which is TI VLIW architecture. The core is organized as eight functional units operating on two register files. Since all eight units execute a unique 32-bit instruction every processor clock cycle, the core consumes a 256-bit VLIW instruction every cycle.
The new C64X family of parts includes three initial offerings: C6414, C6415, and C6416. All feature the same processor speed and core architecture resources, differing only in peripheral interfaces.
To take advantage of this powerful VLIW resource, TI's optimizing C compiler and optimizing assembler tools are designed to maximize the number of functional units doing useful operations during each cycle. Helping this process is the orthogonality of the instruction set for each of the eight units, allowing many typical operations to be performed on more than one functional unit. This gives the optimization tools more flexibility in resource allocation. C6415 extends this orthogonality, increases the number of register data paths, and doubles the depth of the register stack over the earlier C62X and C67X devices.
Motorola first introduced the AltiVec technology to the popular PowerPC product line with the advent of the MPC7400 series. These processors augment the basic MPC750 CPU core with the AltiVec vector engine, which performs fixed- and floating-point functions on 128-bit vectors.
The AltiVec unit can process several data formats including 8-bit, 16-bit and 32-bit signed and unsigned integers, as well as 32-bit IEEE floating-point words. During each processor clock cycle, the unit processes one complete vector, regardless of the data type.
The C6000 family executes up to eight instructions per cycle using its VLIW architecture while PowerPC is up to three, through the use of compound instructions such as multiply-add-store.
Since the C6203 is a fixed-point processor with 32-bit scalar execution units, the number of fixed-point operations per second equals the number of instruction per second. C6701 can execute fixed- and floating-point instructions, but only six of the execution units are capable of floating-point operation resulting in a peak rating of 1,000 operations per second.
In addition to its native 32-bit operations, six of the eight execution units on the new C6415 have been enhanced to perform 8-bit and 16-bit functions on packed data words, giving it a real edge over the C62X and C67X families for these data types.
The AltiVec engine also operates on 8-, 16-, and 32-bit data elements packed into its 128bit vector. For floating-point operations, the 7410 packs four 32-bit IEEE floating-point values in the vector for rate of 2000 million operations per second. For compound instructions, this rate is proportionally higher.
In summary, the impressive computational numbers shown for the C6000 family are due to its VLIW architecture. The 7410 owes its horsepower to AltiVec engine, sometimes referred to as a SIMD (single instruction multiple data) architecture. Note that these are two effective, although completely different, ways of boosting performance.
Processor Memory Resources Although peak processing rates may be a useful comparison, one of the main factors impacting performance is the size of internal memory. In the case of the C62X and C67X family, the only way to realize the benefit of the VLIW architecture is to have the program code stored in on-chip program memory. Internally, this memory is organized as 256-bit words so that for each clock cycle, all 256 bits can be delivered to the execution units in parallel. If the program code for a critical loop is too large to fit in program memory, these processors must fetch code from external memory through a 32-bit bus. With eight 32-bit fetches per VLIW word, execution speed suffers dramatically. With a six-fold increase in program memory space over the C6201 and C6701, this problem is greatly reduced in the C6203.
However, the new C64X family offers a major improvement towards boosting VLIW performance by incorporating a generous 1024KB of L2 cache right on the chip, supporting two smaller 16KB memories serving as the L1 cache for program and data. The 7410 has relatively little on-chip memory but instead relies on an external L2 cache described in the next section.
- Rodger H. Hosking Pentek Inc. |
Related Articles | Editor's Choice |
Visit Asia Webinars to learn about the latest in technology and get practical design tips.