Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Processors/DSPs

DSP/CPU core steers video/audio formats

Posted: 15 Apr 2001 ?? ?Print Version ?Bookmark and Share

Keywords:vliw? dsp? stb? ram? dct?

Television, as we have known it in the past, will take on completely new design directions and characteristics, thanks to very long instruction word (VLIW) technology. In particular, VLIW media processors are now streamlining and advancing video/audio time-shift applications originally supported by conventional controller processors and several DSP chips.

The TV/set-top box time-shift application is a case in point. Time shifting is defined as the viewer's ability to shift, forward or backward, the time when he or she wants to view a particular video segment. This time shifting of streaming video/audio is performed via a hard disk in a TV set or set-top box. Included in this design is a circular buffer memory that records the video and audio and is then used for video/audio replay.

In this design, the broadcast recording and playback subsystem uses the circular buffer memory to constantly record one or more incoming audio and/or video program signal. The controller processor accesses the memory to read a playback signal from the buffer, and then the time-shifted video/audio is shown to the viewer.

The controller handles system RAM and the hard disk. Simultaneously, delayed signals are read from memory at a different memory location that the controller processor selects, providing a viewer-selected time delay. In original designs, a group of input-signal processors provides one or more programming signal to the memory subsystem in compressed digital form. A separate output-signal processor converts the compressed digital video/audio data read from memory into a form suitable for displaying on the TV screen.

Real-time applications

To meet the real-time constraints, multiple DSPs are required for the time-shift application. Consequently, DSPs are not well equipped for time-shift designs because they pose greater design complexity and substantial design overhead cost. Ideally, the system designer wants a single VLIW media processor that can handle all video/audio and associated algorithmic operations at considerably reduced clock cycles. For example, TriMedia's VLIW media processor can perform an 8-by-8 discrete cosine transform (DCT) in about 50 clock cycles, quantization in 22, inverse quantization in 23 and inverse DCT in 50. Reducing clock cycles this way allows DCT, quantization and other algorithms to run much more efficiently.

A 32-bit DSP/CPU core is at the heart of this VLIW media processor. It implements a 32-bit linear address space and 128 general-purpose 32-bit registers. The VLIW instruction length allows five simultaneous operations to be issued every clock cycle. These operations can target any five of the 27 functional units in the DSP/CPU core, including integer and floating-point arithmetic units and data-parallel multimedia operation units.

These multimedia operations significantly accelerate standard video and audio compression and decompression algorithms. As just one of the five operations issued in a single VLIW instruction, a single "custom" or media operation can implement up to 11 traditional microprocessor operations. These custom or multimedia operations combined with the VLIW architecture, yield considerable throughput for multimedia applications.

The custom operations are specialized, high-function operations designed to turbo-charge performance in multimedia applications. When they are properly incorporated into application source code, these custom operations enable an application to exploit the highly parallel VLIW processor implementation. Achieving similar performance increases by executing a higher number of traditional microprocessor instructions per cycle is prohibitively expensive for cost-sensitive consumer electronics designs.

But custom operations go beyond simply making the best use of standard resources. Some high-function custom operations eliminate conditional branches. This helps the scheduler to make effective use of all five operational slots in each VLIW processor instruction.

Custom operations

Custom operations can substantially increase the processing speed in small kernels of applications. Take, for example, transposing a packed 4-by-4 matrix of bytes in memory containing 8-bit pixel values. In this case, custom operations can merge and pack bytes and 16-bit half-words directly and in parallel. Four of these instructions can be applied here to speed up the manipulation of bytes that are packed into words.

First, a sequence of four load-word operations brings the packed words of the input matrix into a group of four registers. Next, a sequence of four merge operations produces intermediate results into a second group of four registers. Then, a third sequence of four pack operations replaces the original operands and places the transposed matrix in separate registers, to enable the original matrix operands to be used if needed. Finally, four store-word operations place the transposed matrix back into memory. The result is 16 operations or byte-matrix transposition at the rate of one operation per byte.

It is worth noting that once the data is loaded into registers, the 4-by-4 transpose operation gets completed within two machine cycles. Hence, careful use of custom operations can not only reduce the absolute number of operations needed to perform a computation, but can also help the compilation system produce code that fully exploits the performance potential of the VLIW media processor.

? Mohammad Ayub Khan

Vice President

Software and Systems Engineering

TriMedia Technologies Inc.

Article Comments - DSP/CPU core steers video/audio form...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top