Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Embedded

Mix processors for versatile video processing

Posted: 16 Jan 2008 ?? ?Print Version ?Bookmark and Share

Keywords:versatile scalable video processing solution? HD video? video transcoding? complementary mix of processor?

Media compression is used in networks to reduce transport time and storage costs. With the arrival of high-definition (HD) broadcasting and video on cellphones, the quality and accessibility of video communications are greater than ever before. Many new applications, formats and methods of content delivery have been developed and are rapidly entering the market. Opportunity abounds in providing equipment that can combine video streams or convert among the various video formats, but deploying solutions is risky because they can quickly become outdated. There is a need for a video processing architecture that will address the needs of many emerging video communications applications and also provide some longevity.

Format variety
Digital video content can be created and stored in various formats such as MPEG-2, MPEG-4, H.263, H.264, VC-1 and Flash Video. Some of these standards have significant differences that affect frame rate and size as well as content-compression level. Because content can be viewed on screens of radically different sizes while moving or standing still, the network is forced to compensate for these variables by altering the parameters of the original content.

Transcoding is a fundamental network-based video processing operation in which one video format is decompressed to a raw stream, which is then compressed into another format. A second important network-based video processing operation combines video streams or images while the video is uncompressed. Applications such as conferencing and image/text overlay use this stream combination technique.

Because an end-user device can only display video in a format it understands, the content provider must either store the same video in many different formats or provide a mechanism by which the content can be adjusted to meet the requirements of the end-user device. One mechanism has the network extract a default format from the user-account profile and convert the current format so that it is compatible with the end-user device if necessary.

Flexible solutions
The processing power needed for transcoding or combining an image is a function of the number of bits processed, which depends on the overall image quality required. Compression and decompression of an HD stream in H.264 format will involve as much as 200x more processing power than an H.263 QCIF stream.

The solution provider that must invest in video transcoding should purchase a solution that will scale easily from an initial service introduction of a few hundred simultaneous channels to a mature deployment with tens of thousands of channels. In addition, compression-format standards will almost certainly change over the lifetime of a service, requiring a flexible transcoding solution.

Thus, equipment design must address these four challenges in devising a solution: scaling over a factor of 100 in channel density; scaling over a factor of 200 in processing-density per channel; field upgradability to accommodate new algorithms; and cost-effectiveness.

Designers must carefully choose the processors and the communications mechanism that will bind the processors together. From these design requirements, we can posit four key design principles for video processing architectures: scalability, versatility, density and programmability.

A transcoding capability must scale in two dimensions: the number of channels and the amount of processing power per channel. Another important consideration is that an encode-compression scheme for video can be two to 10 times as complex as the decode-decompression scheme, depending on the coders used. In fact, when transcoding between formats (such as H.263 and MPEG-4), the asymmetry in processing power can vary an additional five to 10 times.

From a design perspective, a set of separate hardware designs can be developed, with each optimized for different capacities or algorithms to approximate the best price/performance curve. Unfortunately, this approach requires maintaining numerous designs, perhaps with significantly different components and worst of all, different code bases.

The challenge is to establish a common code base and if possible, a common, modular hardware design. A modular design typically leads to processor replication with 1-N of the same processor on some extensible fabric; however, the other implications of this solution must be considered before rushing into design.

New variations on old algorithms, completely new algorithms and varying demand on algorithmic instances (also "algorithm volatility") all require a versatile platform. Such a platform helps a manufacturer maintain a market-leadership position by quickly introducing new algorithms or features that differentiate the product line. But versatility stands in stark contrast to our second requirement: heavy-duty processing power. Designs that attain longevity do so through versatility and thus require some level of general-purpose functionality. Because considerable processing power is needed along with versatility, however, designers must consider a mix of processor types.

Using a balance of GPPs and tightly coupled accelerators provides a better approach. A variety of accelerators, such as DSPs, ASICs or processing arrays, would be appropriate. Important aspects of this kind of design strategy are the overlying software structure, algorithmic partitioning and overall communications fabric.

Stable algorithms and processor-intensive operations run on accelerators.

Software considerations
The overlying software structure runs on a general-purpose CPU, and the structure must abstract the type of algorithm used and the acceleration device that is processing the algorithm. Such an approach allows new algorithms and acceleration technologies to be introduced quickly without affecting the application. The application itself can run on the general-purpose CPU or on a remote server via remote media- and call-control protocols. Local control, management and data routing are handled on the general-purpose CPU, which can also be used for establishing new algorithms.

The goal of algorithmic partitioning is to assure that older, more stable algorithms and the most processor-intensive operations run on accelerators. Note that the entire algorithm does not need to run on the accelerator.

Finally, there is a need for an overall communications fabric that is easily scalable yet provides sufficient performance for function-offload partitioning. The fabric must support the bandwidth for HD video channels, and for the routing and switching required to use multiple processors efficiently for a single conference while avoiding unacceptable latency.

In addition, the fabric must provide an internal protocol for media-stream routing with very low processing overhead such as intelligent Time-Division Multiplexer for local chassis communications or a pseudo-wire protocol for multichassis solutions.

By using the right mix of complementary processors and by intelligently partitioning the processing load, one can create a video processing solution that is both versatile and scalable across any channel density and algorithmic complexity.

- Brian Peebles
Chief Technology Officer
Dialogic Corp.

Article Comments - Mix processors for versatile video p...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top