Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Processors/DSPs

Use MSGQ module to optimize sRIO gains

Posted: 01 May 2008 ?? ?Print Version ?Bookmark and Share

Keywords:DSP? sRIO? complex system topologies?

Bandwidth requirements of telecom infrastructure, video infrastructure and imaging applications are growing at rapid rates. These systems need to support video streams with higher resolutions, faster frame rates and better audio quality. At the same time, these systems need to achieve higher channel density and lower power per channel. The market is also demanding increased peripheral and memory integration and reduced board area in order to deliver lower system cost. Developers need flexible, scalable silicon devices and tools to help them keep up with these market trends.

Some traditional high performance I/O for DSPs has limitations when it comes to reliability, adequate bandwidth and scalability. Serial RapidIO (sRIO) overcomes these limitations by providing a high performance, packet-switched, interconnect technology which is very beneficial for complex DSP topologies. Unlike its predecessors, sRIO does not require sharing the interface with memory, and it can work as either a master or slave. It also offers long physical reach, hardware-level error detection/correction, status/acknowledgement feedback and in-band interrupt/signaling.

Figure 1: Five DSPs can be connected, each directly to the other, using 1x sRIO links.

Advanced DSPs such as Texas Instruments Inc.'s TMS320C6455 DSP now incorporate sRIO interfaces that are designed to be very efficient. The sRIO interfaces connect directly to the DMA engine in the DSP, using transaction proxy registers for low control overhead. Data can be prioritized for efficient handling by the DMA system, and the interface can queue multiple transactions.

Complex system topologies
It is important to understand the role sRIO plays in complex system topologies and how it offers increased flexibility when implementing a physical system. sRIOs provide chip-to-chip and board-to-board communication at performance levels scaling to 20GBps and beyond. They provide 1.25-, 2.5-, or 3.125GHz bidirectional links in 1x and 4x widths, for throughput up to 10GBps each way.

With sRIO, the designer can determine how to best connect multiple devices. DSPs can be connected directly in mesh, ring and star topologies or multiple DSPs can be connected through a switch with or without local connections to each other. sRIO can also be used to connect DSPs, FPGAs and ASICs together. This flexibility allows designers to arrange the components in any way that suits the application data flow, rather than compromising system design to deal with interface or protocol limitations.

For instance, a simple system can have two DSPs connected via a 4x sRIO link. Another system that requires more computation ability but not any more I/O may consist of five DSPs, each connected directly to the other via a 1x sRIO link as shown in Figure 1. Alternatively, the five DSPs could all have a 4x sRIO link to a central switch for better I/O. Yet a fourth system with still heavier computation requirements may have 12 or more DSPs connected via 4x links to a fabric of one or more switches for the ultimate in computational power and I/O bandwidth.

An sRIO-enabled system can achieve significant overall performance increase by taking advantage of these features. For example, in wireless infrastructure systems, a total of 3-6Gbit/s of antenna data is typically processed by an ASIC or FPGA which handles 24 to 48 antenna streams of approximately 123Mbit/s each per base station. User data, on the other hand, is typically processed on a DSP with approximately 19Mbit/s per user channel brought in over a shared EMIF channel. DSPs with linked sRIO channels allow the user and antenna data to be processed individually. Using a DSP is not only a fraction of the cost of the FPGAs or ASICs, it can now handle the same data rate used in current systems of 24 to 48 antenna streams of about 123Mbit/s each for a total of 3- 6Gbit/s of antenna data. On the user data side, the higher speed core of the latest DSP generation, faster sRIO I/O, and freed external memory bandwidth allow for increased density to 128 user channels per DSP of 19Mbits/s each for a total of 2.5Gbit/s of user data per DSP.

Message passing
Software developers not only reap the benefits of increased performance and flexibility provided by an sRIO interface, they can also develop applications using either low- or high-level programming approaches. In the low-level direct I/O approach, the programmer must specify the target and the address. This approach offers the best performance and is appropriate for applications where the target-buffering scheme is known at design time and the application partitioning is fixed. However, the downside of this approach means that the developer must know the physical memory maps of remote processors. This makes third party integration harder.

The high-level message-passing approach offers a more abstract way to communicate without having to do a lot of low-level device programming. This approach is optimal for applications where the target-buffering scheme is unknown and the application partitioning is unknown or flexible. In addition, the message-passing interface greatly reduces the time required to scale an application to a greater or smaller number of processors.

Figure 2: In the MSGQ module, the API interface shields the application from transports and allocators.

Several embedded processor vendors provide support for sRIO in the kernel-level software layer. In TI DSPs for example, message passing is supported by the Message Queue (MSGQ) module of DSP/BIOSTM software kernel foundation, which allows application developers to design software applications at a higher level of abstraction.

Message passing allows applications to communicate with other DSPs via the sRIO interconnect more efficiently. Messages sent through this technique are sent at a higher priority than data buffers that is very beneficial since it is generally better to prioritize control data. MSGQ has the ability to move readers and writers around processors without having to modify source code, making it possible to develop on a single processor and easily scale to a multiple-processor system. This means the writer does not need to know on which processor the reader resides and this ability allows for easier integration work. This makes it easy to develop applications such as client/server apps.

MSGQ also enables zero-copy transfers of messages, assuming the underlying physical medium allows zero-copy in the inter-processor case. Zero-copy is pointer passing instead of copying the contents of a message into another message. This can be done on single processors and with processors that have shared memory between them. The ability to allocate a message from a specific pool makes it easy to provide QoS features, such as offering faster performance for critical resources.

MSGQ module
The MSGQ module is comprised of the Application Programming Interface (API), allocators and transports as shown in Figure 2. The API shields the application from transports, which provide the interface for transporting messages between processors, and allocators, which supply the interface for allocating messages.

All messages that are sent in the MSGQ module must first be allocated. Multiple allocators can be used to allocate critical messages from one pool and non-critical messages from another pool. An example of a simple allocator is the static allocation called STATICPOOL, which manages a static buffer supplied by the application. At initialization the STATICPOOL allocator takes in the address, length of a buffer and the requested message size. The buffer is chopped up into the specified message size chunks and placed in a link list. This makes it easier to locate messages.

Next, the transport sends the message across the physical link to the destination message queue on another processor (Figure 3). By having a transport interface, the application can change the underlying communication mechanism without changing the application, except for configuring the transport. This approach hides the technical nuances of the physical link and improves application portability.

Message queues have unique system-wide names and senders will be able to locate the message queue using this name. All messages that are sent via the MSGQ module must be coded MSGQ_MsgHeader as its first field. This is required because internal instructions are saved within this header that is used internally by the transports and the MSGQ module. When a message is sent to a different processor, the transport handles any word size or endian differences in the header portion of the messages. The application is responsible for any required conversion for the application specific portion of the message.

Since different processors may have different scheduling modules (message queue in the system), the MSGQ module allows the application writer to specify the type of notification mechanism. This is particularly helpful because the user can specify the notification mechanism and adapt MSGQ accordingly. However, once a message is sent to a reader, the writer loses ownership of the message and cannot modify or free the message so it is incredibly important the message is correct prior to being sent. When the reader receives the message, it must either free or re-use the message.

Locating a message queue
MSGQ maintains one message repository for each opened message queue. The reader of a message queue gets messages from the message queue's repository. If a reader or writer thread needs to be moved to another processor, there are no changes needed to the reader or writer code.

There are two ways to locate a message queue: synchronous or asynchronous. With a synchronous (i.e. potentially blocking) function, a message manages the querying of each transport to find the location of the desired message queue. With an asynchronous function, the message queue is located and an asynchronous locate message is sent to the specified message queue.

Figure 3: With the transport interface, the application can change the underlying communication mechanism without changing the application.

Synchronous is easier to implement but it requires several parameters of the queue to be blocked, such as the locating thread. While asynchronous does not requiring blocking, actual implementation is more difficult.

Synchronous or asynchronous operations are supported via application-specified notification mechanisms. The user can specify these, such as semaphore or posting of an interrupt, avoiding the need to follow a particular scheduling model. The sender of a message can embed a message queue and the reader of the message can extract the message queue and reply back.

Data flow sample
The following is an example of a basic data flow from an application that is designed to move data between two DSPs. The example uses multiple pools to manage the different types of messages including application, transport internal control messages and error messages. Having different pools is not required, but makes an application easier to maintain. For instance, managing several small pools is sometimes easier than handling a single large pool. Additionally, if the message sizes are different, large amounts of memory will be wasted with a single pool since the worst case sizes must be supported.

The data flow is designed to run on an evaluation module (EVM) such as the TI TMS320C6455 EVM, which features two 1GHz TMS320C6455 DSPs connected via sRIO. The complete code is included with the evaluation board as an example.

Sending a message
The following shows what happens behind the scenes when a message is sent and received on a single processortask one to task two. Task two is scheduled by the OS and opens up an MSGQ queue, and a pend and post function are specified for that message queue. A pend function is used if there are no messages while a post function is called when a message is sent to the message queue.

If the MSGQ module realizes that there are no pending messages, task one is able to run but must read the queue identifier and locate the right queue because it might be on a different processor. Queues are normally located during startup with little performance impact. Task one must also allocate memory for the messaging before it can send the message to task two.

Once task one sends the message, it will no longer be allowed to work on the message, as it is now owned by MSGQ, which assigns it to the right queue. Task two is informed of the message and prepares to receive the messages. Once task two has the message, it can re-use the message and send it back to task one. For example, if two tasks want to ping-pong a message back and forth, only one message at the beginning has to be allocated. Then once the reader receives the message it updates the contents accordingly and then sends it back. Task two can now process the message and once completed, the message returns to the memory management and task two is no longer allowed to work on the message any more. Message transfer is now complete.

Message passing greatly simplifies the development and maintenance of complex processor communications by providing an abstract interface to data movement.

- This article was contributed by Texas Instruments Inc.

Article Comments - Use MSGQ module to optimize sRIO gai...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top