Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > FPGAs/PLDs

Balance HW/SW needs in multicore designs

Posted: 01 Jun 2006 ?? ?Print Version ?Bookmark and Share

Keywords:bryon moyer? teja technologies? teja? fpga? hardware?

Embedded system design is a dance between software and hardware. The question is, which of the two gets to call the tune? Who leads? Who controls the relationship?

On one hand, it is the hardware that really does the work. On the other hand, the ultimate functionality of the system tends to be in the software, with the hardware supporting that effort. Hardware and software engineers may have differing opinions as to which of those spins is more accurate.

In reality, system hardware and software are defined together at a high level. Board content, memory size, I/O and other such pieces of support hardware are provided to address the needs of the functions to be executed by the software. But at a lower level, once a processing engine is picked, the on-chip computational resources are fixed and immutable.

Any change to this fixed structure has to be managed through patching with separate accelerator chips. As soon as the processor decision is concrete, there is a shift from a process of designing hardware (to meet the software needs) to a process of working the software (to ensure it fits within the constraints of the processor or processors, in the case of a multicore implementation).

This is intensified in embedded design, where resources are relatively scarce and performance may be mission-critical. From here on, the hardware leads the dance, and the software follows. The software becomes an enabler in a co-dependent relationship that caters to the sometimes unreasonable needs of the hardware.

In an effort to minimize the pain of this effort, it is easy to over-provision the hardware to avoid being trapped later; this could result in a system with more hardware resource than required, raising costs.

Alternatively, if extra resources aren't used, there is a risk that the needs of the software exceed the available hardware, and a larger processor must be swapped in later; the software will get tired of having its feet stepped on and will call for a new dance partner.

Most specialized processors like network processors have few family members. The next increment up could be much larger than the small extra increment needed, increasing the cost and wasting the extra resources. The alternative is to add more chips for external acceleration, adding an I/O bottleneck to processing.

An FPGA with multiple soft embedded processors provides a way to avoid this power imbalance. Because it's programmable, hardware resources can be tuned much more precisely to the needs of the software.

Furthermore, because FPGAs have such wide applicability, the economics allow the offering of a fuller set of sizes so that running out of room in one device means a more modest jump to the next device size. This can be easily illustrated by looking at packet processing subsystems, which often use specialized multicore processing elements to meet the needs of multigigabit line rate processing.

In a discussion of multicore processing, words like "core", "processor" and "engine" tend to get used a lot!sometimes synonymously. For clarity, this article uses the word "core" to refer specifically to a microprocessor, along with support logic or memory that gets implemented many times in a multicore configuration. The word "engine" refers to the structure that is made up out of one or more cores. The word "processor" is used in the traditional sense of a single CPU.

Picking it apart
There are different kinds of hardware that get commandeered in a packet processing engine including ports, the fast path engine, the control plane processor, code and data store, accelerators, external memory controllers and peripherals.

All of these elements can be built in an FPGA. In some cases, the hardware cost is in logic gates, sometimes in memory, and in some cases, in pins. Many cases involve a combination of these.

There is always the constraint that the sum total of the used resources must fall within the number provided in a given device. But the power comes in being able to dedicate lower-level resources like gates to one function or another rather than having them dedicated to a function that may or may not ever be used.

In a multicore- or multiprocessor-based design!especially in packet processing systems!the benefits of a softcore FPGA approach are most apparent in the fast path engine, the control plane processor and in how accelerators are deployed in such environments.

Hardware acceleration
Hardware acceleration is essential for pieces of many algorithms. There are two kinds of function that may have to be accelerated: computationally-intensive functions, like encryption or checksum calculation; and long-lead items, like external memory access. The intent is to speed up slow items. With fixed-architecture solutions, however, there are some challenges:

  • Any existing on-chip accelerators are there whether needed or not.

  • Accelerators typically have to be shared.

  • Accelerators that are not built-in must be generated on a separate chip.

With an FPGA, these issues can be addressed simply through the inherent flexibility of the technology in that only accelerators that are needed are created, accelerators can be shared or not, and accelerators exist on-chip, with the cores that use them.

If an application requires no checksum, no checksum accelerator is created. If an application requires checksums and can withstand sharing, sharing is possible; if sharing hurts performance, multiple accelerators can be created, with each core having a private accelerator. No off-chip delays are required, since the accelerator is created on the same FPGA.

FPGAs allow one more level of flexibility, the scheduling of the accelerated function. "Accelerators" in the traditional sense are synchronous!the core that called them waits for the result before continuing.

An alternative is a coprocessor, which is asynchronous!the calling core can work on other tasks, while the coprocessor executes.

Finding harmony
Hardware flexibility on a wide variety of fronts allows the best performance/cost trade-offs to be made. The control of those decisions lies in the hands of the engineer designing the system.

By keeping the bulk of the processing algorithm in software and combining that with flexible hardware, fewer hard take-them-or-leave-them constraints are placed on the system; good engineering decisions will allow convergence to line rate much more quickly.

Because only the needed hardware will be instantiated, cost can be controlled and traded off against performance by the designer. With more give and take, the dance between hardware and software becomes more accommodating, with hardware calling some tunes and software calling others, but no control issues, co-dependency, and lingering resentments. And both can enter into and co-exist in a happy marriage for a long and useful life.

- Bryon Moyer
VP of Product Marketing, Teja Technologies

Article Comments - Balance HW/SW needs in multicore des...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top