Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > FPGAs/PLDs

Parallel FFT for multi-GHz FPGA signal processing

Posted: 26 Apr 2013 ?? ?Print Version ?Bookmark and Share

Keywords:fast Fourier transform? real-time spectral-monitoring? FPGAs?

High-speed fast Fourier transform (FFT) cores are vital requirement for any real-time spectral-monitoring system. As the demand for monitoring bandwidth grows in pace with the proliferation of wireless devices in different parts of the spectrum, these systems must convert time-domain to frequency-domain ever more rapidly, necessitating faster FFT operations. Indeed, in most modern monitoring systems, it is often necessary to use parallel FFTs to run at sample throughputs of multiple times the pace of the highest clock rate achievable in FPGAs, such as the Xilinx Virtex-7, taking advantage of wideband A/D converters that can easily attain sample rates of 12.5 Gigasamples/second and more. [1]

At the same time, as communications protocols become increasingly packetised, the duty cycles of signals that need to be monitored are decreasing. This phenomenon requires a dramatic decrease in scan repeat time, which necessitates low-latency FFT cores. Parallel FFTs can help in this regard as well, since the latency scales down almost proportionally to the ratio of sample rate to clock speed.

For all of these reasons, let's delve into the design of a parallel FFT (PFFT) with runtime-configurable transform length, taking note of the throughput and utilisation numbers that are achievable when using parallel FFT.

Figure 1: A parallel FFT processes multiple samples at a time to scale throughput beyond achievable system clocks of the target device. Optional features include flow control, synchronisation and dynamic length programmability.

Hardware parallelism for FFTs
Due to the complexity of implementing FFTs directly in logic, many hardware designers use off-the-shelf FFT cores from various vendors. [2] However, most off-the-shelf FFT cores use "streaming" or "block" architectures that process only one or fewer samples per clock, which limits the throughput to the maximum clock speed achievable by the FPGA or ASIC device. A PFFT offers a faster alternative. A PFFT can accept multiple samples per clock and process them in parallel, to deliver multiple output samples per clock. This architecture multiplies the throughput beyond the achievable device clock speed, but comes at an additional cost in area and complexity. Thus, to use a PFFT you will have to make trade-offs in throughput vs. area. The trade-offs for a typical Virtex-7 FPGA design are outlined in figure 1 and table 1.

Table 1: Area scalability is generalized by hardware multiplier utilisation. Throughput scalability vs. area is slightly better than linear and generally very usable for increasing throughput to multi-gigahertz sample rates.

Looking at the table, a few general features can be seen in the trade-off curve:
1. As parallel throughput increases, multiplier (area) utilisation increases, with a slightly lower multiple (better than linear).
2. Slower system clocks and timing closure yield sublinear throughput growth as parallelism increases. However, on modern FPGAs this degradation is diminishing.
3. Overall better-than-linear throughput/area growth is realised due to No. 1 and No. 2 above.
4. Latency decreases as parallelism increases.

1???2???3?Next Page?Last Page

Article Comments - Parallel FFT for multi-GHz FPGA sign...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top