Selecting memory for high performance FPGA platforms
Keywords:high-frequency trading? HFT? algorithms? Network Interface Card? OS?
At the turn of the 21st century HFT trading was focused on superior algorithms and trading strategies. So the advantage lay on strategy rather than speed with the most popular systems having latency of the order of seconds. By 2010, algorithmic improvements were not sufficient to gain trade advantages, and participants started reducing tick-to-trade latency to gain an advantage over each other. This brought trade time down to microseconds.
Stimulated by sub-millisecond buy and sell trade orders, HFT platforms are engaging in a highly competitive speed race to cut down market data round-trip latency into the microsecond order. Since a difference of even a few nanoseconds can create a big competitive advantage in the form of latency arbitrage (referred to as 'front running'), trading firms are constantly on the lookout for faster trading servers.
Traditionally, software tools have been used to perform HFT trading. These tools make use of high-performance computing systems that are efficient in performing complex trading strategies (figure 1). The OS kernels on these systems control access to CPU and memory resources while the application stack handles all trading strategies. A Network Interface Card (NIC) is used to interface the system to the stock exchange.
![]() |
Figure 1: Order processing in a software based approach (Source: Cypress). |
However, this configuration suffers from drawbacks with respect to tick-to-trade latency:
???Standard NICs are not optimised to handle TCP/IP and proprietary trade exchange protocols, and cannot handle market feeds onboard.
???There's an added delay of a few microseconds on the PCI Express bus between the host system and Ethernet cards.
???The interrupt-based approach of the kernel OS inherently causes long delays.
???These solutions are based on multi-core processors sharing memory resources. Shared memory access is not best suited for deterministic latency which is critical when handling feeds from a stock exchange.
Recent advances in algorithmic trading have introduced some lower-latency solutions, the most promising of which is custom hardware built using Field Programmable Gate Arrays (FPGAs). These devices are a bridge between the extreme performance of hard-coded ASICs and the flexibility of CPUs. FPGAs provide a vast array of concurrent resources that can be configured to drastically reduce round trip trade latency compared to software based solutions (figure 2).
![]() |
Figure 2: Order processing in an FPGA based approach (Source: Cypress). |
Related Articles | Editor's Choice |
Visit Asia Webinars to learn about the latest in technology and get practical design tips.