Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Memory/Storage

Selecting memory for high performance FPGA platforms

Posted: 04 May 2016 ?? ?Print Version ?Bookmark and Share

Keywords:high-frequency trading? HFT? algorithms? Network Interface Card? OS?

Besides being flexible, FPGAs can be programmed to be self-sufficient in processing critical tasks like data acquisition, risk matching and order processing. These self-sufficiencies make them faster and more reliable than software algorithms. The key factors that allows FPGA-based solutions to offer such massive improvements in performance in electronic trading is that they enable processes traditionally handled by software to run directly on FPGA.

These advantages that FPGAs hold over software-based algorithms are due to the following functions being offloaded to the FPGA itself:
1. Handling of the TCP/IP message
2. Decoding FAST or similar exchange specific protocols and stripping relevant data
3. Making trading decisions without incurring any Kernel based interrupt delay
4. Mitigating risk by managing order books and trade logging within FPGA

Due to these differences, FPGA-based solutions provide ultra-low latency feed handling as well as ever-faster order execution and risk assessment. They also attain maximum performance per watt to minimise energy and thermal requirements. Another advantage of FPGA solutions is the ability to scale to deploy "FPGA Farm" implementations.

A key part of the FPGA-based approach is the clever integration of QDR memories that allow for deterministic memory access rates and properly optimised VHDL codes. The two most critical data sets that need to be maintained in the FPGA's memory are stock information for maintaining order books and data & time stamp logging for risk analysis. Both of them place different requirements on the cache memory. The data & time stamp logging of packets is important to keep an accurate record of trade decisions to reconstruct events from the past to learn from them. The kind of granularity needed for these records is in tens on nanoseconds. This makes memory latency (i.e., the time lag between providing memory with address and getting the data out on data bus) highly critical.

The other data set, the order book, is a database of all orders with symbols and prices that the trading system needs to maintain. This database is usually a small subset of all instruments that the exchange carries based on the stocks under interest from their clients. This order book needs to be updated and accessed simultaneously based on information received from the exchange and clients. The relevant data in the order book is compared with the data received from the exchange and based on the trading algorithm a decision to buy, sell or hold the instrument is taken.

Since the input data stream from stock exchanges is not received in a deterministic sequential manner, the memory access for implementing trading strategy is also random, done in small data bursts and quick data retrieval with minimum latency. In memory parlance, this ability to perform random accesses is measured in terms of a metric called Random Transaction Rate (RTR). RTR represents the number of random read or write accesses that a memory can support in a given timeframe. It is measured in multiples of transactions per second (for example, MT/s or GT/s). In most memories, the random access time is defined by the cycle time latency (tRC). The maximum RTR is approximately the inverse of tRC (1/tRC).

The choice of cache memory can often limit the full capabilities of FPGA-based hardware. Most FPGAs use traditional DRAM-based memories solely due to their cost advantage & higher density. However, these memories are extremely slow and prone to soft errors. Given the volume of trades undertaken by these systems every second, speed and reliability cannot be compromised.

Consider the two most widely used DRAM options from a pure technology perspective C Synchronous DRAM (SDRAM) and Reduced Latency DRAM (RLDRAM). SDRAM tRC has not evolved substantially over the past 10 years (nor is it expected to evolve going forward) and stands at ~48 ns, which correlates to a 21 MT/s RTR. Other DRAM-based memory devices have been designed to improve tRC at the expense of density. For example, RLDRAM 3 has a tRC of 8 ns, which correlates to a 125-MT/s RTR. Essentially DRAMs are optimised for sequential access involving deterministic computation algorithms, but high-frequency trading doesn't work that way.

A better alternative is Synchronous Static RAM. Although DRAM-based memories offer higher memory capacity, they fail to meet the latency and performance desired from cache memories for trading platforms. Static RAMs have been the memory of choice for most high performance applications for decades. Compared to your average DRAM-based solution, an SRAM-based solution is faster by up to a factor of 24.

?First Page?Previous Page 1???2???3?Next Page?Last Page

Article Comments - Selecting memory for high performanc...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top