There is a lot of buzz around FPGA based trading at the moment, and in this post I want to cover why that's the case. FPGAs bring certain benefits that can't be acheived by software running on a CPU, in particular, low latency and determinism. However, where a lot of people find some level of difficulty is in migrating existing software algorithms into these devices. This is a problem that I want to address in this post.
The benefits of FPGA offload
In many applications, the delay between receiving some information and being able to act on it is important. This is particularly the case in trading, where being the first to act on a piece of information means the difference between a hit or missed opportunity.
In some applications, the total decision making time incorporates the delay from getting data from the exchange to your network, through your infrastructure, into a network device, up into your application, and back out through your infrastructure and to the exchange. There are a number of techniques you can use to eliminate latency at any of these points - faster switches, faster network cards, faster servers, etc.
At some point though, the latency of getting data into a CPU becomes a significant portion of the total system latency. At this point, it may be time to consider offloading a portion (the 'critical path') of your application to an FPGA. One method is to take an existing software application and progressively offload parts of it to an FPGA. This can be done by decoupling the application logic that acts on information from the part that determines how to act.
Taking an approach whereby pieces of decision making logic are progressively offloaded to an FPGA means a shorter time to production, cheaper development cycles and the ability to see results quicker.
Decoupling your application
I want to explore the concept of decoupling the latency critical part of your application from the other modules in a bit more detail. What follows is just one idea, which won't be applicable to everyone but serves to illustrate the point. Let's take the paradigm of listen-decide-act, where:
- We listen for information that we want to act on,
- We make a decision based on this information,
- Finally, we act in some way by sending a reply
The aim is to reduce the time from learning something new to sending a reply. This is helped by two things, first, we often we know a little bit about the opportunities we're interested in. Secondly, whilst our latency needs to be as low as possible, the time between incoming messages of interest might be much longer.
Given that we have an idea of opportunities we might expect to see in advance, we can use the longer 'off time' between messages to decide how we might want to act in response to any of these. This is the kind of task that can be performed by software running on a CPU. Once this has been computed, we can push this information to an FPGA, which need only implement simple logic to check whether any of the pre-computed conditions have been met and send the appropriate reply.
Putting it all together
Implementing something like this requires the right hardware and software platform to build upon. We believe we've put that platform together in the form of the ExaNIC, combined with the ExaNIC FPGA development kit. The ExaNIC is an ultra low latency network card with a low latency software stack built on top.
The FPGA development kit opens up the FPGA on the ExaNIC, allowing you to put the 'critical path' of your application directly on the card itself. This means that you get very low 'bump in the wire' latencies between receiving new information and having the ability to act on that information.
The kit ships with an example application that illustrates this concept in more detail. For those of you who already have ExaNICs, your existing cards are already compatible!
If you're interested in learning more about this concept get in contact with us.. We'd love to share more details with you!