Accelerating Transactions Through FPGA-Enabled Switching

In this interview for the HFT Review, Mike O’Hara talks to John Peach, Consulting Engineer for EMEA at Arista Networks, who recently launched the Arista Application Switch, the first in a new category of FPGA-enabled switches enabling customers to run their own applications directly inside the network.

HFT Review: John, welcome to the HFT Review. Earlier this year, Arista unveiled the 7124FX Application Switch, which has been generating a fair amount of excitement in the high frequency trading space. What was the background behind the development of that switch?

John Peach: In high frequency trading and the surrounding regulatory environment there have been a variety of initiatives not just to reduce the latency of the trading flow, but also to add greater functionality while maintaining that low latency. Where previously improvements in latency were achievable by simply migrating to a newer, faster switching platform, as switching latency has got to under 500ns the scale of returns available through subsequent ASIC generations have diminished meaning customer focus is now firmly back on the application.

Newer CPU technologies and networking acceleration options such as kernel bypass and to a lesser extent add-in FPGA cards, are augmenting the software effort but there is a space for a more integrated network-centric higher layer processing capability. Offloading server CPU cycles for the benefit of the trading facility as a whole improves performance and determinism while reducing overall latency and costs.

To draw a parallel in the enterprise Data Centre, we’re accustomed to the concept of adding specific hardware to provide upper layer functions (e.g. firewall or load balancing). Offloading functionality that can’t easily be done in software, these platforms free up server CPU cycles and increase the overall efficiency of the data center server estate.

Our goal is to provide an analogous solution that allows the customer to implement their own logic for embedding latency sensitive mission critical applications without worrying about the intricacies of integrating multiple disparate hardware components.

Deployment of the Arista 7124FX can deliver a 3-5X reduction in end to end transaction latency by reducing dependency on and removing server and/or application tiers.

HFTR: Can you take us through a couple of use cases of how this works in practice?

JP: Certainly. One specific use case is the requirement for inline risk checking in brokerage environments. Increasing regulation has encouraged brokerages to come up with innovative solutions to implement in-path risk checking while minimising impact on their clients, in turn creating a new competitive front for high performance. This particular problem is elegantly addressed by FPGA acceleration and has led to a proliferation of FPGA daughter cards in servers, which has introduced new scaling challenges.

HFTR: In what way?

JP: While FPGA processing is a good solution to the problem of in-line processing, it is the logistics of a large solution that tend to make scaling difficult. For example, today the most accessible form of FPGA processing is through a simple PCIe add-in card for a server however due to the relatively low FPGA capacity and port density of each card, several are required to handle modest numbers of customers. Several cards naturally requires several servers, since the cards must have a host to provide them with power and control, as well as a network and associated cabling to tie the whole solution together.

The overall space and power requirement quickly grows, while the addition of network hops and external cabling adds both latency and more potential failure modes. Furthermore, the combination of vendors for server, FPGA card and network adds operational overhead. Given the premium for co-location real estate and power plus the need for 100% uptime this model soon becomes unwieldy.

Addressing solution scale, ease of integration, proximity to the data, compact packaging and reduced operations are the root of the 7124FX concept.

HFTR: Hence the idea of placing the FPGA doing the inline risk checking directly on the switch itself?

JP: That’s right. In building the 7124FX, we took our lowest latency 500ns 10Gb Ethernet switch, which is already very widely deployed, and we co-located the most advanced Altera FPGA adjacent to the switch ASIC. The FPGA supports sixteen 10-GbE ports, a marked uplift from the usual two to four that you would get on a PCIe add-in card.

With the goal of accelerating transactions as they pass through the network, it was crucial for us to find a truly in-line solution; rather than simply attaching the FPGA as a client of the switching chipset, the processor sits directly in line with the traffic flow, leveraging both the high functionality of the FPGA and the more traditional network forwarding, multicasting and filtering capabilities that are inherent to the ASIC itself.

Combining this functionality into the same 1RU package as our other platforms provides substantial benefits. Customers no longer have to integrate multiple third party components and knit them all together via additional networking hardware, which consumes cost and space, not to mention the additional support overheads.

For inline risk specifically, organisations have been racing to find the best total solution to the problem at large scale and that’s the key factor here. What can be done on a single device fairly easily is much tougher to maintain as the solution grows. Scaling and saving on all those metrics is something that hasn’t been achieved until now.

HFTR: What about other applications for this technology within the HFT space?

JP: Once you’ve got the ability to do this processing directly within the network, you can provide services that are leveraged across the pool of servers. Feed-handling (normalisation, translation), line arbitration, and symbol based routing are some of the obvious network-centric applications.

Other areas include translation from standards based FIX protocols to private binary formats and unicast to multicast conversion, where a specific venue/platform does not provide multicast datafeeds. All of those applications are well suited for the FX platform.

Of course for the right firm in the right space, there is no reason a strategy couldn’t be implemented directly on the FX platform, dramatically lowering total latency and jitter, increasing performance and minimising the power and rack footprint needed.

HFTR: How closely do you work with other technology vendors and what sort of skill sets do customers typically need?

JP: While its key for us to ensure technology vendors in our eco-system have great access to the technology and our hardware and software resources, Arista doesn’t limit the scope of this type of platform by controlling development access. As with our own EOS operating system, we don’t impose any specific limitations on who can develop but we continually evaluate if we should be working more closely with additional partners based on customer feedback.

Our policy when working with technology partners is very much to connect our customers with the right organisations. There’s no reason that an organisation who we don’t have a direct partnership with now couldn’t go ahead and develop for the platform, because the system has been designed in such a way that the FPGA component is accessible to the user, as it would be had they installed an add-in card in their existing server.

We also provide a reference image to explain basic packet flow through the device and further demonstrate how to interact directly with Arista EOS, the switch operating system software. We want to ensure that instead of an invisible lump of logic that just happens to be in the same box as the switch, the FX platform’s FPGA subsystem can be fully managed and controlled, as you would expect any Arista EOS powered switch to be.

We do expect a significant proportion of development to be fully proprietary as befits a customer base where technology intellectual property (IP) is a major differentiator. It is here that focusing on the platform and allowing users free reign is a great advantage – whether a firm wants to develop from the ground up or integrate a selection of third party IP blocks they are at liberty to implement their application on the FPGA independently, with their intellectual property remaining proprietary.

Leveraging Impulse Accelerated Technologies’ advanced tool set, Arista offers an alternative path into the hardware world through Impulse’ C to RTL compilers, libraries and consultancy offerings. Furthermore there are numerous 3rd parties including Altera themselves providing IP blocks for low level functionality all the way up to complex functions such as TCP stacks and CPUs.

Another use case is expanding or augmenting existing commercial platforms. Turnkey solutions from Exegy and NovaSparks, provide broader solutions.

HFTR: You mentioned the scalability aspect earlier. How important is that?

JP: Scalability is key in many ways, beyond just raw performance. Firstly, there hasn’t been access to this scale of FPGA technology and processing power in the past without custom PCB design as most off the shelf add in cards make use of much smaller components.

The combination of switch ASIC and FPGA means the 7124FX offers a total of 24 10Gb interfaces, which allows connectivity to multiple inputs and outputs. If needed, cascading multiple switches allows for additional scalability.

Secondly, to-date there hasn’t been access to an easily supportable platform, i.e. one that can be quickly replaced with an identical unit, and is covered under the same support contracts and processes that are expected of mission critical network gear.

This simply hasn’t been possible with self-integrated solutions consisting of components from several different manufacturers. We’re bringing not only the performance and the platform knowledge, but also the supportability and logistics aspect of that to lower the overall operational cost of the proposition and increase the usability.

HFTR: How do you see this technology evolving in the future?

JP: One of the near term challenges is the availability of monitoring applications and services to wrap around 40GbE networks. With each leap in network performance, there tends to be a lag before applications, servers and network adapters catch up. Having the option to place an FPGA in line with the device gives you immediate flexibility to build wire-rate capture and monitoring tools with important features such as filtering, header manipulation and time-stamping.

From the point of view of both the trader and the venue, understanding traffic profiles with nanosecond granularity provides much greater scope for analysis, capacity planning and troubleshooting. By offering a precision oscillator in the device along with support for external synchronization methods, the 7124FX allows a completely new approach to managing traffic and to understanding traffic flow. Capitalising on this opens up a new era of control.

There is also the wider applicability of the 7124FX outside the high frequency trading markets; into the Government, Telecoms and Medical segments where the same inline processing with low latency and high performance are able to deliver significant advantages.

HFTR: Any final thoughts?

JP: I would say the key take-aways are the application performance, deployment flexibility and efficiency the FX platform provides. I expect this wide applicability to enable more firms to make a first foray into hardware acceleration, remembering the complimentary nature of the product.

The 7124FX can optimise and augment existing solutions and has very broad applicability in the trading environment, regardless of strategy or place in the trading cycle, whether for actual trade flow, data analysis and capture, time stamping, or any number of more esoteric applications.

The challenge with many existing hardware solutions is that from a customer perspective they’re still effectively closed “black box” technologies. The flexibility and the openness of our solution are critical in terms of the ability to deploy on a number of levels, whether it’s turnkey; whether it’s completely self-managed or whether it’s somewhere in between.