TCP Offload Engine

Imagine a daily race with hundreds of top fuel dragsters all lined up rumbling along in parallel waiting for the same green Christmas tree light before launching off the line. In some electronic markets, with specific products, every weekday morning this is exactly what happens. It’s a race where being the fastest is the primary attribute used to determine if you’re going to be doing business. On any given day only the top finishers are rewarded with trades. Those who transmit their first orders of the day the fastest receive a favorable position at the head of the queue and are likely to do some business that day. In this market, EVERY nanosecond (a billionth of a second) of delay matters, and can be monetized. Last week the new benchmark was set at 98 nanoseconds, plus your trading algorithm, in some cases 150 nanoseconds total tick to trade.

“Latency” is the industry term for the unavoidable network delays, and “Tick to Trade Latency” aggregates together the network travel time for a UDP market data signal to arrive at a trading system, and for that trading system to transmit a TCP order into the exchange. Last year Solarflare introduced Application Nanosecond TCP Send (ANTS) and lowered the “Tick to Trade Latency” bar to 350 Nanoseconds. ANTS executes in collaboration with Solarflare’s Application Onload Engine (AOE) based on an Altera Stratix FPGA. Solarflare further advanced this high-speed trading platform to achieve 250 Nanoseconds. Then in the spring of 2017 Solarflare collaborated with LDA Technologies. LDA brought their Lightspeed TCP cores to the table and replaced the AOE with a Xilinx FPGA board once again lowering the “Tick to Trade Latency” to 120 Nanoseconds. Now through further advances, and moving to the latest Penguin Computing Skylake computing platform, all three partners just announced a STAC-T0 qualified benchmark of 98 nanoseconds “Tick to Trade Latency!”

There was even a unique case in this STAC-T0 testing where the latency was measured at negative 68 nanoseconds, meaning that a trade could be injected into the exchange before the market data from the exchange had even been completely received. Compared to traditional trading systems which require that the whole market data network packet to be received before ANY processing can be done, these advanced FPGA systems receive the market data in the packet in four-byte chunks and can begin processing that data while it is arriving. Imagine showing up in the kitchen before you wife even finishes calling your name for dinner. There could be both good and bad side effects of such rapid action, you have a moment or two to taste a few things before the table is set, or you may get some last minute chores. The same holds true for such aggressive trading.

Last week, in a Podcast with the same name we had a discussion with Vahan Sardaryan, CEO of LDA Technologies, where we went into this in more detail.

Penguin Computing is also productizing the complete platform, including Solarflare’s ANTS technology and NIC, LDA Technologies Lightspeed TCP, along with a high-performance Xilinx FPGA to provide the Ultimate Trading Machine.

This article was originally published in January of 2009 at 10GbE.net.

Thinning the herd.

In 2007 over one million 10GbE network ports were purchased. Many of those were for a switch to switch interconnects but some were to connect servers to networks via 10GbE. Natural selection is now taking effect in the 10GbE NIC market as the big dogs, Intel & Broadcom, start thrashing around in an effort to secure market share as 10GbE matures. Both want to dominate the 10GbE LAN on Motherboard (LoM) market. In the NIC market, four companies likely supply over 80% of the 10GbE NICs purchased and they are Chelsio, Intel, Myricom, and Neterion. The remaining 20% of NIC sales fall to companies like Broadcom, SMC, NetXen, ServerEngines, Tehuti, AdvancedIO, Endace, Napatech, etc… One should be wondering why Broadcom is in the second group, it’s because Broadcom’s focus is on selling 10GbE silicon to OEMs like IBM and HP for LoM projects positioning their silicon on high-end server mother boards and not retailing NIC cards.

Officially the first documented victim is NetEffect, the leader in iWarp (Infiniband for 10GbE) NICs. NetEffect rose from the ashes of a failed Infiniband company, Banderacom, earlier this decade to apply their silicon development skills and Infiniband algorithms to the more stable Ethernet market as a new feature called iWarp. NetEffect in-fact led the iWarp charge, it was the self-proclaimed leader in low-latency iWarp 10GbE NICs. In August NetEffect filed for reorganization in US Bankruptcy Court. With the failure of NetEffect the market has cast its vote and drove a stake through the heart of iWarp, hopefully terminating this feature.

Rumors have been swirling around Teak Technologies, a maker of 10GbE NICs and a switch, for some time. It appears that Teak has not weathered the storm and has since faded away, their domain name is no longer resolving to an IP address. The domain was never transferred from the founder, and the founder announced this spring on Linkedin that he had moved on some time ago. Is it conclusive evidence, no, but would you buy technology from a tech company whose URL won’t resolve to a server?

It is a tough economic climate for start-up NIC companies, particularly those in the bottom 20% as they have likely never had a quarter in the black. Now is a challenging time to be out there seeking another round of capital from ones VCs. Several have been without an injection of new funding for over two years and lack the sales volume required to sustain their own existence much beyond year end. As such we’ve directly questioned one firm to see if they are alive, and another that is widely rumored in the industry to be in trouble, but their marketing departments are still bailing.

For those not into style Manolo Blahnik is one of the leading female shoe designers, and often Blahnik’s start at $700/pair, the price of a good 10GbE NIC. As most servers have moved to dual socket quad-core processors the value proposition for TCP Offload Engine (TOE) 10GbE NICs has quickly eroded.

In the spring of 2006, a good non-TOE 10GbE NIC consumed 40% of the host CPU in a dual-socket dual-core server and provided >6Gbps of performance, while a similar TOE did the same job using only 10% of the host CPU. So with a 30% savings in host CPU, there was some value in using a TOE. With two years of improvements in silicon, stateless offloads, and servers moving to dual-socket quad cores we now have 10GbE NICs capable of near-wire rate (>9.5Gbps) that consume only 10% of the host CPU. Similarly, TOE NICs in the same environment consume roughly 5% of the host CPU.

By most estimates, servers are typically running at 20% CPU utilization, as a result of application load. So will a 5% savings in host CPU be noticed, let alone worth the added purchase price of a TOE? No. Add to that the Linux Foundation’s 14-point argument against using TOES, written by the Linux Kernel developers themselves, and one would wonder why people still consider TOEs in style.

Here are the 14 reasons cited by the Linux Foundation on their

Security updates

Point-in-time solution

Different network behavior

Performance

Hardware-specific limits

Resource-based denial-of-service attacks

RFC compliance

Linux features

Requires vendor-specific tools

Poor user support

Short term kernel maintenance

Long term user support

Long term kernel maintenance

Eliminates global system view

If you are seriously interested in buying a TOE you should read their TOE page.