Lightning Interview #3: Robert Engels on Tuning Java for Performance

Robert Engels works for OptionsCity, which develops financial service platforms using Java. Their products include Freeway, a multi-asset algorithmic trading platform that won the 2012 Chicago Innovation Awards. Robert is the Chief Architect for Freeway development. If you're familiar with modern automated trading, you'll also be quite aware of the necessity to achieve maximal performance.

Building low-latency communications systems and large-scale information retrieval systems is something Robert's been doing for more than 25 years, working with a variety of Fortune 500 companies before he came to OptionsCity. Robert studied Computer Science and Mathematics at the University of Illinois, Champaign-Urbana.

In this third Java.net "Lightning Interview" I asked Robert about the Java performance tuning his team has implemented in Freeway.

1. Can you describe the performance you've attained for Freeway?

Robert: A really excellent example of the raw throughput, is that in "almost live" testing we are able to pump across the network to the exchange more than 75k quotes per second sustained, including the processing of all resulting market data, and quote/price generation. This is on fairly commodity server hardware, 2x6 core 3.2 Ghz with 1Gb ethernet. It's hard to judge real-world performance, as our customer's current needs are more than an order of magnitude lower. In our "low latency mode", the throughput is about half that, but the average response time to a single piece of market data is less than 30 microseconds.

2. What types of tweaking/tuning did you apply to Java to achieve this performance?

Robert: The major performance advances were obtained using lock-free data structures, and managing the thread priorities. When you are attempting to shave microseconds, the performance is greatly affected by memory cache hits, which is affected by cpu affinity. Modern linux kernels do a pretty good job of scheduling on their own, but for certain subsystems being able to assign cores to work can make a big difference.

3. Can you describe in more detail the single change that made the greatest difference in improving the performance?

Robert: Since most of our customers are more concerned with latency rather than throughput, a Linux 3.x kernel, coupled with realtime thread priorities makes all the difference. You just don't want low priority or housekeeping work to interrupt or delay your critical path. Lastly, I would be remiss if I didn't mention using the Azul Systems Zing JVM with its "pause-less garbage collector", Java GC pauses are killers in a low latency environment, and Zing does wonders here.

Conclusion

OptionsCity Director of Client Technology Freddy Guime told me that once in a while, Robert takes time off from performance tuning Java: "If Roberts's thinking linearly, he's probably on the golf course." But then again, who knows what Robert's really thinking about as he's out on the course tuning his drives, approaches, chips, and putts?