Realtime bandwidth

Strategies for high volume messaging over WebSockets

We love data at Football Radar — we have a lot of it, so this is probably a good thing. There are some significant engineering challenges that come with this, however.

Feeling the strain

One of our key products is a trading dashboard that handles hundreds of thousands of events from football matches and related betting markets. To consistently deliver value, we need to be able to handle this flurry of messages over a WebSocket connection and reliably keep the UI in sync.

At its peak, on a typical Saturday afternoon, each WebSocket connection will receive upwards of 500 messages per second. Translating each of these messages into a UI update yields incredibly bad performance: the incumbent dashboard does just that, but as the business has scaled, we are hitting the limits of the current solution.

Profiling the performance of the current dashboard has yielding some very troubling results

So that we can continue to provide value to our users, we need to think about ways we can separate the concerns of receiving messages and updating the UI. Doing so has pushed us to evaluate our tech stack and rebuild portions of our dashboard from the ground up.

Decoupling

The first step is to separate the data and view layers. Our existing data abstraction has a thorough understanding of the view logic, and upon receipt of each message directly triggers rendering logic against any affected elements.

The call to this.renderAt results on one or more DOM mutations against the affected nodes. Typically, it will also involve compilation of a template string. As you can see from the profile above, this tight coupling between the model and views has scaled incredibly badly.

Most modern JavaScript frameworks provide some sort of event bus to neatly decouple the application. In our case, that seemed like a very worthwhile addition — all view-dependent logic in the model has been replaced by calls to an internal event bus. Views listen for these events and manage their own updates.

var Model = { onMessage: function(message) { this.emit(message); }};

Even with this in place, it is not obvious how the performance of the application will improve. Surely this just moves the performance bottleneck to the view layer? Well yes, and we have seen exactly that — until we tweak the event bus to more intelligently batch and defer messages.

By overloading this.emit, we see sizeable performance gains. Instead of the message being directly dispatched, it is pushed on to an event queue and merged with similar messages. For example, these two messages for a single game can easily be dispatched as a single event:

We can then defer the dispatch of messages, periodically flushing the event queue. Given the nature of the data we present, we do not have deliver UI updates at 60 FPS, so we can very aggressively throttle message dispatch.

In this example, we flush events once every 250ms; in reality, we vary this number based on the volume of messages, to consistently provide the most appropriate performance.

Changing this relationship between the model and view layers gives us huge scope to regulate the flow of messages through the application, but in truth it’s really only half the battle.

Enter React

It is well documented that the slowest part of any web application is the DOM; in an ideal world, you should touch the DOM as little as possible, avoiding costly reflows. This is easier said than done, as there are many unexpected ways to trigger a reflow.

Being such a well understood problem, it has paved the way for impressive solutions. React, from the engineering team at Facebook, is far from just the latest technology bandwagon: it cuts right to the heart of browser performance woes, while offering a delightfully declarative component API.

I won’t dwell too much on the specifics of React, such as its virtual DOM, its declarative DOM syntax, or its composable interfaces, but it suffices to say that it is a very elegant, focused solution to a common challenge. And more than that, it perfectly suits our requirements.

By using React, we actually avoid changing too much of our architecture. Much in the same way that jQuery allows you to forget about browser compatibility, React lets us forget about bad DOM performance; we happily bind event listeners to our model to trigger a re-render of an entire invalid component, letting the React internals take care of the optimisations. This drastically simplifies our code.

Further tweaks

With React in place, we see huge improvements, but there is still plenty of room for improvement. Reflows and repaints are at an all-time low, but the sheer volume of messages being piped through the system is causing a lot of extraneous work for React’s diffing engine.

Fortunately, React offers a hook for handling chatty state changes: this.shouldComponentUpdate. This method allows us to selectively short-circuit various state changes, if we know that they shouldn’t trigger any sort of DOM manipulation.

For example, if we truncate a timestamp or a floating point number in the UI, we don’t really care about marginal updates. Messages from the WebSocket stream might include data about stoppage time or betting prices that is incredibly precise, but then rounded to the nearest minute or nearest whole number when displayed. Knowing this, we can skip needless computation.

What does it all mean?

After investing less than 4 weeks in rewriting our trading dashboard, the gains have been very encouraging

The benefits we see with this new architecture are very promising. There is certainly much room for improvement, but we are now able to deliver a very high quality experience to our users, even on devices where it was previously not possible. This project has been an eye-opener: a testament to focused profiling and choosing the most appropriate tool for the job.