So far in this series on queueing theory, we’ve seen single server queues,
bounded queues, multi-server queues, and most recently queue
networks. A fascinating result from queueing theory is that wait time
degrades significantly as utilisation tends towards 100%. We saw that
queues, which are unbounded, have degenerate behaviour under heavy load when
utilisation hits dangerous levels.

In my previous post I presented the queue as a model for
multi-server architectures. As discussed at the end of that post, the
model has two main drawbacks: each server must have the same
service rate, and there’s no mechanism for modelling the overhead of
routing between servers. Modelling a multi-server system using a single
queue - even a queue with multiple servers - ignores important real-world
system characteristics. In this post, I’ll explain how we can arrange
queues into networks that capture the cost of routing and allow for
servers with different service rates.

A few questions seem to come up again and again from the people who’ve
been reading my posts on queue theory. Perhaps, the most common question
is: “How do I model multi-server applications using queues?”. This in an
excellent question since most of us will be running production systems
with more than one server, be that multiple collaborating services or just
a simple load-balanced service that has a few servers sharing the same
incoming queue of customers.

After showing my previous post around at work, a colleague responded
with this article in which the author compares the performance of
a Java EE application running on Windows and on Linux. When running on
Linux, the application exhibits the performance characteristics outlined
in my post: at high utilisation, latency grows uncontrollably. What might
be surprising however is that on Windows, latency doesn’t change much at
all, even at high utilisation. Does this mean that the results we saw for
queues are wrong? Not quite! Whereas the Linux results show
increased latency at high utilisation, the Windows results show an
increased error count; at high utilisation Windows is simply dropping
connections and kicking waiting customers out of the queue.

At Skipjaq, we are interested in how applications
perform as they approach the maximum sustainable load. We don’t want to
completely saturate an application so it falls over, but we also don’t
want to under-load the application and miss out on true performance
numbers. In particular, we are interested in finding points in the load
where latencies are on the precipice of moving outside acceptable limits.