This page documents a set of benchmarks demonstrating the scalability
of a Java-based server using nonblocking I/O mechanisms. Here, we
compare the use of thread-based concurrency with nonblocking I/O, using
the Java NBIO library.

Note: These results are somewhat outdated. More recent analysis
is provided in our SOSP'01
paper.

Basic server setup:
The server in question accepts socket connections from clients; for each
burst of 1000 8192-byte packets it sends out a short 32 byte ack. The
server and clients simply measure the bandwidth of the connection.

There are seven implementations of the server:

Threaded: Forks a thread for each incoming connection.

NBIO: Uses a single thread along with my NBIO nonblocking socket library for Java.
This version of NBIO uses the poll(2) system call to test for incoming events.

SelectSource: Uses the poll(2)-based NBIO along with SandStorm's
SelectSource, a "shim" which implements an event queue on top
of NBIO's SelectSet class. In addition the SelectSource
randomizes the order of readiness events passed to the application, so
that I/O is balanced fairly across many sockets.

/dev/poll NBIO: Uses a single thread along with my
NBIO
nonblocking socket library for Java.
This version of NBIO uses the /dev/poll mechanism to test for incoming events.

aSocket Threaded: Uses the original threadpool-based version
of Rob von Behren's aSocket library (ninja2.core.io_core.aSocket).
This provides a nice asynchronous socket abstraction (cleaner and easier
to program than raw nonblocking sockets). However, this version is
implemented without NBIO, so it devotes a pool of threads to deal with
the incoming connections.

aSocket NBIO: Uses SandStorm's NBIO-based aSocket library.
This library uses only 3 threads: One for reading, one for writing, and one for
accepting incoming connections.

/dev/poll aSocket NBIO: Uses SandStorm's NBIO-based aSocket library
with /dev/poll support.
This library uses only 3 threads: One for reading, one for writing, and one for
accepting incoming connections.

The server and all clients are 4-way 500 Mhz Pentium III machines
running Linux 2.2.15 with IBM JDK 1.1.8. All nodes are connected with
Gigabit Ethernet.

This graph shows the aggregate bandwidth measured by the server as the
number of connections grows from 0 to 1000. Some things to note about
this:

First, the NBIO-based servers all sustain good throughput even out to a
large number of connections. When using poll(2), there is some penalty
as the number of
connections increases, which is no doubt due to the overhead of the
Linux poll(2) implementation -- see below.
When using /dev/poll instead, the graph is essentially flat out to 1000
connections!

The threaded servers cannot run beyond 400 and 450 connections,
respectively. This is because each server requires at 1 thread per
socket connection, but Linux has a per-user limit of 512 processes (and
IBM "native threads" are actually processes in Linux). In fact, to run
abot 256 threads, one needs to run as root and set
ulimit -u unlimited. Even so, we can see that the threaded
server performance starts to degrade as the number of connections increases.

The aSocket servers perform worse than the "raw" NBIO and threaded
servers. This is not surprising, since aSocket provides a nice level of
abstraction on top of "raw" sockets. In particular, aSocket is
responsible for allocating a new byte array for each incoming packet and
passing it up to the user; this increases memory pressure as well as
garbage collection overhead. However, it is more general in the sense
that the user need not manage incoming buffer space for packets.

The NBIO servers exhibit lower performance for a small number of
connections than their threaded counterparts. This is also not
surprising: the overhead of using a small number of threads is low
compared to setting up event-handling loops and using the NBIO
SelectSet mechanism to test for incoming events. However, NBIO
is not optimized for small numbers of connections -- if you only have a
few connections, then you might as well be using threads! Sustaining
high performance for a large number of connections is the goal.

The comparison between SelectSource and raw NBIO is meant to show that
the overhead for event queue randomization is low.

Scaling to 10,000 Connections

This graph shows the aggregate bandwidth measured by the server as the
number of connections grows from 0 to 10,000. Note that only the
/dev/poll-based aSocket server was measured with 10,000
connections, although all of the NBIO-based servers could support the load.

At 10,000 connections the server obtains an aggregate bandwidth of
101 Mbps, which while lower than the peak bandwidth of 161 Mbps
(for 100 connections) is still very good. The performance of the
poll(2)-based and threaded aSocket servers are shown for
comparison. Note that the x-axis is using a log scale here.