Which I/O Strategy Should I Use?

by Warren Young

There are several different conventions for communicating with
Winsock, and each method has distinct advantages. The question of
the hour is, what are these advantages, and how does someone choose
the convention that makes the most sense for their application? The
choices are:

Blocking sockets - By default,
a Winsock call blocks, meaning that it will not return until it
has completed its task or has failed while trying.

Pure Non-blocking sockets
- Calls on non-blocking sockets return immediately, even if they
cannot complete their task immediately. Although this allows the
program to do other things while the network operations finish,
it requires that the program repeatedly poll to find
out when each request has finished.

Asynchronous sockets -
These are non-blocking sockets, except that you don’t have
to poll: the stack sends the program a special window message
whenever something "interesting" happens.

select() - The select() function call is
a way to block a thread until something interesting happens on
any of a group of sockets. It is usually used with non-blocking
sockets, in order to avoid polling.

Event objects - Used with WSAEventSelect(),
this mechanism is similar to the select() method, but a
bit more efficient. It also only works on platforms with Winsock,
whereas select() works on any platform with BSD sockets.

Overlapped I/O - One of
Winsock 2’s major
features is that it ties sockets into Win32’s unified
I/O mechanism. In particular, you can now use overlapped I/O on
sockets, which is intrinsically more efficient than the above
options.

Further confusing the issue are threads, because each of the above
mechanisms changes in nature when used with threads.

In trying to find an answer to the "which I/O strategy" question,
it becomes apparent that there are only a few major kinds of programs,
and the successful ones follow the same patterns. From those patterns
and practical experience — some personal and some borrowed
— I have derived the following set of heuristics. None of these
heuristics are absolute laws, no one isolated heuristic is sufficient,
and the heuristics sometimes conflict. When two heuristics conflict,
you need to decide which is more important to your application and
ignore the other. However, beware of ignoring a heuristic simply
because violating it does not create noticeable consequences for your
program. If you get into the habit of ignoring a certain heuristic,
it becomes useless.

The heuristics are ordered in terms of compatibility, then speed,
and finally functionality. Compatibility is first, because if a
given I/O strategy won’t work on the platforms you need to
support, it doesn’t matter how fast or functional it is. Speed
is next because performance requirements are easy to determine,
and often important. Functionality is last, because once you decide
the compatibility and speed issues, your choices become much more
subjective.

There are many versions of Windows, but when it comes to the
network stack, you can put most of them into one of two groups:
the Windows 95 derivatives and the Windows NT 4.0 derivatives.
This article treats everything else — Windows NT 3.x, Win16,
Windows CE and non-Windows platforms
— separately.

Your code may also need to be compatible with POSIX-based systems.
This includes Unix, Linux, MacOS X, QNX, and BeOS. Although there
are a few different network and threading APIs used by the various
POSIX-based systems, I’ll only talk about BSD sockets and POSIX
threads in this article.

None of these operating systems have exactly the same set of
networking features. You can exploit this fact to rule out I/O
strategies that not all of your target operating systems support.

Win9x

WinCE

WinNT 4+

WinNT 3.x

Win16

Unix

Blocking Sockets

yes

yes

yes

yes

yes

yes

Non-blocking Sockets

yes

yes

yes

yes

yes

yes

Asynchronous Sockets

yes

no

yes

yes

yes

no

Event Objects

yes

no

yes

no

no

no

Overlapped I/O

yes1

no

yes

no

no

no2

Threads

yes

yes

yes

yes

no

yes3

Win9x does not support overlapped I/O in the kernel. Where
overlapped I/O calls work on Win9x, it is because the mechanism
is emulated at the API layer. (This applies to Winsock, file and
serial/parallel port I/O at least.) This means that programs that
only use overlapped I/O functionality guaranteed by the Winsock
spec will run fine on Win9x. If, on the other hand, you stray into
functionality that only WinNT 4+ provides, your application will
fail on Win9x. One example of this is calling ReadFile()
with a socket: this works fine on NT4+, but will fail on Win9x.

If you only need scatter/gather I/O support, BSD sockets
provides this functionality in the readv() and writev()
calls. There is no standard Unix mechanism that provides similar
efficiencies to Win32’s overlapped I/O. Some Unixes provide
the aio_*() family of functions (called asynchronous I/O,
but not related to Winsock’s asynchronous I/O), but this
is not implemented widely at the moment.

Although all current Unixes support POSIX threads, there
are still a lot of older Unix machines out there with broken,
nonstandard or nonexistent threading. You will have to choose a
subset of all the Unixes if you want to use the same threading
code on all Unixes. You’ll definitely be writing different
threading code for Windows, since its threading API is completely
different.

Heuristic 2: Avoid select().

select() is the least efficient way to manage non-blocking
I/O, because there is a lot of overhead associated with the
function. Most of this overhead is a linear function of the number
of connections: double the number of connections, and you double the
processing time.

About the only time you should use select() is for
compatibility reasons: it’s the only non-blocking I/O strategy
that works on all versions of Windows (including CE) and on virtually
all POSIX-based systems. If your program only needs to work on non-CE
versions of Windows, there are better alternatives.

Heuristic 3: Asynchronous sockets work best with low volumes of data.

Asynchronous Winsock I/O (WSAAsyncSelect()) isn’t the
most efficient I/O strategy, but it’s not the least efficient,
either. It’s a fine way to go in a program that deals with low
volumes of data. As the volume of data goes up, the overhead becomes
more significant.

Heuristic 4: For high-performance servers, prefer
overlapped I/O.

Of all the various I/O strategies, overlapped I/O has the highest
performance. (I/O completion ports are even more efficient, but are
nonstandard vis-a-vis Winsock proper, so I don’t cover them in
the FAQ.) With careful use of overlapped I/O (and boatloads of memory
in the server!) you can support tens
of thousands of connections with a single server. No other I/O
strategy comes close to the scalability of overlapped I/O.

Heuristic 5: To support a moderate number of
connections, consider asynchronous sockets and event objects.

If your server only has to support a moderate number of connections
— up to 100 or so — you may not need overlapped
I/O. Overlapped I/O is not easy to program, so if you don’t
need its efficiencies, you can save yourself a lot of trouble by
using a simpler I/O strategy.

Programmed correctly, asynchronous sockets are a reasonable choice
for a dedicated server supporting a moderate number of connections. The
main problem with doing this is that many servers don’t have
a user interface, and thus no message loop. A server without a UI
using asynchronous sockets would have to create an invisible window
solely to support its asynchronous sockets. If your program already
has a user interface, though, asynchronous sockets can be the least
painful way to add a network server feature to it.

Another reasonable choice for handling a moderate number of
connections is event objects. These are very efficient in and of
themselves. The main problem you run into with them is that you
cannot block on more than 64 event objects at a time. To block on
more, you need to create multiple threads, each of which blocks on a
subset of the event objects. Before choosing this method, consider
that handling 1024 sockets requires 16 threads. Any time you have
many more active threads than you have processors in the system,
you start causing serious performance problems.

One caution: it’s very easy to underestimate the number of
simultaneous connections you will get on a public Internet server. It
may make sense to design for massive scalability even if your estimates
don’t currently predict thousands of simultaneous clients.

Heuristic 6: Low-traffic servers can use most any
I/O strategy.

For low-traffic servers, there isn’t much call to be
super-efficient. Perhaps your server just doesn’t see high
traffic, or perhaps it’s running a Windows 95 derivative and
so it limited to 100 sockets
at a time by the OS. Suitable strategies for 1-100 connections are
event objects, non-blocking sockets with select(), asynchronous
sockets, and threads with blocking sockets.

We’ve covered the first three methods already, so let’s
consider threads with blocking sockets. This is often the simplest
way by far to write a server. You just have a main loop that accepts
connections and spins each new connection off to its own thread, where
it’s handled with blocking sockets. Blocking sockets have several
advantages. They are efficient, because when a thread blocks, the
operating system immediately lets other threads run. Also, synchronous
code is more straightforward than equivalent non-synchronous code.

There are two main problems with thread-per-connection
servers. First, threads often require a lot of synchronization work,
which is hard to get right; this may outstrip the simplicity benefits
of using blocking sockets. Second, threads don’t scale well
at all: as the number of threads increases, the operating system
overhead associated with context switches between the threads becomes
significant. This method is only suitable for a fairly small number
of connections, or a greater number of connections that are mostly
idle.

Heuristic 7: Do not block inside a user interface
thread.

This heuristic sounds more like a straightforward rule of
Windows programming, but I bring it up because most programs are
single-threaded. In a single-threaded GUI program, any time you call
a Winsock function that blocks the UI thread, buttons can’t be
pressed, menus won’t pull down, scroll bars won’t move,
keypresses are ignored...your UI freezes.

Heuristic 8: For GUI client programs, prefer
asynchronous sockets.

There are two reasons for this heuristic:

Asynchronous sockets were designed from the start to work
well with GUI programs. You already have a window loop going,
and you already have window management code in the rest of the
program. Adding asynchronous network I/O is about as easy as
adding a dialog to your program.

All of the alternatives require at least one additional
thread to handle the networking in order to satisfy the previous
heuristic. With asynchronous sockets, you can handle both the
network and the UI with a single thread. Since window messages
are handled one at a time in the order they arrive, everything
is automatically synchronized.

Heuristic 9: Threads are rarely helpful in client
programs.

When a programmer first learns about threads, he is eager to
try them out in his own programs. He sees that they have several
advantages, but he doesn’t yet see the drawbacks. Unfortunately
for the soon-to-be-educated newbie, these drawbacks can have very
significant consequences.

One real benefit of threads is that a thread doing I/O on a
blocking socket has a linear control flow, and is therefore easier
to understand. Asynchronous code is more spread out, so it is harder
to write and debug.

Another perceived benefit of threads is a kind of encapsulation:
a programmer can split a program up into a number of threads, each of
which has a single well-defined task. But, this is only valid if each
thread is mostly independent from the rest of the program. If not,
the threads will have to share data through a common data structure,
destroying any potential encapsulation.

In the end, the biggest problem with threads is also related to
shared data structures: synchronization. This issue is covered better
elsewhere, so I won’t spend many words on it here. In short,
synchronization is hard to get right: poorly-synchronized threads
are subject to serialization delays, context switching overhead,
deadlocks, race conditions and corrupted data. These are hard problems,
and for most programs the benefits are not large enough to make them
worth overcoming.

A saner alternative is to use asynchronous I/O. This buys you the
synchronization benefits described in the previous heuristic. You
can even partition the application in a similar manner to threads
by creating an invisible window for each socket. If you have two
different types of sockets, each socket can have its notifications
sent to a different type of window. In straight API terms it means
a separate WndProc() for each type of socket. In terms of
frameworks like MFC, you can put the code for each type of socket in
a different subclass of CWnd.

Heuristic 10: Use threads only when their effect on
the rest of the program is easily contained.

The previous heuristic cautions that threads are often very hard
to program correctly, but the truth is that they are sometimes very
useful. You can make an educated guess about whether threads will
improve the program by doing a bit of design work: is there a clean
interface between each thread and the rest of the program? If so,
synchronization becomes simple. If not, you’re going to end up
with a mess that crashes and destroys data unpredictably.

Examples where threads are viable are:

An FTP server. One way to write an FTP server is to
let the main thread accept the incoming network connections,
and send each one to a separate thread. Then, each thread can
process the incoming FTP commands, send any required replies,
and terminate when the session closes. Because each thread never
has to interact with any other, and they all act alike, this is
an ideal application of threads. (But, keep in mind the previous
server-related heuristics: one thread per client severely limits
your server’s scalability.)

A web browser. When you download a file with a modern
web browser, the file comes down in the background, so that you
can continue browsing. That download stream is most likely handled
by a dedicated thread.

An email program. In an email program, the primary
focus is usually on reading and writing email. However, when
an email message needs to be sent, it is best not to interrupt
the user’s work. You can send that message with a separate
network thread, since the process affects the rest of the program
only minimally.

A stock ticker. Reduced to basics, a stock ticker simply
displays a small amount of continuous real-time data in a pleasing
and useful format. When the amount of network data involved is low,
the thread synchronization overhead becomes negligible. Plus,
this kind of application only has a single data structure that
needs protection; the really big synchronization problems appear
when multiple data structures need to be protected.

Heuristic 11: Design around your protocol.

Some network protocols are inherently synchronous, and others are
not. An example of a synchronous protocol is the POP3 e-mail protocol:
send a user name, get a response, send a password, get a response,
send a request to get the list of emails, get a response... With POP,
you have to send these commands in a specific order: you can’t
send the password before the user name, and you can’t get the
list of emails without sending the user name and password. Writing
a POP client with a non-synchronous socket type would require also
writing a state machine.

On the other hand, if your protocol is non-synchronous, you might
as well use non-synchronous sockets. Non-synchronous protocols tend
to resemble a set of function calls. Consider, for example, a program
to retrieve data from a networked SQL database: send a SQL statement,
and retrieve the result set. At the end of each "function call", the
program is back to its original state: you don’t need to maintain
a state machine to keep track of where you are in the protocol.

This heuristic is almost a restatement of all the above material. It
just bears repeating that, while blocking sockets are attractive for
their simplicity, you may find that their disadvantages eventually
force you to redesign your program to use some form of non-blocking
sockets. This is especially true if your program will be supporting
more than one socket. (Virtuall all server programs fall into this
category.) The only reasonable way to use multiple blocking sockets
at once is to use threads, but with non-blocking sockets, you have
many more design options.

Conclusion

It is my hope that you find these heuristics helpful. Although
you may not agree with each of them, I think that they will at least
make you think about your own choices. Design is a highly subjective
enterprise, and this list is based mainly on my own thoughts and
preferences.

Special thanks go to Philippe Jounin for his comments on the 1998
version of this paper. The 2000 version reflects my greater experience,
as well as commentary from David Schwartz and Alun Jones, both of
whom expanded my ideas of the proper way to build a Winsock server.