Multi-threading in .NET: Introduction and suggestions

One of the greatest understatements I've heard in a newsgroup was made by
Patricia Shanahan, in a Java newsgroup in 2001: "Multi-threaded programming
needs a little care." Multi-threading is probably one of the worst understood
aspects of programming, and these days almost all application programmers
need to understand it to some extent. This article acts as an introduction
to multi-threading and gives some hints and tips for how to do it safely.
Warning: I'm not an expert on the subject, and when the real experts start
discussing it in detail, my head starts to spin somewhat. However, I've tried
to pay attention to those who know what they're doing, and hopefully the
contents of this article form at least part of a multi-threading "best practice".

This article uses the C# type shorthands throughout - int
for Int32 etc. I hope this makes it easier for C# developers
to read, and won't impede any other developers too much. It also only
talks about the C# ways of declaring variables to be volatile and locking
monitors. Developers using other languages can find the equivalents in their
own preferred environment, I'm sure.

The fact that you're reading this article in the first place means you probably
have at least some idea of what multi-threading is about: it's basically trying
to do more than one thing at a time within a process.

So, what is a thread? A thread (or "thread of execution") is a sort of context
in which code is running. Any one thread follows program flow for
wherever it is in the code, in the obvious way. Before multi-threading,
effectively there was always one thread running for each process in an
operating system (and in many systems, there was only one process running anyway).
If you think of processes running in parallel in an operating system (e.g. a
browser downloading a file and a word processor allowing you to type, both "at
the same time"), then apply the same kind of thinking within a single process,
that's a reasonable way to visualise threading.

Multi-threading can occur in a "real" sense, in that a multi-processor box may
have more than one processor executing instructions for a particular process at
a time, or it may be effectively "simulated" by multiple threads executing in
sequence: first some code for thread 1 is executed, then some code for thread 2,
then back to thread 1 etc. In this situation, if both thread 1 and thread 2 are
"compute bound" (all they're doing is computation, without waiting for any input
from the network, or file system, or user etc) then that won't actually speed
things up at all - in fact, it'll slow things down as the operating system has to
switch between threads, and the memory cache probably won't be as effective. However,
much of today's computing involves waiting for something to happen, and during that
time the processor can be doing something else. Intel's "Hyper-Threading" technology
which is on some of its more recent chips (bearing in mind that this article was
written in early 2004!) is a sort of hybrid between this "real" and "simulated"
threading - for more information, see
Intel's web page on the subject.

.NET has been designed from the start to support multi-threaded operation. There
are two main ways of multi-threading which .NET encourages: starting your own threads
with ThreadStart delegates, and using the ThreadPool class
either directly (using ThreadPool.QueueUserWorkItem) or indirectly using
asynchronous methods (such as Stream.BeginRead, or calling BeginInvoke
on any delegate).

In general, you should create a new thread "manually" for long-running tasks, and use the
thread pool only for brief jobs. The thread pool can only run so many jobs at once, and
some framework classes use it internally, so you don't want to block it with a lot of
tasks which need to block for other things. The examples in this article mostly use
manual thread creation. On the other hand, for short-running tasks, particularly those created
often, the thread pool is an excellent choice.

The code creates a new thread which runs the ThreadJob
method, and starts it. That thread counts from 0 to 9 fairly fast (about
twice a second) while the main thread counts from 0 to 4 fairly slowly
(about once a second). The way they count at different speeds is by each
of them including a call to Thread.Sleep, which just makes
the current thread sleep (do nothing) for the specified period of time.
Between each count in the main thread we sleep for 1000ms, and between
each count in the other thread we sleep for 500ms. Here are the results
from one test run on my machine:

Main thread: 0
Other thread: 0
Other thread: 1
Main thread: 1
Other thread: 2
Other thread: 3
Main thread: 2
Other thread: 4
Other thread: 5
Main thread: 3
Other thread: 6
Other thread: 7
Main thread: 4
Other thread: 8
Other thread: 9

One important thing to note here is that although the above is very
regular, that's by chance. There's nothing to stop the first "Other
thread" line coming first, or the pattern being slightly off -
Thread.Sleep is always going to be somewhat approximate, and
there's no guarantee that the sleeping thread will immediately start
running as soon as the sleep finishes. (It will become able to
run, but another thread may be currently running, and on a single
processor machine that means the thread which has just "woken up" will
have to wait until the thread scheduler decides to give it some processor
time before it next does anything.)

As with all delegates, there's nothing to restrict you to static methods,
or methods within the class that the delegate is used from. You need to
have access to the method, of course, and if you want to specify an
instance method, you have to use a particular instance. Here's another
version of the program above, using an instance method in a different
class. If the Count method had been static, the value of the
job variable would have been new ThreadStart(Counter.Count).
Most examples given in this article use methods within the same
class, but that's just for brevity and simplicity.