Multithreading tutorial, part one: Introduction

This is the first of a multi-part series demonstrating multithreading techniques and performance characteristics in VB.Net.

Introduction

Multithreading is treated as a black art by many programmers, and for good reason. Even for an experienced programmer, knowing when to multithread an application, where to multithread an application, and how to multithread an application can be difficult to determine. Multithreading is definitely not a science, there are no hard and fast rules whatsoever to guide the programmer. In preparation for writing this series, I contacted Rick Brewster at Microsoft (who worked on the excellent Paint.net software) for advice. He quoted Rico Mariani at Microsoft: "measure, measure, measure." This is sound advice. This series of articles is not going to provide any pat advice, but will offer some suggestions, sample code, and demonstrations of various multithreading techniques as well as showing some performance characteristics, so that you can get an idea of your multithreading options.

Where We Were, Where We Are

The idea behind multithreading is not new, it goes back decades. However, up until very recently, with the advent of HyperThreading processors and then multi-core processors did multithreading become as critical to understand when writing applications as it suddenly is. Before these advances in hardware, unless you were writing applications for use of "big iron," chances were that the system you were coding for would probably not have more than one physical or logical CPU, unless you were writing applications for expensive SMP or parallel processing servers. The Windows world has been almost entirely single processor until recently, while only the high end Macintosh machines had multiple processors. The game has completely changed, particularly with Intel introducing the Core 2 CPUs (codenamed "Conroe"). With this line of CPUs and the guaranteed to follow AMD counterpunch, along with the previous generations of multi-core CPUs, even low end desktop PCs are coming out with multiple logical CPUs.

On a single core, single CPU computer, all multithreading is performed by the operating system with a trick called "time slicing." As a basic idea, time slicing essentially multiplexes requests to the CPU, interleaving them and giving a bigger portion of the pipeline to higher priority processes. While the system is effective and works, it is quite inefficient. The operating system needs to understand which threads go with which requests, and there is quite a bit of overhead associated with the technique. With the new systems entering (and already on) the market, the multithreading game has changed completely, allowing the operating system to dynamically allocate different threads to different logical CPUs based upon the threads' needs. For example, one long running thread can devour all of one core's resources, while the other core handles short lived, simple to process threads such as user input and mouse clicks. As a result, applications that make maximum use of the new architecture will be more responsive, run more smoothly, and provide a better experience to the user.

As an interesting sideline to the new architechtural changes, while the number of cores per physical CPU is increasing (Intel and AMD are currently producing dual core systems, and quad core systems are on their way!), the clock speed of the individual cores has remained stagnant and even gone done a bit in some cases. In other words, while the new CPUs can juggle multiple tasks better than before, single tasks pay a price. This makes it more crucial than ever before that applications be written to use multiple threads in the right places in the right way. Most desktop applications tend to not use multithreading, or if they do use multithreading, the main processing occurs on one thread, leaving the main application's thread free to provide status updates and process user input such as handling a "Cancel" button. We have all seen what a single threaded application looks like: the window goes grey, Task Manager reports that the process is not responding until it suddenly finishes and updates the screen, we see the spinning hourglass, and have no way of stopping it from cranking away without killing the whole process. Even worse, these applications frequently dominate the CPU, causing the entire system to run slow. A properly multithreaded application, particularly one on a multi-core or multi-CPU system does not inflict itself on the user like this.

To sum it all up, an application that is written to be properly multithreaded, all else being equal, will be preferred by a user over one that runs in single thread. And programmers who understand multithreading and have demonstrable experience in the techniques of multithreading will be in increasingly hot demand as the desktop market shifts to the new CPUs.

A Quick Primer to Multithreading Concepts

Multithreading seems to have a language all to itself, and it does! Many of the concepts involved in multithreading simply do not get discussed in traditional procedural or object oriented code, because the problems do not exist. If they are encountered, it is with working with a database, in terms of maintaining data integrity. Here is a brief primer to the ideas in the world of multithreading.

Physical CPU: An actual CPU or processor that fits into a socket on the motherboard. A computer with only one core per CPU will have the same number of physical and logical CPUs.

Logical CPU: A separate CPU pipeline. A HyperThreaded processor will appear to have two logical CPUs per core, and a multi-core processor will have a logical CPU for each core per processor. For example, a computer with two dual-core, HyperThreaded CPUs will appear to have eight logical CPUs (2 physical CPUs x 2 cores per CPU x 2 pipelines per core).

Atomic Operation: An operation in code which must only be executed by one thread at a time, typically to maintain data integrity. For example, the following code block will not work correctly if other threads try updated the variable intSharedVariable, necessitating making this block of code an atomic operation:

Block: A thread (or process) is "blocking" when it is in such a state that all other threads must wait until it is finished to continue their work. Using the example for "Atomic Operation," any thread trying to access intSharedVariable will block until the atomic operation is completed.

Lock: A system for marking a restricting access to a particular variable or object to other threads. The other threads will block until the lock is released. Some locks pertain to both reading and writing (any thread accessing the object will block no matter what), and others are write locks, allowing other threads to read the data but will block when trying to change its value. It is extremely rare to find a read lock, allowing data to be written but blocking on read operations.

Mutex: A locking primitive used to ensure that only one thread at a time has access to a resource.

Semaphore: A counter that counts how many resources are being used at any given time. When the maximum is reached, calls to decrement the semaphore block until another thread releases the semaphore. Semaphores are used to limit the number of threads performing work or accessing a resource, such as in a thread pool.

Monitor: A method of controlling and synchronizing access to objects.

Thread Pool: The .Net thread pool dynamically manages the number of threads and their priority for processes that use it. Any number of threads may be requested to run from the thread pool, but the .Net Framework controls how many are active at any given moment.

A Brief Aside Regarding Virtual Machines

Because of the increasing prevalence of virtual machine (both the host/client type and the hypervisor variety), it is unreliable to rely upon a count of the number of physical processors in a machine. Any low level functions are unreliable at best. Instead, you should count on the OS and/or the .Net Framework to determine the number of processors that are apparent to the OS. The best way to do this within .Net is a call to System.Environment.ProcessorCount which returns the number of logical processors visible to the operating system. That is an important distinction, as a hypervisor or virtual machine system may make only one logical or physical processor accessible by the operating system running your application, while more logical or physical CPUs actually exist. As an example, if you have two dual core CPUs installed and the hypervisor only makes one core available to the OS running your application, but you base your thread usage on the number of logical cores at a low level (which would be four, not the one usable by your application and OS), you will be using four times as many threads as you meant to be using.

The Challenges

Writing applications that use multithreading are not very difficult, particularly if it is of the type where a single thread for processing runs separately from the main thread, to allow for progress bars and cancel buttons. In that case, the programmer has not strayed very far from having a single threaded application running on a multithreaded operating system. Other systems have processing of data that is in an N-dimensional format that does not require any atomic operations, so dividing the workload by logical CPU and processing them simultaneously is not too difficult; many multimedia applications are written in this manner. For example, a graphics program might divide an image into quarters on a system with four logical CPUs, and use four threads to simultaneously apply a filter to each quarter. Other applications like database or application servers will created and queue a thread for each operation, and perform them as possible in the order they were entered, carefully locking key data. Yet other applications will perform as much processing as possible, while network or file I/O processes on a separate thread, splitting the two processes as early as possible and joining them as late as possible to give the I/O the most possible time to finish on its own. And yet other applications need to perform a good number of atomic operations, and weigh the cost of each context switch and blocking against the benefit of each additional thread.

By far, the biggest difficulty with writing multithreaded applications is maintaining data integrity. It is tempting to lock as much code as possible within atomic operations or tie objects up with mutex's and just hope for the best. While that may have a lot of data integrity, the cost of the atomic operations and locking is so high that your application will probably run significantly slower than a single threaded application!

After the data integrity issue has been solved, performance is the next concern. A profiler such as the one included with Visual Studio 2005 is invaluable for this. By using the profiler on a single threaded piece of code, you can see where the bottlenecks are, and judge whether or not they are big enough to justify the cost of multithreading them. Once the application is multithreaded, the profiler helps you measure how many threads is best for your application.

Testing is another challenge. Due to the nature of a multithreaded application, stepping through the application is frequently unrealistic. The best way to test these types of applications is to prepare a battery of input/expected output scenarios and test against those. Code review, particular by peers, can pick up critical details that paper planning and even testing might not catch. Repeated tests are quite important, because depending on your threading technique, the threads may not always be processed in the same order, providing different results if there is a problem. Finally, having a wide range of CPU architectures to test on is extremely important if you are relying upon the number of processors to manage a thread pool or number of concurrent threads.

Building a Multithreaded Application

The next blog post in this series will work us through building the outline of an application designed to demonstrate various multithreading techniques and their performance characteristics. Stay tuned!