Design

Introducing Multithreading to Mature Desktop Applications

By Stefan Wörthmüller, June 08, 2011

A crash course tutorial

Today, all programs must parallelize tasks if they are to enjoy the power available through multicore processors. Until recently, though, parallel programming was the domain of server programmers and their scientific counterparts. Now, programmers from other domains are faced with the problem of migrating existing applications from serial to multithreaded architectures.

The good news is that you do not have to migrate your whole application. The migration can be done step by step, and a large portion of the code does not need to be migrated at all: Most programs spend 80 to 90 percent of their runtime in 5 to 10 percent of their code. If your application has 100,000 lines of code, it's probably sufficient to parallelize 5,000 to 10,000 lines to get most of the lift of parallelism. Still, that's a lot of work.

This article shows how to introduce multithreading to mature desktop applications written in C and C++. These applications usually have some attributes that make it more difficult to introduce multithreading: They often consist of old code with many nooks and crannies whose functions are undocumented and unknown, or they are GUI applications with a single main thread. Let's explore how to migrate these applications.

Practical Steps

The core practice of converting existing apps to multiple threads consists of several steps.

Step 1: Profile the code
The first step of parallelizing an application should always be profiling. Determine the parts that benefit the most from parallelization. These are typically hotspots of activity. Everything not near the top of this list should be left to run serially. Intel VTune Amplifier XE is one of the best profilers for this purpose. Its value increases once the threading process begins because of its extensive support for threaded apps.

Step 2: Review the hotspots
After determining the most time-consuming parts of an application, review them carefully. If there is work unrelated to threading that can improve performance, this should be completed and the application should be profiled again.

Step 3: Look for potential conflicts
The most difficult part of introducing multithreading is to identify potential conflicts and resolve them. This is simple within a small loop, but it becomes more difficult when multiple functions are called and global data is used (both of which are common in mature applications). It is a good idea to review all code that will run in multiple threads for problems of parallelization. (Again, Intel tools are uniquely good at this; see Intel Parallel Studio for several tools that can aid in this step.) Also make sure not to call any third-party libraries from multiple threads before verifying their thread safety.

Step 4: Start with small parts
To keep things manageable, it's best to start with small code fragments and introduce multithreading in successive short rounds after sufficient testing  if possible, on several different machines. The performance improvements may not be significant in the first steps, but this approach will make the transition much easier and bring fewer surprises.

Threading APIs

To make changes to the code, you'll need to choose a threading API. In many cases, the API is determined by the platform, but Boost and other multiplatform frameworks, such as Nokia Qt, provide portable threading APIs. On Linux and UNIX systems, Pthreads is the native API, whereas on native Microsoft Windows, it's the extensive Win32 API set. Win32 is wrapped by modern Windows environments, such as MFC and .NET.

Table 1 presents a short overview of the function names for the most-important constructs in these alternatives. The samples shown in this article are Windows-specific, but the syntax is fairly similar for other thread libraries.

Table 1: Function names in three widely used threading APIs.

OpenMP

Of many useful toolkits out there, OpenMP is one that can be particularly helpful in migrating applications written in C/C++ or FORTRAN. OpenMP for C/C++ is a #pragma-based extension for compilers. It was created for the very purpose of introducing multithreading into existing code. It does so principally by parallelizing existing loops.

OpenMP does nothing at all by default. But it can be used to parallelize single loops by just inserting a #pragma parallel before a loop. OpenMP also has many pragmas by which you can define variables as shared or private among all threads, introduce locks, and so on. But the real benefit is that much of this work is done by C pragmas only  most of the existing code does not change. If OpenMP is disabled with a compiler switch, the pragmas are simply ignored. All major C++ compilers (including Visual Studio, GCC, and Intel) support OpenMP. It is well-established and introduces no additional costs.

Problems You'll Encounter

Once you've settled on an API and you begin to make changes, you're likely to encounter various practices that are safe in single-threaded apps, but become problematic when more than one thread is used.

Shared resources and synchronizationThe most common problem results from shared resources such as global or static variables. Consider the following code:

Because thread execution cannot be controlled by the programmer, two threads executing the code can create problems for each other. Thread 1 could be completing the loop, having just set the variable to 1. If Thread 2 hits the first statement immediately after, the variable will be reset to 0 and Thread 1 will continue looping, rather than exiting.

The effects of this interference might be obvious (such as causing the program to crash), but they might also silently corrupt data or program results, which is much worse because the problem might remain unnoticed.

There are a couple of ways to work around this problem:

Eliminate the problem's cause. Decouple things, so that two threads don't share the same variable. This is generally considered the best approach.

Introduce a lock, so that the code shown can be executed only by a single thread at a time. This is safe and won't affect performance excessively, provided the code concerned is not called frequently.

Deadlocks

Managing a lock  that is, an element that blocks more than one thread at a time from executing a given piece of code  has the potential to cause several problems. The most common is a deadlock. This occurs when two threads are each trying to access code locked by the other. As each thread waits for the lock on a section of code to be released, neither thread can advance. This deadly embrace means both threads are effectively hung. Any other thread that wants to access the code or data locked by one of the threads will hang as well.

A common way to work around this problem is to acquire locks in a certain order (i.e., increasing by a user defined lock index, etc.) in program code. But this does not help if the locking calls are in different nested functions. At the beginning of a migration from single- to multithreaded code, deadlocks should be a rather uncommon problem. A good rule is to always use as few locks as possible.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!