Parallel Programming in Visual C++ 2010 CTP

The CTP build of Visual C++ 2010 includes a new library to help you write native parallel code. Writing parallel code is getting more and more important with the broad availability of quad-core CPUs at this time and the many-core CPUs that will appear in the coming years. I will only be talking about the new concurrency library for native code. Of course, writing parallel code has already been possible for a long time. However, you had to create and manage all threads by yourself and this could often be a complex task. Because of this, it requires quite a bit of time to parallelize a simple loop over multiple threads. The new native concurrency library makes this much easier.

This article will explain the parallel_for construct that is part of the native concurrency library in more detail and will briefly touch on a few other constructs. For the example, I will parallelize a trivial implementation of a Mandelbrot fractal renderer.

This code first calculates the width and height of the window to which you will be rendering. It also sets up the position and size of your view on the imaginary plane. The renderer will render line by line in a memory device context, so you set up a memory device context and select a bitmap in it whose size is the width of the rendering window and whose height is just 1 pixel. Then, you loop over each line. In each line, you loop over each pixel and for each pixel you iterate a number of times to calculate the value of that pixel. When you escape from the Mandelbrot set, you calculate the value "d" to get some kind of smooth grayscale coloring of your fractal. Once a row has been rendered, it will be blitted to the screen using BitBlt so you can see the progress of the rendering.

When you would run this renderer, it will render line by line from top to bottom. A screenshot of this can be seen below:

Parallelizing the Mandelbrot Renderer

Before you can use the new native concurrency library, you need to include the ppl.h file. Also, these concurrency functions are inside the namespace Concurrency, so either use "using namespace Concurrency" or specify Concurrency in front of every use of something from the library.

#include <ppl.h>
using namespace Concurrency;

To parallelize the above Mandelbrot renderer using the new parallel_for construct from the Visual C++ 2010 CTP concurrency library, you basically only need to change the outer for loop—the loop that is iterating over all the rows. A first version would look like the following:

The only things that have been changed in this code compared to the original code are the bold parts, meaning the "for" has been replaced with the "parallel_for" and the creation of the memory DC and bitmap are moved inside the parallel_for because you need a separate memory DC for every thread that will be created. The parallel_for construct is using another new feature of C++, called Lambda expressions, that allow you to create anonymous inline functions. Describing lambda expressions is outside the scope of this article.

Making the Renderer Thread Safe

When you would run the above code, you would notice that some lines will be missing in the rendering result. This is because you are writing to the same device context from different threads without any synchronization. To fix this, you will use a critical section to secure access to the device context so that only 1 thread can draw on it at the same time. The changes look as follows:

Before starting the parallel_for loop, you initialize a critical section. Inside the parallel_for loop, you will wrap all usages of the device context "dc" inside EnterCriticalSection/LeaveCriticalSection constructs to make sure only 1 thread accesses that device context at the same time.

When executing this code, you will see that it is rendering in blocks, as can be seen in the following screenshot. Each block is being handled by a different thread.

This article comes with one attachment. MandelbrotPar_src.zip contains the above Mandelbrot example. In the toolbar of the application, you will find a button with a P in it. When you toggle this button, you will switch between serial and parallel rendering. The titlebar of the application shows the time it took to render the image in milliseconds. Please note that this is a very basic example application. The rendering is happening in the WM_PAINT handler directly, meaning it will redraw the entire fractal each time it needs to paint the window.

Parallel Programming in Visual C++ 2010 CTP

Other Concurrency Constructs

The native concurrency library also has some other high level constructs, such as parallel_for_each and parallel_invoke.

parallel_for_each

This construct can replace the std::for_each construct to iterate over a container in a parallel manner. An example could be a vector drawing application. The application might hold a vector of all objects that have to be rendered. When the application needs to render everything to the screen, it simply iterates over the vector and calls the Draw() functions on each object. For example:

The above defines a base class for renderable objects and a specific rectangle class as an example. A vector is filled with renderable objects; then, the for_each construct will call the Draw() method on each object. This is happening in a serial way. You can simply parallelize this as follows:

This example will launch two functions in parallel if you have more than one core. The first function will print the numbers 0 to 100 and waits 100 milliseconds between two numbers. The second function will print the numbers 0 to 10 and will wait one second between two numbers. When you execute the above application on a system with more than one core, you will see that the numbers will interleave each other. Note that I didn't include any synchronization, so some numbers might appear on the same line. The parallel_invoke function can accept from 2 up to 10 functions to run in parallel. parallel_for internally is using parallel_invoke to split its work.

Profiling Parallel Code

Visual Studio 2010 CTP contains a new profiling option to help you in profiling the concurrency of your application. It allows you to check CPU utilization, thread blocking, and core execution. Explaining this profiler is outside the scope of this article. You can find some more information regarding it at http://msdn.microsoft.com/en-us/magazine/cc817396.aspx.

About the Author

Marc Gregoire

Marc graduated from the Catholic University Leuven, Belgium, with a degree in "Burgerlijk ingenieur in de computer wetenschappen" (equivalent to Master of Science in Engineering in Computer Science) in 2003. In 2004 he got the cum laude degree of Master In Artificial Intelligence at the same university. In 2005 he started working for a big software consultancy company. His main expertise is C/C++ and specifically Microsoft VC++ and the MFC framework. Next to C/C++, he also likes C# and uses PHP for creating webpages. Besides his main interest for Windows development, he also has experience in developing C++ programs running 24x7 on Linux platforms and in developing critical 2G,3G software running on Solaris for big telecom operators.

Top White Papers and Webcasts

Live Event Date: March 19, 2015 @ 1:00 p.m. ET / 10:00 a.m. PT
The 2015 Enterprise Mobile Application Survey asked 250 mobility professionals what their biggest mobile challenges are, how many employees they are equipping with mobile apps, and their methods for driving value with mobility.
Join Dan Woods, Editor and CTO of CITO Research, and Alan Murray, SVP of Products at Apperian, as they break down the results of this survey and discuss how enterprises are using mobile application management and private …

On-demand Event
Event Date: February 12, 2015
The evolution of systems engineering with the SysML modeling language has resulted in improved requirements specification, better architectural definition, and better hand-off to downstream engineering. Agile methods have proven successful in the software domain, but how can these methods be applied to systems engineering? Check out this webcast and join Bruce Powel Douglass, author of Real-Time Agility, as he discusses how agile methods have had a tremendous …