This book describes patterns for parallel programming, with code examples, that use the new parallel programming support in Visual C++. This support is commonly referred to as the Parallel Patterns Library (PPL). There is also an example of how to use the Asynchronous Agents Library in conjunction with the PPL…

The CPU meter shows the problem. One core is running at 100 percent, but all the other cores are idle. Your application is CPU-bound, but you are using only a fraction of the computing power of your multicore system. What next?

The Dataflow Network pattern decomposes computation into cooperating asynchronous components that communicate by sending and receiving messages. Buffering the messages allows concurrency. There are a variety of techniques for implementing dataflow networks. The techniques described in this chapter involve the use of in-process messaging blocks and asynchronous agents, which are provided by the Asynchronous Agents library.

There are also samples for Visual Studio 2010 to go with the chapters. You can find all of this on project downloads page.

Much of the introductory material is similar to the .NET book, this is intentional. The patterns are the same but how they are implemented using the PPL and Async Agents Library is not.

We’d love to heard your feedback on this draft material. If you have time to read the chapters or look at the code and post comments on the CodePlex site that would be great. All feedback is read and taken into account as we shape the material.

6 Responses to “Parallel Programming with Microsoft Visual C++”

I am wondering if you could some benchmark statistics about your examples, such as how well they are divided among multi-threads/cores, which would be an intuitive way of seeing the powers of the libraries.

I am also interested in seeing how a comparison between VC++ libraries and TBB and other open source libraries, not about being better or worse, but the features of each.

Just gave the C++ Parallel Patterns Library a whirl and with a Core2 660 @2.4 Ghz and did a hmac task in 78 seconds with PPL. It took 190 seconds serially.

I’m just starting to learn PPL, so I’ll try to give some feedback on the chapters as I get a chance to read them.

Here is some feedback on the introduction:

Figure 1: “Parallel programming patterns” is helpful once you understand the terms. The table on page 17 is even more helpful, and might be better placed earlier. It takes the reader from the characteristics of their algorithm to the pattern and chapter.

—

This paragraph seems muddled to me:
“Another advantage to grouping work into larger and fewer tasks is that such tasks are often more independent of each other than smaller but more numerous tasks. Larger tasks are less likely than smaller tasks to share local variables or fields. Unfortunately, in applications that rely on large mutable object graphs, such as applications that expose a large object model with many public classes, methods, and properties, the opposite may be true. In these cases, the larger the task, the more chance there is for unexpected sharing of data or other side effects.”

The guidance should be to build independent tasks large enough to make the overhead unimportant, I think. Not much point in telling the user that sometimes large tasks have dependencies, and sometimes they don’t.
—

Amdahl’s law can be concisely explain with a this equation with linear speedup:

total time= serial time + ( parallel time / processors )

As the number of processors goes to infinity, the serial time dominates.

The speedup is time for N processors divided by time for one processor, so for maximum N,

speedup is (s + p) / s

which is precisely the inverse of the fraction of time spent in serial code.

So why did my program actually ran more than twice as fast with PPL on two cores than as straight C++ code? Looking at the task manager, I could see that both cores were actually running at about 50% with the straight C++ code. Since I used some ncrypt library calls, ncrypt must have been dividing the executions between the two cores. My guess is that the overhead in this process caused the serial program to run just a bit slower than it would have if it ran on one core. Both cores ran at 100% with PPL.

ARCHIVE

ABOUT ME

DISCLAIMER

The content of this site is my own personal opinion and does not in any way represent my employer, it's subsideries or affiliates. These postings are provided "AS IS" with no warranties, and confer no rights.