.NET

PLINQ: Parallel Queries in .NET

By Donis Marshall, March 30, 2012

LINQ queries execute when you iterate over results, and they execute sequentially. With PLINQ, the iterations are performed in parallel, as tasks are scheduled on threads running in the .NET Framework 4 thread pool

PLINQ Operators and Methods

You can modify the behavior of a PLINQ query with a variety of clauses and methods that are actually extension methods of ParallelQuery<TSource>. Most of these are the same clauses and methods available to LINQ. You can use these operators either independently or together to affect the behavior of a PLINQ query. However, PLINQ also introduces some new constructs, which are introduced in this section.

Webcasts

You create a PLINQ query to parallelize your code. In most circumstances, the next step is to iterate the results by using a foreach or for method. At that time, the query is most likely performed by using deferred execution. The results are processed in iterations of the for­each loop. There is only one problem: The foreach loop is sequential. This is a classic "hurry-up-and-wait" scenario. After executing a PLINQ query, you might want to extend parallelism to handle the results in parallel as well.

LINQ's Parallel.ForEach method is useful for parallelizing the same operation over a collection of values. It would appear natural to adhere to the same model to process the results of a PLINQ query. PLINQ returns a ParallelQuery<TSource> type, which represents multiple streams of data. However, Parallel.ForEach expects a single stream of data, which is then parsed into multiple streams. For this reason, the Parallel.ForEach method must recognize and convert multistream input to a single stream. There is a performance cost for this conversion.

The solution is the ParallelQuery<TSource>.ForAll method. The ForAll method directly accepts multiple streams, so it avoids the overhead of the Parallel.ForEach method. Here is a prototype of the ForAll method. The first parameter is the target of the extension method, which is a ParallelQuery type. The last parameter is an Action delegate. For the Action delegate, you can use a delegate, a lambda expression, or even an anonymous method. The next element of the collection is passed as a parameter to the delegate.

Here is a short demonstration that illustrates how to use the ForAll operator. In this example, you will perform a parallel query on a string array and then select and display strings longer than two characters in length.

Perform a parallel query of a string array

1. Create a console application for C# in Visual Studio. In the Main method, define a string array.

string [] stringArray = { "A", "AB", "ABC", "ABCD" };

2. Perform a PLINQ query on the string array. Select strings with a length greater than two.

So far, we have used the AsParallel method to convert LINQ to PLINQ. It is a simple change to a LINQ query that alters the semantics completely.

A PLINQ query is not guaranteed to actually execute in parallel. Overhead from executing the parallel query in parallel, such as thread-related costs, synchronization, and the parallelization code, can exceed the performance gain. Determining the relative performance benefit of the PLINQ query is an inexact science based on several factors. Here are some of the considerations that might affect the performance of a PLINQ query:

Length of operations

Number of processor cores

Result type

Merge options

One of the biggest factors is the duration of the parallel operations, such as the Select clause. Dependencies and the synchronization that results from them adversely affect the performance of any parallel solution. Furthermore, shorter operations might not be worth parallelizing, because the associated overhead might exceed the duration of the operation. For small operations, you could change the chunking to improve the balance of execution to overhead. Custom partitioners, including those that change the chunk size, are an option.

The number of processor cores might affect the performance of your parallel application, including PLINQ. However, you should typically ignore the number of processor cores, because that's mostly beyond your control. Maintaining hardware independence in your application is important for both scalability and portability.

PLINQ does not consider all of the above factors when deciding to execute a query in parallel. Based on the shape of the query and the clauses used, PLINQ decides to execute a query either in parallel or sequentially. You can override this default by using the WithExecutionMode clause with the ParallelExecutionMode enumeration as a parameter. The two options are ParallelExecutionMode.ForceParallelism and ParallelExecutionMode.Default. Use the ParallelExecutionMode.ForceParallelism enumeration to require parallel execution.

The ParallelExecutionMode.Default value defers to PLINQ for the appropriate decision on the execution mode. Here is an example that forces a parallel PLINQ query.

How the result of your query expression is handled can also affect performance. For example, the following PLINQ query returns a List<T> type. Converting the PLINQ to a list requires that the results be buffered to return an entire list.

intArray.AsParallel()
.Where((value)=>value>5)
.ToList();

As mentioned, for the aforementioned code, the results are buffered. In some circumstances, PLINQ might buffer the results, but that is mostly transparent to your code.

Using the .NET Framework 4 thread pool, PLINQ uses multiple threads to execute the query in parallel. The results of these parallel operations are then merged back into the joining thread. The merge option describes the buffering used when merging results from the various threads.

Here are the merge options as defined in the ParallelMergeOptions enumeration:

NotBuffered: The results are not buffered. For operations such as the ForAll operation, NotBuffered is the default.

FullyBuffered: The results are fully buffered, which can delay receipt of the first result.

AutoBuffered: This option is similar to NotBuffered, except that the results are returned in chunks.

Default: The default is AutoBuffered.

You can override the default buffer preference with the WithMergeOptions operator.

Using AsSequential

The difference between PLINQ and LINQ starts with the AsParallel clause. As we've seen, converting from LINQ to PLINQ is often as simple as adding the AsParallel method to a LINQ query. Here is a basic LINQ query:

numbers.Select(/* selection */ )
.OrderBy( /* sort */ );

Here is a parallel version of the same query, with the required AsParallel method added.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!