Task Parallel and Data Parallel Defined

Resources at the end of the article abound with Task and
Data Parallelism definitions. Essentially, Task Parallelism is
collaboratively running parallel work. The Data Parallelism definition
is very similar with some seemingly subtle differences. Data Parallel usually
operates on a Collection and normally the same Action is applied to each member
in the collection.

Data Parallel data structures are built on top of the TPL Task
Parallel data structures. It's a mistake to look at Data Parallel in TPL
without some understanding of the Task Parallel components underpinning TPL
Data Parallel. Also, it is difficult to grasp where an application can benefit
from Data Parallel without seeing how Data Parallel classes can be substituted
for basic TPL data structures.

Migrating code implemented with basic TPL data structures to
a TPL Data Parallel implementation is the best way to understand where and how
to apply Data Parallel.

Basic TPL Sample

Following the definition above; not all parallel
problems fit the Data Parallel mode. As stated earlier Data Parallel usually
operates on a Collection and executes the same action on each member in the
collection.

While the sample code above compiles and runs there
are inefficiencies.

The biggest inefficiency is the code visits each
member in the array and allocates a separate Task for each member. While
allocating a Task for each member in the collection may seem sensible, consider
that the number of Cores on the machine roughly equates to the number of
concurrently executing Tasks.

Since the operation is the same on each collection
member, a better approach is to partition the data according to the number of
cores and allocate separate partitions to each Core. This would reduce the Task
allocation overhead and the need to visit each array member.

A Partitioning Improvement

One partitioning method would involve creating a
separate array of indexes and, assuming a two core machine, operating two tasks
each on a separate index partition. The code would resemble something like the
sample below.

Since Partitioning and Task allocation would be
identical whether partitioning a string collection like in the sample above or
some other data structure, a developer could construct some set of classes to
perform the partitioning on any Collection type.

However, it would be easier to just use the Data
Parallel data structures built into TPL and let TPL Data Parallel handle all
the plumbing code.

Moving to TPL Data Parallel

The sample code below applies the Data Parallel data
structures to the same workload as the code above.

Aside from being more compact, the sample code above is more
readable than the code samples shown earlier in the article. In fact, the code
is virtually identical to a traditional For loop. While resembling a For loop,
the code above works nothing like a traditional For loop. Execution behavior
more closely resembles the second Task based sample above.

Another major difference between the Data Parallel sample
and the samples above is use of the iterator index value. When utilizing a
llamda; rather than passing a value to the delegate most developers prefer a
Closure.

Looking at the earlier Task sample, instead of an Action
with an object parameter; why didn't the earlier samples reference the iterator
value? Because the Task executes after the loop has terminated, at the time the
Task executes the loop iterator will most likely be at its terminal value. In
the samples above the terminal value is 9999.

TPL Data Parallel is simply a cleaner way to do Data
Parallelism.

One major Data Parallel caveat is: like all parallel code,
concurrency friendly code comes with an overhead price.

Parallel != Better

Parallel code is not necessarily faster code. There is
overhead in the Data Parallel partitioning and in allocating the necessary TPL
data structures. The code below performs equally well or better than all of the
samples above.

Conclusion

Data Parallelism and Task Parallelism are common Parallel
Computing forms. .NET Framework Task Parallel Library (TPL) supports both
parallelism forms. TPL Data Parallelism is built on the TPL Task Parallelism
data structures. Though TPL Data Parallelism may often resemble traditional
looping, Data Parallelism is still concurrent code and care must always be
taken with concurrent code.

About the Author

Jeffrey Juday

Jeff is a software developer specializing in enterprise application integration solutions utilizing BizTalk, SharePoint, WCF, WF, and SQL Server.
Jeff has been developing software with Microsoft tools for more than 15 years in a variety of industries including: military, manufacturing, financial services, management consulting, and computer security.
Jeff is a Microsoft BizTalk MVP.
Jeff spends his spare time with his wife Sherrill and daughter Alexandra.

Advertiser Disclosure:
Some of the products that appear on this site are from companies from which QuinStreet receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. QuinStreet does not include all companies or all types of products available in the marketplace.

Thanks for your registration, follow us on our social networks to keep up-to-date