Is using Parallel.For and Parallel.ForEach better suited to working with tasks that are ordered or unordered?

My reason for asking is that I recently updated a serial loop where a StringBuilder was being used to generate a SQL statement based on various parameters. The result was that the SQL was a bit jumbled up (to the point it contained syntax errors) in comparison to when using a standard foreach loop, therefore my gut feeling is that TPL is not suited to performing tasks where the data must appear in a particular order.

Question 2.

Does the TPL automatically make use of multicore architectures of must I provision anything prior to execution?

My reason for asking this relates back to an eariler question I asked relating to performance profiling of TPL operations. An answer to the question enlightened me to the fact that TPL is not always more efficient than a standard serial loop as the application may not have access to multiple cores, and therefore the overhead of creating additional threads and loops would create a performance decrease in comparison to a standard serial loop.

If the results must be ordered then for you to parallelize the loop you need to be able to do the actual work in any order and then sort the results. This may or may not be more efficient than doing the work serially in the first place, depending on the situation. If the benefit of parallelizing the work that can be done in any order is more than the cost of ordering the results then it's a net gain. If that task just isn't sufficiently complex, your hardware doesn't allow for a lot of parallelization, or if it doesn't parallelize well (i.e. you have a lot of waiting due to data dependencies) then sorting the results could take more time than you gain from parallelizing the loop (or worse yet, parallelizing the loop takes longer even without the sort, see question two) and so you shouldn't parallelize it.

Note that if the actual units of work need to be run in a certain order, rather than just needing the results in a certain order, then you either won't be able to parallelize it, or you won't be able to parallelize it nearly as effectively. If you don't properly synchronize access to share resources then you'll actually end up with the wrong result (as happened in your case). To that end you need to remember that performance optimizations are meaningless if you can't actually come up with the correct result.

You don't really need to worry much about your hardware with the TPL. You don't need to explicitly add or restrict tasks. While there are a few ways that you can, virtually anytime you do something like this you'll hurt performance. When you do stuff like that you're adding restrictions to the TPL so it can't do what it wants. Often it knows better than you.

You also touch on another point here, and that's that parallelizing a loop will often take longer than not (you just didn't give likely reasons to cause this behavior). Often the actual work that needs to be done is just very small, so small that the work of creating threads, managing them, dealing with context shifts and synchronizing data as needed can be more work than what you gain from doing some work in parallel. This is why it's important to actually do lots of testing when deciding to parallelize some work to ensure that it actually benefits from it.

It's not better or worse for unordered lists - your issue in #1 was that you had a shared dependency on a StringBuilder which is why the parallel query failed. TPL works wonderfully on independent units of work. Even then, there are simple tricks that you can use to force an evaluation of a parallel query and keep the original ordering of the results when the parallel operations have all finished.

TPL and PLINQ are technically distinct things; PLINQ uses TPL to accomplish it's goal. That said, PLINQ tries to inspect your architecture and structure the execution of the set as best it can. TPL is simply the wrapper around the task architecture. It's going to up to you to determine if the overhead of creating a task (which is something like 1MB of memory), and overhead of the context switching to execute the tasks is greater than simply running the task serially.

Creating a task is only "1 MB" of memory if the underlying thread pool decides to create a new thread, which it shouldn't do often (if at all) because as the name implies, it keeps a pool of threads ready. But creating a new task instance and that are very much unrelated and you shouldn't shy away from creating a new task.
–
JulianROct 31 '12 at 1:31

On point 1, if using TPL you don't know in which order which task is run. That's the beauty of parallel vs sequential. There are means to control the order of things but then you probably lose the benefits of parallel.

On 2: TPL makes use of multi-cores out of the box. But there is indeed always overhead in using multiple threads. Load on the scheduler increases, thread (context) switching is not for free. To keep data in sync to avoid race-conditions you'll probably need some locking mechanism that also adds to the overhead.

Crafting a fast parallel algorithm using TPL is made a lot easier but still some sort of art.