Concurrent Programming - A Primer

An overview of Microsoft's Parallel FX initiative, including the Task Parallel Library and PLINQ.

Introduction

I have always been interested in parallel computing, starting in my teens when I worked on a robotics application in a language called SAIL12 (huh, look at that, it was Bill Gates' favorite language). I even implemented an extension in 6502 assembly to Commodore PET Basic that supported concurrent subroutine execution. It was clear to me by the 1980's that concurrent programming was something we developers would have to get our heads wrapped around, and soon. Apparently, "soon" meant more than 25 years later, as there is finally a dawning in the mainstream (meaning Microsoft, quite frankly) world that concurrent programming is not only the next logical step in software development, but the next necessary step.

This article is essentially a formalized journal of my research into what's going on right now with Parallel LINQ (PLINQ) and Microsoft's Task Parallel Library (TPL). There is obviously a push to utilize the functionality, no pun intended, of LINQ and lambda expressions in C# 3.0 and .NET 3.5 to gain a foothold in concurrent programming. While there is a plethora of information on parallel computing across numerous languages, it is in my interest (and I suspect in this community's interest) to see what Microsoft is doing with regards to C# and .NET (and obviously, VB.NET as well), as this will affect us more directly than other work (past or present). So yes, this article is biased in that it focuses primarily on Microsoft technologies.

I've put together my notes into a hopefully readable article on the topic. There's lots of references at the end of the article for you to dig deeper, and I've quoted liberally from other people, as they, not I, are the experts. I discovered that concurrent programming, the Microsoft "way", involves some digging into not just the TPL but the foundations of LINQ, lambda expressions, and functional programming. This has been revealing in that I feel it has created a more complete picture of the different technologies involved in concurrent programming, how they interact with each other, and their strengths and weaknesses.

As a disclaimer, any errors in understanding these technologies are entirely mine, and please correct me if I've made wrong statements and conclusions.

What is Concurrent Programming?

The answer is obvious: executing multiple tasks (or work units) in parallel. While true parallelism doesn't occur on a single core computer (instead, the CPU switches between tasks), on systems with multi-CPUs (and multi-cores in each CPU), true parallelism is achieved. Nor is parallel computing confined to utilizing the cores on one physical machine. Distributed computing is a form of parallel computing, in which work units are distributed across numerous machines. However, distributed computing adds additional requirements to task management (namely task distribution), and is not discussed here.

The essence of concurrent programming involves two things: task management and communication. A task manager is necessary to distribute work units to available threads, and communication involves setting up the initial parameters for a task and obtaining the result of the task's work. It is this last aspect, task communication, that is the most difficult. This is where, by incorrect locking mechanisms, a developer can kill any performance gains, and even worse, create subtle bugs when multiple tasks attempt to change the same memory locations simultaneously, and when tasks deadlock, each waiting for the other to complete its work.

The issues of task communication are more broadly categorized as state and memory sharing issues. A typical solution to the synchronization issues involved with shared state and memory is to use locks, monitors, semaphores, and other techniques to block threads from altering state while one single thread makes changes. Synchronization is hard to test, and this is where bugs and performance problems develop. In Erlang (a functional language), synchronization is not even an issue. This is achieved by treating variables as immutable, and by messaging between threads, where the message is a copy of the sender thread's variables.

Another technique is transactional memory. Locking is a pessimistic approach--a lock assumes that the memory will be written to by someone else, who is also going to lock the memory. Transactional memory is optimistic: "...a thread completes modifications to shared memory without regard for what other threads might be doing, recording every read and write that it is performing in a log. Instead of placing the onus on the writer to make sure it does not adversely affect other operations in progress, it is placed on the reader, who, after completing an entire transaction, verifies that other threads have not concurrently made changes to memory that it accessed in the past."10

So, fundamentally, concurrent programming involves task management and some form of state management, be it synchronization techniques, messaging with serialization, or transactional memory. There may be other techniques for state management that I'm not aware of. However, as you can see, task management is well, task management. State management, ah, there's the rub!

What is Microsoft Doing?

F#

First off, F# doesn't directly have anything to do with concurrent programming. "F# is a typed functional programming language for the .NET framework"1. However, F# includes the concept of asynchronous workflows, primarily for "writing concurrent and reactive programs that perform asynchronous I/O where you don't want to block threads."2 The TPL will eventually be incorporated into F# "as a key underlying technology for F# asynchronous workflows."4 Also, as I discuss later, functional languages are, by their nature, well suited to concurrent programming, and so it is natural that the F# team is interested in parallel computing.

The Task Parallel Library (TPL)

"The Task Parallel Library (TPL) is designed to make it much easier to write managed code that can automatically use multiple processors. Using the library, you can conveniently express potential parallelism in existing sequential code, where the exposed parallel tasks will be run concurrently on all available processors." 3

What To Do, Not How

I've heard told, you can tell a man what to do or you can tell a man how to do it, but you better not tell a man what to do and how to do it. The TPL, via the vehicle of PLINQ and the TPL API, eliminates the "how to do it" part of your imperative code with regard to performing work in parallel. You express what to do as a work unit, and the TPL figures how to best do that work. PLINQ takes advantage of the TPL by taking the query iterations and designating them as work units for TPL to assign to threads (typically processor cores).

Parallel Task Basics

Let's take CPian livibetter's Mandelbrot Set with Smooth Drawing13 and parallelize it, demonstrating the Parallel.For loop. The Parallel.For loop is the simplest "structured" method in the Parallel class. But, if you download livibetter's code to do this exercise, make sure you:

Convert the solution to a Visual Studio 2008 solution

Change the "Ms" project's target framework to .NET Framework 3.5

Add a reference to System.Threading

You will, of course, also need to download and install the TPL CTP.

First, we have to make the code thread safe by removing side effects. The main computational loop looks like this, and I've added comments to the side-effect code:

If we parallelize the outer loop, it's obvious that each work unit needs its own instance of z; otherwise, each work unit will be manipulating the same instance, which will have severe consequences to the work each unit is doing. Second, the increment of z.Re by xStep cannot be done anymore. The loop will be parallelized against x, so z.Re will have to be calculated for each x rather than simply incremented. The resulting parallelized version looks like this (using a lambda expression as a delegate):

The program will now utilize the available cores to generate the Mandelbrot fractal. It definitely runs faster, and you can clearly see the CPU utilization go to 100%.

An interesting question is, why parallelize the outer loop rather than the inner loop? One answer is that you are then adding overhead to the TPL's task manager. The task manager currently splits up the work (p.Width work units) across each core, and each core works the entire p.Width iteration (the work unit). If you parallelize the inner loop, then each work unit is much smaller--the while loop that determines the number of iterations before z escapes--and the overhead in fetching the next work unit is higher (now p.Width * p.Height times, rather than just p.Width times), not to mention that the Parallel.For queues the work units p.Width times, rather than just once as happens when the outer loop is parallelized.

Other Parallel Task Concepts

If you read Optimize Managed Code for Multi-Core Machines3, you'll note it also discusses an Aggregate method. This has either been dropped, or did not make it into the CTP. Currently, another concept supported by the Parallel class is the Parallel.Do method. This method takes, as a parameter, an array of Action instances, and the task manager will execute each Action asynchronously. Similarly, the Parallel.ForEach method takes a single action on an enumerable data source, and potentially processes the action on each item in the data source in parallel.

Another concept is called "a future". "A future, which is a task that computes a result, is constructed not with a normal action, but with an action that returns a result. This result is a delegate with the Func<T> type, where T is the type of the future value. The result of the future is retrieved through the Value property. The Value property calls Wait internally to ensure that the task has completed and the result value has been computed."3

The Task Manager

There really isn't any better way of stating it than Daan Leijen and Judd Hall already have: "All tasks belong to a task manager, which, as the name implies, manages the tasks and oversees worker threads to execute tasks. While there is always a default task manager available, an application can also explicitly create a task manager."3

Exception Handling

With regards to exception handling, tasks that generate exceptions are propagated to the code that invokes the parallel processing of the task. To quote: "...the Parallel.For and Parallel.Do functions accumulate all exceptions thrown and are re-raised when all tasks complete. This ensures that exceptions are never lost and are properly propagated to dependents."3

What exactly is PFX? That's a bit hard to describe. TPL is said to "use the Parallel FX Library"3. I'm unclear as to what that means, and perhaps Microsoft is suffering from a bit of acronym dyslexia itself. So, the only reason I mention it here is that it seems to be the umbrella assembly/library/extension for PLINQ and the TPL.

The Parallel Language Integrated Query (PLINQ)

PLINQ is LINQ where the query is run in parallel. Converting a query from sequential to parallel execution is accomplished very easily, by adding "AsParallel()" to the data source. For example14:

"LINQ's set-at-a-time programming model for expressing computations places an emphasis on specifying what needs to get done instead of how it is to be done. Without LINQ, the how part would typically otherwise be expressed through the use of loops and intermediary data structures, but by encoding so much specific information, the compiler and runtime cannot parallelize as easily. LINQ's declarative nature, on the other hand, leaves the flexibility for a clever implementation like PLINQ to use parallelization to obtain the same results."14

State and Memory Sharing

Does it strike you that the efforts of TPL, PFX, and PLINQ are focused on parallel task processing whilst completely ignoring state and memory sharing issues? This certainly struck me. It seems to me that the cart is being put in front of the horse. So far, the TPL addresses parallel computing only within the context of loop parallelism where the tasks are essentially autonomous tasks. To really leverage parallel processing for real world tasks, in which tasks need to communicate with each other, it becomes essential that state and memory sharing in a concurrent environment are resolved. One approach to this (taken by Erlang, to mention one language) is to completely eliminate shared state (tasks get copies of objects so they don't have to set locks or deal with race conditions) and eliminate shared memory (copies of objects are passed between tasks rather than references, again eliminating the need for locks). Instead, messages are used to communicate between tasks, discussed next.

Messaging and Message Serialization

In Slava Akhmechet's blog on Erlang style concurrency15, he takes the reader through the process of redesigning Java with passing concurrency in order to eliminate the issues of state and memory sharing, thus eliminating synchronization issues. He does so by eliminating synchronization keywords and by instantiating objects on a heap specific to that thread, so that access to objects on another thread is simply not possible. Of course, threads do have to communicate with each other, and this is accomplished by sending messages. Each thread gets its own message queue, and blocks until there is a message to process. However, since the message sent by thread 1 will be placed in the queue for thread 2, thread 2 now has a reference to thread 1's heap, which is violation of the goal. So instead, the message is serialized. Slava Ahkmechet next points out a very interesting thing--because the message is serialized and the send/receive messages implement an interface, we now have the ability to distribute work not only across cores, but across machines. However, this opens another can of worms--determining if the receiver actually received the message.

Drawbacks

One significant drawback of this approach is the performance hit you take when serializing an object to be passed between tasks. Since the target task receives a copy, it must be serialized by the sender, and the copy de-serialized by the receiver. I can imagine major performance problems if you have extensive communication between tasks using non-trivial (basically, non-value type) objects.

Software Transactional Memory

Another approach that appears to be on the radar is STM. "In computer science, Software Transactional Memory (STM) is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. It functions as an alternative to lock-based synchronization, and is typically implemented in a lock-free way."10 In the Channel 9 video5, the interviewer asks about "transactional task management", which is referring to software transactional memory. As Anders Hejlsberg points out, this is a Microsoft research project. See Further Reading for links.

Considerations

Task Management

Not all tasks take up 100% of a CPU's cycles; therefore, it would be useful to be able to organize tasks in a manner that gives the developer control over the thread pool size for a particular category of tasks. Or, conversely, the underlying task manager should be capable of assigning additional threads, realizing that the processors are underutilized. With the TPL, an application can create its own task manager. As stated: "...you might want to use multiple task managers, where each has a different concurrency level or each handles a separate set of tasks."3

Debugging

It's not hard to imagine the complexity in debugging an application with numerous concurrent tasks. As stated: "Another important use of the task manager interface is to run all code sequentially using a single worker thread."3 While this obviously doesn't solve concurrency related bugs, it does provide a mechanism to debug your application in a sequential execution mode.

Also, with a concurrency model that supports messaging, debugging is facilitated by being able to audit the messages.

Side Effect Free

Anders Hejlsberg makes the following interesting statement regarding PLINQ: "...we can be much smarter about how to run these queries. We can make, maybe, some simple assumptions about whatever functions you call in here are pure and side effect free..."5 The key phrase here is "side effect free". This is the crux of concurrent programming: tasks must be side effect free. One mechanism to achieve this is, of course, the stateless, no shared memory paradigm used by Erlang and others. However, I would be hard pressed to say that making these assumptions is anything but "simple".

To reiterate, as Joe Duffy states: "If you can express your program side-effect free in a mostly functional manner using PLINQ..."5 There is a lot of discussion that TPL makes it easy to write concurrent applications; however, the hard work, making your tasks side-effect free, is apparently being largely ignored.

Confession time: "This library doesn't change the way you synchronize access to data or the way you transact it...but when it comes to access to shared resources or shared data, you still have locks, monitors, or whatever." (Anders Hejlsberg)5 Parallel task management is the simple side of the concurrent programming equation, and as Anders Hejlsberg says, "while the debate rages" on how to manage shared memory and state, the TPL simplifies task management. Unfortunately, the developer is left holding the bag in the real world use cases of the TPL, where shared memory and state is a consideration.

Does Concurrent Programming Require Functional Languages?

When asked if one of the reasons LINQ can take advantage of parallelism, Anders Hejlsberg replies: "It's because of the more declarative or functional style of writing programs and queries instead of the more traditional, imperative statement-like way of doing it."5 How does a more declarative style of writing programs facilitate parallelism? This quote on "Declarative Concurrency Permits the Runtime to Optimize" helps to explain the rationale:

"Serializing access to data with locks involves trading correctness for parallelism and scalability. Clever locking schemes can be used to reap the benefits of both, but require complex code and engineering. If you're crafting a highly reusable, scientific-quality library, this amount of work might be par for the course. But, if you're writing an application, the tax is simply too high.

Locks employ a technique known in database circles as pessimistic concurrency, while transactional memory prefers optimistic concurrency. The former prohibits concurrent access to data structures while the transaction executes, while the latter detects (and sometimes even tolerates) problematic conflicts at commit time. If two concurrent tasks work on the same data structure and do not conflict, for example, serialization is not necessary; yet, a naïve lock-based algorithm might impose it. Permitting the transaction manager to choose an appropriate execution strategy using runtime heuristics can lead to highly scalable code.

Moreover, in-memory transactions can eliminate hard-to-debug "heisenbugs" such as deadlocks, which often result in program hangs and are hard to provoke, and priority inversion and lock convoys, both of which can lead to fairness and scalability problems. Because the transaction manager controls locking details, it can ensure fair and deadlock-free commit protocols."11

However, when trying to understand why functional languages are well suited for concurrent programming, the primary reason I've found is that state is immutable. This becomes clearer in Chris Sells' Functional Language Summary6, to quote:

All "variables" are immutable, often called "symbols"

Program state is kept in functions (specifically, arguments passed on the stack to functions), not variables

The second point, that state is managed on the stack and not in variables, is the key to why functional programming is well suited for parallel computing. To quote again from Chris Sells:

Functions cannot cause side effects ("variables" are immutable) ...

No need for multi-threaded locks, as state is immutable

This makes functional programs automatically parallelizable

Ah ha! So, this is a much more definite answer as to the advantages of functional languages with regards to concurrent programming. And, since LINQ utilizes lambda calculus as the underlying mechanism in its query expressions, and "Lambda calculus provides a theoretical framework for describing functions and their evaluation"7, we have an association between LINQ and its application in concurrent programming.

So, does concurrent programming require a functional language? No, of course, not. But, it is greatly facilitated by the immutable state implicit in a functional language. In other words, the developer doesn't have to worry about locks, semaphores, etc., which is required in concurrent programming in which state is shared.

But is LINQ Truly Functional?

No. "The LINQ query language uses "lambda expressions", an idea originating in functional programming languages such as LISP - a lambda expression defines an unnamed function (and lambda expressions will also be a feature of the next C++ standard). In LINQ, lambdas are passed as arguments to its operators (such as Where, OrderBy, and Select) - you can also use named methods or anonymous methods similarly - and are fragments of code much like delegates, which act as filters."9 The caveat is that C#'s lambda expressions (and therefore LINQ) do not implement true functional programming; any lambda expression can access mutable variables outside of its function, and program state is still kept outside of the functions themselves. Therefore, the programmer must still be diligent in writing code with a lambda syntax that is thread safe, and this also means writing LINQ statements that are thread safe. For example8:

int i = 0;
var q = from n in numbers select ++i;

is not thread safe. As the PFX blog points out, "There is a set of 101 LINQ samples available at MSDN. Unfortunately, many of these samples rely on implementation details and behaviors that don't necessarily hold when moving to a parallel model like the one employed by PLINQ. In fact, some of them are dangerous when it comes to PLINQ."8

Work Stealing

Threads can steal work assigned to other threads. This helps distribute the work units across free cores/threads when units of work result in unbalanced thread usage. The intent of the TPL is to support this capability.

Conclusion

The .NET Parallel class definitely eases the development of threaded applications from the perspective of work unit management. State and shared memory issues remain for the programmer to solve. Adding concurrent programming capabilities to LINQ is a natural extension of LINQ, and should prove to be valuable in a variety of scenarios. In the end, the PFX, if I'm using the term correctly, should help in leveraging multi-core systems. I will be interested to see if the architecture Microsoft is putting together can be extended further into true distributed computing. I will also be interested to see what sort of state and shared memory solutions can work with PLINQ and TPL. Clearly, we are on the edge of another series of technological advances, this time in concurrent programming.

Personal Observations

In watching the videos with Anders Hejlsberg, I can't help but feel disappointed that there isn't any mention of Erlang. In fact, there seems to be an avoidance of the whole issue of mutability and state management. Granted, Anders Hejlsberg has a "we must be honest about this" statement when he points out that the TPL doesn't change the way you have to deal with synchronization. It is clear to me that he and Joe Duffy don't want to touch this elephant as the rest of the world "rages on in the debate" on how to deal with synchronization. Fair enough, the TPL certainly does make it easier to add parallelism to well defined work units. Still, I'm disappointed that the real meat of the issue, synchronization, has been pushed aside. I tend to draw the conclusion that, because .NET's lambda expressions aren't true functional programming, the .NET languages will never be capable of truly eliminating the developer's work (and therefore bugs) in dealing with synchronization in the way that languages like Erlang achieve success in resolving the synchronization issue.

Comments and Discussions

Great intro into concurrent programming. It would be better if you included my favourite framework from Microsoft, CCR (Concurrency and Coordination Runtime).
http://msdn.microsoft.com/en-us/library/bb648752.aspx

It is nice to see someone who writes much better than me do this. I would have (and did in a proposal to the AF a few months back) break it into 5 divisions rather than two. Certainly I can see how those issues each fit into your two. Absolutely well done.

_________________________
Asu no koto o ieba, tenjo de nezumi ga warau.
Talk about things of tomorrow and the mice in the ceiling laugh. (Japanese Proverb)
John Andrew Holmes "It is well to remember that the entire universe, with one trifling exception, is composed of others."

I was exited when Microsoft announced their efforts towards parallel programming, mainly TPL. Nevertheless, in the last several months, there were not too many articles (according to my wish and expectations) on this topic.

This nicely written, easily readable article came just in time! And besides doing a critical analysis of what Microsoft offers, it also presents a wider perspective of the subject by discussing non-Microsoft technologies.

Developers have been writing multithreaded apps for years by letting IIS handle the task management and normally using a database for communication. I know this kind of concurrency, where separate tasks are executed in parallel, is a lot easier than splitting up each task.

I wonder how common is the requirement to split up each task? Processors are so fast now that most of the time, you are waiting for disk or network access. Even main memory is now painfully slow in comparison.

The other option is to optimise your app, which seems to be a lost art since .NET moved us a step further from the hardware. However, this can often produce order-of-magnitude increases in speed. Of course, on the other hand, 10-processor boxes are available. But they tend to be used to run highly-tuned, expensive-to-write, server software. What is the gain to cost ratio for a typical desktop database app?

Since dual-core processors became mainstream, concurrency has become a buzzword that is too often presented as a panacea. There are cases where multiple threads help, but for example in a WCF server, they don’t help by increasing the performance of individual tasks, but by providing scalability and encapsulation.

I think my point is that splitting up a loop is not such a great improvement in most cases. And really, writing a basic multi-threaded desktop app with existing technology is not that difficult. With all the unaddressed issues, I think these new tools have a long way to go before it's economic to multi-thread everything!

Very common. Besides the visual rendering of data, which is often easily parallized and has applications ranging from scientific to gaming, there's a variety of task-based analysis that is ripe for multithreading. I worked on a complex switch ring analysis program a few years ago that was obviously a candidate for running on multiple cores. In fact, at the time I was investigating distributed processing because multiple cores wasn't an option. And believe me, I worked very hard at optimizing not only the app, but the underlying logic in the analysis, which was actually highly productive and resulted in a wealth of rules that engineers could apply when creating these switch rings.

Nick Butler wrote:

I think my point is that splitting up a loop is not such a great improvement in most cases.

Well, look at the simple example in my article--splitting up the loop in rendering the fractal resulted in almost doubling the rendering. If I had a quad or 8 core system, wow. And that' just for doing a fractal. Imagine distributed computing examples, like visualizing protein folds, that can also take advantage of all the cores on the computer as well.

Security (code breaking) is another great application for parallel computing.

Firstly, thanks again for this article: it’s a great summary of the current position, and has really made me question things.

I’ve thought back through every commercial project I’ve worked on and had to go back ten years to 1998 to find one where the type of multi-threading promoted by PFX would have helped. Maybe I’m not representative, but I have to say that’s not “very common” for me.

That’s not to say I don’t use threads; in fact the opposite is true: I use them all the time. My last project needed to send emails and the naive implementation made the UI thread wait while the emails were sent to the SMTP servers. This is a good candidate for a new thread, especially since the app didn’t care if the emails were actually sent ( in fact, it couldn’t care because email is not a reliable protocol ).

I spent an hour yesterday optimising the original Mandelbrot program and managed to increase time performance by a factor of between about 2.5 and 6, depending on MaxIterations. I have posted my changes on the article’s message board. I had hoped for a factor of 10, but I ran out of time and so couldn’t try changing the algorithm.

Even assuming you have a computationally-intensive problem that has been optimised and is still is not performant enough, I still don’t see that there are many scenarios where just adding .AsParallel is going to be sufficient. Introducing concurrency inevitably introduces overheads which can easily negate any increases in time performance, especially when the units of work are small. As far as I can see, managing this complexity is not handled by PFX and is left to the developer ( although Daniel Moth does mention some thread-safe collections in one of his videos ).

Marc Clifton wrote:

If I had a quad or 8 core system, wow.

This is insightful. On my dual-core, Vista box the system uses about 15% of available processor when “idle”. A single thread will use 50%, leaving only 35% free. Is it worth the bother to factor out enough of the overheads of multi-threading to make this actually worthwhile? I think not in most cases, on a dual-core machine. Things will change when everybody has many-core machines, but this is away in the future. Perhaps this is why PFX is still a “research” project at MS.

Lastly, another separate thought springs to mind. I have been using multi-proc machines since 1995; not to improve performance of individual applications, but to keep my machines responsive while a single-threaded app churns away in the background ( for example, doing a C++ build ). An app using .AsParrallel will take over my machine and make it unusable for a period of time. This takes us back to the bad old days of non-pre-emptive multitasking where one badly behaved app can take over my system.

That's an excellent point. Many apps I've written are multithreaded (obviously the server ones are), but yes, it is rarer to encounter applications that could benefit from parallel algorithms, however, because of my work experience, I've encountered a few.

Nick Butler wrote:

I spent an hour yesterday optimising the original Mandelbrot program and managed to increase time performance by a factor of between about 2.5 and 6, depending on MaxIterations.

Another excellent point--reaching for Parallel.For is like fixing a broken leg with Tylenol. And it mirrors my experience, that the key to performance improvement is algorithmic, not concurrency. In fact, if you read my second article, I was amazed that the performance with 2 threads was worse than with a single thread, even though I can clearly see that the CPU utilization is much less for a single thread, and 100% using PFX.

Nick Butler wrote:

Introducing concurrency inevitably introduces overheads which can easily negate any increases in time performance, especially when the units of work are small.

Absolutely.

Nick Butler wrote:

As far as I can see, managing this complexity is not handled by PFX and is left to the developer

Sadly true. But worse, is that there's no real discussion going on regarding your points above--optimizing algorithms, testing if there really is a performance gain, etc.

Nick Butler wrote:

Is it worth the bother to factor out enough of the overheads of multi-threading to make this actually worthwhile?

On an algorithm I was working on to analyze switch rings for satellites, which could take hours to days to determine redundancy, 35% would be quite meaningful.

Nick Butler wrote:

I have been using multi-proc machines since 1995; not to improve performance of individual applications, but to keep my machines responsive while a single-threaded app churns away in the background ( for example, doing a C++ build ).

Yes, another excellent point. The OS itself takes advantage of the cores to allow apps to run responsively, the kernel, etc.

In conclusion, your post is so excellent, I would actually like to add it directly to the article itself. That OK with you?

_________________________
Asu no koto o ieba, tenjo de nezumi ga warau.
Talk about things of tomorrow and the mice in the ceiling laugh. (Japanese Proverb)
John Andrew Holmes "It is well to remember that the entire universe, with one trifling exception, is composed of others."

I just voted you up Nick (your comment was rated as 1)... you have put some effort to write that long comment. It clearly show your true interest. I don't understand why this forum is always voting down the ones with different ideas. I think we should encourage them. But should attack their points, when wronf. The effort one put to write a comment should still be highly valued.. you, here Nick has not written something bad.. your points may be wrong, but it is not a point to vote one down..

I know that the common practice of this forum is to vote a one down when posting a wrong idea or posting something dirty, second is fine but I doubt about the first..

Marc - this is another top notch article. Your exploration of the whole parallel area that is coming out of Microsoft is excellent. I look forward to seeing practical problems being solved by these technologies. Now, it's up to me to sit down and actually take the time to think about these things.

I am disappointed though that it doesn't prevent us from still having to manage issues such as locks. It's a missed opportunity. Anyway, this article deserves a 5 for being such a thorough overview.

Although I'm sure parallel is important to some fields, to mine I have enough performance already (as in blink and you'll miss it). I just don't get why business processes, which are inherently synchronous in nature, need to go "multi-core".

Nice article, btw, Marc - got my 5!

"On one of my cards it said I had to find temperatures lower than -8. The numbers I uncovered were -6 and -7 so I thought I had won, and so did the woman in the shop. But when she scanned the card the machine said I hadn't.

"I phoned Camelot and they fobbed me off with some story that -6 is higher - not lower - than -8 but I'm not having it."
-Tina Farrell, a 23 year old thicky from Levenshulme, Manchester.

I just don't get why business processes, which are inherently synchronous in nature, need to go "multi-core".

Well, off the cuff, the workflow of a single document in a business process is, yes, sequential. However, multiple documents in that same workflow can be done in parallel. Imagine what would happen if all credit card transactions were done sequentially, respective to each other.

For some reason I went all boss-eyed when I read this bit, and managed to open a whole new door of understanding... the proverbial light bulb switched on!

I read it as Paralell Workflow and everything clicked into place. To take your credit card example and twist it a bit:

I own a shop which is very popular, but for most of the year only requires one till/credit card processing facility. At Christmas the shop becomes very busy and long queues at the till form. Now I don't want to lose these sales because people can't be bothered to wait in line, so I have to find a way of speeding up the payment process - so what do I do?

I install another till/credit card processor and employ a temporary employee to operate it. In effect I create a complete copy of the first till (like you would in Functional programming), which operates independently of it's source and because neither till share any state they both operate completely independently of each other.

"On one of my cards it said I had to find temperatures lower than -8. The numbers I uncovered were -6 and -7 so I thought I had won, and so did the woman in the shop. But when she scanned the card the machine said I hadn't.

"I phoned Camelot and they fobbed me off with some story that -6 is higher - not lower - than -8 but I'm not having it."
-Tina Farrell, a 23 year old thicky from Levenshulme, Manchester.

Great work Marc. I gave it a quick once over, and you've pretty much conglomerated a thousand resources into one digestible document.

Functional Programming is something I enjoy looking into both from an Academic point of view and also from a Practical one.

I'll be giving this article a more thorough read later. I just dropped and started reading.

Though I have to say I'm surprised you have not mentioned other functional programming languages of note such as CAML/OCAML/Haskell and others. Indeed, a performance comparison both development and run might be a good metric in proving the validity of the functional model with respect to the (now) traditional structural model

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning." - Rick Cook

"There is no wealth like knowledge, no poverty like ignorance." Ali ibn Abi Talib

Though I have to say I'm surprised you have not mentioned other functional programming languages of note such as CAML/OCAML/Haskell and others.

One of the Microsoft Research links I provide discusses them, but quite frankly, if you're wondering what sort of time warp engine I use to write these things, well, I really have to put a lid on scope creep. There is so much stuff I don't have time for to do it justice.

Hey, no worries. This article reads beautifully Marc. I'm giving it the thorough read now, and I'll probably plow through it in 6 hours or so when I wake up. I'll be updating my comments then.

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning." - Rick Cook

"There is no wealth like knowledge, no poverty like ignorance." Ali ibn Abi Talib