File size

File size

File size

File size

File size

File size

209.0 B

General purpose thread pools are more complicated to get right than you may think. In CLR 4 (the next version of the VM that powers .NET), the thread pool has made some significant advances in performance and support for concurrency and parallelism.

Since V1, .NET programmers have been afforded the luxury of an automatic queue-dequeue-execute-from-the-queue thread management infrastructure inside the CLR. This is .NET's Thread Pool.

As expected, the CLR's thread pool has improved with each iteration of the CLR (hey, V1 was, well, V1...). The goal has always been efficient, reliable, performant thread management. With CLR 4, the team that designs and implements the thread pool, have made
some truly compelling changes, which should add up to a very solid thread pool shipping with CLR 4. One of the big changes is the addition of
thread-stealing algorithms to support concurrency and parallelism. Indeed, CLR 4 has native support for the Parallel Computing Platform's Parallel Extensions for .NET. What does this mean, exactly? How does it work, exactly? What else is new in CLR 4's
thread pool?

Meet developer Eric Eilebrecht and program manager Erika Parsons. Eric helped implement the thread pool (he's been doing this for multiple versions, actually). Erika, as PMs do, helped design the thread pool and ensured that the design and implementation meets
the needs expressed by customers who rely on the thread pool.

Tune in. Lots to learn. You'll be impressed both by the enhancements and direction set forth for the future in CLR 4's thread pool.

Eric has some great blog posts on the new addtions to the thread pool in CLR 4 that will be very useful for
expanding on the knowledge you gain from this conversation.

I'm pretty sure that he is running windows 7. Sorry for this "off topic" stuff...

Some time ago you were talking to a kernel developer and if i remember right one of the things he was talking about was the thread sheduling in user mode. The threads are managed in user mode so there is no context switch to the kernel (better performance).
So i think this is very much related to the work that the CLR team has done, right?

Well, no. The
User Mode Scheduler in Windows 7 (this is what you're talking about) was written by Windows and the PCP (Parallel Computing Platform) teams. Now, the ConCRT (Concurrency Runtime (provides native APIs) - also from PCP team) is written on top of UMS. The
CLR's thread pool implementation is not...

"But would it make sense that the CLR take use of the UMS on windows 7?"

Not so much. Because if you can stay in managed code, you don't have to context switch into native - which saves a lot of time and work. So keeping all locks, queues, and threads pools (as much as possible) in managed code is more or less more efficient.

I think it will turn out that we don't need more threading abstractions, but less. In fact we need as close to zero as we can get. In most code, it turns out that blocking on IO is the major need to spin up new threads so you can wait on 1 thing and continue
to do something else. If you can remove blocking, or at least make it appear to be gone, you can remove a lot of this. Take for example and common server app. You block waiting for a connection, then get the request, then block on some other IO during processing,
then send a reply. During this cycle, your doing a lot of thread management to keep stuff lively and also not making a thread per request. However a thread per request is exactly how you really want to program because its easier.

This may be a language issue, but why can blocking and callbacks and delegates/lambdas be further abstracted away? Take a simple read socket, write to disk, and write ok back to client:

So in todays world, we block on read, block on file write and block on socket write. So we normally can't write it this way because we are tying up a thread for each client request. So we fall back to async IO, which adds tons of complication and mess.
Now tools such as TPL and CCR try to address this and make it easier, but the model is still overly complex. We need a new model. Why can't the Read above, fire off the request and transparently (to me) return the now blocking thread to do other work for
others and come back with result when the request is done (kinda coroutineish) - maybe the returning thread is different, we should not need to know or care? Same with other methods above. So everything is actually async, but looks and feels sync. No callback
blocks or other such goo. The runtime handles this in the background. It does require that all thread depency code needs to be removed or abstracted for us in the language and BCL. Programmer should not have to think about that anyway and should be abstracted.
Couldn't this work?

Why can't the Read above, fire off the request and transparently (to me) return the now blocking thread to do other work for others and come back with result when the request is done (kinda coroutineish)

Isn't it what it actually does? When a thread blocks on an I/O, its CPU time slice is preempted by the OS scheduler and another thread immediately gets to run. It's exactly the behavior you described, except the thread object isn't reused for other work
(which is hard because a thread has a lot of context... the stack to start with), but another thread can run. So no CPU time is "lost".

The point is that this situation doesn't create parallelism. If your application wasn't written with some multithreading in mind, it may well have no work to do during the blocking call. This is something a compiler can't invent, you have to express your
parallelism or "tasks" to some degree. For example most app today are written in a purely single threaded way, so even if the scheduler wanted to re-use your thread during the blocking call, what task would it use for?

Another point is that the thread is actually *blocked*. Even if you have more work going on other threads, this may be bad... E.g. if this is your UI thread.

Overall I think that the Tasks concept is a good move. It's a bit like LINQ: don't say how to do something, describe the result you want and let the "black box" operate. The applications I am working on could easily benefit from throwing tasks at the runtime
on multicore machines.

Of course, the hard problem which remains to be solved is how to handle concurrency when there is shared state (and there always is some). Correctness and performance are hard and I would love to see some simplifications in that area (the concurent collections
and other primitives in .NET 4 being a good start, but I have the feeling this is not enough to ensure easy and safe multithreaded development).

"When a thread blocks on an I/O, its CPU time slice is preempted by the OS scheduler and another thread immediately gets to run."

That is true. But my suggestion is for another reason. If you doing a server for example, you can't write code like that because its not just that your blocking threads - it that you could be using and blocking 1000s of threads - which is one of the problems
Eric described. The work around for this issue has typically been async methods and we know this is a hard way to program. My thought experiment is to gain both natural async behavior with a sync look and allow the blocked thread to return to the pool and
the callback is done internally to continue executing the next step on a same thread or another from the pool. With the CCR for example, you can acomplish this with the Iterator arbiter, but that also requires some code goo that is not that natural (but probably
better then alternatives to date.) Basically all I am suggesting is the same idea, but with more language sugar that makes it look more sync (without any delegates) and has the same goal of not blocking any threads.

@Charles: I've read a bit about Axum, but didn't check it in depth yet. Indeed I think we'll probably have to introduce new concepts into the languages to be able to safely and easily code multithreaded applications.

@staceyw: indeed threads are an important resource, if you block 1000s of them you need a better way to handle concurrency. You are totally correct: currently the solution is asynchronous programming and current languages make it unnecessarily hard.

Did you check the F# asynchronous features? It's very similar to the idea you described. Basically you write your code inside an async {} block, and you can write everything just like the usual synchronous code (with the exception of adding an exclamation
mark (!) before asynchronous calls). The language takes care of all the details. It's a very nice feature.

I think I can hep peuple with my private solution; think of it, my framework uses only several BCL classes, allow us delivers the full power of mult-core multi-threading apps directly from
.NET 3.5

I used only these BCL classes: AutoResetEvent, Monitor,
ThreadPool and WaitHandle. Framework is designed for both parallelisms: vertical parallalism (number of concurrent items per processor) and horisontal parallelism (number of processors consumed).

It is so simple, that programmer must care about olny about how implement parallelism on a given algorithm using parallel work items. A programmer needs to implement single work item for the algorithm itself and generic type of data used in this work item.

By default, the queue engine tries to scale algorithm horisontally, processing work items to fit the number of virtual cores (Environment.ProcessorCount) in a one single thread, then if it is succeded (so overall number of estimate parallel work item tasks is greater
ot equal to the required number of processors), it scales vertivally, in a way, when th code with the least number of tasks gets the most of the priority in allocation nex work item, so all cores are used virtually equally, depending to the algorithm. But
nothing stops to cutomize the algorithm to run 85% of ("cheap") of parallel work items on a single core, wile other, 15% ("hard") of work items (for example if you have a 8 core i7), on all other processor cores, and, beleive me, it is very easy.

It is about a couple of kilobytes long and several lines oof code, extremely easy to read and understand. I used
PEX, CodeContracts. So it really makes me happy for it! And it just works!

Remove this comment

Remove this thread

Comments Closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation,
please create a new thread in our Forums, or
Contact Us and let us know.