Introduction

The intent of this paper is to evaluate the threading paradigms available within the .NET environment. The initial thought was to evaluate performance of different design patterns across a number of languages. My initial foray into codifying multiple language solutions proved problematic. The added variables also detracted from the primary objective of evaluating the various design patterns. Each of the design patterns tested are based on functionality within the .NET CLR. Three patterns reviewed are:

The Thread class and its ThreadStart delegate.

Use of the CLR Thread Pool (QueueUserWorkItem method).

Asynchronous threading using BeginInvoke and asynchronous callbacks.

In the following section, I will describe the object design used within the tests. The object design used was chosen based upon its flexibility, ease of use, and understanding. The design used has not been optimized for speed or efficiency. The numerical results should be taken in concert with each other as opposed to an individual absolute indicator.

The design consists of a Manager object, which is responsible for creating a series of Worker objects and allocating them their task assignments. The task assignments are in actuality instance data provided to each worker object upon its instantiation.

The Worker object encapsulates a secondary thread, which does all the heavy lifting requested by the Manager object.

Depending upon the threading design pattern being implemented, the Manager object plays some to no part in the actual thread creation. In two of the three implementations tested, the secondary thread creation is completely encapsulated and obscured from the Manager’s view. I’ll discuss this further as we delve into each of the solutions.

Earlier, I mentioned my initial intention to codify each threading solution within a different language. I started down this road by coding a solution in C++ .NET and in C#. The C++ .NET solution proved itself significantly more difficult and confusing, with the added pointer indirection required in the creation of a properly garbage collected class. Initial results were very surprising.

C# Solution.

C++.NET Solution.

The C# implementation was the clear winner in terms of timing. A closer look at the C++ .NET solution hints at a severe memory leak issue, which would have a direct impact on the overall performance measure. The C++.NET solution was abandoned at this juncture in order to focus on the threading patterns. The remainder of this document is based upon solutions built using C#. If anyone has done similar tests using managed C++, I would be very interested in the results.

The Thread class and its ThreadStart delegate

In our first example (ManagedThreadCS project), the secondary thread creation is completely obscured from the Manager object. The Manager creates a series of Worker objects and then waits for their completion. The Manager also takes on the task of controlling and monitoring the number of threads it can create. This information is passed to the Manager via a parameterized constructor.

The actual secondary thread creation occurs in the constructor of the Worker object. The Worker object contains an event object (ManualResetEvent), which is signaled by the secondary thread upon its completion.

It is the ManualResetEvent object that is queried by the Manager in order to determine if the Worker object has completed its assigned task.

Use of the CLR Thread Pool (QueueUserWorkItem method)

The manager object in the ManagedThreadCS-Pool project takes on the responsibility of queuing thread pool requests within this implementation. The Manager object no longer needs to concern itself with the number of thread requests. This responsibility now will fall upon the CLR resident thread pool.

The Manager still instantiates the Worker objects and assigns them their individual tasks. In addition, the Manager then requests a thread from the pool associated with this process. The requested thread is instructed to execute within the instance of the Worker object just created by the Manager.

Note: The thread pool is static in that only one pool can be assigned to a particular process. The pool assignment is done at the first invocation of a thread pool method.

Asynchronous threading using BeginInvoke and Asynchronous callbacks

In the last of our examples (ManagedThreadPool-Async project), we see the asynchronous threading example taking on a few characteristics of the prior two samples. From the perspective of encapsulation, this sample is similar to the first in that the Manager knows nothing about the actual secondary thread creation.

In terms of the underlying implementation, this example relies on the use of the CLR thread pool.

In addition to specifying a function to be executed by the secondary thread, we also must specify a function that is called after the secondary thread completes. In our example, the purpose for specifying an AsyncCallback function is to ensure a call to EndInvoke. It is extremely important that each BeginInvoke has an associated EndInvoke. Failure to call EndInvoke prohibits the CLR from cleaning up, which may result in memory leaks and unwanted code behavior.

Note: The AsyncCallback function can be executed on a thread separate from the primary thread and secondary thread, discussed already.

Test Results

Each of our samples was tested from within a console application. The console application instantiated a Manager object providing it the data to operate upon, and recorded the average time of execution.

The size of the data set provided to the Manager object had a direct impact on the number of threads that could be created. In our first example (Thread Start), the code constrained the maximum number of threads to a count of 3. The second and third examples were free to create any number of threads permissible under the CLR Thread Pool (defaults to a max. of 25).

The first data set provided could conceivably require 15 threads if not constrained by the code.

15 calls (33300 underlying objects created by the secondary thread).

Program

T(avg) 100 trials

T(avg) 1000 trials

~Thread Cnt

TheadStart

98

96

3

ThreadPool

106

101

5

Async

110

105

4

The second data set provided could conceivably require 250 threads if not constrained by the code. Since the thread pool only allows for a maximum of 25 threads, this data set proved useful in reviewing pool management under heavy loading conditions. Stressing the pool manager was achieved by lengthening the duration of the worker thread. A Sleep timer in the worker thread was utilized to simulate additional processing time.

250 calls (627500 underlying objects created by the secondary thread)

Program

T(avg) 1 trial

T(avg) 10 trials

~Thread Cnt

TheadStart

3254

3124

3

ThreadPool

3996

3745

3

Async

4116

3830

3

The performance across each implementation varied from run to run. The difference in performance ranged from 3% to 6%. From a pure timing perspective, there isn’t a clear winner.

CPU utilization can be effected by many other factors. The tests were executed from within a typical Windows NT and a Windows XP environment. In each of these environments, an attempt was made to minimize the number of tasks vying for processor time.

With that said, a superior implementation starts to emerge when you look at the results from a slightly different perspective.

Empirical Analysis of thread CPU usage reveals that operations under the thread pool are more efficient. Processor utilization never reaches 100% while under thread pool control.

The “Thread Start” implementation quickly grabs as much processor time as it can based upon its default normal priority. Processor utilization approaches 100% and remains fairly constant.

The graph of the thread pool’s processor utilization reveals a completely different picture. The CLR thread pool ramps up in a more controlled manner, keeping peek utilization well under 100% while still completing in a similar time frame.

The CLR Thread Pool enables the system to optimize throughput with respect to all other running processes. The benefit of the thread pool seems self-evident.

Note: Not all tasks are candidates for thread pool management. Microsoft documentation is rather explicit with regards to when and when not to use the Thread Pool. Consult the docs to see which approach best works for your problem domain.

One Last Pass

The last project (ManagedThreadCS-Pool2) takes what this writer feels is the most versatile of the threading solutions tested and adds event handling. The event handling mechanism allows for the design of a more robust solution, which allows us to continue processing while the secondary threads perform their assigned tasks.

I use this example to simply point out that the various threading solutions are not mutually exclusive. A balanced blend of the different threading solutions can make for some very capable designs.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

About the Author

Comments and Discussions

Hi. I'm developing a system which uses assyncronous methods (BeginXXX, EndXXX). The Microsoft documentations says that these methods are executed within the threadpool. However, the number of threads shown in Task Manager for my process grows up to more than 1000. I never call Thread.Start(). I've added some traces to debug the application and sometimes, when I call a method BeginXXX, the number of threads remains the same. I don't understand what is happenning.
I've made some tests and noticed that CLR cleans up the threads after a while. Is there something similar to the garbage collector for threads? Is there a way to call it? Have you any idea why the number of threads is increasing?
I'd thank any suggestions.

See Marc Clifton's article under "Its more complicated."
As he points out the key is that worker thread need to complete their work quickly. I have noticed if I force thread operations to be very time consuming then the OS will indeed spawn additional threads. Can you be a bit more specific about the duration of your worker threads?

Regards
Gary J. Kuehn

It is better to fail at attempting something great than to do nothing and succeed at it.

Assuming this projected worked before 2003 I’d say there is a problem in function ThreadFuncCallback; it seems the generic AsyncDelegate is no longer callable in 2003, thus requiring you to define your own delegate. If you have the time and 2003 maybe you could update this project. Thanks for the Article.

I downloaded sample 2 (ManagedThreadCS-ASync) and I am somewhat perplexed. I must have made some cut and paste errors when I was submitting the project. Within the constructor of the CGSRMgrThread class you should find the instantiation of a new ThreadDelegate and an AsyncCallback respectively.

Thanks for the update; works good now. I'm an old timer programmer and like everyone else trying to master a new language; I’ve read many articles on the subject of Threads and I’m still working on the when to use this method over that one; complex it is but with everyone’s help I’m getting there. Thanks for the Great Article; give it a 5 for your effort so far; but I would hope that you never stop updating this article based on user feedback; that’s what makes this Site out of Site if you know what I mean.

The way a hyperthreaded CPU still has cache contengencies, the result will still be different from a true SMP system. The next generation processors from AMD and Intel will both be dual core and have seperate caches for each logical processor, but even then, they will not have the same behavior as a quad xeon for instance.

I agree that an HT system is not the same as a dual-processor system, but you get real contention because two threads are executing at the same time. HT systems are cheap and give you a starting point.

An HT system is different from a 2P system, which is different from a 4P system, which is different from an 8P system. Etc. I've seen software which scaled well on a 200MHz 4P system scale horribly on a 500MHz 4P system.

On a uniprocessor system where only one thread is executing at a time and a typical timeslice lasts for millions of cycles, you get next to no contention. Measurements of multithreaded software on a uniprocessor system are nearly meaningless. If MT software runs badly on a UP system, you know you've got a real problem. If MT software runs well on a UP system, you cannot predict how well it will run on an MP system.

As someone also currently evaluating the threading models for a project, I think this is a good start.

What I do not find interesting is that, the various threading models are designed to satisfy different needs. The control a ThreadStart gives, for instance, is not available to the ThreadPool uses.

In most cases, the application of the thread model is more important than the performance of model. If I have to render a 1000 polygon on a memory image, for instance, I will not use ThreadStart, neither will asynchronous callback make sense to me. The article will, therefore, have being more interesting, if it had discussed the various applications for each model before discussion the performance.

BTW, I do not seem to understand the tabular presentations - any further help?

The table results are the results of some coarse timing test in milli-seconds. The first set of results depicted numerous calls to short duration threads whereas the second table contained fewer calls of longer duration threads. I hope to construct some more meaningful tests shortly. I will post my findings upon completion.