Jeffrey Richter: Excerpt #4 from CLR via C#, Third Edition

Good morning, everyone, Jeffrey Richter here. Today I’d like to share another section from my new book with you. It’s from Chapter 25, “Thread Basics.” Enjoy, and search this blog for more excerpts from the book.

Stop the Madness

If all we cared about was raw performance, then the optimum number of threads to have on any machine is identical to the number of CPUs on that machine. So a machine with one CPU would have only one thread, a machine with two CPUs would have two threads, and so on. The reason is obvious: If you have more threads than CPUs, then context switching is introduced and performance deteriorates. If each CPU has just one thread, then no context switching exists and the threads run at full speed.

However, Microsoft designed Windows to favor reliability and responsiveness as opposed to favoring raw speed and performance. And I commend this decision: I don’t think any of us would be using Windows or the .NET Framework today if applications could still stop the OS and other applications. Therefore, Windows gives each process its own thread for improved system reliability and responsiveness. On my machine, for example, when I run Task Manager and select the Performance tab, I see the image shown in Figure 25-1.

It shows that my machine currently has 60 processes running on it, and so we’d expect that there were at least 60 threads on my machine since each process gets at least 1 thread. But Task Manager also shows that my machine currently has 829 threads in it! This means that there is about 829 MB of memory allocated for just the thread stacks, and my machine has only 2 GB of RAM in it. This also means that there is an average of approximately 13.8 threads per process.

Now, look at the CPU Usage reading: It shows that my CPU is busy 0 percent of the time. This means that 100 percent of the time, these 829 threads have literally nothing to do—they are just soaking up memory that is definitely not being used when the threads are not running. You have to ask yourself: Do these applications need all these threads to do nothing 100 per- cent of the time? The answer to this question has to be “No.” Now, if you want to see which processes are the most wasteful, click the Processes tab, add the Threads column, and sort this column in descending order, as shown in Figure 25-2. (You add the column by selecting the View menu’s Select Columns menu item.)

As you can see here, Outlook has created 38 threads and is using 0 percent of the CPU, Microsoft Visual Studio (Devenv.exe) has created 34 threads to use 0 percent of the CPU, Windows Live Messenger (Msnmsgr.exe) has created 34 threads to use 0 percent of the CPU, and so on. What is going on here?

When developers were learning about Windows, they learned that a process in Windows is very, very expensive. Creating a process usually takes several seconds, a lot of memory must be allocated, this memory must be initialized, the EXE and DLL files have to load from disk, and so on. By comparison, creating a thread in Windows is very cheap, so developers decided to stop creating processes and start creating threads instead. So now we have lots of threads. But even though threads are cheaper than processes, they are still very expensive compared to most other system resources, so they should be used sparingly and appropriately.

Well, without a doubt, we can say for sure that all of these applications we’ve just discussed are using threads inefficiently. There is just no way that all of these threads need to exist in the system. It is one thing to allocate resources inside an application; it’s quite another to allocate them and then not use them. This is just wasteful, and allocating all the memory for thread stacks means that there is less memory for more important data, such as a user’s document.

I just can’t resist sharing with you another demonstration of how bad this situation is. Try this: Open Notepad.exe and use Task Manager to see how many threads are in it. Then select Notepad’s File Open menu item to display the common File Open dialog box. Once the dialog box appears, look at Task Manager to see how many new threads just got created. On my machine, 22 additional threads are created just by displaying this dialog box! In fact, every application that uses the common File Open or File Save dialog box will get many additional threads created inside it that sit idle most of the time. A lot of these threads aren’t even destroyed when the dialog box is closed.

To make matters worse, what if these were the processes running in a single user’s Remote Desktop Services session—and what if there were actually 100 users on this machine? Then there would be 100 instances of Outlook, all creating 38 threads only to do nothing with them. That’s 3,800 threads each with its own kernel object, TEB, user-mode stack, kernel- mode stack, etc. That is a lot of wasted resources. This madness has to stop, especially if Microsoft wants to give users a good experience when running Windows on netbook com- puters, many of which have only 1 GB of RAM. Again, the chapters in this part of the book will describe how to properly design an application to use very few threads in an efficient manner.

Now, I will admit that today, most threads in the system are created by native code. Therefore, the thread’s user-mode stack is really just reserving address space and most likely, the stack is not fully committed to using storage. However, as more and more applications become managed or have managed components running inside them (which Outlook supports), then more and more stacks become fully committed, and they are allocating a full 1 MB of physical storage. Regardless, all threads still have a kernel object, kernel-mode stack, and other resources allocated to them. This trend of creating threads willy-nilly because they are cheap has to stop; threads are not cheap—rather, they are expensive, so use them wisely.

That's interesting information about threads, but unfortunately you gave no addition information how to avoid using threads 🙂 What should we do to reduce amount threads used?

Actually I think that we should not worry about the threads as long as they fulfill their function within the application. Of course you should pay attention to the total number of threads to be created, nevertheless we have to live with it until someone will invent something totally different with respects to the threads, but with the same behavior.

I have a question for you regarding the processes tab in the Task Manager. I've noticed for quite some time that the CPU column reports usage using whole numbers only. That is to say if an application is using, for example, 2.5% it is reported as either 2 or 3 percent. The values I chose to represent an application's cpu usage in the previous statement are ambiguous because I do not know what rounding method is used to represent values in the CPU column. Do you think it would be a good idea to go more into depth of the Task Manager itself and create a section in the 3rd edition of your text discussing the matter?

BTW, In terms of number of threads used by an application I agree too many threads are no good. I recall from the work we did on the first edition of your CLR Via C# text, thread pool threads are a better thread usage model than creating threads from scratch. There is a "snag" in using thread pool threads. If you need to run several threads to perform logic in a method, for instance, and you want to know which thread performed the instructions in the method you may have to create a regular thread. I say so because I don't know of a way to name a Thread Pool Thread. Thus, Thread pool threads would not be a good candidate for the aforementioned situation.

Ray: In my book, I talk about many techniques that you can use to reduce the number of threads in your application while also exploiting all the CPUs in the machine. I disagree with the sentiment that you shouldn’t worry about threads and again, my book goes into a lot of detail about why this is.

Jamie: Process Explorer shows fractional CPU usage so there are some tools that more accurately reflect what is going on. I am a big fan of the thread pool and I’d say that if you are naming threads then your application’s architecture is not optimal. There should not be a need to name threads (except for GUI threads and the GC’s Finalizer thread) in a properly-architected application. Again, the 3rd edition of my book goes into this in great detail.