CLR Hosting - part 4 (scheduling and threading)

Introduction

Part 4 of my CLR Hosting journey: scheduling and threading. Today's applications typically rely on threading primitives to perform multiple tasks concurrently. Schedulers are needed in the operating system (and on other places, as I'll show later) to plan the execution of these tasks, because just one task can run at a time on one processor. The CLR, being a layer on top of the Windows OS, has support for threading too (luckily). In the previous versions of the CLR, threading was handled by the operating system directly because the mscorwks.dll (or the server builds) did send calls for threading stuff to the Win32 API directly. Just as it was the case with memory management, the CLR in v2.0 of the .NET Framework has an intermediate layer to forward threading-calls to the host for further processing. The Default CLR Host will call into the Win32 API to perform all of the threading/scheduling, whileas other hosts can choose to take responsibility over this. Again, SQL Server 2005 was the key influence for the CLR team to adapt the design of it to allow customization by hosting applications. The stuff that's covered includes:

Task creation and management

Thread pool support

Synchronization mechanism support (e.g. mutex, semaphore, etc)

In this post, I'll start with the needs of SQL Server 2005 concerning the CLR's scheduling/threading mechanisms. Once I've explained this, we'll jump into mscoree again to explain how a CLR Host can take advantage of the various APIs to control scheduling and threading stuff in pretty much detail.

Threading and scheduling in SQL Server

The basics

It's clear that a database server such as SQL Server should be capable of doing a lot of work at (virtually) the same time. In a multi-user database environment, it's key to achieve a great performance and scalability in order to be competent with other vendors. This brings us to the need for an efficient scheduling mechanism to allow the database engine to do various things at a time, in order to server users a fast as possible to process their requests. SQL Server uses a mutli-threading model to accomplish this goal, while running in one single process. This means there is one process with multiple thread and one single memory address space, therefore eliminating all stuff around shared memory. Now, okay, we do have multiple threads, but how are those threads put on the processor to perform work? That's where the User Mode Scheduler comes into play. The User Mode Scheduler (further abbreviated to UMS) is a key component of SQL Server that gives the SQL OS (that's the core engine of SQL Server in relation to resource allocation and control) more control over how threads (and fibers, see further) are scheduled. Every processor has one instance of the UMS associated with it (well, that's only true for the processors that SQL Server is allowed to use, which is configurable trough the affinity mask).

So, we know there are threads being used by SQL Server. Next, you should know that these threads are grouped into thread pools which are dedicated for various kinds of operations. First of all, there is a series of threads that are out there to support the basic core tasks of the SQL Server database engine, such as the lazywriter, the log writer (which writes out log caches - in-memory representations of transaction log entries - to disk based on a flushQueue and a freeQueue to return written-to-disk-caches to), cleanup threads, etc. Beside its own housekeeping, SQL Server should also be able to process requests from users and client applications to retrieve/manipulate data (that's what a database is for, right?). In order to do this, there's a pool of so-called worker threads. Clients submit requests to the SQL Server database through the Net-Library up to the User Mode Scheduler before it reaches the Relational Engine. This is where the UMS kicks in to assign a worker thread to the incoming request for further processing. This request submission is done through an IO completion port through which it's queued up in a completion queue. There are various actions the scheduler can take to assign a thread: either it can take an available thread from the pool, or it can create a new thread (there is a configurable upper limit called max worker threads that defaults to 255 which should be okay), put it in the pool and assign it to the client's request for further processing. The assigned worker thread stays alive and bound to the user's request till completion of the request. Worker threads are divided among the different UMS schedulers that are running on the machine (remember, one for each processor).

Threads versus fibers

SQL Server support two modi which it can run in. The first one is the default and is called "thread mode", the other one is called "fiber mode". In order to understand the difference, let's explain the difference between threads and fibers:

Threads are kernel-mode objects known by the OS and act as the default unit of work which can be put on the scheduler. At regular times the (preemptive) scheduler of the Windows OS starts its mission to take a thread off the processor (or off on of the processor on an SMP machine), find another thread (by scanning the list of threads and executing some thread election algorithm, based on thread priorities, credits, etc) and put that thread on the processor to start executing. Switching between threads is called on context-switch and has a relatively high overhead on the system.

Fibers are often called lightweight threads and are living for 100% in user mode, so the Windows OS's scheduler doesn't even know about their existence. Code running in user mode is responsible for all of the scheduling work for the fibers to run.

Fiber mode (or lightweight pooling) looks promising but there are a couple of drawbacks, one of those being the fact that not all of the functionality (e.g. SQLMail - if you should use that - and SQLXML) does work in fiber mode. The key recommendation however is to avoid fibers if you can. Tempting as the checkbox in the SQL Server configuration might be, it's generally spoken not a good idea to enable fiber mode (see MSDN too on http://msdn.microsoft.com/library/en-us/dnsqldev/html/sqldev_02152005.asp?frame=true). One scenario where fibers might be advisible is when the overhead of context switching is so high that it starts to hurt other work that needs to be done by the server.

The User Mode Scheduler

In this paragraph, I'll explain the basics of the User Mode Scheduler. If you want to know more about it, I recommend you to take a look at the book "The Guru's Guide to SQL Server Architecture and Internals" by Ken Henderson, chapter 10. First of all there's the UMS thread running on every processor that can be used by SQL Server. There's only one such thread per processor. It's the task of the OS to schedule this thread on any of the available processors. When you do configure an affinity mask you tell SQL Server which CPUs it can use which is likely not a good idea because it causes overhead on the UMSs to schedule their threads on the same processor every time (the OS simply is prevented from choosing any free processor to put the UMS' thread on).

All code running in SQL Server (because of user requests for example, performing transactions, queries, etc) is scheduled by the UMS. If there's non-SQL Server code that has to be run, the UMS will keep its hands off it and leave it to the OS's scheduler to perform the work (e.g. extended procedures). The reason to introduce a UMS to control the scheduling originates from the SQL Server 7.0 timeframe where it became clear that relying on the underlying OS to perform all of the scheduling was not flexible enough and the scalability potential of the product was hindered because of this.

The UMS lives in a file called ums.dll inside the Binn folder of your SQL Server installation. The overall goal of the UMS is to avoid wasting processor cycles because of things such as context switches and a thread switching between user and kernel mode. The UMS contains a subset of the functionality of the Win32 threading functionality, containing only those things that are relevant for SQL Server. When the UMS performs its work, it calls into the Win32 functions to do further work, after controlling SQL Server's needs to achieve high scalability and performance. Actually the UMS is nothing more than a thin layer between the OS and the database engine when it comes down to scheduling and threading.

Now, the big difference between SQL Server's scheduling and the scheduling implemented by the Windows OS. SQL Server's UMS performs so-called cooperative scheduling whileas the Windows OS (since NT) works based on preemptive scheduling. What do these two terms mean?

Preemptive scheduling: The scheduler doesn't trust anyone and kicks in at regular times to take a currently running thread from the processor it was assigned to, to save that thread's state (stack, program counter, registers, etc), find another thread to run and put that thread on the processor by restoring its state and waking it up. This way, no single program or thread can monopolize the processor which can bring down the computer (in a sense that no-one else is allowed to run, e.g. explorer.exe would starve, the OS seems to hang).

Cooperative scheduling: "We're friends, aren't we?" The UMS knows it can trust the threads that are part of the game and that these threads will voluntarily yield. As the matter in fact, all of the thread's code inside SQL Server is implemented by the SQL Server team, so it's possible to know the behavior of it. However, if there is one single thread that does not yield (and thus acts as a bad guy), the whole database system will be hurt because of this.

The UMS uses the Win32 API to reach its goal on an underlying preemptive OS. The basic mechanism it employs is the following. Every thread running in a UMS has an event object that can be signaled. When the corresponding UMS for the thread does not want the thread to be scheduled (cf. worker thread pool etc), it asks that thread to start an infinite wait on its event object by calling WaitForSingleObject. Because of this, the OS will judge that this thread is waiting and can't possibly do anything, therefore bypassing it when looking (in a preemptive manner) for a thread to be put on the processor to start its work. At a certain point in time, the UMS might want the thread to start doing something and signals the event object of that thread. Because of this, the WaitForSingleObject call ends and the thread comes alive. Now, the Windows scheduler will see that the thread is alive and kicking and ready to do some work, so it can be selected by the scheduler to run on some processor. In order to avoid context switches and swapping on processors, the UMS will attempt to keep the number of viable threads as low as possible, ideally having only one thread per processor. That way, the Windows OS has no choice but to select that "dedicated to a given processor" thread.

To run extended procedures, the UMS can't even think of controlling it in a cooperative scheduled fashion because the code in the extended procedure can't be trusted and it can't even indicate to yield. Because of this, preemptive scheduling for extended procedures (and other sort of stuff such as OAs, debugging, distributed queries, etc) is used. On its turn, this causes a drop on scalability and concurrency because the UMS isn't able anymore to serve a lot of requests on a limited set of workers.

An explanation of the UMS scheduler itself and all of the associated lists would bring us too far. Instead, I'll just mention these and give a short one-line description:

Worker list - contains all of the available UMS workers, encapsulating a fiber or thread

Runnable list - all of the UMS workers ready to execute but waiting to be signaled by the UMS (which is done indirectly by another worker calling yield)

Waiter list - UMS workers waiting for a resource; another worker that holds the resource is responsible to signal the waiting worker by scanning the waiter list

I/O list - list containing outstanding asynchronous I/O requests waiting for completion; any yielding worker has to walk through this list to check for completed asynchronous I/O operations and to wake the corresponding worker

Timer list - list with timed work requests; any yielding worker has to walk through this list to check for expired timer requests and to wake the corresponding worker

As you can see, voluntarily thread yielding is a key job for each of the workers.

You can check the statistics of the UMS by calling DBCC SQLPERF(umsstats):

Note, the results shown above are these from a system that has just been started a couple of minutes ago. On a system that's up and running for a longer time, you should see far higher numbers for the total work and the context switches.

CLR Hosting

In this first post about scheduling and threading in the CLR Hosting APIs I'll give a brief overview of the basic concepts and API functions you should know about. In the next posts of CLR Hosting part 4, I'll dig into more details by taking a look with you at the CoopFiber sample that's included in the .NET Framework v2.0 SDK. More information about this sample can also be found on Robert "Dino" Viehland's blog. Dino's working on the CLR team as an SDE/T and has been posting about this sample too in the past.

Tasks

Okay, so now how does SQL Server 2005 can make sure that cooperative scheduling can be combined with the CLR integration stuff. Before v2.0 of the CLR, the CLR was only capable of working on a preemptively scheduled operating system layer. The hosting APIs did not provide a means to put the CLR on top of a cooperatively scheduled "host". As a basic notion, the CLR Hosting API provides the notion of a task which is the abstraction of an underlying unit of execution, being either a thread or a fiber (see above for the explanation of both terms). Based on the mode SQL Server 2005 is running in, the notion of a task is mapped either to a thread or a fiber on the OS.

Basically, there are 4 interfaces:

IHostTaskManager - interface to create your own task manager that the CLR should obtain a reference of through the IHostControl::GetHostManager method as explained in previous posts

ICLRTaskManager - the IHostTaskManager interface contains a method called SetCLRTaskManager, which is used by the CLR to pass in an instance of an ICLRTaskManager object; this is the side of the CLR in relation to task management

In here you'll find a limited set of functionality that's similar to the functionality in the IHostTaskManager. I'll cover this further in this post.

IHostTask - the host's notion of a task

Used to start, stop, join, alert tasks and to get/set priorites.

ICLRTask - the CLR's notion of a task; as with the ICLRTaskManager, the CLR provides the IHostTask instance with an instance of ICLRTask through a method called SetCLRTask

This interface contains methods to communicate to the CLR, for example to notify it that the host is scheduling/unscheduling a task (SwitchIn/SwitchOut methods). Other functionality includes yielding (cf. SQL Server elaboration on cooperative scheduling), an ExiTask function and functions to Abort and RudeAbort (the diffence will be explained later) a task. There are also functions for statistics information (memory statistics, number of held locks).

A task has a lifecycle that you should understand in order to implement the IHost* interfaces correctly. Let's show such a lifecycle:

A task is created and starts its live in an unscheduled state. The CLR calls the IHostTaskManager::CreateTask method and creates an instance of ICLRTask which is passed to the IHostTask through the SetCLRTask method.

The task is started by calling IHostTask::Start.

The host decides to schedule the task. Using the ICLRTask reference it got in step 1, it calls SwitchIn to tell the CLR that the task will be scheduled.

Now the task is up and running.

When the host decides to switch out the task, it can call the SwitchOut method on the ICLRTask object, in a similar way as explained in step 3. The host can decide this based on various conditions, including I/O completion (e.g. relation to the I/O List in the UMS of SQL Server) or synchronization blocking (e.g. access to a resource protected with mutex/semaphore).

Later on, the task can be scheduled again, using SwitchIn as explained in step 3. However, this time it might end up on another physical OS thread (when being mapped to threads). The CLR can indicate thread affinity however when it needs this for the particular thread. It does this by calling BeginThreadAffinity and EndThreadAffinity on the IHostTaskManager. In that case, the host must reschedule the task on the same OS thread.

A last situation occurs when the task has completed its work. In that case, the host calls ICLRTask::ExitTask to notify the CLR about this.

Aborts and rude aborts will be covered later. Notice however that these are communicated to the CLR too by the host. This stuff has to do with finalization as I'll explain later on in further posts. Finally, there is a Reset method on ICLRTask too, to clean up the task and to enabe it to be reused by the CLR in the future (rather than calling ExitTask to destroy the task on the CLR level).

Note: Beside of threads created through the BCL functions (and later on, redirected to the host to perform the actual work) such as the Thread class, the CLR creates some management threads when it's loaded through the Win32 API always. These threads are responsible for various tasks such as garbage collection (gc and finalizer), debugging (debugger helper thread), thread pool access control (timer, gate, worker, waiter and I/O completion thread), internal timing.

Managed/unmanaged code execution transitions

Transitions between unmanaged code and managed code (in both directions) can be intercepted by providing a host manager for scheduling too. Examples include the use of COM interop to call unmanaged code from within a managed code context and the use of function pointers (in managed code that corresponds to a delegate) to call back into managed code by using such a pointer that was marshaled to native code. Hosts that use cooperative scheduling (such as the SQL Server 2005 CLR Host) need to know when execution leaves managed code, because from that point on, threading is no longer under the control of the CLR. Therefore, the host can't control the scheduling anymore (refer to the voluntary yielding of threads where the mechanism of cooperative scheduling is built on), so it has to grant the thread to be scheduled preemptively by the OS to continue its work.

In order to support those scenarios, the CLR calls the host ot notify the host about these managed/unmanaged code transitions using four methods on the IHostTaskManager object:

In the next subpart of CLR Hosting part 4, I'll cover the stuff around synchronization management and thread pooling. After having covered this stuff, I'll take a look at the CoopFiber sample together with you.