Chapter 3 Developing Phase: Process and Thread Management

Published: May 31, 2006

This chapter discusses the similarities and differences in the implementation of process and thread management in UNIX and Microsoft® Windows® operating systems. The chapter first discusses the UNIX and Windows process management mechanism and the Windows application programming interface (APIs) related to processes. It then discusses threads and their implementation in Windows.

On This Page

Process Management

The UNIX and Windows process management mechanisms are very different, and the major difference between them lies in the creation of processes. UNIX uses fork to create a new copy of a running process and exec to replace the current process image with the new process image. Windows does not have a fork function. Instead, Windows creates processes in one step by using CreateProcess. In Windows, there is no need to execute the process after its creation as it will already be executing the new code. However, the standard exec functions are still available in Windows. These differences (and others) result in the need to convert the UNIX code before it can run on a Windows platform.

The following sections discuss the following process management topics that need to be considered for migration:

Creating a New Process

Replacing a Process Image (exec)

Retrieving Process Information

Waiting for a Spawned Process

Processes vs. Threads

Managing Process Resource Limits

Limiting File I/O When Using Windows

Process Accounting

Managing and Scheduling Processes

This section also introduces Windows jobs, which allow you to group processes together for management purposes. This functionality is not available in UNIX. With the information provided in this section, you will understand the process management routines in UNIX and Windows. Using this information, you will be able to replace UNIX process routines with the corresponding Windows-compatible routines.

Note There are a number of process management functions in the Windows API. For more information on these functions, refer to the Windows API reference on the Microsoft Developer Network (MSDN®) Web site.

Creating a New Process

In UNIX, you can create a new process by using fork. The fork function creates a child process that is almost an exact copy of the parent process. The fact that the child is a copy of the parent ensures that the process environment is the same for the child as it is for the parent.

In Windows, the CreateProcess function enables the parent process to create an operating environment for a new process. The CreateProcess function creates a new process and its primary thread. The environment includes the working directory, window attributes, environment variables, execution priority, and command-line arguments. The new process runs the specified executable file in the security context of the calling process. A handle is returned by the CreateProcess function, which enables the parent application to perform operations on the process and its environment while executing. Unlike UNIX, the executable file run by CreateProcess is not a copy of the parent process and must be explicitly specified in the call to the CreateProcess function. If the calling process is impersonating another user, the new process uses the token for the calling process, not the impersonation token. To run the new process in the security context of the user represented by the impersonation token, use the CreateProcessAsUser or CreateProcessWithLogonW functions.

An alternative to using CreateProcess is to use one of the spawn functions that are present in the standard C runtime. There are 16 variations of the spawn function. Each spawn function creates and executes a new process. Many of these functions have the same functionality as the similarly named exec functions in UNIX. The spawn functions include an additional argument that permits the new process to replace the current process, suspend the current process until the spawned process terminates, run asynchronously with the calling process, or run simultaneously and detach as a background process. For a UNIX application to change the executable file run in the child process, the child process must explicitly call an exec function to overwrite the executable file with a new application. The combination of fork and exec is similar to, but not the same as, CreateProcess. The following example shows a UNIX application that forks to create a child process and then runs the UNIX ps command by using execlp.

You can port this code to Windows by using the Windows CreateProcess function discussed earlier, or by using a spawn function from the standard C runtime library. In both cases, the old and new processes run parallel and asynchronously. The following example shows how you can port the previous code using the CreateProcess function.

The arguments supported by CreateProcess (shown in the preceding example) give you a considerable degree of control over the newly created process. This contrasts with the spawn functions on UNIX, which do not provide options to set process priority, security attributes, or the debug status. The _spawn function creates and executes a new process on Windows. The following example shows how the same code was ported using the _spawnlp function.

Replacing a Process Image (exec)

Each of the functions in the exec family replaces the current process image with a new process image.

In UNIX, the exec family of functions replaces the executing process image with that of another process image. The new image is constructed from a regular, executable file called the new process image file. As mentioned previously, a fork followed by an exec is similar to CreateProcess. Windows supports the six POSIX variants of the exec function plus two additional ones (execlpe and execvpe). The function signatures are identical and come as part of the standard C runtime. Porting UNIX code that uses exec to Windows is easy to understand. The following is a simple UNIX example showing the use of the execlp function.

Note For more information about exec support on Windows, refer to the standard C runtime library documentation that comes with the Microsoft Visual Studio® .NET 2003 development system.

The preceding example compiles and runs on Windows with only minor modifications. However, it requires an executable file called ps.exe to be available (one is included with the Interix product). If Interix is not installed, this command can be replaced by any other Windows command. The <unistd.h> include file is not a valid header file when using Windows. To use this example when using Windows, you need to change the header file to <process.h>. Doing so allows you to compile, link, and run this simple application.

Process Information

In Windows, current process information is returned to the parent when a child is created.

When a process is started, information can be specified for its startup state. This is given in the _STARTUPINFO structure. Windows information is contained in this structure. For graphical user interface (GUI) processes, this information affects the first window created by the CreateWindow function. For console processes, this information affects the console window if a new console is created for the process. A process can use the GetStartupInfo function to retrieve the _STARTUPINFO structure specified when the process was created.

Waiting for a Spawned Process

In the preceding section, an example showed how you can create an asynchronous process where the parent and child processes execute simultaneously. No synchronization is performed. This section describes how to modify the previous example to include functionality that enables the parent process to wait for the child process to complete or terminate before continuing. To accomplish this in UNIX, a developer would use one of the wait functions to suspend the parent process until the child process terminates. The same semantics are available when using Windows. The functions used are different, but the results are the same. When you view the examples, remember that this is not an exhaustive comparison between the two platforms.

A very simple scenario is described; but if you need to expand the scenario to include waiting for multiple child processes, the example of creating a process using spawn does not map adequately because it does not include support for this functionality. In this case, you need to consider the CreateProcess approach and WaitForMultipleObjects. The WaitForMultipleObjects function determines whether the wait criteria were met and accordingly returns a value when any one of the specified objects is in the signaled state or if the time-out interval elapses.

The following example shows how UNIX code that waits for a child process can be migrated to Windows.

UNIX example: Waiting for a spawned process

Note: Some of the lines in the following code have been displayed on multiple lines for better readability.

Processes vs. Threads

In the next example, the UNIX code forks a process, but does not execute a separate run-time image. This creates a separate execution path within the application. When using Windows, this is achieved by using threads instead of processes. If your UNIX application creates separate threads of execution in this manner, you should use the Windows API CreateThread. The process of creating threads is discussed in the “Creating a Thread” section of this chapter.

Managing Process Resource Limits

Developers often want to create processes that run with a specific set of resource restrictions. In some cases, they may impose limitations for the purposes of stress testing or forced failure condition testing. In other cases, the limitations may be imposed to restrict runaway processes from using up all the available memory, CPU cycles, or disk space.

In UNIX, the getrlimit function retrieves resource limits for a process, the getrusage function retrieves current usage, and the setrlimit function sets new limits. The common limit names and their meanings are listed in Table 3.1.

Table 3.1. Common Limit Names and Definitions

Limit

Description

RLIMIT_CORE

The maximum size (in bytes) of a core file created by this process. If the core file is larger than RLIMIT_CORE, the write is terminated at this value. If the limit is set to 0, then no core files are created.

RLIMIT_CPU

The maximum time (in seconds) of CPU time a process can use. If the process exceeds this time, the system generates SIGXCPU for the process.

RLIMIT_DATA

Maximum size (in bytes) of a data segment of the process. If the data segment exceeds this value, the functions brk, malloc, and sbrk will fail.

RLIMIT_FSIZE

The maximum size (in bytes) of a file created by a process. If the limit is 0, the process cannot create a file. If a write or truncation call exceeds the limit, further attempts will fail.

RLIMIT_NOFILE

The highest possible value for a file descriptor, plus one. This limits the number of file descriptors a process may allocate. If the number of files being allocated is more than the value of RLIMIT_NOFILE, functions allocating new file descriptors may fail with the error EMFILE.

RLIMIT_STACK

The maximum size (in bytes) of a stack of the process. The stack will not automatically exceed this limit; if a process tries to exceed the limit, the system generates SIGSEGV for the process.

RLIMIT_AS

Maximum size (in bytes) of the total available memory of a process. If this limit is exceeded, the memory functions brk, malloc, mmap, and sbrk will fail with errno set to ENOMEM, and automatic stack growth will fail as described for RLIMIT_STACK.

Windows uses job objects to set job limits instead of process limits. Unlike in UNIX, Windows job objects do not have file input/output (I/O) source restrictions. If you require file I/O limits in your application, you need to create your own code to handle this.

Windows Job Objects

Windows supports the concept of job objects, which allows you to group one or more processes into a single entity. A job object allows groups of processes to be managed as a unit. Job objects are namable, securable, sharable objects that control attributes of the processes associated with them. Operations performed on the job object affect all processes associated with the job object.

After a job object has been populated with the desired processes, the entire group can be manipulated for various purposes ranging from termination to imposing resource restrictions.

Job objects can be used to implement the concept of UNIX process groups. In Windows, process groups can be implemented by process control, which uses job objects. A process for which you have defined a process-execution rule can be placed into a process group. Process groups use job objects to define rules for the groups. Table 3.2 lists the Windows job objects structures.

Table 3.2. Windows Job Objects Structures

Member

Description

Notes

JOBOBJECT_BASIC_ACCOUNTING_INFORMATION

Contains basic accounting information for a job object.

This structure holds information on total amount of user-mode execution time and kernel-mode execution time, page faults, and the total number of processes associated with the a job for all active and terminated processes.

This structure holds information on the basic accounting and the I/O accounting information for the job. It includes information for all processes that have ever been associated with the job, in addition to the information for all processes currently associated with the job.

JOBOBJECT_BASIC_LIMIT_INFORMATION

Contains basic limit information for a job object.

This structure sets various restrictions on the job object such as time limit, working set size, and active process limit. Refer to Table 3.3 for more information.

JOBOBJECT_BASIC_PROCESS_ID_LIST

Contains the process identifier list for a job object.

This structure holds the ProcessIdList, which holds information on process identifiers for the job object.

JOBOBJECT_BASIC_UI_RESTRICTIONS

Contains basic user-interface restrictions for a job object.

This structure holds the UIRestrictionsClass, which restricts the processes associated with the job for creating/switching desktops, changing display settings and system parameters, and accessing global atoms, handles, and reading/writing to Clipboard.

JOBOBJECT_END_OF_JOB_TIME_INFORMATION

Specifies the action the system will perform when the end-of-job time limit is exceeded.

The default termination action is to terminate all processes and set the exit status. Another way is to post a completion packet to the completion port and, when the system clears the end-of-job time limit, processes in the job can continue their execution.

JOBOBJECT_EXTENDED_LIMIT_INFORMATION

Contains basic and extended limit information for a job object.

Contains basic limit information, such as per-process memory limit or per-job memory limit. These are ignored if the corresponding flags are not set in the LimitFlags member of the JOBOBJECT_BASIC_LIMIT_INFORMATION structure. It also holds peak memory used by processes in the job.

JOBOBJECT_SECURITY_LIMIT_INFORMATION

Contains the security limitations for a job object.

Holds the SecurityLimitFlags, which sets the security limitations for the job and pointers to tokens to specify privileges/control access.

The restrictions that job objects allow you to enforce are in the JOBOBJECT_BASIC_LIMIT_INFORMATION structure.

Table 3.3. Job Objects

Member

Description

Notes

PerProcessUser-TimeLimit

Specifies the maximum user-mode time allotted to each process (in 100 ns intervals).

The system automatically terminates any process that uses more than its allotted time. To set this limit, specify the JOB_OBJECT_LIMIT_PROCESS_TIME flag in the LimitFlags member.

PerJobUser-TimeLimit

Specifies how much more user-mode time the processes in this job can use (in 100 ns intervals).

By default, the system automatically terminates all processes when the time limit is reached. You can change this value periodically as the job runs. To set this limit, specify the JOB_OBJECT_LIMIT_JOB_TIME flag in the LimitFlags member.

LimitFlags

Specifies the job restrictions to apply.

Refer to the job objects API reference for more information.

MinimumWorkingSetSize/ MaximumWorkingSetSize

Specifies the minimum and maximum working set size for each process (not for all processes within the job).

Normally, the working set of a process can grow beyond its maximum; setting MaximumWorkingSetSize forces a hard limit. When the working set of the process reaches this limit, the process pages against itself. Calls to SetProcessWorkingSetSize by an individual process are ignored unless the process is just trying to empty its working set. To set this limit, specify the JOB_OBJECT_LIMIT_WORKINGSET flag in the LimitFlags member.

ActiveProcessLimit

Specifies the maximum number of processes that can run concurrently in the job.

Any attempt to go over this limit causes the new process to be terminated with a “not enough quota” error. To set this limit, specify the JOB_OBJECT_LIMIT_ACTIVE_PROCESS flag in the LimitFlags member.

Affinity

Specifies the subset of the CPUs that can run the processes.

Individual processes can limit this even further. To set this limit, specify the JOB_OBJECT_ LIMIT_AFFINITY flag in the LimitFlags member.

PriorityClass

Specifies the priority class that all processes use.

If a process calls SetPriorityClass, the call will return successfully even though it actually fails. If the process calls GetPriorityClass, the function returns what the process has set the priority class to even though this might not be the actual priority class of the process. In addition, SetThreadPriority fails to raise threads above typical priority but can be used to lower the priority of a process. To set this limit, specify the JOB_OBJECT_LIMIT_PRIORITY_CLASS flag in the LimitFlags member.

SchedulingClass

Specifies a relative time quantum difference assigned to threads in the job.

Value can be from 0 to 9 inclusive; refer to the text after this table for more information. To set this limit, specify the JOB_OBJECT_LIMIT_SCHEDULING_CLASS flag in the LimitFlags member.

As you may have observed by reviewing the table for setrlimit and job objects, the restrictions offered by job objects are comparable to UNIX except in one major area—file I/O.

Limiting File I/O When Using Windows

When a process is created in UNIX, the process control block (PCB) in kernel space contains an array of limits that is initialized with default values. In the case of the RLIMIT_FSIZE limit, the write procedures in the kernel are aware of the limit structure in the PCB, and these functions make checks to enforce the limits. The Windows operating system does not implement similar limits on files. To solve this problem, you must write your own solution and build it into your application.

This section presents a solution that you can use in your application. This solution emulates the UNIX file resource limits with:

An array of limits held as a static variable. This is similar to how some of the C run-time functions use static variables.

Versions of the UNIX functions getrlimit() and setrlimit(). These functions manipulate the limit array.

Wrappers for each of the disk write functions. These wrappers are resource limit-aware.

This solution is implemented as three files. Two of the files, resource.h and resource.c, implement the getrlimit(), setrlimit(), rfwrite(), and _rwrite() functions. Only fwrite() and _write() are wrapped because they are the most common disk write functions encountered in the UNIX world. The third file is rlimit.c, which is a very simple test program used to confirm that rfwrite() fails when the limit is reached.

Process Accounting

The Windows API has several functions for gathering process accounting information:

GetProcessShutdownParameters. Retrieves shutdown parameters for the currently calling process.

Managing and Scheduling Processes

This section looks at how you can change the scheduling priority of a process in UNIX and Windows.

In UNIX, getpriority(), setpriority(), and nice() functions can be used to change the priority of processes. The getpriority() call returns the current nice value for a process, process group, or a user. The returned nice value is in the range of [-NZERO, NZERO-1]. NZERO is defined in /usr/include/limits.h. The default process priority always has the value 0 for UNIX. The setpriority() call sets the current nice value for a process, process group, or a user to the value of value + NZERO.

In Windows, processes are scheduled to run based on their scheduling priority. Each thread is assigned a scheduling priority. The priority levels range from zero (lowest priority) to 31 (highest priority). Only the zero-page thread can have a priority of zero. The zero-page thread is a system thread responsible for zeroing any free pages when there are no other threads that need to run.

Each process belongs to one of the priority classes listed in Table 3.4.

Table 3.4. Priority Classes

Priority Classes

IDLE_PRIORITY_CLASS

BELOW_NORMAL_PRIORITY_CLASS

NORMAL_PRIORITY_CLASS

ABOVE_NORMAL_PRIORITY_CLASS

HIGH_PRIORITY_CLASS

REALTIME_PRIORITY_CLASS

By default, the priority class of a process is NORMAL_PRIORITY_CLASS. Use the CreateProcess function to specify the priority class of a child process when you create it. If the calling process is IDLE_PRIORITY_CLASS or BELOW_NORMAL_PRIORITY_CLASS, the new process will inherit this class. The GetPriorityClass function and the SetPriorityClass function can be used to retrieve and set the priority class of processes, respectively. Table 3.5 lists the functions that are related to scheduling in UNIX and Windows.

Table 3.5. Functions Related to Scheduling in UNIX and Windows

UNIX Function

Description

Windows Function

nice()

Change the priority of a conventional process.

SetPriorityClass

getpriority()

Get the maximum priority of a group of conventional processes.

GetPriorityClass

setpriority()

Set the priority of a group of conventional processes.

SetPriorityClass

sched_getscheduler()

Get the scheduling policy of a process.

GetPriorityClass

sched_setscheduler()

Set the scheduling policy and priority of a process.

SetPriorityClass and SetThreadPriority

sched_getparam()

Get the scheduling priority of a process.

GetThreadPriority

sched_setparam()

Set the priority of a process.

SetThreadPriority

sched_yield()

Relinquish the processor voluntarily without blocking.

For this use SetPriorityClass and set to IDLE_PRIORITY_CLASS

sched_rr_get_interval()

Get the time quantum value for the Round Robin policy.

Not available

For threads, the scheduling priority is determined by the priority class of the process that they belong to and the priority level of the thread. Thread scheduling and priority is discussed in detail in the next section.

Thread Management

This section introduces the concept of threads. The following sections discuss the similarities and differences between UNIX and Windows APIs in managing threads:

Creating a Thread

Canceling a Thread

Synchronization of Threads

Thread Attributes

Thread Scheduling and Prioritizing

Managing Multiple Threads

I/O Completion Ports

A thread is an independent path of execution in a process that shares the address space, code, and global data of the process. Time slices are allocated to each thread based on priority. Threads consist of an independent set of registers, stack, I/O handles, and message queue.

Threads can usually run on separate processors on multiprocessor computers. Windows enables you to assign threads to a specific processor on a multiprocessor hardware platform.

An application using multiple processes usually has to implement some form of interprocess communication (IPC). This can result in significant overhead and, possibly, a communication bottleneck. In contrast, threads share the process data between them, and interthread communication can be much faster. The problem with threads sharing data is that this can lead to data access conflicts between multiple threads. You can address these conflicts by using synchronization techniques, such as semaphores and mutexes.

In UNIX, threads are implemented by using the POSIX pthread functions. In Windows, developers can implement UNIX threading by using the Windows API thread management functions. Although the functionality and operation of threads in UNIX and Windows is very similar, the function calls and syntax are very different.

The following are some similarities between UNIX and Windows in their management of threads:

Every thread must have an entry point. The name of the entry point is entirely up to you as long as the signature is unique and the linker can adequately resolve any ambiguity.

Each thread is passed a single parameter when it is created. The contents of this parameter are entirely up to the developer and have no meaning to the operating system.

A thread function must return a value.

A thread function needs to use local parameters and variables as much as possible. When you use global variables or shared resources, threads must use some form of synchronization to avoid potentially corrupting data.

This section discusses how you should go about converting UNIX threaded applications into Windows threaded applications. As a supplement to threads, Windows has a more primitive execution vehicle called a fiber. Windows fibers are used to provide full control over scheduling for special needs such as thread pooling, which is useful for certain server applications that manage worker threads for incoming requests.

For details on thread management functions in the Windows API, see the Windows API reference in Visual Studio .NET 2003 or MSDN.

Creating a Thread

When creating a thread in UNIX, use the pthread_create function. This function has three arguments: a pointer to a data structure that describes the thread, an argument specifying the attributes of the thread (usually set to NULL, indicating default settings), and the function that the thread will run. The thread finishes execution with a pthread_exit where, in this case, it returns a string. The process can wait for the thread to complete using the function pthread_join.

The following simple UNIX example creates a thread and waits for it to finish.

UNIX example: Creating a single thread

Note: Some of the lines in the following code have been displayed on multiple lines for better readability.

In Windows, threads are created using the CreateThread function, which requires:

The stack size of the thread.

The security attributes of the thread.

The address at which to begin execution of a procedure.

A pointer to a variable to be passed to the thread.

Flags that control the creation of the thread.

An address to store the system-wide unique thread identifier.

After a thread is created, the thread identifier can be used to manage the thread (like get and set the priority of thread) until it has terminated. The next example demonstrates how you should use the CreateThread function to create a single thread.

Windows example: Creating a single thread

Note: Some of the lines in the following code have been displayed on multiple lines for better readability.

Do notcast a function that returns void into a LPTHREAD_START_ROUTINE. The kernel raises a STATUS_REG_NAT_CONSUMPTION exception in the IA64 architecture.

Ifyou pass too few parameters to a function, even if the function is careful not to access that parameter until some other conditions are met, the compiler may find that it needs to spill the parameter, thereby raising the STATUS_REG_NAT_CONSUMPTION exception in the IA64 architecture.

The UNIX and Windows examples have roughly equivalent semantics. There are only two notable differences:

The thread function in the Windows code cannot return a string value. Developers must use some other means to convey the string message back to the parent (for example, returning an index into a string array).

The Windows version of the thread function just returns a DWORD value instead of calling a function to terminate the thread. ExitThread could have been called, but this is not necessary because ExitThread is called automatically upon the return from the thread procedure. TerminateThread could also be called, but this is neither necessary nor recommended. This is because TerminateThread causes the thread to exit unexpectedly. The thread then has no chance to execute any user-mode code, and its initial stack is not deallocated. Furthermore, any DLLs attached to the thread are not notified that the thread is terminating.

The two solutions have vastly different syntaxes. Windows uses a different set of API calls to manage threads. As a result, the relevant data elements and arguments are considerably different.

Canceling a Thread

The details of terminating threads differ significantly between UNIX and Windows. While both environments allow threads to block termination entirely, UNIX offers additional facilities that allow a thread to specify if it is to be terminated immediately or deferred until it reaches a safe recovery point. Moreover, UNIX provides a facility known as cancellation cleanup handlers, which a thread can push and pop from a stack that is invoked in a last-in-first-out (LIFO) order when the thread is terminated. These cleanup handlers are coded to clean up and restore any resources before the thread is actually terminated. The Windows API allows you to terminate a thread asynchronously. Unlike UNIX, in Windows code you cannot create cleanup handlers and it is not possible for a thread to defer from being terminated. Therefore, it is recommended that you design your code so that threads terminate by returning an exit code and so that threads cannot be terminated forcibly. To do this, you should design your thread code to accept some form of message or event to signal that the threads should be terminated.

Based on this notification, the thread logic can elect to execute cleanup handling code and return normally. To prevent a thread from being terminated, you should remove the security attributes for THREAD_TERMINATE from the thread object. Although forcing a thread to end by using the TerminateThread function is not recommended, for completeness, the following example shows how you could convert UNIX code that cancels a thread into Windows code that cancels a thread.

UNIX example: Canceling a thread

Note: Some of the lines in the following code have been displayed on multiple lines for better readability.

When you compare the UNIX and Windows examples, you can see that in the Windows implementation, the setting for the deferred termination is absent. TerminateThread is not immediate, and it is not predictable. The termination resulting from a TerminateThread call can occur at any point during the thread execution. In contrast, UNIX threads tagged as deferred can terminate when they reach a safe cancellation point.

If you need to match the UNIX behavior in your Windows application exactly, you must create your own cancellation code, thereby preventing the thread from being forcibly terminated.

Synchronization of Threads

UNIX and Windows provide mechanisms for controlling resource access. These mechanisms are referred to as synchronization techniques. In a multith--readed program, you must use synchronization objects whenever there is a possibility of conflict in accessing shared data or resources. For example, if your thread increments a global variable, you cannot predict the result because the variable may have been modified by another thread before or after the increment. The reason that you cannot predict the result is that the order in which threads have access to a shared resource is indeterminate.

The following example illustrates code that is, in principle, indeterminate.

Note This is a very simple example and on most computers the result would always be the same, but the important point to note is that this is not guaranteed.

The main thread in the following example is represented by the parent. It generates a “P,” and the child or secondary thread outputs a “T.” A UNIX example and a Windows example are shown as follows:

UNIX example: Multiple nonsynchronized threads

Note: Some of the lines in the following code have been displayed on multiple lines for better readability.

No actual synchronization between the main thread and child thread is performed; each thread prints different characters. The sequence of the characters printed to the output may be different in each execution.

Once again, it is not possible to predict the output from these examples. In most applications, unpredictable results are an undesirable feature. Consequently, it is important that you take great care in controlling access to shared resources in threaded code.

There are a variety of ways to coordinate multiple threads of execution. To synchronize access to a resource, use one of the synchronization objects in one of the wait functions.

The wait functions allow a thread to block its own execution. The wait functions do not return until the specified criteria have been met. A synchronization object is an object whose handle can be specified in one of the wait functions to coordinate the execution of multiple threads. More than one process can have a handle to the same synchronization object, making interprocess synchronization possible.

The next sections discuss the different synchronization techniques.

Synchronization with Interlocked Exchange

A simple form of synchronization is to use what is known as an interlocked exchange. An interlocked exchange performs a single operation that cannot be preempted.

The functions InterlockedExchange, InterlockedCompareExchange, InterlockedDecrement, InterlockedExchangeAdd, and InterlockedIncrement provide a simple mechanism for synchronizing access to a variable that is shared by multiple threads. The threads of different processes can use this mechanism if the variable is in shared memory. (InterlockedCompareExchange is discussed in the next section.)

The InterlockedExchange function atomically exchanges a pair of values. The function prevents more than one thread from using the same variable simultaneously.

The variable pointed to by the target parameter must be aligned on a 32-bit boundary. These functions fail on multiprocessor x86 systems and any non-x86 systems.

Because this is not the case in the example, the example has limited value; but it does illustrate the use of the InterlockedExchange functions.

Note The InterlockedExchange function should not be used on memory allocated with the PAGE_NOCACHE modifier.

The following example demonstrates the usage of InterlockedExchange for synchronizing the shared resource or global variable.

Windows example: Thread synchronization using interlocked exchange

Note: Some of the lines in the following code have been displayed on multiple lines for better readability.

The InterlockedExchange function generates a full memory barrier (or fence) and performs the exchange operation. This ensures the strict memory access ordering that is necessary, but it can decrease performance. To operate on 64-bit memory locations and values, use the InterlockedExchange64 function.

The variable pointed to by the Target parameter must be aligned on a 64-bit boundary.

Be cautious about the variable pointed to by the Target parameter and the Value parameter; they must be of the data type longlong.

Synchronization with Spinlocks

In the previous example, as noted, you still have no synchronization between the two threads. The output may still be out of order. One simple mechanism that offers synchronization is to implement a spinlock. To accomplish this, a variant of the Interlocked function called InterlockedCompareExchange is used.

The InterlockedCompareExchange function performs an atomic comparison of the specified values and exchanges the values, based on the outcome of the comparison. The function prevents more than one thread from using the same variable simultaneously. The InterlockedCompareExchange function performs an atomic comparison of the destination value with the comperand value. If the destination value is equal to the comperand value, the exchange value is stored in the address specified by destination, otherwise no operation is performed.

The variables for InterlockedCompareExchange must be aligned on a 32-bit boundary; otherwise, this function will fail on multiprocessor x86 systems and any non-x86 systems.

Note This function and all other functions of the InterlockedExchange and InterlockedExchange64 family should not be used on memory allocated with the PAGE_NOCACHE modifier because this may cause hardware faults on some processor architectures. To ensure ordering between reads and writes to PAGE_NOCACHE memory, use explicit memory barriers in your code.

Spinlocks work well for synchronizing access to a single object, but most applications are not this simple. Moreover, using spinlocks is not the most efficient means for controlling access to a shared resource. Running a While loop in the user mode while waiting for a global value to change wastes CPU cycles unnecessarily. A mechanism is needed that does not waste CPU time while waiting to access a shared resource.

When a thread requires access to a shared resource (for example, a shared memory object), it must either be notified or scheduled to resume execution. To accomplish this, a thread must call an operating system function, passing parameters to it that indicate what the thread is waiting for. If the operating system detects that the resource is available, the function returns and the thread resumes. If the resource is unavailable, the system places the thread in a wait state, making the thread nonschedulable. This prevents the thread from wasting any CPU time. When a thread is waiting, the system permits the exchange of information between the thread and the resource. The operating system tracks the resources that a thread needs and automatically resumes the thread when the resource becomes available. The execution of the thread is synchronized with the availability of the resource. Mechanisms that prevent the thread from wasting CPU time include:

Mutexes

Critical sections

Semaphores

Windows includes all three of these mechanisms, and UNIX provides both semaphores and mutexes. These three mechanisms are described in the following sections.

Synchronization Using Mutexes

A mutex is a kernel object that provides a thread with mutually exclusive access to a single resource. The state of a mutex object is set to signaled when it is not owned by any thread, and nonsignaled when it is owned. Only one thread at a time can own a mutex object, whose name comes from the fact that it is useful in coordinating mutually exclusive access to a shared resource.

Any thread of the calling process can specify the mutex-object handle in a call to one of the wait functions. The single-object wait functions return when the state of the specified object is signaled. When the state of the mutex is signaled, one waiting thread is granted ownership, the state of the mutex changes to nonsignaled, and the wait function returns. The owning thread uses the ReleaseMutex function to release its ownership.

The next example looks at the use of mutexes to coordinate access to a shared resource and to handshake between two threads.

UNIX example: Thread synchronization using mutexes

Note: Some of the lines in the following code have been displayed on multiple lines for better readability.

Synchronization with Critical Sections

Another mechanism for solving this simple scenario is to use a critical section. A critical section is similar to InterlockedExchange except that you have the ability to define the logic that takes place as an atomic operation.

Critical section objects provide synchronization similar to that provided by mutex objects, except that critical section objects can be used only by the threads of a single process. Critical section objects provide a slightly faster, more efficient mechanism for mutual-exclusion synchronization (a processor-specific test and set instruction) as compared to event, mutex, and semaphore objects, which can also be used in a single-process application. There is no guarantee about the order in which threads will obtain ownership of the critical section; however, the system will be fair to all threads. Unlike a mutex object, there is no way to tell whether a critical section has been abandoned.

The process is responsible for allocating the memory used by a critical section. Typically, this is done by just declaring a variable of type CRITICAL_SECTION. Before the threads of the process can use it, initialize the critical section and then request ownership of a critical section. If the critical section object is currently owned by another thread, the process waits indefinitely for ownership. In contrast, when a mutex object is used for mutual exclusion, the wait functions accept a specified time-out interval. The TryEnterCriticalSection function attempts to enter a critical section without blocking the calling thread.

A thread uses the InitializeCriticalSectionAndSpinCount or SetCriticalSectionSpinCount functions to specify a spin count for the critical section object. On single-processor systems, the spin count is ignored and the critical section spin count is set to 0. On multiprocessor systems, if the critical section is unavailable, the calling thread will spin dwSpinCount times before performing a wait operation on a semaphore associated with the critical section. If the critical section becomes free during the spin operation, the calling thread avoids the wait operation.

Any thread of the process can release the system resources that were allocated when the critical section object was initialized. After this function has been called, the critical section object can no longer be used for synchronization.

Windows example: Thread synchronization using critical sections

Note: Some of the lines in the following code have been displayed on multiple lines for better readability.

Synchronization Using Semaphores

A semaphore object is a synchronization object that maintains a count between zero and a specified maximum value. The count is decremented each time a thread completes a wait for the semaphore object and incremented each time a thread releases the semaphore. When the count reaches zero, no more threads can successfully wait for the semaphore object state to become signaled. The state of a semaphore is set to signaled when its count is greater than zero, and nonsignaled when its count is zero. The semaphore object is useful in controlling a shared resource that can support a limited number of users.

In the following examples, two threads are created that use a shared memory buffer. Access to the shared memory is synchronized using a semaphore. The primary thread (main) creates a semaphore object and uses this object to handshake with the secondary thread (thread_function). The primary thread instantiates the semaphore in a state that prevents the secondary thread from acquiring the semaphore while it is initiated.

After the user types in text at the console and presses ENTER, the primary thread relinquishes the semaphore. The secondary thread then acquires the semaphore and processes the shared memory area. At this point, the main thread is blocked waiting for the semaphore and will not resume until the secondary thread has relinquished control by calling ReleaseSemaphore.

In UNIX, the semaphore object functions of sem_post and sem_wait are all that are required to perform handshaking. With Windows, you must use a combination of WaitForSingleObject and ReleaseSemaphore in both the primary and the secondary threads in order to facilitate handshaking. The two solutions are also very different from a syntactic standpoint. The primary difference between their implementations is with the API calls that are used to manage the semaphore objects.

One aspect of CreateSemaphore that you need to be aware of is the last argument in its parameter list. This is a string parameter specifying the name of the semaphore. You should not pass a NULL for this parameter. All the kernel objects, including semaphores, are named. All kernel object names are stored in a common namespace except if it is a server running Microsoft Terminal Server, in which case there will also be a namespace for each session. If the namespace is global, one or more unassociated applications could attempt to use the same name for a semaphore. To avoid namespace contention, applications should use some unique naming convention. One solution would be to base your semaphore names on globally unique identifiers (GUIDs).

Terminal Server and Naming Semaphore Objects

As mentioned earlier, Terminal Server has multiple namespaces for kernel objects. There is one global namespace, which is used by kernel objects that are accessible by any and all client sessions and is usually populated by services. Additionally, each client session has its own namespace to prevent namespace collisions between multiple instances of the same application running in different sessions.

In addition to the session and global namespaces, Terminal Server also has a local namespace. By default, the named kernel objects of an application reside in the session namespace. It is possible, however, to override what namespace will be used. This is accomplished by prefixing the name with Global\ or Local\. These prefix names are reserved by Microsoft, are case-sensitive, and are ignored if the computer is not operating as a Terminal Server.

UNIX example: Synchronization using semaphores

Note: Some of the lines in the following code have been displayed on multiple lines for better readability.

Thread Attributes

There are a number of attributes associated with threads in UNIX that you need to convert to equivalent attributes in Windows. This section contrasts the UNIX and Windows thread attributes and describes how you should convert your code. Table 3.6 lists the relevant UNIX thread attributes.

Table 3.6. UNIX Thread Attributes

Attribute

Default values

Description

detachstate

PTHREAD_CREATE_JOINABLE

PTHREAD_CREATE_DETACHED

Thread may be joined by other threads.

Threads may not be waited on for termination.

inheritsched

PTHREAD_INHERIT_SCHED

PTHREAD_EXPLICIT_SCHED

Scheduling parameters, policy, and scope are inherited from creating thread.

Scheduling parameters for the newly created thread are specified in the thread attribute.

schedparam

-

Priority set to default for scheduling policy.

schedpolicy

SCHED_OTHER

SCHED_FIFO

SCHED_RR

Scheduling policy is determined by the system.

Threads are scheduled in a first-in-first-out order.

Threads are scheduled in a round-robin fashion.

Scope

PTHREAD_SCOPE_SYSTEM

PTHREAD_SCOPE_PROCESS

Threads are scheduled system-wide.

Threads are scheduled based on other threads in the owning process.

Stackaddr

N/A

Attribute not supported; address selected by the operating system.

Stacksize

0

Stack size inherited from process stack size attribute.

Detachstate indicates whether a thread can be waited on for termination. In Windows, the same effect is achieved by closing any handles that exist for a given thread. Because a handle is required for one of the wait and thread management functions, without a handle, you are effectively stopped from acting on a thread. You can also control thread objects based on a security descriptor that is provided at the time the thread is created.

The handle returned by the CreateThread function has THREAD_ALL_ACCESS access to the thread object. When you call the GetCurrentThread function, the system returns a pseudohandle with the maximum access that the security descriptor of the thread allows the caller.

The valid access rights for thread objects include the DELETE, READ_CONTROL, SYNCHRONIZE, WRITE_DAC, and WRITE_OWNER standard access rights, in addition to the thread-specific access rights listed in Table 3.7.

Table 3.7. Thread-Specific Access Rights

Value

Meaning

SYNCHRONIZE

A standard right required to wait for the thread to exit.

THREAD_ALL_ACCESS

Specifies all possible access rights for a thread object.

THREAD_DIRECT_IMPERSONATION

Required for a server thread that impersonates a client.

THREAD_GET_CONTEXT

Required to read the context of a thread by using GetThreadContext.

THREAD_IMPERSONATE

Required to directly use the security information of a thread without calling it using a communication mechanism that provides impersonation services.

THREAD_QUERY_INFORMATION

Required to read certain information from the thread object.

THREAD_SET_CONTEXT

Required to write the context of a thread.

THREAD_SET_INFORMATION

Required to set certain information in the thread object.

THREAD_SET_THREAD_TOKEN

Required to set the impersonation token for a thread.

THREAD_SUSPEND_RESUME

Required to suspend or resume a thread.

THREAD_TERMINATE

Required to terminate a thread.

Inheritsched/schedparam/schedpolicy/scope indicates that the scheduling is either inherited from the thread that created the new thread or that it is set explicitly. It also defines the policy and scope applied to scheduling threads. In Windows, by default, the priority of a thread is THREAD_PRIORITY_NORMAL.

You can use the GetThreadPriority function to determine the current priority of a thread and the SetThreadPriority function to change the priority of a thread.

Stacksize indicates the stack size applied to a thread at the time of its creation by using the CreateThread function. The initial size of the stack is specified in bytes. The system rounds this value to the nearest page. If this parameter has a zero value, the new thread uses the default stack size for the executable.

Setting Thread Attributes

This section presents a simple example of how the attributes of a thread can be set. The UNIX example makes some basic use of thread attributes. The corresponding Windows example does not need to use attributes to accomplish the same functionality. All that is required with Windows is to create a thread that cannot be acted upon by a wait. This can be accomplished by passing NULL as the dwThreadId parameter to the CreateThread function and by closing the handle that is returned by the call.

The net effect of these combined activities effectively hinders the capability of an application to manage the thread. This issue is addressed in the “Thread Scheduling and Prioritizing” section later in this chapter.

UNIX example: Setting thread attributes

Note: Some of the lines in the following code have been displayed on multiple lines for better readability.

Windows Security and Thread Objects

Threads are kernel objects that are protected by Windows security; therefore, a process must request permission before attempts are made to manipulate an object. The creator of the object can deny access to an unauthorized user.

Object flags are covered as part of the thread discussion here, but this information also pertains to other kernel objects that are obtained by using one of the Windows Create functions.

Until now, threads have been created in these solutions with a NULL security attribute. This indicates that the thread should be created using the default security and that the returned handle should be inheritable. If you want to change the behavior of the previous example to prevent the thread handle from being inherited or closed, you can use the SetHandleInformation function as follows:

Note: Some of the lines in the following code have been displayed on multiple lines for better readability.

To change both flags in a single call, you should join the flags by using a bitwise OR operator. After this call, attempting to close the handle by using the CloseHandle function results in an exception being raised.

Thread Scheduling and Prioritizing

This section looks at how you can change the scheduling priority of a thread in UNIX and Windows.

Ideally, you want to map Windows priority classes to UNIX scheduling policies and Windows thread priority levels to UNIX priority levels, but unfortunately, it is not this simple. The priority level of a Windows thread is determined by both the priority class of its process and its priority level. The priority class and priority level are combined to form the base priority of each thread.

Every thread in Windows has a base priority level determined by the priority value of the thread and the priority class of its owning process. The operating system uses the base priority level of all executable threads to determine which thread gets the next slice of CPU time. Threads are scheduled in a round-robin fashion at each priority level, and scheduling of threads at a lower level will only take place when there are no executable threads at a higher level.

UNIX offers both round-robin and FIFO scheduling algorithms, whereas Windows uses only round-robin. This does not mean that Windows is less flexible; it just means that any fine-tuning that was performed on thread scheduling in UNIX has to be implemented differently when using Windows. Table 3.8 lists the base priority levels for combinations of priority class and priority value for Windows.

Table 3.8. Process and Thread Priority for Windows

#

Process Priority Class

Thread Priority Level

1

IDLE_PRIORITY_CLASS

THREAD_PRIORITY_IDLE

1

BELOW_NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_IDLE

1

NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_IDLE

1

ABOVE_NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_IDLE

1

HIGH_PRIORITY_CLASS

THREAD_PRIORITY_IDLE

2

IDLE_PRIORITY_CLASS

THREAD_PRIORITY_LOWEST

3

IDLE_PRIORITY_CLASS

THREAD_PRIORITY_BELOW_NORMAL

4

IDLE_PRIORITY_CLASS

THREAD_PRIORITY_NORMAL

4

BELOW_NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_LOWEST

5

IDLE_PRIORITY_CLASS

THREAD_PRIORITY_ABOVE_NORMAL

5

BELOW_NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_BELOW_NORMAL

5

Background

NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_LOWEST

6

IDLE_PRIORITY_CLASS

THREAD_PRIORITY_HIGHEST

6

BELOW_NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_NORMAL

6

Background

NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_BELOW_NORMAL

7

BELOW_NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_ABOVE_NORMAL

7

Background

NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_NORMAL

7

Foreground NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_LOWEST

8

BELOW_NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_HIGHEST

8

NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_ABOVE_NORMAL

8

Foreground NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_BELOW_NORMAL

8

ABOVE_NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_LOWEST

9

NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_HIGHEST

9

Foreground NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_NORMAL

9

ABOVE_NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_BELOW_NORMAL

10

Foreground NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_ABOVE_NORMAL

10

ABOVE_NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_NORMAL

11

Foreground NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_HIGHEST

11

ABOVE_NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_ABOVE_NORMAL

11

HIGH_PRIORITY_CLASS

THREAD_PRIORITY_LOWEST

12

ABOVE_NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_HIGHEST

12

HIGH_PRIORITY_CLASS

THREAD_PRIORITY_BELOW_NORMAL

13

HIGH_PRIORITY_CLASS

THREAD_PRIORITY_NORMAL

14

HIGH_PRIORITY_CLASS

THREAD_PRIORITY_ABOVE_NORMAL

15

HIGH_PRIORITY_CLASS

THREAD_PRIORITY_HIGHEST

15

HIGH_PRIORITY_CLASS

THREAD_PRIORITY_TIME_CRITICAL

15

IDLE_PRIORITY_CLASS

THREAD_PRIORITY_TIME_CRITICAL

15

NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_TIME_CRITICAL

15

BELOW_NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_TIME_CRITICAL

15

ABOVE_NORMAL_PRIORITY_CLASS

THREAD_PRIORITY_TIME_CRITICAL

16

REALTIME_PRIORITY_CLASS

THREAD_PRIORITY_IDLE

17

REALTIME_PRIORITY_CLASS

-7

18

REALTIME_PRIORITY_CLASS

-6

19

REALTIME_PRIORITY_CLASS

-5

20

REALTIME_PRIORITY_CLASS

-4

21

REALTIME_PRIORITY_CLASS

-3

22

REALTIME_PRIORITY_CLASS

THREAD_PRIORITY_LOWEST

23

REALTIME_PRIORITY_CLASS

THREAD_PRIORITY_BELOW_NORMAL

24

REALTIME_PRIORITY_CLASS

THREAD_PRIORITY_NORMAL

25

REALTIME_PRIORITY_CLASS

THREAD_PRIORITY_ABOVE_NORMAL

26

REALTIME_PRIORITY_CLASS

THREAD_PRIORITY_HIGHEST

27

REALTIME_PRIORITY_CLASS

3

28

REALTIME_PRIORITY_CLASS

4

29

REALTIME_PRIORITY_CLASS

5

30

REALTIME_PRIORITY_CLASS

6

31

REALTIME_PRIORITY_CLASS

THREAD_PRIORITY_TIME_CRITICAL

Managing Thread Priorities in Windows

The Windows API provides a number of functions for managing thread priorities, including the following:

GetThreadContext. This function returns the execution context of the specified thread. The following is an example showing the thread context:

Note: Some of the lines in the following code have been displayed on multiple lines for better readability.

GetThreadPriority. This function returns the assigned thread priority level for the specified thread. To see how thread priority affects the system, a simple test, such as the one that follows, could be added to a simple Windows application:

Note: Some of the lines in the following code have been displayed on multiple lines for better readability.

GetThreadPriorityBoost. This function retrieves the priority boost control state of the specified thread. Threads have dynamic priority, which is the priority that the scheduler uses to identify which thread will execute. Initially, the priority of a thread is the same as its base priority, but the system may increase or decrease the priority to maintain thread responsiveness. Only threads with a priority between 0 and 15 are eligible for dynamic priority boost.

The system boosts the dynamic priority of a thread to enhance its responsiveness as follows:

When a process that uses NORMAL_PRIORITY_CLASS is brought to the foreground, the scheduler boosts the priority class of the process associated with the foreground window so that it is equal to or greater than the priority class of any background processes. The priority class returns to its original setting when the process is no longer in the foreground. In Microsoft Windows, the user can control the boosting of processes that use NORMAL_PRIORITY_CLASS through Control Panel.

When a window receives input, such as timer messages, mouse messages, or keyboard input, the scheduler boosts the priority of the thread that owns the window.

When the wait conditions for a blocked thread are satisfied, the scheduler boosts the priority of the thread. For example, when a wait operation associated with disk or keyboard I/O finishes, the thread receives a priority boost.

SetThreadIdealProcessor. This function specifies the preferred processor for a specific thread. The system schedules threads on the preferred processor when possible.

SetThreadPriority. This function changes the priority level for a thread. For details on the different priority levels, see the Windows API reference.

SetThreadPriorityBoost. This function enables or disables dynamic priority boosts by the system.

Example of Converting UNIX Thread Scheduling into Windows

In this example, the thread priority level is set to the lowest level within the given policy or class for UNIX and Windows respectively. For UNIX, lowering the thread priority level requires creating an attribute object prior to instantiating the thread and then setting the policy of the attribute object. After this activity is complete, the thread is created with the modified attribute. Upon successful instantiation of the thread, the priority level is adjusted to the lowest level within the designated policy and class. In UNIX, this is accomplished by a call to pthread_attr_setschedparam. When using the Windows API, it is accomplished by a call to SetThreadPriority.

UNIX example: Thread scheduling

Note: Some of the lines in the following code have been displayed on multiple lines for better readability.

In the preceding Windows example, the priority level of the thread is adjusted to the lowest level within the priority class of the owning process. If you want to change the priority class as well as the priority level, insert the following code just before the SetThreadPriority call:

SetPriorityClass(GetCurrentProcess(), PriorityClass)

Where, PriorityClass will be one of the values shown in Table 3.9.

Table 3.9 summarizes how to change the scheduling priority for a thread and priority class for the owning process.

Specify this class for a process that performs time-critical tasks that must be executed immediately. The threads of the process preempt the threads of normal or idle priority-class processes. An example is the Task List, which must respond quickly when called by the user, regardless of the load on the operating system. Use extreme care when using the high-priority class because a high-priority class application can use nearly all available CPU time.

IDLE_PRIORITY_CLASS

Specify this class for a process whose threads run only when the system is idle. The threads of the process are preempted by the threads of any process running in a higher priority class. An example is a screen saver. The idle-priority class is inherited by child processes.

NORMAL_PRIORITY_CLASS

Specify this class for a process with no special scheduling needs.

REALTIME_PRIORITY_CLASS

Specify this class for a process that has the highest possible priority. The threads of the process preempt the threads of all other processes, including the operating system processes, which may be performing important tasks. For example, a real-time process that executes for more than a very brief interval can prevent disk caches from flushing or can cause the mouse to be unresponsive.

Managing Multiple Threads

In the next two examples, numerous threads are created that terminate at random times. Their termination and display messages are then caught to indicate their termination status. Although these examples are contrived, they do illustrate one key point— the semantics of creating multiple threads and waiting for their completion are similar on both platforms.

UNIX example: Multiple threads

Note: Some of the lines in the following code have been displayed on multiple lines for better readability.

I/O Completion Ports

There should always be enough live threads to fully use the available CPUs, but there should never be so many threads that the overhead becomes too large. Multiplexing a large number of clients across a smaller number of live threads is difficult for an application to do. The application cannot always know when a given thread is going to block; and without this knowledge, it cannot activate another thread to take its place. To solve this problem and make it easy for programmers to write efficient and scalable applications, Windows provides a mechanism called the I/O completion port.

An I/O completion port is designed for use with overlapped I/O. A completion port is created with the CreateIoCompletionPort function.

The CreateIoCompletionPort function associates the port with multiple file handles. When asynchronous I/O initiated on any of these file handles completes, an I/O completion packet is queued to the port. This combines the synchronization point for multiple file handles into a single object. If each file handle represents a connection to a client (usually through a named pipe or socket), a handful of threads can manage I/O for any number of clients by waiting on the I/O completion port. Instead of directly waiting for overlapped I/O to complete, these threads use GetQueuedCompletionStatus to wait on the I/O completion port. Any thread that waits on a completion port becomes associated with that port. The Windows kernel keeps track of the threads associated with an I/O completion port.

The WaitForMultipleObjects function can produce similar behavior, but the most important property of I/O completion ports is the controllable concurrency they provide. The concurrency value of an I/O completion port is specified when it is created. This value limits the number of runnable threads associated with the port; after the number of runnable threads exceeds the concurrency value, the rest of the threads are blocked. As a result, when a thread calls the GetQueuedCompletionStatus function, it only returns when a completed I/O is available and the number of runnable threads associated with the completion port is less than the concurrency of the port. Because there is one central synchronization point for all the I/O, a small pool of worker threads can service many clients.

Unlike the other Windows synchronization objects, threads that block on an I/O completion port, by using GetQueuedCompletionStatus(), unblock in last-in-first-out (LIFO) order.

A dozen threads can easily service a large set of clients by thread pooling, although this will vary depending on how often each transaction needs to wait.