Thread Scheduling for
Multiprogrammed Multiprocessors

Robert D. Blumofe

The University of Texas at Austin

I will present an efficient user-level thread scheduler for shared-memory
multiprocessors and an analysis of its performance under multiprogramming.
This scheduler is a non-blocking implementation of the work-stealing algorithm.
Idle processes (kernel threads) steal (user-level) threads from randomly
chosen victims, and all concurrent data structures are implemented with
non-blocking synchronization. Without any need for special kernel-level
resource management, such as coscheduling or process control, this non-blocking
work stealer efficiently utilizes whatever processor resources are provided
by the kernel.

We demonstrate this efficiency with an algorithmic analysis and an empirical
analysis. For our algorithmic analysis, we assume that the kernel is an
adversary, and we prove that the execution time is optimal to within a
constant factor. We have implemented the non-blocking work stealer in Hood:
a C++ threads library built on top of Solaris pthreads, and we have studied
its performance. This study shows that application performance does conform
to the theoretical bound with a very small constant factor, roughly 1.
Applications efficiently utilize processor resources even when the number
of processes exceeds the number of processors and even when the number
of processors grows and shrinks arbitrarily.

This work has been done in collaboration with Nimar Arora, Dionisios
Papadopoulos, and Greg Plaxton of The University of Texas at Austin.