You are in a maze of twisty little passages, all alike… (a GHC hacking post)

About a month ago I decided that it would be cool if I could solve the bug GHC's runtime never terminates unused worker threads. Well, I just got around to looking at it today, and after wandering aimlessly around the twisty maze that is the GHC RTS for an hour or so, I finally found a light at the end of a tunnel, in the form of a heart-warmingly simple patch. I’ve sent mail off to Simon Marlow to make sure the light isn’t actually a train, but it occurred to me that it would be interesting to look at my command history and blog about the process by which I came to the conclusion that line 464 of Capability.c was the correct place to add my change, since this sort of mental journey is not the one that is really ever recorded anywhere in any shape or form.

Warmups before running the maze. In a shifty shifty maze like GHC, you want to make sure the guided route (i.e. a clean build) is working before trying anything fancy. I use a separate build tree from source tree, so getting everything up to date involves:

When this has been resolved in a satisfactory manner (a non-trivial task for platforms with Windows), the code hunting can begin.

Grab your equipment. What? You mean to say you’ve wandered into this maze and you don’t even know how to tell you’ve gotten to your destination? That’s no good... you’ll need a dousing rod of some sort... something to tell you when you’ve got it right.

In this particular case, the original bug reporter had written up a small, incomplete test script, so the first thing I did was flesh it out into a script that required no human interaction. The benchmark for the new script was clear: /proc/PID/task should report a number substantially smaller than 200. To see that the current implementation is broken:

Getting your bearings. Ok, so what do we want? We want threads to die instead of hanging around. There are two ways to do this: have the thread commit seppuku when it realizes it isn’t wanted, or have some manager kill the thread as necessary. The later is generally considered poor form, since you want to make sure the threads aren’t doing anything critical that will get corrupted if they die. So seppuku it is. Here, now, there are two questions:

When does the thread decide to go into a waiting pool? This is presumably where we’d want it to terminate itself instead.

How would the thread decide whether or not it should hang around or bug out?

Mapping out the land. GHC has this little runtime flag called -Ds. It’s pretty useful: it dumps out a whole gaggle of debug information concerning threads, which is precisely what we’d like to look for. Our plan of action is to look at what the thread activity looks like in our test script, and identify the points at which threads should be dying instead of hanging around.

Note the number b75006d0; that’s our main thread and it’s going to be quite a busy beaver. Here is the very first thread we spin off to make a foreign call, but it finishes fairly quickly and isn’t the foreign call we are looking for:

The thread stops, but it doesn’t die, it just gives up the capability. These are two extremely good candidates for where the thread might alternately decide to kill itself.

Placemarkers. It’s time to bust out the trusty old grep and figure out where these debug messages are being emitted from. Unfortunately, 5 and finished are probably dynamically generated messages, so stopped is the only real identifier. Fortunately, that’s specific enough for me to find the right line in the RTS:

ezyang@javelin:~/Dev/ghc-clean/rts$ grep -R stopped .
./Capability.c: // list of this Capability. A worker can mark itself as stopped,
./Capability.c: if (!isBoundTask(task) && !task->stopped) {
./RaiseAsync.c: - all the other threads in the system are stopped (eg. during GC).
./RaiseAsync.c: // if we got here, then we stopped at stop_here
./Task.c: if (task->stopped) {
./Task.c: task->stopped = rtsFalse;
./Task.c: task->stopped = rtsFalse;
./Task.c: task->stopped = rtsTrue;
./Task.c: task->stopped = rtsTrue;
./Task.c: debugBelch("task %p is %s, ", taskId(task), task->stopped ? "stopped" : "alive");
./Task.c: if (!task->stopped) {
./sm/GC.c: // The other threads are now stopped. We might recurse back to
./Schedule.c: "--<< thread %ld (%s) stopped: requesting a large block (size %ld)\n",
./Schedule.c: "--<< thread %ld (%s) stopped to switch evaluators",
./Schedule.c: // stopped. We need to stop all Haskell threads, including
./Trace.c: debugBelch("cap %d: thread %lu stopped (%s)\n", ### THIS IS THE ONE
./Task.h: rtsBool stopped; // this task has stopped or exited Haskell
./Task.h:// Notify the task manager that a task has stopped. This is used
./Task.h:// Put the task back on the free list, mark it stopped. Used by
./Interpreter.c: // already stopped at just now
./Interpreter.c: // record that this thread is not stopped at a breakpoint anymore
./win32/Ticker.c: // it still hasn't stopped.

That line in Trace.c is actually in a generic debugging function traceSchedEvent_stderr, but fortunately there’s a big case statement on one of its arguments tag:

Going digging. We first have to pick which site to inspect more closely. Fortunately, we notice that the second trace event corresponds to suspending the thread before going into a safe FFI call; that's certainly not what we're looking at here. Furthermore, the first is in the scheduler, which makes a lot of sense. But there’s nothing obvious in this vicinity that you might associate with saving a worker task away due to lack of work.

What about that giving up capability message? Some more grepping reveals it to be in the yieldCapability function (like one might expect). If we then trace backwards calls to yieldCapability, we see it is invoked by scheduleYield, which is in turn called by the scheduler loop:

scheduleYield(&cap,task);
if (emptyRunQueue(cap)) continue; // look for work again
// Get a thread to run
t = popRunQueue(cap);

This is very, very interesting. It suggests that the capability itself will tell us whether or not the work to do, and that yieldCapability is a promising function to look further into:

debugTrace(DEBUG_sched, "giving up capability %d", cap->no);
// We must now release the capability and wait to be woken up
// again.
task->wakeup = rtsFalse;
releaseCapabilityAndQueueWorker(cap);

That last call looks intriguing:

static void
releaseCapabilityAndQueueWorker (Capability* cap USED_IF_THREADS)
{
Task *task;
ACQUIRE_LOCK(&cap->lock);
task = cap->running_task;
// If the current task is a worker, save it on the spare_workers
// list of this Capability. A worker can mark itself as stopped,
// in which case it is not replaced on the spare_worker queue.
// This happens when the system is shutting down (see
// Schedule.c:workerStart()).
if (!isBoundTask(task) && !task->stopped) {
task->next = cap->spare_workers;
cap->spare_workers = task;
}
// Bound tasks just float around attached to their TSOs.
releaseCapability_(cap,rtsFalse);
RELEASE_LOCK(&cap->lock);
}

We’ve found it!

Checking the area. The spare_workers queue looks like the queue in which worker threads without anything to do go to chill out. We should verify that this is the case:

Writing up the solution. So, the patch from here is simple, since we’ve found the correct location. We check if the queue of spare workers is at some number, and if it is, instead of saving ourselves to the queue we just cleanup and then kill ourselves:

Postscript. There are some obvious deficiencies with this proof-of-concept. It’s not portable. We need to convince ourselves that this truly does all of the cleanup that the RTS expects a worker to do. Maybe our data representation could be more efficient (we certainly don’t need a linked list if the number of values we’ll be storing is fixed.) But these are questions best answered by someone who knows the RTS better, so at this point I sent in the proof of concept for further review. Fingers crossed!