So, I wanted a nice place to tear down TLS data, and found (yet another) piece of Win32 that isn’t very nicely designed.

Whilst the (newer) fiber-local storage allows you to specify a clean-up callback, the same isn’t true of TLS. So if you want to write a library that uses TLS behind the scenes you have an annoying problem. Either leak the TLS data whenever the thread exits, or require the library user to explicitly call a clean-up function.

One solution is to package your code as a DLL. DLLs get notified when each thread exits, providing a neat place to tear down the TLS data. But sometimes, I want a static .lib instead of a DLL. For installer-less programs, I typically want everything statically compiled into a single .exe.

There are also situations where I’m writing a program (not a library) but don’t have any particular control over threads; for example, when using thread pooling. Yes, my work items could each tear down the TLS data, but that’s not necessarily what I want. Thread-specific caches (for memory allocators, say) shouldn’t be torn down at the end of each work item: they should be allowed to exist right up until the thread pool ends the thread.

Neither of these is particularly appealing. However, it turns out that the PE file format has a solution. PEs contain a .tls section. This section contains a set of pointers to callback functions. The callback functions are essentially equivalent to the DllMain callbacks.

So, to get automatic cleaning of TLS, we just need to put a suitable callback in the .tls section of the executable.

The difficult bit is to get the linker to put our function pointers in to the .tls section. The VC++ C Runtime actually knows about the .tls section (even though it doesn’t generally use it), and so we can get our callbacks registered by putting function pointers in any section .CRT$XLx (where x is A-Z).

Something that’s irked me throughout this GCD project is just how bad Microsoft’s documentation is in a lot of places.

On the whole, I quite like MS’s documentation. The split between the “conceptual” and the reference material, and the abundance of sample code, these are all good things. A lot of it is pretty detailed, too.

But then, you also have documentation like this. Great. It sets it to run on a persistent thread. I could have guessed that from the function name. What I would like to know is, what are the implications of this? Why is this something I care about? What makes it useful?

Documentation that merely reiterates the function name is not proper documentation.

I am finding that this style of documentation is increasingly common, especially in Microsoft’s newer APIs. The older stuff is generally much richer. Not always as good as it should be—I still find the overlapped I/O docs awful, for example—but much better. Most importantly, there’s at least an effort to explain what a function call does that goes above and beyond the name of the damn thing.

We see the same thing in quite a bit of .NET documentation, too. I was recently looking at the Visual Studio 2010 SDK documentation (used for writing add-ins), and came across the same thing: documentation that repeats the function or class name, but adds no value.

I realize that documentation isn’t sexy, and that developers don’t like writing it, but that’s not really good enough for a company like Microsoft.

The essence of GCD is pretty simple, but there are enough wrinkles (target queues and suspension) that a naive implementation won’t work. I can’t just dump things into the pool and forget about them, so instead I have to maintain my own queues of work items and plop them into the pool myself.

Apple’s implementation has the same issue; behind the scenes it uses Apple’s proprietary “pthread_workqueues”, but it layers its own queuing on top.

It’s an interesting experience. One thing I’ve found is that for all the noise that people made about GCD when Snow Leopard was in beta, there’s vanishingly little content about it on the Web. Either nobody is actually using it (which is a pity, because it enables quite a nice coding style for GUI applications), their uses are all extremely simple (such that they only use a relatively small subset of its features), or they’re just not talking about it.

The latter seems a bit strange, because some of its features aren’t entirely obvious. The utility of the ability to set target queues, in particular, is not at all clear.

I wonder if perhaps it’s not useful as such, but rather a feature they added for their NSOperationQueue stuff, which layers on top of GCD to provide some richer capabilities. NSOperationQueue is used to run NSOperation objects, and those objects can have dependencies on other NSOperations. This allows a DAG of NSOperations to be constructed. My speculation is that perhaps it constructs an equivalent DAG of GCD queues, and this is why GCD has the targeting capability. But I don’t really know for sure.

The big sticking point with any implementation will be dispatch sources. Windows has no direct equivalent (its asynchronous/overlapped I/O mechanism works in a fundamentally different way), so it’s going to require some thinking. It would be nice to be able to support dispatch sources for file/socket I/O, and something similar should be feasible. But that will have to wait until I have the fundamentals done.