Dr. Dobb's Bloggers

Implementing Thread Local Storage on OS X

This global data is, well, global and is accessible to any thread started up by that program. This is called implicit sharing. But since there is no inherent synchronization to accessing it, all kinds of inadvertent threading problems can arise from using it.

One way to resolve this is for each thread to have its own distinct and separate copy of the data. This is called TLS (Thread Local Storage). In C, TLS is declared in different ways depending upon your particular compiler and system. For example, under Windows it is typical to see:

__declspec(thread) int x = 3;

and on Linux:

__thread int x = 3;

There's special support for TLS wired into the compiler, linker and operating system to handle the allocation, initialization and access to TLS so your source code can refer to x just like any other global variable. The benchmarks I've run on Windows and Linux show that TLS access is not significantly different from a performance standpoint than regular global data (although the code generated is different, and varies from system to system).

Version 2 of the D programming language reverses this idiom and makes TLS the default for global data:

int x = 3; // x is thread local

and implicitly shared global data has to be declared specially:

shared int x = 3; // x is shared between threads

The idea is that in order to support multithreaded programming properly, implicitly shared data should require extra programming effort. But that's a topic for another time, this article is about implementing TLS.

D is meant to work hand-in-glove with the local C compiler on the same system, so that means D implements TLS in the same manner, and is completely compatible with, TLS on the C compiler. The problem, though, crops up on the Mac OS X, where:

__thread int x = 3;

gives the error message from gcc:

error: thread-local storage not supported for this target

Uh-oh, weez in trouble. We need to roll our own.

The general idea is that every thread, when created, gets its own copy of a block of data that is statically initialized at program start. This original block is read only and is not normally accessible by program code.

The first step is to emit all the thread local data into special segments, which will form this original block. I picked the names __tls_data (for initialized data) and __tls_coal_nt (for "coalesced" data) more or less at random. Because the runtime needs to find the beginning and end of the TLS data, these are bracketed by the segments __tls_beg which contains a declaration of a void*-sized _tls_beg, and __tls_end similarly containing a declaration of a void*-sized _tls_end. We can refer to them in D code with the declarations:

extern (C) { extern __gshared { void* _tls_beg; void* _tls_end; } }

(The __gshared storage class exists to tell the compiler this is shared data but not to type it as shared - let the programmer handle all the details.)

The scope(exit) statement makes sure to free the TLS data when the thread ends

(There's a little more falderol than that, involving the TLS constructors and destructors, handling exceptions, etc., that's beyond the scope of this article.)

Accessing the TLS now involves a runtime computation to figure out where the TLS for the current thread is, its offset from where the original block is, and adding that offset to the static address of the tls variable. Thus, a reference to x isrewritten as:

x => *__tls_get_addr(&x)

The compiler is done. (Weeell, the compiler does note that __tls_get_addr() always returns the same value for the same argument, so it is smart enough to common subexpression references to x.) For runtime, implement the function as:

All this function does is get the thread object (obj) for the current thread, look up where the TLS data is for that thread, and do a little arithmetic to offset and return the pointer.

Voila! It's done, and works like a champ. Almost; my benchmarks show it to be 10 times slower than a simple access to a shared global. (The benchmark is just a simple loop repeatedly reading a TLS variable.) I don't think there's much to be done about it, because the whole reason that Windows and Linux have TLS support wired into the operating system is to make it fast. If OS X eventually does support TLS, we'll have to scrap our version and switch, but that's the life of the language implementer. The important thing is that D programs using TLS are completely source portable between Windows, Linux and now OS X.

Thanks to Jason House and Bartosz Milewski for their helpful suggestions on thisarticle.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!