Now suppose you want to run some preparation code once per thread – say, SetThreadName, or SetThreadPriority or whatnot. How would you go about that? If you code it before the loop the code would execute once, and if you code it inside the loop it would be executed 1000000 times.

You can set a breakpoint in ThreadInit::ThreadInit() and watch it being executed exactly once in each thread. You can also enjoy the view at the threads window, as all OMP threads now have name and priority adjusted.

[Edit:] Improvement

The original object created before entering the parallel region – is just a dummy meant for duplication, but still runs its own constructor and destructor. This might be benign, as in the example above (redundant set of thread properties), but in my real life situation I used the ThreadInit object to create a thread-specific heap – and an extra heap is not an acceptable overhead.

Here’s another trick: as the spec says, a default constructor is called for the object copies in the parallel region. Just create the dummy object with a non-default constructor, and make sure the real action happens only in the default one. Here’s one way to do so (you can also code different ctors altogether):

5 Responses to Executing Code Once Per Thread in an OpenMP Loop

I experimented with direct TLS storage (“declspec(thread)”), which I believe thread_specific_ptr wraps. It is more cumbersome and does not directly execute code once per thread – you’d need to do something like atomically test the value of the TLS slot, act on it and update it *in every loop iteration* – which is exactly what I’m trying to avoid. Do you see another usage for boost::thread_specific_ptr here?