> OK. But that doesn't really address my initial point: implementing
> call_once in terms of the POSIX pthread_once is a bit pointless if
> you need to implement pthread_once separately anyway. If you don't
> have pthread_once, I would write call_once (or, equivalently
> pthread_once2_np), and implement pthread_once in terms of that.

This is tied to another issue, whether the implementation should be
header-only. If one is committed to produce a header-only implementation, it
makes sense to reuse your current call_once in the implementation of
pthread_once (and possibly pthread_once2_np).

The N2178 pthread.h aims to support the scenario where one is to deliver a
separately compiled pthread-win32.lib/dll (as part of the Platform SDK, say
;-) ). This is why I've tried to make sure that the C++ templated layer is
implementable without loss of performance or generality in terms of a
non-templated API (which happens to be a slightly extended pthreads layer).
With careful definitions of the pthread_* types, it's possible to achieve
binary compatibility as well, that is, link the program against a newer
version of pthread-win32.lib/dll and have it work without recompilation.
This is not possible with header-only templates or inline functions, because
they are burned into the application code at compile time.