In the other direction we remember the following points
in the job execution:

- 'started' - set when the job is picked up by a worker thread
- 'executed' - set once the job function returned.
- 'finished' - set when pthreadpool_tevent_job_signal() is entered
- 'dropped' - set when pthreadpool_tevent_job_signal() leaves with orphaned
- 'signaled' - set when pthreadpool_tevent_job_signal() leaves normal

There're only one side writing each element,
either the main process or the job thread.

This means we can do the coordination with a full memory
barrier using atomic_thread_fence(memory_order_seq_cst).
lib/replace provides fallbacks if C11 stdatomic.h is not available.

A real pthreadpool requires pthread and atomic_thread_fence() (or an
replacement) to be available, otherwise we only have pthreadpool_sync.c.
But this should not make a real difference, as at least
__sync_synchronize() is availabe since 2005 in gcc.
We also require __thread which is available since 2002.