I
use hwloc-1.4.1 stable on Red Hat 5 and am seeing a possible
concurrency issue not covered by the "Thread Safety" guidelines:

- I start a small number (4) of threads, each of which does
some work and periodically executes
hwloc_get_last_cpu_location() with HWLOC_CPUBIND_PROCESS

- occasionally, one or two of those threads will see the call
fail with ENOSYS (even though the same call has already executed
successfully a number of times)

These errors are transient and seem to occur only when some
of the threads in the group are terminating. I've skimmed
through the implementation in topology-linux.c and it seems
plausible to me that the errors could be caused by failure to
read /proc state "atomically" in the presence of concurrent
thread starts/exits.

Of course, the latter is hard (impossible ?) to do because
the state always changes and a snapshot can only be obtained
with a single read() (which in turn would require knowing how
many thread entries to expect in advance). However, returning
ENOSYS in such cases does not seems intended but rather a flaw
in retry logic. Similar issues may be present with other API
methods that rely on hwloc_linux_foreach_proc_tid() orhwloc_linux_get_proc_tids().

Can you try the attached patch? It doesn't abort the loop
immediately on per-tid errors anymore. This may work better when
threads disappear. I don't remember if the retry logic was written
while thinking about adding threads only or about adding and
removing threads.

If the patch doesn't help, can you send your code to help debug
things?

Will try this within a day or two. At the moment I am simply using a retry loop on ENOSYS and usually no more than one retry is needed.