I'm curious about the difference between OpenMP and pthreads. I know
there is a bit of overthread in OpenMP but I didn't realize it would
be so meaningful. This will of course vary between architectures and
compilers,

My experience in a nutshell: OpenMP is elegant, but immature and less
available compared to pthreads.

At my company, we started using OpenMP but after a while we scrapped it
in favor of pthreads because of

a) availability across platforms
b) predictability across platforms

c) major shortcoming uncovered in intel's OpenMP -- its critical
section locks are implemented as spinlocks