What I can say regarding compilation is:
* indeed gcc is significantly slower than clang, which is very unusual (normally it's the other way around). I discovered a gcc bug recently, perhaps it's the same. ...

No bug. I think it's LTO, more precisely cross-module inlining.

Here's what happened. During the development of my draughts program Scan, I discovered Clang's -flto. For the first time ever, I could move "critical" code from the include files to where they belong: with related functions. And not lose performance, or very little (at least with Clang/LLVM). Having code in the include files also triggers a lot of useless recompilation during development. By contrast, the cost of LTO is slower linking; something I can live with. During the release, I moved a selected set back to the include files. It took some time in itself, and is not fun at all. I also couldn't do profiling on my machine, so a lot of guesswork was needed. At least, I still remembered which functions used to be there.

But since then, I've been used to not thinking about it anymore. After all it's the compiler's job to inline functions, regardless of their placement in the source code. The Senpai 2 release was already taking a lot of time and, among other things, I skipped the "manual inlining" step.

GCC most likely has vastly inferior inlining ability than LLVM, perhaps due to its design at a time when separate compilation was a must. Thinking about it, the Windows compilers might have similar shortcomings. So it's not a bug, and it's also partly my fault. The question is, do we do something about it?

Totally agree with you regarding inlining. I never use the word "inline" in my code. I simply declare (externally linked) functions in headers, and implement them in source files, even the most trivial functions. And I rely on LTO to make the magic happen (code generating is done while linking, allowing cross module optimization, including inlining). But I use GCC and it works perfectly. Last I checked, my code compiles to a slower executable with Clang compared to GCC.

But then my code is in C, yours is in C++. And with C++ (at least the way you use it) come lots of do nothing wrappers, making performance more sensitive to LTO inlining capacity. Perhaps that's why._________________Theory and practice sometimes clash. And when that happens, theory loses. Every single time.