Instead of hashing every byte of an integral type individually,
we leverage qHash and boost::hash_combine to create an overall
hash. The result should be just as good, as it's proven technology,
but the hash functions themselves become much faster. This is
demonstrated by the newly added benchmark.

qHash for integral types is just a cast to uint, maybe would be clearer to do that directly.

Often you end up having to balance the cost of hashing vs the cost of collisions, so a cheaper hash isn't in and of itself a win.

My benchmarking was focused on reducing collisions, so in retrospect it's likely the current implementation is sacrificing real performance. Possibly we shouldn't be taking byte-sized chunks of /any/ data type size (though then padding doesn't help the hash much).

OTOH, benching the hash seems like an exercise in futility: the best hash by this measurement will be "return 0". The cost of any other hash is pure waste unless it's improving the overall performance of the system via reduced collisions. Changes to the hash should then be ignoring these benchmarks and testing if real insertion and lookup patterns are improved... unfortunately benchmarking that is quite a bit more work.

I looked at std::hash, and that one also uses the value directly. I rather use qHash or std::hash as it documents clearly what I want it to be, and if they ever change the implementation it's automagically picked up by us then as well.

And I looked at this code as I've seen it as a hotspots a couple of times. A hash function should be fast, but of course you are right in saying that the collision reduction is even more important. But can you explain why using the integral value directly would decrease the randomness (or however one calls it) of the hash function? Especially looking at our current code:

Then, when we keep in mind that DEFAULT_SEED is 2166136261u, we are extremely close to what boost::hash_combine does internally:

seed ^= hash_value(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);

Thus I propose I refactor KDev::hash to operate on boost::hash_combine until we get http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3876.pdf . I cannot think that the boost code is worse than the KDevHash code, quite the contrary.

When we do follow this approach, I also want to remove the implicit hash function that converts any T to const char* and hashes that. Rather, I want people (i.e. me - in the first step) to write custom qHash or std::hash functions for their types.

I agree that the implementation should be changed to use the much cheaper combination. There's no value in magic that happened to work in the few mediocre benchmarks I did, where I never benchmarked /performance/, only collisions.

The combination is all that really matters, and that's where the pre-kdevhash code was really failing, and boost::hash_combine is obviously an improvement to kdevhash. (IOW, we are effectively doing the same as the no-op std::hash/qhash to get bits from types, but with expensive 8-bit combination instead of uint combination)

So +2 to not even bothering with the sfinae stuff and just punting to std::hash (or qHash) + hash_combine.

Side note: I doubt that kdevhash is used for non-integral types at all (yes, that means the current implementation is /more/ braindead than you think). The whole point of its existence is to act as a nice API so that you don't shoot yourself in the foot writing a custom hash implementation for your type. We should just turn it into a nice wrapper for boost::combine_hash and be done with it.

Hashing always uses random (as in https://xkcd.com/221/) magic numbers, tweaked by trial and error, there are no studies showing the golden ratio (boost::hash_combine) has useful properties for hashing, it's simply not obviously bad.
As noted in the comment, this used to be one-at-a-time starting with the fnv prime: slow since it's byte-by-byte, dumb since it's hashing integrals, but not at all unproven.