3rd Party Code

Works by maintaining the xxHash32 state variables (v1 -> v4, length) as well as a queue of values that fall outside of the block size (16 bytes/4 ints). The seed is initialized to random bytes.
Further details, unit tests and history: dotnet/corefx#25013

This comment has been minimized.

This comment has been minimized.

The method is just a bit too big to fit into the jit's "always inline" category, and the inliner can't easily anticipate that the IL sequence in this method is going to reduce to something that nicely fits a machine instruction. So the jit falls back to its profitability heuristics and doesn't see much else of merit, and so it decides not to inline.

The jit's inliner is less aggressive than inliners in C/C++ compilers, as it needs to balance jit time and code size growth vs optimization potential. Heuristics like those the inliner uses are often mistaken on specific cases; the hope is that they are generally decent on average.

Currently there's no easy automatic way for the jit to know that a particular inline is more important than others. Hence the aggressive inlining attribute exists, though best used sparingly.

This comment has been minimized.

edited

@jkotas after quite a bit of investigation about whether or not to inline the public-facing methods I've found that they shouldn't be. Changes are staged and ready to hit this PR assuming that you agree.

This has lead me down a rabit hole of a missed value range optimization opportunity in RyuJIT (the gist in the review comment). I'm honestly not sure if I should log an item for RyuJIT - thoughts?

Pasted rant to save you a click:

I've noticed that jumps are still present in the Add() code. They shouldn't be: there is enough information to eliminate them entirely. These results might look very different if RyuJIT is improved - possibly exactly like the manually optimized Combine() code. Changing _length & 0x3 to _length % 3 doesn't change the code (so it's not bit twiddling sorcery tripping up the jitter).

If hc were a field on a heap object I'd totally understand why no optimization can occur (another thread might access it); however, in this case it is absolutely known that all the code exists 100% in the stack (apart from a static readonly read). At the very least, the jumps could be eliminated. I'm guessing some constraint must be loosened?

This comment has been minimized.

This comment has been minimized.

If hc were a field on a heap object I'd totally understand why no optimization can occur (another thread might access it); however, in this case it is absolutely known that all the code exists 100% in the stack

Yeah but ToHashCode is not inlined and that means that the address of the local variable hc is taken. This tends to block optimizations.

This comment has been minimized.

edited

Yeah but ToHashCode is not inlined and that means that the address of the local variable hc is taken. This tends to block optimizations.

The address of hc is on the stack, though, it's a value type. I tested that theory and the size did come down, but only to 1789b (from 1814b). Here's my test bench. I might just open an issue anyway, it can always be closed - I'm now worried about flooding this PR with off-topic discussion.

This comment has been minimized.

edited

we might also want public static int Combine(T1 value);. I know it looks a little funny, but it would provide a way of diffusing bits from something with a limited input hash space. For example, many enums only have a few possible hashes, only using the bottom few bits of the code. Some collections are built on the assumption that hashes are spread over a larger space, so diffusing the bits may help the collection work more efficiently.

This comment has been minimized.

I don't think this is necessary. It's untestable without spending a lot of CPU time, will never come up in a real scenario, and is adding costs to a hot path (if Add is used instead of the Combine methods, it is probably being called in a loop). Do you have a good reason for adding this?

This comment has been minimized.

I don't think it's super necessary either, if you are hashing uint.MaxValue fields you probably have far bigger DOS issues than HashCode re-initializing state. The strange (_length ^ position) == 0 check came from the original code, which was more robust without leaning on checked:

(_v1|_v2|_v3|_v4| (_length^position) ==0)

It further decreases the chances of re-initialization to probably some crazy probability (_v1 | (_position... would probably be good enough). The comment about the bloom filter is also erroneous for this reason (and should be updated).

Should I restore the original bloom filter, restore the shorter option or simply drop the checked?

This comment has been minimized.

You should drop the checked. I am not familiar with the details of the hashing algorithm, so I will leave the rest to your discretion. However, if you decide not to include the _v1 | _v2 | _v3 | _v4 then it should just be _length == position, not (_length ^ position) == 0.

Hopefully branch prediction can make it incredibly cheap since it'll never get taken.

I'm not specifically concerned about a DoS, rather just that this could start yielding incorrect results at a certain point. We shouldn't silently start producing incorrect data since it's incredibly difficult to debug or recover from.

This comment has been minimized.

edited

@morganbr if we're throwing an exception, I assume we should expose the length so that users don't have try/catch? If CLI is still a thing, it will have to be long.

Edit: throw behaves exactly like checked, trying some other tricks.

Edit: Amazing. If I put the check into it's own method, it does get removed. It doesn't matter; Add (which is the only method that invoked Add(int)) isn't inlined anyway. This is becoming an exercise in academia.

This comment has been minimized.

I don't have strong feelings either way on exposing it. I also don't know the exact CLS rules, but yeah we probably would have to make it an int or a long so that all languages can reference it. If it's a long, I don't mind skipping an overflow check since 2^63 bytes is impossibly large, but I don't know how much that would slow down this code.

This comment has been minimized.

This whole thing with s_seed being static readonly was mentioned in the original PR. It's not really relevant because this code ends up being crossgened and thus those values are not actually compile time constants.

This comment has been minimized.

@mikedn Oh I forgot that this code is crossgened, that's a shame. @jkotas What happened to the BypassNGenAttribute that you mentioned earlier, in my PR to Memmove? It seems like it would be useful here.

This comment has been minimized.

I think some developers are going to mistakenly try to call GetHashCode instead of ToHashCode. It'd be worth including "Use ToHashCode to retrieve the computed hash code." in both the obsolete and exception message for GetHashCode as a hint of what to do.

This comment has been minimized.

This has the same effect as the nested-ifs you had earlier, actually. I had something different in mind for goto, but now I don't think it will let us get the same perf as if we could use switch. You can revert this part to how it was before.

Have you tried marking ToHashCode with AggressiveInlining and using switch? Does switch generate more code than ifs?

This comment has been minimized.

edited

Member

It may be better to manually inline the other Add here - Add((comparer != null) ? comparer.GetHashCode(value) : (value?.GetHashCode() ?? 0)) - to save on number of generic method instantiations that have to be created.

* HashCode based on xxHash32
Works by maintaining the xxHash32 state variables (v1 -> v4, length) as well as a queue of values that fall outside of the block size (16 bytes/4 ints). The seed is initialized to random bytes.
Further details, unit tests and history: dotnet/corefx#25013
Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

* HashCode based on xxHash32
Works by maintaining the xxHash32 state variables (v1 -> v4, length) as well as a queue of values that fall outside of the block size (16 bytes/4 ints). The seed is initialized to random bytes.
Further details, unit tests and history: dotnet/corefx#25013
Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

* HashCode based on xxHash32
Works by maintaining the xxHash32 state variables (v1 -> v4, length) as well as a queue of values that fall outside of the block size (16 bytes/4 ints). The seed is initialized to random bytes.
Further details, unit tests and history: dotnet/corefx#25013
Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

* HashCode based on xxHash32
Works by maintaining the xxHash32 state variables (v1 -> v4, length) as well as a queue of values that fall outside of the block size (16 bytes/4 ints). The seed is initialized to random bytes.
Further details, unit tests and history: #25013
Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>

* HashCode based on xxHash32
Works by maintaining the xxHash32 state variables (v1 -> v4, length) as well as a queue of values that fall outside of the block size (16 bytes/4 ints). The seed is initialized to random bytes.
Further details, unit tests and history: #25013
Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>

* HashCode based on xxHash32
Works by maintaining the xxHash32 state variables (v1 -> v4, length) as well as a queue of values that fall outside of the block size (16 bytes/4 ints). The seed is initialized to random bytes.
Further details, unit tests and history: #25013
Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>

* HashCode based on xxHash32
Works by maintaining the xxHash32 state variables (v1 -> v4, length) as well as a queue of values that fall outside of the block size (16 bytes/4 ints). The seed is initialized to random bytes.
Further details, unit tests and history: #25013
Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>

Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.