A question about the Hash algorithm in Hashtable [Kit George]

A question was raised regarding the hash algorithm used in Hashtable by Frank Hileman:"Any chance that the Hashtable will be changed from the xor/mod prime oldskool type to something modern like this: http://www.isthe.com/chongo/tech/comp/fnv/"

Frank, its unlikely. We may certainly explore the opportunity for a new collection which supports an alternate algorithm, but changing the behavior of the existing class is probably not going to happen.

Thanks for the response. A different hashtable class could be created by anyone. But the FNV is not a new data structure, as much as a new way of computing hash codes. So to use these more modern types of tables, you need different GetHashCode algorithms, in many cases. I asked because the more GetHashCode functions that are written, the more we are locked into the old type of hash table. And the newer algorithms, such as FNV, are supposed to be much more efficient in distribution (which means faster and smaller tables). Many OS hashtables are much faster thanks to FNV.

FNV isnt the only possible string hash function, and, in fact, FNV produces bad bit distributions amongst the higher bits.
See "Performance in Practice of String Hashing Functions" by M.V. Ramakrishna and Justin Zobel for a thoughrogh review of string hashing functions.
http://citeseer.nj.nec.com/ramakrishna97performance.html
See also http://www.burtleburtle.net/bob/hash/evahash.html by Bob jenkins for a thoughrogh review of hash testing approaches.

Garry Trinder

10 Nov 2003 2:45 PM

Sorry for accidental posting on question form, I decided to ask here instead but it submitted for some reason, please discard.
In MSDN it is stated that a Hashtable can support one writer and multiple readers concurrently. On the other topic it is written that "To support one or more writers, all operations on the Hashtable must be done through the wrapper returned by the Synchronized method.". Which is true?
If first, aren't there performance issues for single-threaded application, or how do you achieve one-writer-many-readers without copying data locally? If second, there is a bug in SyncHashtable not locking indexer getter...

Brian Grunkemeyer

11 Nov 2003 10:32 PM

FYI - we have thought about changing the hash function on String, and we might do that sometime depending on some performance testing. It looks like we already have tweaked it a little bit since V1.1. We do hope that no one depends on the current behavior of GetHashCode, as we'll probably change it. Just ensure that you don't persist hash functions to disk or try to reverse engineer our hash function - you'll be broken later if you do.
BTW, we have changed the behavior of Object::GetHashCode and ValueType::GetHashCode for performance reasons - they still obey the contract that they must obey, but if you are somehow counting on an object to give you the same value between different versions of the CLR, you have a problem. (Note that Hashtable implements an OnDeserializationCallback which calls GetHashCode on every key again, so this isn't a problem for serialized Hashtable instances.)
The upshot is we reserve the right to reimplement GetHashCode on every object differently in every build of the CLR.

Frank Hileman

12 Nov 2003 1:08 AM

damien: yes, I saw those papers. The Zobel one seems a little old, testing perf on a sparc 20? I am not saying FNV is the best, but if you look around, you will see where it has been used and made a big difference. The main point is that more support is needed from the BCL because if people write their own GetHashCode functions they may be optimizing to a particular type of data structure. Also regardless of your favorite hash algorithm/structure the mod prime one is dated.