That's true.
> A STL map is required to have a worse case find() of O log(N).
> Balanced trees give you that. Although hash maps are typically faster
> than balanced tree on large amount of random data, they do not
> guarantee a worse case scenario of O log(N).

Actually, it IS entirely possible to create a hash map that guarantees O
(log(N)) complexity in the worst case for insert, delete and find.

The requirement that's difficult to meet is iterating in order in a
specified period of time.
> >TR1 (and C++ 0x) add unordered_[multi]set and unordered_[multi]map,
> >which are intended to be based on hashing.
>
> I.e. TR1 did accept that hash maps are also useful and added them.
> Now all we need is for the developpers to be able to make an informed
> choice on the correct one to use for a particular application.

Actually, there never seems to have been a belief that hash-based
containers weren't/coudn't be useful -- but they're relatively complex
to specify correctly, and the committee had decided not to accept any
new features (for the original 1998 standard) before a suitable
specification of hash-based containers was finished.

James Kanze wrote:
> I don't know about the "typically faster". Their average access
> time is faster IF the hash function is good. Typically, I find
> that it's not always that good (although the situation is
> improving).

Question: Will the new C++ standard require for the libraries to
provide high-quality default hashing functions for internal types (and
perhaps some other standard types such as std::string), or will the user
be forced to always provide one himself?

Creating good-quality hashing functions are not a trivial task at all
(and the subject of continuing extensive research). One cannot expect
the average user to invent a good-quality function without extensive
knowledge and experience on the subject.

On 2008-05-31 17:50, Juha Nieminen wrote:
> James Kanze wrote:
>> I don't know about the "typically faster". Their average access
>> time is faster IF the hash function is good. Typically, I find
>> that it's not always that good (although the situation is
>> improving).
>
> Question: Will the new C++ standard require for the libraries to
> provide high-quality default hashing functions for internal types (and
> perhaps some other standard types such as std::string), or will the user
> be forced to always provide one himself?
>
> Creating good-quality hashing functions are not a trivial task at all
> (and the subject of continuing extensive research). One cannot expect
> the average user to invent a good-quality function without extensive
> knowledge and experience on the subject.

Currently the parametrised hash struct (which is used by the unordered
containers) is required to be instantiable for integers, float, double,
pointers, and string-types. It says nothing about the quality of the
implementation.

Pete Becker schrieb:
>> Creating good-quality hashing functions are not a trivial task at all
>> (and the subject of continuing extensive research). One cannot expect
>> the average user to invent a good-quality function without extensive
>> knowledge and experience on the subject.
>
> Indeed. That's why I've never thought that standardizing hashed
> containers was a good idea. There's just too much flexibility, making
> them hard to specify well, with the result that naive users can get
> horrible performance without knowing why.
>
> But, to answer your question, no, there is no requirement for "high
> quality" hashing functions. Implementations will be required to provide
> hashing functions for the builtin types, pointers, std::string,
> std::u16string, std::u32string, std::wstring, std::error_code, and
> std::thread::id.

[ ... ]
> Question: Will the new C++ standard require for the libraries to
> provide high-quality default hashing functions for internal types (and
> perhaps some other standard types such as std::string), or will the user
> be forced to always provide one himself?

Hash functions are required in the standard library. <functional> has
specializations of hash<> for the obvious arithmetic types, plus
pointers, string types (string, u16string, u32string, wstring) and
error_codes and thread::id's.

As usual, the quality is up to the individual implementation. [N2521,
unord.hash/2]: "The return value of operator() is unspecified, except
that equal arguments shall yield the same result."

On Jun 2, 3:28 pm, (Yannick Tremblay) wrote:
> In article
> <>,
> James Kanze <> wrote:
> >On May 30, 4:28 pm, (Yannick Tremblay) wrote:
> >> A STL map is required to have a worse case find() of O log(N).
> >> Balanced trees give you that. Although hash maps are
> >> typically faster than balanced tree on large amount of random
> >> data, they do not guarantee a worse case scenario of O log(N).
> >I don't know about the "typically faster". Their average access
> >time is faster IF the hash function is good. Typically, I find
> >that it's not always that good (although the situation is
> >improving).
> Ok, sorry about my lack of definition for "typically":
> Assuming a non-perfect hash function that return you a bucket
> identifier (almost necessary since a perfect hash function
> would be too expensive for it to be worthwhile), this hash
> function makes assumptions on the data it receives (e.g.
> random). The hash table making use of this hash function will
> be very fast when presented with "typical" data (i.e. data
> that generally meet the assumptions used for creating the hash
> function). Unfortunately, there may be some worse case
> scenario under which the performance of the hash table will be
> abyssimal.

That's not my point. The performance of a hash table depends
very much on the quality of the hash function. While a perfect
hash function is only possible if the exact set of data are
known in advance, hash functions do differ enormously in their
quality, and I've seen more than a few cases where even some
very competent programmers have come up with relatively poor
hash functions; the most obvious is in Java's hash map, where
Java was forced to change the hash function as a result. I
suspect that part of the reason why the original STL had a
balanced tree, but not a hash table, is that one can sort of
expect an average programmer to get the ordering relationship
right, but you can't expect him to define a good hash function.
(There's also the point that if he gets the ordering function
wrong, there's a good chance of the program not working at all,
so he'll notice. Where as in the case of a hash function, the
program will only be a little bit slower than it should be.)

"James Kanze" <> wrote in message
news:...
On Jun 2, 3:28 pm, (Yannick Tremblay) wrote:
> I suspect that part of the reason why the original STL had a
> balanced tree, but not a hash table, is that one can sort of
> expect an average programmer to get the ordering relationship
> right, but you can't expect him to define a good hash function.
> (There's also the point that if he gets the ordering function
> wrong, there's a good chance of the program not working at all,
> so he'll notice. Where as in the case of a hash function, the
> program will only be a little bit slower than it should be.)

James Kanze's description is correct, as far as it goes; but it misses one
point that I considered socially (or, if you prefer, politically) important
at the time: If a user supplies a naive hash function that results in
terrible performance, the user is likely to blame the library (or the
language!) rather than the hash function. Therefore, when I designed the
associative-array class, I felt that it was more important to obtain
acceptable worst-case performance easily than it was to obtain excellent
average-case performance with substantial effort.

Share This Page

Welcome to The Coding Forums!

Welcome to the Coding Forums, the place to chat about anything related to programming and coding languages.

Please join our friendly community by clicking the button below - it only takes a few seconds and is totally free. You'll be able to ask questions about coding or chat with the community and help others.
Sign up now!