C/C++

User-Defined Hash Functions for Unordered Maps in C++0x

The basic hash map should be the first- or second-most used container in your arsenal, but unless you're a C++ savant, you might find yourself ditching it out of sheer frustration.

The container classes included in the C++ standard library serve as good illustrations of both the strengths and weaknesses of the language. The strengths are obvious: efficient, type-safe containers with performance guarantees suitable for a huge variety of applications. And the weaknesses? Compiler error messages that redefine the term useless, and documentation that makes a mockery of the word.

In this article I'll illustrate how you might bump into these problems using the unordered_map container, as well showing you how to work past the problems. By rights, this basic hash map should be the first- or second-most used container in your arsenal, but if you are less than a C++ savant, you might find yourself ditching it out of frustration.

Hash Tables

It's a little bit of an embarrassment to the C++ community that it didn't have a hash table in the standard library until TR1 was published in 2005. In a perfect world, the original standard should have contained hash map and hash set containers. But Alexander Stepanov didn't include these containers in the original Standard Template Library, and the standardization committee was reluctant to bless containers that didn't have a decent amount of mileage in the real world.

By 2005, there were enough non-standard implementations to allow the TR1 extension to confidently add four new template classes:

unordered_map

unordered_multimap

unordered_set

unordered_multiset

With basically the same semantics as their ordered counterparts (map, multimap, etc.) and the ideal O(1) performance afforded by hashed indexing, C++ finally had a feature that was basically table stakes for any high-level language created since the mid-1980s.

The container classes in general, and unordered_map in particular, are such a useful part of the language that we generally want to introduce them as early as possible to people learning C++. Admittedly, the underpinnings of template programming are out of the beginner's depth, but using the containers doesn't require much knowledge about templates beyond an understanding of a few syntax rules.

A simple piece of sample code that you might use to teach beginners about these containers is shown here:

This example works great in the classroom or in the documentation page for unordered_map. But the new C++ programmer runs into trouble as soon as he or she steps outside the classroom and tries to implement a real-world example. A very common change to this program on its path to the real world will be in the use of a simple class or structure to hold the person's name. To keep it simple, I'll just assume we want to keep first and last names separate, and use the built-in pair class to hold the name:

The g++ compiler doesn't seem to actually produce a sentence describing the error. I think it figures no human could parse it anyway.

The inscrutability of compiler errors in template classes is hardly something new — I was complaining about it all the way back in 1995. And the C++0x committee tried to do something about it. Had the C++0x Concepts feature been accepted, the compiler might instead have issued an error message that looked more like this:

hash_test.cpp(15): type 'Name' does not have a hash function

Unfortunately, Concepts did not make it, and we are stuck with error messages that are of no help at all.

RTFM

There are a couple of obvious places to try to figure out what these errors mean. Google would be one, and Visual Studio's class documentation pages would be another.

Just as an experiment, try putting yourself in the place of a novice, and execute a search on:

You will find some good clues, but probably no solutions to your problem. Much of the published code deals with pre-standard hash tables using boost implementations, or early g++ hash tables, which are not going to help.

But let's say you eventually figure out that you need to write a hash function for your Name class. All you need to know are the mechanics. Where do you find out what they are?

The logical place to do this would be in the Visual Studio unordered_map documentation page. This page has some good information in it, but nowhere does it address an undoubtedly common problem: how do I create a user-defined hash function for unordered_map?. And don't even think about getting anything useful from the g++ documentation page.

These stumbling blocks are the kind of thing that give C++ a reputation as one of the most difficult languages to learn. To see the contrast, you might compare it to Java. The Java programmer won't ever run into this problem, because the language defines a default hash function in the base class Object. Any Java object may be used as a key in Java's HashMap generic class. While the base class definition may be far from optimal for many cases, it will at least work, and won't prevent the beginner from using the container in real programs.

C++ could have done the same thing when implementing the unordered containers, but it was constrained by both philosophy and language limitations. It would have been a real accomplishment to have overcome both, and to be honest, the people on the standards committee have to work hard enough as it is. Insisting on X-Ray vision and a cape for each member might be pushing it.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!