STL 101 Part D - sorted associative containers, Set and Map

Welcome to part IV of my series on the STL. This time around I want to lok at two containers called set and map. I'll explain set first and then look at what map brings to the party.

Creating a set

Both of these containers are implemented as a binary tree by every STL implementation I know of. As a result, the push_back/pop_back paradigm does not work here - items are inserted in order, so you are not able to specify a location. Instead we use the insert method, which can take a location as an option, this is a suggestion to the STL where to start looking for an insert location, and can obviously speed things up tremendously if a large set is being built. To remove items from a set, we need to use the find algorithm and pass the returned iterator into the erase function. Whenever you use the find algorithm, you should check if the return value is the same as end(), just in case the item was not found.

The other thing we can do when we create a set is specify the manner in which items are sorted. I could use string in this container also, but I've elected to use char * from here on instead. We are going to create a set with the same data as before, but sort it in reverse order through a functor as follows.

As you can see, our policy has allowed the items to be sorted in reverse order. While this is a trivial case, user defined types will always require such a functor because no default sort order will be possible.

Advantages of set

The set has a number of advantages over other containers. It has a much better insert cost than vector, and a much better search cost than list, making it an excellent trade off in some situations. As Scott Meyers points out though, if you're creating a container which does not change much, then a sorted array may be a better choice. In both cases the main advantage comes from the ease of searching a container that is sorted, and the STL provides a number of functions especially for use with set.

Set algorithms
The STL contains a number of algorithms that perform set related functions on a sorted range, as follows:

Function name

Function returns

includes

true if range from set one exists in set two

set_union

all elements in both sets

set_intersection

elements that appear in both sets

set_difference

elements only in set one

set_symmetric_difference

elements only in either of the two

Set and map offer bidirectional iterators, so any other algorithms that accept these iterators can be used with them, and any sorted container that supports at least these iterators can be used with the set functions listed above. The following code uses the last four of these algorithms on two sets that are similar, but not the same

Set 3
well
this
the
test
could
be
Set Difference
is boys a
Set Intersection
this test
Set Symmetric Difference
well the could be is boys a
Set Union
well this the test could be is boys a

A note about hashed sets

As has been noted, set and map are generally implemented as binary trees. The other option is to use has tables, and while they are not part of the standard yet, many implimentation offer hashed sets and maps. The advantage of hashing is that it provides potentially better performance than a binary tree, but a binary tree provides stable performance where a hash table at worst can provide O(n) performance. I'm guessing that is the reason that when time became tight, the binary tree implementation got into the standard and the hash one was made to wait.

Map

A map is essentially a set which also stores a second value in each location. The first value can then be considered a key whereby the second can be found. An example we have all probably used exists in the heart of MFC. Because MFC wraps a HWND into a class, but must ultimately process it in a global WndProc, it creates a map which links a HWND to an instance of a CWnd*. Then the WinProc can access the class by looking it up through the HWND which it has had passed into it.

To add to a map, we can use the insert method, but it is much easier to use array type notation, like this:

In other words, our second attempt to access a non-existent element did not result in an unwanted insertion.

Map Iterators

Being an STL container, a map uses iterators to control access to it's elements. However, a map stores two values in each location, so what does the iterator give us to deal with this ? An iterator from a map returns a pair<key, value>, and the two values involved can be accessed using pair.first and pair.second respectively. The following code illustrates this

Printing out the list again would reveal that this element has now been changed. Accessing first allows changing of the key for an element in the same way.

Having established this, a better way to access an element would be to catch the iterator returned by our find test, and call second on it, as using the [] operators after establishing the existence of an element requires two lookups, whereas storing the iterator means only one is required.

Using multiset/multimap
The set and map both require a unique key, that is to say, inserting the same value into a set twice results in only one insertion, the same is true of two elements with the same key into a map. Where more than one element of the same value is required, we must use multiset/multimap. These containers work the same as thier single element cousins, but have some extra functions to assist us in using them

Function

Returns

count

number of elements with a given value

lower_bound

first element that contains a value

upper_bound

last element that contains a value

equal_range

pair of iterators that define the range that contains a given value.

I hope you've found this article useful, and that it inspires you to make use of the STL. Up until now I have spoken mainly about the different containers the STL provides ( although some remain uncovered as of yet ), and how they can be specialised. However, the real power of the STL lies in the way it interfaces with the algorithms the standard library provides. A selection of these algorithms have appeared in the series to date, but the next article will seek to list a good number of them and explore the ways in which they can be used.