Many container operations apply to more than one type of container and have the same interface for each type that supports them. In this article, Andrew Koenig and Barbara Moo show how the library exploits these common interfaces to provide a collection of standard algorithms. By using these algorithms, you can avoid writing (and rewriting) the same code repeatedly. More important, you can write programs that are smaller and simpler than you would write otherwise -- sometimes astonishingly so.

This sample chapter is derived from the book Accelerated C++ (Addison Wesley Longman, 2000), by Andrew Koenig and Barbara Moo. It is part of Bjarne Stroustrup's C++ In-Depth series.

This chapter is from the book

This chapter is from the book

Many container operations apply to more than one type of container. For
example, vector, string, and list allow us to insert
elements by calling insert and remove elements by calling
erase. These operations have the same interface for each type that
supports them. For that matter, many container operations also apply to the
string class.

Every containeras well as the string classprovides
companion iterator types, which let us navigate through a container and examine
its elements. Again, the library ensures that every iterator that supplies an
operation does so through the same interface. For example, we can use the
++ operator to advance any type of iterator from one element to the
next, we can use the * operator to access the element associated with
any type of iterator, and so on.

In this article, we'll see how the library exploits these common
interfaces to provide a collection of standard algorithms. By using these
algorithms, we can avoid writing (and rewriting) the same code over and over
again. More important, we can write programs that are smaller and simpler than
we would write otherwisesometimes astonishingly so.

Like containers and iterators, algorithms also use consistent interface
conventions. This consistency lets us learn a few of the algorithms and then
apply that knowledge to others as the need arises. In this chapter, we'll
use several of the library algorithms to solve problems related to processing
strings and student grades. Along the way, we'll cover most of the
core concepts in the algorithm library.

Unless we say otherwise, the <algorithm> header defines all
the algorithms that we introduce in this chapter.

Analyzing strings

This loop is equivalent to inserting a copy of the elements of
bottom at the end of ret, an operation that vectors
provides directly:

ret.insert(ret.end(), bottom.begin(), bottom.end());

This problem has an even more general solution: We can separate the notion of
copying elements from that of inserting elements at the end of a container, as
follows:

copy(bottom.begin(), bottom.end(), back_inserter(ret));

Here, copy is an example of a generic algorithm, and
back_inserter is an example of an iterator adaptor.

A generic algorithmis an algorithm that is not part of any
particular kind of container, but instead it takes a cue from its
arguments' types about how to access the data it uses. The standard
library's generic algorithms usually take iterators among their arguments,
which they use to manipulate the elements of the underlying containers. So, for
example, the copy algorithm takes three iterators, which we'll
call begin, end, and out, and copies all the elements
in the range [begin, end) to a sequence of elements starting at
out and extending as far as necessary. In other words,

copy(begin, end, out);

has the same effect as

while (begin != end) *out++ = *begin++;

except that the while body changes the values of the iterators, and
copy doesn't.

Before we describe iterator adaptors, we should note that this loop depends
on the use of the postfixversion of the increment operators.
These operators differ from the prefix versions, which we have used up to now,
in that begin++ returns a copy of the original value of begin,
incrementing the stored value of begin as a side effect. In other
words,

it = begin++;

is equivalent to

it = begin; ++begin;

The increment operators have the same precedence as *, and they are
both right-associative, which means that *out++ has the same meaning as
*(out++). Thus,

*out++ = *begin++;

is equivalent to the more verbose

{ *out = *begin; ++out; ++begin; }

Let's return to iterator adaptors, which are functions that yield
iterators with properties that are related to their arguments in useful ways.
The iterator adaptors are defined in <iterator>. The most common
iterator adaptor is back_inserter, which takes a container as its
argument and yields an iterator that, when used as a destination, appends values
to the container. For example, back_inserter(ret) is an iterator that,
when used as a destination, appends elements to ret. Therefore,

copy(bottom.begin(), bottom.end(), back_inserter(ret));

copies all of the elements of bottom and appends them to the end of
ret. After this function completes, the size of ret will have
increased by bottom.size().

This latter mistake is particularly insidious because the program will
compile. What it does when you try to run it is another story entirely. The
first thing copy will try to do is assign a value to the element at
ret.end(). There's no element there, so what the implementation
will do is anybody's guess.

Why is copy designed this way? Because separating the notions of
copying elements and expanding a container allows programmers to choose which
operations to use. For example, we might want to copy elements on top of
elements that already exist in a container, without changing the
container's size. As another example, we might want to use
back_inserter to append elements to a container that are not merely
copies of another container's elements.

Another Way to split

Another function that we can write more directly using the standard
algorithms is split. The hard part of writing that function was dealing
with the indices that delimited each word in the input line. We can replace the
indices by iterators and use standard-library algorithms to do much of the work
for us:

This code uses a lot of new functions, so it will take a bit of explanation.
The key idea to keep in mind is that it implements the same algorithm as the
original, using i and j to delimit each word in str.
Once we've found a word, we copy it from str and push the
copy onto the back of ret.

This time, i and j are iterators, not indices. We use
typedef to abbreviate the iterator type so that we can use
iter instead of the longer string::const_iterator. Although
the string type does not support all of the container operations, it
does support iterators. Therefore, we can use the standard-library algorithms on
the characters of a string, just as we can use them on the elements of
a vector.

The algorithm that we use in this example is find_if. Its first two
arguments are iterators that denote a sequence; the third is a predicate, which
tests its argument and returns true or false. The
find_if function calls the predicate on each element in the sequence,
stopping when it finds an element for which the predicate yields true.

The standard library provides an isspace function to test whether a
character is a space. However, that function is overloaded so that it will work
with languages such as Japanese that use other character types, such as
wchar_t ( § 1.3/14). It's not easy to pass an overloaded
function directly as an argument to a template function. The trouble is that the
compiler doesn't know which version of the overloaded function we mean
because we haven't supplied any arguments that the compiler might use to
select a version. Accordingly, we'll write our own predicates, called
space and not_space, that make clear which version of
isspace we intend.

The first call to find_if seeks the first nonspace character that
begins a word. Remember that one or more spaces might begin a line or might
separate adjacent words in the input. We don't want to include these spaces
in the output.

After the first call to find_if, i will denote the first
nonspace, if any, in str. We use i in the next call to
find_if, which looks for the first space in [i, str.end()). If
find_if fails to find a value that satisfies the predicate, it returns
its second argument, which, in this case, is str.end(). Therefore,
j will be initialized to denote the blank that separates the next word
in str from the rest of the line, or, if we are on the last word in the
line, j will be equal to str.end().

At this point, i and j delimit a word in str. All
that's left is to use these iterators to copy the data from str
into ret. In the earlier version of split, we used
string::substr to create the copy. However, that version of
split operated on indices, not iterators, and there isn't a
version of substr that operates on iterators. Instead, we construct a
new string directly from the iterators that we have. We do so by using
an expression, string(i, j), that is somewhat similar to the definition
of spaces. Our present example constructs a string that is a
copy of the characters in the range [i, j). We push this new
string onto the back of ret.

It is worth pointing out that this version of the program omits the tests of
the index i against str.size(). Nor are there the obvious
equivalent tests of the iterator against str.end(). The reason is that
the library algorithms are written to handle gracefully calls that pass an empty
range. For example, at some point the first call to find_if will set
i to the value returned by str.end(), but there is no need to
check i before passing it to the second call to find_if. The
reason is that find_if will look in the empty range [i,
str.end()) and will return str.end() to indicate that there is no
match.

Palindromes

Another character-manipulation problem that we can use the library to solve
succinctly is determining whether a word is a palindrome. Palindromes are words
that are spelled the same way front to back and back to front. For example,
civic, eye, level, madam, and rotor are all
palindromes.

The return statement in this function's body calls the
equal function and the rbegin member function, both of which
we have not yet seen.

Like begin, rbegin returns an iterator, but this time it is
an iterator that starts with the last element in the container and marches
backward through the container.

The equal function compares two sequences to determine whether they
contain equal values. As usual, the first two iterators passed to equal
specify the first sequence. The third argument is the starting point for the
second sequence. The equal function assumes that the second sequence is
the same size as the first, so it does not need an ending iterator. Because we
pass s.rbegin() as the starting point for the second sequence, the
effect of this call is to compare values from the back of s to values
in the front. The equal function will compare the first character in
s with the last. Then it will compare the second to the next-to-last,
and so on. This behavior is precisely what we want.

Finding URLs

As the last of our examples of character manipulation, let's write a
function that finds Web addresses, called uniform resource locators (URLs), that
are embedded in a string. We might use such a function by creating a
single string that holds the entire contents of a document. The
function would then scan the document and find all the URLs in it.

A URL is a sequence of characters of the form:

protocol-name://resource-name

where protocol-namecontains only letters, and
resource-namemay consist of letters, digits, and certain
punctuation characters. Our function will take a string argument and
will look for instances of :// in that string. Each time we
find such an instance, we'll look for the protocol-namethat precedes it and the resource-namethat follows
it.

Because we want our function to find all the URLs in its input,
we'll want it to return a vector<string>, with one element
for each URL. The function executes by moving the iterator b through
the string, looking for the characters :// that might be a
part of a URL. If we find these characters, it looks backward to find the
protocol-name, and it looks forward to find the
resource-name:

We start by declaring ret, which is the vector into which
we will put the URLs as we find them, and by obtaining iterators that delimit
the string. We will have to write the url_beg and
url_end functions, which will find the beginning and end of any URL in
the input. The url_beg function will be responsible for identifying
whether a valid URL is present and, if so, for returning an iterator that refers
to the first character of the protocol-name. If it does not
identify a URL in the input, then it will return its second argument
(e, in this case), to indicate failure.

If url_beg finds a URL, the next task is to find the end of the URL
by calling url_end. That function will search from the given position
until it reaches either the end of the input or a character that cannot be part
of a URL. It will return an iterator positioned one past the last character in
the URL.

Thus, after the calls to url_beg and url_end, the iterator
b denotes the beginning of a URL, and the iterator after
denotes the position one past the last character in the URL:

We construct a new string from the characters in this range and push
that string onto the back of ret.

All that remains is to increment the value of b and to look for the
next URL. Because URLs cannot overlap one another, we set b to (one
past) the end of the URL that we just found and continue the while loop
until we've looked at all the input. Once that loop exits, we return the
vector that contains the URLs to our caller.

Now we have to think about url_beg and url_end. The
url_end function is simpler, so we'll start there:

This function just forwards its work to the library find_if
function. The predicate that we pass to find_if is one that we will
write, named not_url_char. It will return true when passed a
character that cannot be in a URL:

Despite being small, this function uses a fair bit of new material. First is
the use of the static storage class specifier. Local variables that are
declared to be static are preserved across invocations of the function. Thus, we
will construct and initialize the stringurl_ch only on the
first call to not_url_char. Subsequent calls will use the object that
the first call constructed. Because url_ch is a conststring, its value will not change once we have initialized it.

The not_url_char function also uses the isalnum function,
which the <cctype> header defines. This function tests whether
its argument is an alphanumeric character (a letter or a digit).

Finally, find is another algorithm that we haven't used yet. It
is similar to find_if, except that, instead of calling a predicate, it
looks for the specific value given as its third argument. As with
find_if, if the value that we want is present, the function returns an
iterator denoting the first occurrence of the value in the given sequence. If
the value is not found, then find returns its second argument.

With this information in hand, we can now understand the
not_url_char function. Because we negate the value of the entire
expression before we return it, not_url_char will yield false
if c is a letter, a digit, or any of the characters in url_ch.
If c is any other value, the function returns true.

Now the hard part begins: implementing url_beg. This function is
messy because it must deal with the possibility that the input might contain
:// in a context that cannot be a valid URL. In practice, we'd
probably have a list of acceptable protocol-namesand
look only for those. For simplicity, though, we'll limit ourselves to being
sure that one or more letters precede the :// separator and at least
one character follows it:

string::const_iterator url_beg(string::const_iterator b,
string::const_iterator e) { static const string sep = "://";
typedef string::const_iterator iter;
// i marks where the separator was found iter i = b;
while ((i = search(i, e, sep.begin(), sep.end())) != e) {
// make sure the separator isn't at the beginning or end of the line if (i != b && i + sep.size() != e) {
// beg marks the beginning of the protocol-name iter beg = i;
while (beg != b && isalpha(beg[-1])) --beg;
// is there at least one appropriate character before and after the separator? if (beg != i && !not_url_
char(i[sep.size()])) return beg; }
// the separator we found wasn't part of a URL;advance i past this separator i += sep.size(); } return e; }

The easy part is writing the function header. We know that we'll be
passed two iterators denoting the range in which to look and that we'll
return an iterator that denotes the beginning of the first URL in that range, if
one exists. We also declare and initialize a local string, which will
hold the characters that make up the separator that identifies a potential URL.
Like url_ch in the not_url_char function, this string
is static and const. Thus, we will not be able to change the
string, and its value will be created only on the first invocation of
url_beg.

The function executes by placing two iterators into the string
delimited by band e:

e
109
b
text http://www.acceleratedcpp.com more text
beg i

The iterator i will denote the beginning of the URL separator, if
any, and beg will indicate the beginning of the
protocol-name, if any.

The function first looks for the separator by calling search, a
library function that we haven't used before. This function takes two pairs
of iterators: The first pair denotes the sequence in which we are looking, and
the second pair denotes the sequence that we want to locate. As with other
library functions, if search fails, it returns the second iterator.
Therefore, after the call to search, either i denotes (one
past) the end of the input string, or it denotes a : that is
followed by //.

If we found a separator, the next task is to get the letters (if any) that
make up the protocol-name. We first check whether the separator
is at the beginning or the end of the input. If the separator is in either of
those places, we know that we don't have a URL because a URL has at least
one character on each side of its separator. Otherwise, we need to try to
position the iterator beg. The inner while loop moves
beg backward through the input until it hits either a nonalphabetic
character or the beginning of the string. It uses two new ideas: The
first is the notion that if a container supports indexing, so do its iterators.
In other words, beg[-1] is the character at the position immediately
before the one that beg denotes. We can think of beg[-1] as an
abbreviation for *(beg - 1). The second new idea is the
isalpha function, defined in <cctype>, which tests
whether its argument is a letter.

If we were able to advance the iterator over as much as a single character,
we can assume that we've found a protocol-name. Before
returning beg, we still have to check that there's at least one
valid character following the separator. This test is more complicated. We know
that there is at least one more character in the input because we're inside
the body of an if that compares the value of i + sep.size()
with e. We can access the first such character as
i[sep.size()], which is an abbreviation for *(i + sep.size()).
We test whether that character can appear in a URL by passing the character to
not_url_char. This function returns true if the character is
not valid, so we negate the return to check whether the character is valid.

If the separator is not part of a URL, then the function advances i
past the separator and keeps looking.

This code uses the decrement operator, which we have not previously
used. It works like the increment operator, but it decrements its operand
instead. As with the increment operator, it comes in prefix and postfix
versions. The prefix version, which we use here, decrements its operand and
returns the new value.