Andrew Koenig

Dr. Dobb's Bloggers

The Case Against int, Part 3: The Advantages and Perils of Unsigned Arithmetic

March 07, 2012

C++ defines an unsigned integer type, size_t, that is well-suited to storing array indices, or otherwise counting items that can fit in the computer's memory.

Last week, I noted that integers are often used for counting, and that counting is often better suited to unsigned than to signed arithmetic. To that end, C++ defines an unsigned integer type, size_t, that is well-suited to storing array indices, or otherwise counting items that can fit in the computer's memory. Moreover, each of the standard-library containers, such as vector<T>, defines a type analogous to vector<T>::size_type, which is an unsigned integer with an appropriate capacity to store the number of elements in a vector<T>.

If you're counting objects, the number of objects you're counting can't be negative. Because unsigned integers cannot express negative values, a whole category of errors becomes impossible. This claim isn’t mere sophistry, either. Suppose, for example, you want to check whether n is a suitable index for a vector v. You might think you would have to write

if (n >= 0 && n < v.size()) { /* n is suitable */ }

However, the fact that v.size() yields an unsigned value makes it possible to write

if (n < v.size()) { /* n is suitable */ }

The reason is that, in general, combining signed and unsigned integers in C++ converts the signed quantity to unsigned. Accordingly, if n is negative, it is converted to a huge positive unsigned value, which will appear to be larger than v.size().

Such behavior is often desirable, as it is here; but it can also cause trouble. As an example of such trouble, consider a loop to visit all of the elements of v in ascending order:

Here, I have started by defining vsize_t as the type yielded by v.size(), and done so in a way that makes it unnecessary for me to know what kind of container v is. I have then written a straightforward loop that steps n through each index of v.

What if I want to make this loop run backward? The obvious way to write it is

for (vsize_t n = v.size()-1; n >= 0; --n) { /* Wrong! */ }

but a little examination should convince you that this code is a disaster. In fact, a considerate compiler will warn about the problem: n is unsigned, so n >= 0 is always true. Therefore, the loop will never terminate.

This example exposes one of the difficulties in working with unsigned quantities: Loops often terminate after a variable has passed beyond the range of acceptable values. If that range is bounded by 0, and you're using an unsigned variable, it may be impossible to go beyond that bound.

There's a general rule for working with such bounds: When a loop runs backward, decrement at the beginning of the loop; when it runs forward, increment at the end. Decrementing at the beginning simplifies the code in two ways: It makes it possible to start a loop at the upper off-the-end value, rather than one element earlier, and it ensures that the lower bound will never be transgressed.

This last paragraph is a bit of a mouthful, so let me show the two general forms I have in mind, using while instead of for to reduce the code to its essentials. Let's start with the familiar form of a loop that increases a variable's value each time through:

vsize_t n = 0;
while (n != v.size()) {
// Process v[n]
++n;
}

If we are moving backward through v, the form is less familiar:

vsize_t n = v.size();
while (n != 0) {
--n;
// Process v[n]
}

Writing the loop in this slightly unusual form allows us to stop the loop before n crosses zero (which it will never do, so we had better not try to test for it). It also makes it unnecessary to write a separate calculation to start n at v.size()-1 instead of v.size(). Moreover, the same technique works for iterators:

The similarity between these last two code examples comes from a similarity between unsigned integers and iterators: Because there is an off-the-end value but no off-the-beginning value, both unsigned integers and iterators require special care when a value might run off the beginning of a range (i.e., might underflow).

Summary

Unsigned integers are particularly well suited to counting, because the number of items that one is counting cannot be negative.

Mixing signed and unsigned integers usually converts the signed integer to unsigned. This conversion can be useful (i.e., it can simplify bounds checking), but it can also cause trouble, particularly around lower bounds of ranges.

One rule of thumb for avoiding such trouble is to decrement a control variable at the beginning of a loop. Using this rule requires the variable to start out one step beyond its "real" first value, but that state of affairs is usually easy to achieve.

Techniques for writing loops that use unsigned integers can carry over to loops that involve iterators.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!