Constant Objects and Constant Expressions

Constant objects are a bit more useful in C++ than they are in C. Unfortunately, they're still not as useful as some people would like them to be.

A couple of months ago, I explained that C and C++ offer three different ways to define symbolic constants: as macros, enumeration constants, or constant objects ("Symbolic Constants," November 2001). For example, to define buffer_size as a symbol representing 256, you can use a macro:

#define buffer_size 256

an enumeration constant:

enum { buffer_size = 256 };

or a constant object:

int const buffer_size = 256;

I recently received e-mail about that column. It raises some interesting questions about the behavior of these different forms. I thought I'd share the letter and my response with you. Here goes:

Dear Mr. Saks:

I always enjoy your column in ESP, and often use your articles and suggestions for my work. I was happy to see a column about symbolic constants and how to use enumeration constants to replace macros. It does work, but not across compilation units.

I am sometimes in the situation of delivering both source code and object libraries. Sometimes the libraries have been compiled with dependencies on the macro values appearing in the source handed to customers. A customer might change a macro value, recompile the source, and then link to object code that had been compiled using the old values for the macro. The problem is that if the objects are, say, allocating buffers depending on a macro value now modified, the resulting executable is likely to misbehave.

As an example, suppose file a.h contains the following:

#define BUFSIZE 100

file a.c contains:

#include "a.h"
int buf_a[BUFSIZE];

and file b.c contains:

#include "a.h"
extern int buf_a[];
int buf_b[BUFSIZE];

Suppose also that you deliver files a.h and a.c to your customer in source format, but you deliver file b.c in object format only as b.o. The customer might edit a.h so that BUFSIZE defines a smaller value, then recompile a.c and link a.o (the object file for the just-compiled a.c) to b.o. After relinking, buf_b and buf_a will have different sizes, and any code in b.o depending on BUFSIZE and operating on buf_a will probably cause memory overruns.

Yes, this example is a bit contrived, but not really too far from what I deal with at work. Using an enumeration constant instead of a macro does not help with this problem because enumeration constants are translated to explicit numeric values by the compiler. Using an enumeration constant instead of a macro in the example would still produce a b.o containing a value for BUFSIZE that stays unchanged regardless of the modifications to the value of BUFSIZE in a.h.

Whether this is a good design or not is beside the point. Let's just say that my customers must have access to the macro values, but not to some source code which may be using them.

I believe that using a constant object would help here, since it should be left as a symbol, rather than translated into a constant by the compiler. Therefore, it should be possible for the linker to detect changes in the constant's value in the scenario above. However, our C compiler does not support constructs like:

int buf_b[BUFSIZE];

where BUFSIZE is a const int. Short of a redesign, the problem remains in place.

From your article, I understand C++ allows this construct if BUFSIZE is left as a symbol in each compilation unit. If that's the case, it would solve my problem. I hope I can test this in the future.

Carlo Mastrogiacomo
LSI Logic

Carlo is indeed correct that changing the definition for BUFSIZE in a.h from:

#define BUFSIZE 100

to:

enum { BUFSIZE = 100 };

will not change the situation. If the customer later changes the value of BUFSIZE as defined in this header, yet does not recompile every source file that refers to BUFSIZE, it's likely that the program will misbehave at run time.

It's also true that a C compiler won't let you define BUFSIZE as a constant object and use it as an array dimension. That is, a C compiler will let you define BUFSIZE as:

int const BUFSIZE = 100;

but then it won't let you declare:

int buf_a[BUFSIZE]; // error in C

C++ will let you do this, but this still won't solve Carlo's problem. If the customer changes the value of BUFSIZE as defined in the header without recompiling all the source files that depend on the value of BUFSIZE (b.c as well as a.c), the program will probably misbehave in the same way whether BUFSIZE is a constant object or an enumeration constant.

Constant expressions

In both C and C++, the expression specifying the dimension in an array declaration, if present, must be an integral constant expression. The expression may be as simple as an integer literal, as in:

int a[10];

It may also be an expression with several operands and operators, as in:

int b[2 * (M - N) + 1];

as long as each operand is itself a constant expression, so the compiler can reduce the expression down to a single integer constant at compile time.

Both C and C++ require an integral constant expression in a few other contexts: as the expression in a case statement; as the size of a bit-field member in a structure, union, or C++ class; or as the value of an enumeration constant. C++ also requires an integral constant expression as the initializer for an integral constant static data member initialized within its class, and as a non-type template argument.

Although their respective standards use different phrasing, a constant expression as defined in C++ is mostly the same as a constant expression as defined in C, with one notable exception. In both languages, an integral constant expression is an expression of integral or enumeration type whose operands are only integer or character literals, enumeration constants, sizeof expressions, or floating literals that are cast to integral or enumeration types. (I've left out a few minor details, but this is pretty close.)

The notable exception is that, in C++ but not C, an integral constant expression may have operands that are constant objects of integral or enumeration types initialized with constant expressions. For example, given:

int const BUFSIZE = 100;

then BUFSIZE can be an integral constant expression by itself, or an operand of a larger such expression, as in:

int buf[2 * BUFSIZE - 1];

The initializer for a constant object need not be an integral constant expression, but then the object cannot be used in a constant expression. For example, C++ accepts:

int n = 100;
...
int const BUFSIZE = n;

but BUFSIZE's initializer is not a constant expression, and so BUFSIZE itself is not a constant expression.

Regardless of the form of the initializer, a constant object is never a constant expression in standard C.

Linkage for constant objects

As I explained last month ("Enumeration Constants vs. Constant Objects," December 2001), C and C++ differ in their treatment of constant objects in another regard: in C, a constant object at global scope has external linkage by default; in C++, it has internal linkage by default. Let's see how this affects the example at hand.

Suppose you rewrite file a.h so that it defines BUFSIZE as a constant object:

int const BUFSIZE = 100;

Again, file a.c contains:

#include "a.h"

int buf_a[BUFSIZE];

and file b.c contains:

#include "a.h"
extern int buf_a[];
int buf_b[BUFSIZE];

This code won't compile as C because C won't accept a constant object such as BUFSIZE as a constant expression. However, it will compile as C++, and it behaves as follows.

By default, BUFSIZE has internal linkage. This means that each translation unit that includes a.h gets its own copy of BUFSIZE. That is, only code in a.c can refer to the copy of BUFSIZE in a.c, and only code in b.c can refer to the copy of BUFSIZE in b.c. As I explained last month, if the compiler can determine that a.c doesn't actually need the storage for BUFSIZE, the compiler need not allocate that storage. Ditto for b.c.

Suppose you compile a.c and b.c with a.h containing:

int const BUFSIZE = 100;

and you ship a.h, a.c, and b.o to your customer. Your customer then changes a.h so that it contains:

int const BUFSIZE = 50;

then recompiles a.c and links a.o with b.o. Each object file will be compiled using a different value of BUFSIZE. The linker can't use the value of BUFSIZE from a.o to adjust the storage allocation in b.o because each definition for BUFSIZE has internal linkage. It's quite possible that neither object file contains a symbol definition for BUFSIZE.

Now, suppose you define a.h so that BUFSIZE has external linkage, as in:

extern int const BUFSIZE = 100;

Then each translation unit that you compile will have a definition for BUFSIZE with external linkage. That is, a.o and b.o will each have a definition for BUFSIZE with external linkage. When you or your customer tries to link these object files together, you should get a linker error complaining about duplicate external symbols.

The bottom line

I don't believe there's a tidy solution to Carlo's problem. You can get something close if you're willing to dynamically allocate the buffers. But that has a cost in speed and space that he-and you-might not be willing to pay.

Dan Saks is the president of Saks & Associates, a C/C++ training and consulting company. You can write to him at dsaks@wittenberg.edu.