Andrew Koenig

Dr. Dobb's Bloggers

The Nightmare of Binary Compatibility

April 02, 2014

Wasn't C compatibility a major reason that C++ was designed the way it was? If so, why not write that desire into the standard?

Last week, I wrote that compatibility between C++ and C is a major reason that C programmers adopted C++, and also a major source of difficulty in C++ implementations. This duality is particularly evident when we think about binary compatibility.

From the early days of the C++ standards committee, people would ask me, "Is C++ required to be binary compatible with C?" Most of the time, after I would answer no, the questioner would become perplexed or angry. Wasn't C compatibility a major reason that C++ was designed the way it was? If so, why not write that desire into the standard?

My usual answer was to invite the questioner to consider:

A C++ implementation on a computer with no C implementation available.

A C++ implementation on a computer with two or more C implementations that are not binary compatible with each other.

In both cases, it is impossible to make C++ "compatible with C." The best possible scenario is for a C++ implementer to say that a C++ implementation is binary compatible with a particular C implementation.

Even such a mild statement can be difficult to make, because it requires the C++ implementer to follow the lead of the C implementer. Suppose a C implementation changes in a way that introduces a C++ incompatibility. Does that change require the C++ implementer to follow suit immediately? What if the C implementer does not even tell the C++ implementer about the change? How does the C++ implementer find out about the change? Even at such a high level, it is clear that defining the precise meaning of compatibility is not always easy.

In addition to the definition problem, there are implementation problems — even for parts of the language that may seem obvious. For example, for binary compatibility to be useful, it must include the notion of calling a C function from a C++ function and vice versa. Implementing this notion seems to require using a single linker for both C and C++ programs. This is a reasonable requirement, at least on the surface, and usually leads to the requirement to use the linker that was already in place for C programs.

Now consider a C++ program that includes an overloaded function — i.e., two different functions with the same name. How can a linker, intended for C programs, which presumably identifies functions by their names, cope with the notion of two different functions with the same name? Most likely, it can't. Instead, the C++ compiler must translate these two functions into two differently named C functions.

The obvious way to do this translation is by what has come to be called name mangling: Enlarge each function's name to encode information about the types of its parameters. Not only does this strategy cater to overloading, but it also allows stricter type checking across translation-unit boundaries. What it does not allow, of course, is C compatibility.

Here, the C++ compiler has no problem inferring, at least in principle, that foo might be a C function, and therefore that it must mangle foo(double) but not foo(int). However, we can make the compiler's job impossible by making one small change to the program:

Our C++ program now has two functions, both named foo, defined in other translation units. At most one of those can be a C function, because there cannot be two C functions named foo. If one of them is a C function, which is it?

I see no way in which a C++ compiler can possibly answer this question automatically. You might think it could choose whichever version of foo is defined first, and indeed that strategy seems almost mandatory. Suppose, for example, that we rewrite our program yet again:

Nothing stops us from declaring a new version of foo after we have declared and used another version. However, at the point at which the compiler sees the definition of main, it does not yet know that foo is going to be overloaded. Therefore, if the compiler is to decide automatically, it must decide that foo(int) is compatible with the (perhaps not yet compiled) C version of foo. Therefore, choosing the first version of an overloaded function to be the one that is compatible with C seems necessary if the compiler is not to have to examine every function in an entire translation unit before it can compile any of them.

However, this strategy goes horribly wrong in the face of the following translation unit:

extern double foo(double); // These two versions of foo are declared in the
extern double foo(int); // opposite order from the other translation unit
// anything

Which version of foo is compatible with C here? If it is foo(double), then the program is broken: foo(int) and foo(double) cannot both be compatible with C because there can be only one C function named foo. Yet we now have the ugly implication that when we declare overloaded functions, the order in which they are declared makes a difference, to the extent that all of the translation units that use overloaded functions must declare all of those versions of each such function in the same order.

Because of these complexities, early C++ compilers adopted a manual strategy: In order to overload a function, every translation unit had to say explicitly that that function name was overloaded. If there was a declaration such as

overload foo;

then every use of foo in that translation unit was assumed to be overloaded, hence not compatible with C. Otherwise, the function could not be overloaded at all. However, this strategy still had a disadvantage: To overload a function that was not previously overloaded required adding an overload declaration — not only in the translation unit that defined the overloaded function, but in every translation unit in the entire program that used that function. In effect, overloading a function required recompiling every translation unit that declared that function, even if it did not care about the overloading.

Eventually, C++ compilers adopted a different strategy for C compatibility in the face of overloading. We shall start exploring that strategy next week.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!