Walter Bright

Dr. Dobb's Bloggers

C's Biggest Mistake

December 22, 2009

C is arguably the world's most successful programming language. Its success has, of course, endlessly tempted people to improve upon it. Thus, C is probably the patriarch of the longest list of languages. Notable among these are C++, the D programming language, and most recently, Go. There are endless discussion threads on how to fix C, going back to the 80's.

So this is well trod ground. What could possibly be added to this soup? I posit that most such discussions center around detail. More interesting is what is the largest fundamental mistake. We should take into account the context of the times that spawned C, and the problems it was trying to solve and the environment in which it was intended to be used. Keep in mind it was developed for a 16 bit machine, with extremely limited resources available. I'd like to dismiss things like it doesn'tdo garbage collection, functional programming, dynamic typing, or OOP. Those aren't problems C attempted to address, so the lack of them are not mistakes.

What mistake has caused more grief, more bugs, more workarounds, more endless hours consumed, etc., than any other? Many people would say null pointers. I don't agree. C's biggest mistake is:

Conflating pointers with arrays.

I don't mean them using the same syntax, or the implicit conversion of arrays to pointers. I mean the inability to pass an array to a function as an array, even if it is declared to be an array. C will silently convert the array to be a pointer, and will rewrite the function declaration so it is semantically a pointer:

void foo(char a[])

is exactly equivalent to1:

void foo(char *a)

This seemingly innocuous convenience feature is the root of endless evil. It means that once arrays leave the scope in which they are defined, they become pointers, and lose the information which gives the extent of the array - the array dimension. What are the consequences of losing this information?

An alternative must be used. For strings, it's the whole reason for the 0 terminator.For other arrays, it is inferred programmatically from the context. Naturally, every situation is different, and so an endless array (!) of bugs ensues.

The trainwreck just unfolds in slow motion from there.

The galaxy of C string functions, from the unsafe strcpy() to sprintf() onwards, is a direct result. There are various attempts at fixing this, such as the The Safe C Library. Then there are all the buffer overflows, because functions handed a pointer have no idea what the limits are, and no array bounds checking is possible.

This problem was inherited in toto by C++, which consequently spawned 10+ years of attempts to create a usable string class. Even the eventual std::string resultis compromised by its need to be compatible with C 0-terminated strings. C++ addressed the more general array problem by inventing std::vector<T>, and many programming guidelines eschew using T[] style arrays. But the legacy of C arrays continues in C++ with the unsafe iterator design.

The C99 attempted to fix this problem, but the fatal error it made was still not combining the array dimension with the array pointer into one type.

The Fix

But all isn't lost. C can still be fixed. All it needs is a little new syntax:

void foo(char a[..])

meaning an array is passed as a so-called "fat pointer", i.e. a pair consisting of a pointer to the start of the array, and a size_t of the array dimension. Of course, this won't fix any existing code, but it will enable new code to be written correctly and robustly. Over time, the syntax:

void foo(char a[])

can be deprecated by convention and by compilers. Even better, transitioning to the new way can be done by making the declarations binary compatible with older code:

This change isn't going to transform C into a modern language with all the shiny bells and whistles. It'll still be C, in spirit as well as practice. It will just relieve C programmers of dealing with one particular constant, pernicious source of bugs.

References

[1] The relevant text from K+R's The C Programming Language 5.3 is "When an array name is passed to a function, what is passed is the location of the beginning of the array. Within the called function, this argument is a variable, just like any other variable, and so an array name argument is truly a pointer, that is, a variable containing an address."

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!