Wednesday, May 1, 2013

GCC 4.8 was recently released. This is
the first GCC release that is written in C++ instead of C. Which got
me thinking ...

Would this make sense for PostgreSQL?

I think it's worth a closer look.

Much of GCC's job isn't actually that much different from PostgreSQL.
It parses language input, optimizes it, and produces some output. It
doesn't have a storage layer, it just produces code that someone else
runs. Also note that Clang and LLVM are written in C++. I think it
would be fair to say that these folks are pretty well informed about
selecting a programming language for their job.

It has become apparent to me that C is approaching a dead end.
Microsoft isn't updating their compiler to C99, advising people to
move to C++ instead. So as long as PostgreSQL (or any other project,
for that matter) wants to support that compiler, they will be stuck on
C89 forever. That's a long time. We have been carefully introducing
the odd post-C89 feature, guarded by configure checks and #ifdefs,
but that will either come to an end, or the range of compilers that
actually get the full benefit of the code will become narrower and
narrower.

C++ on the other hand is still a vibrant language. New standards come
out and get adopted by compiler writers. You know how some
people require Java 7 or Python 2.7 or Ruby 1.9 for their code? You wish you
could have that sort of problem for your C code! With C++ you
reasonably might.

I'm also sensing that at this point there are more C++ programmers
than C programmers in the world. So using C++ might help grow the
project better. (Under the same theory that supporting Windows
natively would attract hordes of Windows programmers to the project,
which probably did not happen.)

Moving to C++ wouldn't mean that you'd have to rewrite all your code
as classes or that you'd have to enter template hell. You could
initially consider a C++ compiler a pickier C compiler, and introduce
new language features one by one, as you had done before.

Most things that C++ is picky about are things that a C programmer
might appreciate anyway. For example, it refuses implicit conversions
between void pointers and other pointers, or intermixing different
enums. Actually, if you review various design discussions about the
behavior of SQL-level types, functions, and type casts in PostgreSQL,
PostgreSQL users and developers generally lean on the side of a strict
type system. C++ appears to be much more in line with that thinking.

There are also a number of obvious areas where having the richer
language and the richer standard library of C++ would simplify coding,
reduce repetition, and avoid bugs: memory and string handling;
container types such as lists and hash tables; fewer macros necessary;
the node management in the backend screams class hierarchy; things
like xlog numbers could be types with operators; careful use of
function overloading could simplify some complicated internal APIs.
There are more. Everyone probably has their own pet peeve here.

I was looking for evidence of this C++ conversion in the GCC source
code, and it's not straightforward to find. As a random example,
consider
gimple.c.
It looks like a normal C source file at first glance. It is named
.c after all. But it actually uses C++ features (exercise for the
reader to find them), and the build process compiles it using a C++
compiler.