Wednesday, May 1, 2013

GCC 4.8 was recently released. This is
the first GCC release that is written in C++ instead of C. Which got
me thinking ...

Would this make sense for PostgreSQL?

I think it's worth a closer look.

Much of GCC's job isn't actually that much different from PostgreSQL.
It parses language input, optimizes it, and produces some output. It
doesn't have a storage layer, it just produces code that someone else
runs. Also note that Clang and LLVM are written in C++. I think it
would be fair to say that these folks are pretty well informed about
selecting a programming language for their job.

It has become apparent to me that C is approaching a dead end.
Microsoft isn't updating their compiler to C99, advising people to
move to C++ instead. So as long as PostgreSQL (or any other project,
for that matter) wants to support that compiler, they will be stuck on
C89 forever. That's a long time. We have been carefully introducing
the odd post-C89 feature, guarded by configure checks and #ifdefs,
but that will either come to an end, or the range of compilers that
actually get the full benefit of the code will become narrower and
narrower.

C++ on the other hand is still a vibrant language. New standards come
out and get adopted by compiler writers. You know how some
people require Java 7 or Python 2.7 or Ruby 1.9 for their code? You wish you
could have that sort of problem for your C code! With C++ you
reasonably might.

I'm also sensing that at this point there are more C++ programmers
than C programmers in the world. So using C++ might help grow the
project better. (Under the same theory that supporting Windows
natively would attract hordes of Windows programmers to the project,
which probably did not happen.)

Moving to C++ wouldn't mean that you'd have to rewrite all your code
as classes or that you'd have to enter template hell. You could
initially consider a C++ compiler a pickier C compiler, and introduce
new language features one by one, as you had done before.

Most things that C++ is picky about are things that a C programmer
might appreciate anyway. For example, it refuses implicit conversions
between void pointers and other pointers, or intermixing different
enums. Actually, if you review various design discussions about the
behavior of SQL-level types, functions, and type casts in PostgreSQL,
PostgreSQL users and developers generally lean on the side of a strict
type system. C++ appears to be much more in line with that thinking.

There are also a number of obvious areas where having the richer
language and the richer standard library of C++ would simplify coding,
reduce repetition, and avoid bugs: memory and string handling;
container types such as lists and hash tables; fewer macros necessary;
the node management in the backend screams class hierarchy; things
like xlog numbers could be types with operators; careful use of
function overloading could simplify some complicated internal APIs.
There are more. Everyone probably has their own pet peeve here.

I was looking for evidence of this C++ conversion in the GCC source
code, and it's not straightforward to find. As a random example,
consider
gimple.c.
It looks like a normal C source file at first glance. It is named
.c after all. But it actually uses C++ features (exercise for the
reader to find them), and the build process compiles it using a C++
compiler.

Postgres is good database, but it is designed 25 years ago, lot of internal components are aged. I see a useless to pay time and energy for port to C++ without significant refactoring and redesign. Lot of internal patterns - list, nodes, executor, error handling should be implemented different in C++ probably. But deep changes are related to deep bugs and issues - anybody knows a issue related to KDE4 or GNOME3.

Being neither a C++ programmer or C one, I find C much easier to read and has certainly given me a lot less hassle compiling.

The biggest problems I've had is integrating C++ code dependencies (e.g. for PostGIS GEOS and even PL/V8 the V8 engine). That's where all my crashing happens. I'm not sure if your proposal makes things more difficult or worse. I suspect worse just because I get (possibly misguided) feeling that C++ is more sensitive to things like which GCC version each part is compiled with and ABI compatibility (which I still have no idea what that means)

I guess I'm just saying that PostgreSQL and other libraries -- e.g. PostGIS have their own dependencies of which much is not under their control and not even guaranteed to be compiled by the same group. So a thorough review of those is necessary before jumping unto a hot plate. Even I as a windows user/developer take anything that Microsoft is doing or saying with a grain of salt. Simply because I know what they say is often how they would like things to be and they will be driven by inertia whatever that direction the inertia is swinging. I'd rather predict the direction of wind than pay attention to what they are doing.

And the jury is still out on it 2 years later. Although, it seems to be gaining some traction as the language of choice for personal projects if GitHub is any indication. Someone even implemented a SNES emulator in it...

Imo postgresql could use a lot of c++. But I'm afraid it would require pretty much a rewrite. And I don't think some of the top developers be happy to even consider it. Don't get me wrong, postgresql code is very nice - but as far as code goes, long functions, global variables, and all sorts of other ugly concepts that could use rewrite in c++.

Using a C++ compiler as a better (supported) C compiler should not bring too many problems en would ensure continued support on many platforms. One concern could be with embedded platforms, but perhaps even those have better C++ support than they did in the past.

And taking the pragmatic route like GCC has done (switch over, but don't rewrite the whole system to make it fit, if ever) should not change that much. And it gives you the option to start using some features in the future if they are deemed beneficial by the core developers.

The GCC migration is a I think the RIGHT way to do it for a mature project like PostgreSQL. Don't let the scope turn into a situation where you ALSO need to rewrite everything, just start, to the degree reasonable, taking advantage of the features when available.

Someone did implement a postgres inspired RDBMS in c++. It is called electrondb. I was an advisor to them for a couple of years. They are kind of in stasis awaiting more VC funding. Trying to convince them to go open source. But on a technical note, couple of guys wrote about 3 quarters of a million lines of code in about 4 to 5 years. As a C guy (ex sybase architect) I found reviewing this code a bit counter intuitive but it is not really that difficult once you get used to it. Template syntax is a pain though. Not sure about performance. We were starting to do tpc runs when things went into a hiatus.

So cut a long story short it can be done and the end result is acceptable. But how maintainable the code is, how portable across is and where the compiler and std lib trip you up can be determined only if this code sees production. But drizzle should give sufficient pointers on this since it is open though not postgres inspired.

On a related note i develop linux kernel code and l4 micro kernel. One is in c and the other is in c++. i need the c++ for some our security research design paradigms, so have no choice there. Toolchain in c++ is a pain. But microkernels are less tnan 20k lines of code, so we can live with it. Not sure if the code can take contributions at the level of linux. Unless competence of the average ( not the gurus !) driver writer improves, c++ is best avoided.

C++ does not suit every application. for example the kernel cannot do without C and some assembly code is inevitable too. so C++ is NOT a stand-in replacement for C. it's just another programming language. whether it helps the project in the long run depends on an analysis of what C++ features are needed vs what C features will be lost, and if it's all worth it.

here is some good information about why C++ by Herb Sutter for those who might be interested:http://channel9.msdn.com/posts/C-and-Beyond-2011-Herb-Sutter-Why-C

Everything c can do, c++ can do it too.With c++, it is easier to write type safe codes in c++, c++ supports cleaner code in several significant cases. It never requires "uglier" code like those ad hoc macro in linux.

If you look at the way the Linux kernel uses macros combined with GCC extensions like typeof(x), it is obvious that they are actually writing templates. And many of their struct definitions reproduce inheritance and virtual method calls.

In 2006-2008, at Dataupia, Postgres was already ported to C++ AND converted from multi-process-based to multi-threaded single-process-based. It was close-source (as part of high performance parallel database grid appliance). I left Dataupia in 2008 and I don't know what happened to that source base in the last 3-4 years. Speaking of that conversion. It was HUGE effort, and the people involved in this had previously invented bitmap indexes and wrote major commercial database engines. Knowing the amount of work needed for this, I would strongly advise against this initiative.

Well, changing PostgreSQL from multiprocess to multithread would likely be a 3-4 year effort as well, but no one is proposing that. I am in fact arguing for not changing the architecture at all, just upgrading the language slightly.

I support to use c++ to compile c-based postgresql code. Do not need to change the architecture or add in any class or template features. I never read the postgresql source code but it is very possible to link a c code to c++ object, intermix c and c++ in the same program. I would suggesta) Try to compile the code base, one by one, to c++. Test and fix along the way.b) Consider to use c++ features if it bring easy usage and performance.

My experience in previous companya) they did that and it worked for c programs.b) programs that use c++ template takes very long to compile. c) program that use deep and multiple inheritance has memory size and performance problem. we could not debug the program line by line easily (9 years ago as memory was limited and expensive on sun system). The was worse if array of c++ objects was created as each object need to call its constructor and parent constructors.

I've done a lot of work in C++ over the years, including porting a pretty large program from Fortran 77 to C++. And as a newbie to the PG code base, there are a few things that strike me as being cleaner in C++. For example, if linked-list elements could inherit from {d|s}list_node, rather than embed it as a member, then the list-handling code could be functions rather than macros if one wanted. (I'm not a fan of macros except where necessary.)

But I think there's a considerable risk of a slippery slope, which I've run into in the past. So many of C++'s features look like perfect solutions to certain problems (lambda functions, exceptions, templates of all kinds, inheritance, etc.) But it's so easy to *start* using those sensibly, and then wake up two months later in a ditch with a hangover, a new tattoo, an incomprehensible mess of "idiomatic" C++ code, and no idea how you got there.

If PG could be ported to a small, well-disciplined subset of C++ that made life better rather than worse, it could be a win. But coming to a good consensus about what subset that is, and sticking with it, might be a bit harder.

Sorry, let me expand on that: Rust has a Foreign Function Interface. Mozilla says they're going to start adding Rust to their products incrementally - if something needs refactoring, they'll do it in Rust and call that piece of code from the current C++ codebase.

Suggesting Rust because it's effectively C++ done right and contrary to Go it has no garbage collection. You keep the performance of C/C++ with a safer language than C and much smaller than C++.

Will this proposed change make it easier to write pg extensions in C++? The lack of official examples/tutorials make it near impossible for newcomers to reuse C++ code to extend postgresql. All one can find is a half-page principles and suggestions to look at rather large projects such as PL/V8 and PostGIS.