I wonder about clang/llvm. I want something as far along as possible to try to learn the changes from C++11. I've been compiling it this week on slackware (the machine won't run OpenBSD yet). My first try, I went with the debug version (debugging info is good, yeah, especially since I took the tip of their source control) and ld wanted 3 GiB. Fine, but I gave up when the install (and object files I had left over) was going to take something over 10GiB. So this morning I started again with the "release" version and it's still building tonight. As my machines go, this one's not that slow (only 5 years old and multicore -- oh shoot, I forgot -j 3, oh well, at least the system's still responsive).

Has anyone ever manually built gcc? Is it the same?

Then clang needs someone else's standard c++ library, right? Plus it requires gcc to bootstrap. I guess these aren't showstoppers, but, I dunno, to me it almost seems like part way along the road to building something like GHC. Okay, maybe that's exaggerating.

I occasionally glance at the Bitrig project which is based on LLVM's clang 3.1, I have not had the guts to try it though. Their package list has grown to the point that most of Xfce4, Firefox/Xombrero have compiled for the amd64 porthttp://mirror2.us.bitrig.org/pub/bit...0723/packages/

Building gcc is a nightmare...at least if it is done right. You have to build gcc-newver with gcc-oldver, then build gcc-newver with gcc-newver. Not the most efficient use of time, as you can imagine.

Yeah, I guess I was missing the obvious fact that an existing gcc binary (or other C++ compiler) is also a prerequisite for building gcc itself. And there, what, there must be a window of versions you'd have to use to get whatever ISO C++98 features they use now (bound to expand over time -- perhaps eventually to C++11). Now that my clang build's finished I'm thinking it wasn't so bad at all. The dependencies were very minimal -- less than gcc's I think, looking here: http://gcc.gnu.org/install/prerequisites.html (except gcc's dependencies are clang's dependencies transitively, in a way).

So if you're starting from the beginning on a new platform that's not supported yet how do you bootstrap the compiler? You have to cross compile, I guess. It would be interesting if someone somehow kept track of what compilers are the parents of what binaries all the way back. Do our present compilers have as their ultimate parent the original pcc in the 70s or Ken Thompson's original compiler written in assembly? Or is there a more recent C compiler written in assembler (which is written in...?) that's the root of what we have now? Or could it be multirooted if pcc's yacc was based on one thing and the compiler processing yacc's output something else. I guess more likely it would be all rooted at Ken Thompson, maybe his B compiler. Maybe he's having a laugh on us all and played the trick he wrote about in that paper.

I think "not maintained for general use" is the key thing here. The developers hammer it into place just enough to make a new arch "native build" capable, and then throw away the cross compiler. Not really intended for general use.

Edit - and as long as the compiler is designed in a semi-sane fashion (take the source in, say, C, and emit the equivalent assembly, then assemble to machine code), the compiler doesn't need to be re-written in assembler, it just needs to have a new target arch written into it to permit it to output correct assembler for that architecture. (But I'll grant the fact that this observation is the 10,000 foot view of the problem, and just because a compiler outputs proper assembly for unsupported arch when run on supported arch doesn't mean the compiler can cross-compile itself to make it "native build" capable).

I'm not making myself clear. I'm over the idea that moving to clang is difficult because of bootstrapping or dependencies (although I now have something at home that fails to find std::cout when it links a simple C++ hello world -- never mind, I'll do the research myself). Over the course of my post I transition to wondering how compilers get bootstrapped to realizing they must be generated by other compilers, not necessarily running on the same platform, to wondering about what the ultimate root of all these compilers is.

Assembler code is kind of a red herring here. When thinking of an ultimate root I should have either said hand entered machine code or else said manually written assembly code and (arbitrarily?) decided to leave out the assembler in the compiler family tree.

So the ultimate root, the point up the tree where a human directly created the machine code rather than that code (via C, assembler or other intermediate format) being output by another program -- I leave text editor out as a generating program here -- is it Ken Thompson's B compiler, something further back, or something later? Wouldn't it be interesting (for the person looking at the chart, not the person creating it) to have a graph mapping compiler versions and platforms all the way back through each compiler build. For instance, when BSD first started using gcc, they compiled it with pcc? When Bell switched from Ken Thompson's compiler to pcc, they compiled the latter with the former and yacc, which was also compiled with the former? Or are there some other oddities in the tree, e.g. a VAX VMS system compiler in the ancestry?

It's interesting to think (if it's true) that all our C and C++ compilers were ultimately generated, across many levels of indirection as each succeeding compiler was built, by Ken Thompson. Kind of like finding out your family descends from the Mayflower, if you were an American that is. Or maybe a more universally relatable analogy would be to Carl Sagan pointing out our bodies' molecules are from supernovae.

This is the paper I referred to: http://cm.bell-labs.com/who/ken/trust.html From this you wonder if the valiant OpenBSD devs handling compiler issues ever have to find bugs in the compiler that generated the compiler they're seeing problems with. You could imagine the new compiler being bug free (okay, not bug free but not suffering from the bug at issue) at source level but having a bug in its binary caused by the old compiler. I can't even imagine how hard that would be to figure out.

I've read that paper. David Wheeler's response is a nice followup (albeit many moons later =)

"Simply recompile the source code twice: once with a second (trusted) compiler, and again using the result of the first compilation."

Where does this trusted compiler come from? If Ken Thompson's at the root of all C compilers what if he is secretly an evil genius (or slightly less paranoid, what about generations between your so called trusted compiler and his, supposing we all trust him)?