Friday, August 4, 2017

Writing a GCC back end

It is surprisingly easy to design a CPU (see for example Colin Riley’s blog series) and I was recently asked how hard it is to write a GCC back end for your new architecture. That too is easy — provided you have done it once before. But the first time is quite painful...

I plan to write some blog posts the coming weeks that will try to ease the pain by showing what is involved in creating a “working” back end that is capable of compiling simple functions, give some pointers to how to proceed to make this production-ready, and in general provide the overview I would have liked before I started developing my backend (GCC has a good reference manual, “GNU Compiler Collection Internals”, describing everything you need to know, but it is a bit overwhelming when you start...)

The series will cover the following (I’ll update the list with links to the posts as they become available)

GPL vs BSD could be a reason. I know most people don't agree but that's a long debate. Citing Stallman: 'For GCC to be replaced by another technically superior compiler that defended freedom equally well would cause me some personal regret, but I would rejoice for the community's advance. The existence of LLVM is a terrible setback for our community precisely because it is not copylefted and can be used as the basis for nonfree compilers — so that all contribution to LLVM directly helps proprietary software as much as it helps us.'"

I write about GCC because I think there are too much focus on LLVM... Both GCC and LLVM are very capable compilers with different strengths and weaknesses – LLVM is the best choice for some use cases and GCC for others.

For example, the GCC backend support is very flexible, and it is much easier to add “strange” architectures to GCC than to LLVM. I know that Embecosm has tried to improve the situation for LLVM (see e.g. their work with the AAP architecture), but my understanding is that there are still much work left to do in LLVM.

I am really looking forward to this series of articles. I like making instruction sets and implementing them. I tried to make a GCC backend for one of those instruction sets back early this year, but I never made it past assignment statements (without conditionals).

I have a dream of being able to compile C for the Soviet mainframe BESM-6 (48-bit word-oriented architecture with 6 chars/bytes per word), but several people who have attempted to write a GCC back-end for it, got cold feet. I wonder if it can be done at all if the number of chars per word is not a power of two, integers have reserved bit ranges, etc.