I am currently taking a course on Compilers. I don't like the idea of blindly memorising facts without any sort of place to apply them to. I want to learn by hands-on doing.

So, I would like to have the complete code of 3-4 compilers, possibly for languages with different syntax rules (python,c, c++, java) while working through the Dragon book.

If complete compilers are too much of an ask, examples of parsers(well-written LL, LR, LALR parsers) and intermediate-code generators for these languages would also do.

There is a LOT of code out there on the Internet regarding this, but I want something that is considered to be high-quality and standard. I would be grateful for any resources that you can refer me to in this matter. Thanks.

Complete compilers are not too much of an ask. High Quality is easy to find. Almost all compilers for languages you've listed of are high quality. You'll have a hard time finding a bad compiler for Java, C C++ or Python.
–
S.LottJan 27 '11 at 16:22

I pointed out in the question that there is a LOT of code. How do I differentiate the good from the bad ?
–
Arjun J RaoJan 27 '11 at 16:25

@Arjun J Rao: You will have a hard time finding a bad compiler. Who would post a broken non-working compiler? Just look at the number of downloads and use the most popular download from SourceForge. It's really easy.
–
S.LottJan 27 '11 at 16:26

@S.Lott, almost all of the industry-strength compilers sources are of a very low quality. They're fast and robust, at a price of using hackish handwritten parsers. Students should not be exposed to this kind of things.
–
SK-logicJan 27 '11 at 16:28

1

@SK-logic. Can you provide links or references to support that assertion? More importantly, can you provide an answer to the question, rather than a broad opinion.
–
S.LottJan 27 '11 at 16:32

C/C++: GCC (old and crusty codebase, but extremely popular), or clang (newer, modular, getting close to production quality, backed by Apple among others). There's also TCC - the Tiny C Compiler, which would probably be good for learning.

Java: If you just care about the bytecode compiler, look at e.g., Jikes. If you want the JIT and whatnot as well, openjdk is for you.

That said, real compilers can be quite complex; building a toy compiler may be easier to understand. That said, TCC would likely be the best starting point out of this group, as it's small enough to easily understand.

Your course on compilers should be giving you the pieces that will eventually lead to a full blown compiler.

For example, the section on lexical analysis can lead to a component called the Lexer. If you keep an eye open to generics and re-usability, you can turn this into a component that can be used later in your compiler.

I highly suggest you take the approach of having at least two components in every homework project: main and the library component. In the example of lexical analysis, the main component would handle input and testing. The library component would be the lexer. This technique will help greatly after you graduate and develop huge applications in the real world.

I would definitely look into The LLVM Compiler Infrastructure. It is not a compiler by itself, but rather core tools for writing compilers, interpreters and virtual machines. Clang is a C/C++ compiler built on this framework.

Just note that implementing compiler theory directly will yield a very naive compiler. Most compilers extend that theory with many years of advanced research on parsing techniques, optimizations and code generation.

If you can, look into smaller projects, limited to a single architecture (i.e. some RISC computer) and a single language. Once you've progressed through that, look into bigger compiler suites which support multiple languages on the front-end and multiple architectures on the back-end.