During the last LLVM developers'meeting, several Apple engineers made some presentations about their work on different aspects of LLVM. One of them explains the work on "clang", a new C front-end, which will also support C++ and Objective C. Chris Lattner also discussed the use of LLVM in Leopard Open GL. Slides and videos available. Additionally, LLVM 2.0 was released recently.

Seriously; I can't understand it. I tried to look it up, and all the information I could find out about it was very obfuscated, without giving a clear idea of what it did or where it came from.

(It doesn't help that the video presentations in the site linked were only provided in Quicktime format, either...)

The closest thing to an inkling I got was that it's a sort of cross-platform compiler, something that would compete with GCC, except that it incorporates GCC in it. Is this correct, or was that site far off?

The only platform I have that Quicktime has been ported to is Windows XP, and my experience with Windows-based Quicktime is that it's very slow, a memory-hog, too buggy, full of advertisements for other Apple products (I hear it's got much worse after iTunes and the iProduct revolution), and I don't get the same quality as with other file formats the same size. One of the last crashes I had with Windows was the boot after I installed Quicktime; I uninstalled it in Safe Mode, and my computer booted again.

I'm told it's none of these things on a Mac. Too bad my only Mac is a SE/30...

Think of a compiler as a pipeline. You stick human-readable source code in one end, and machine code comes out the other.

Conventionally, this pipeline is divided up into a number of stages, which are often grouped into three parts, the "front end", "back end", and "middle end" (it's a stupid term, I know). The front end is responsible for lexing and parsing the program text into an abstract syntax tree, and then converting that syntax tree into some lower-level representation for analysis. The back end converts that lower-level representation into machine code.

In between, the code is in a language-agnostic and machine-agnostic format. In other words, the code can be optimized with absolutely no regard to the source language or the target architecture. As a result, any optimization on this middle format can instantly be taken advantage of by any language front-end or any machine back-end.

There are a number of useful formats that can be used for this middle stage, but the best we have now is called SSA, for "static single assignment". In SSA, each variable name is used exactly once. Whenever a variable is assigned multiple times, it is subscripted, and any reference to that version of the variable carries the same subscript. This has the effect of making it explicit exactly which expressions depend on which, and so optimizing code in this form is relatively straightforward.

The idea behind LLVM was "lifetime program optimization". Conventional compilers (including GCC) optimize program modules, but go no further. LLVM is made to extend this not only to link-time (which some compilers do already), but to install-time, run-time, and even idle-time. After all, every computer has a slightly different configuration, and why shouldn't programs be able to optimize themselves with this information?

Thus, LLVM is a combination of two things. First, it's an SSA language, which is clean, type-safe, and completely specified, in contrast to GCC's intermediate representations (there are three that I know of). And second, it's an optimizing framework and code generator. All LLVM-based optimizers read in and write out LLVM code; the code generation (which mostly consists of translating instruction names and allocating registers) happens at the very end.

So, in effect, LLVM is a compiler "middle" and back-end. It's the second half of that compiler pipeline; you feed it in intermediate code, and it gives you heavily-optimized machine code.

Of course, if you want to compile C programs, you need to convert your C code into LLVM. That's the job of a front-end. So the LLVM guys took the GCC front-end and rewrote it to generate LLVM code instead of GIMPLE. The combination of that front-end and the LLVM back-end makes a complete compiler.

Unfortunately, while GCC is excellent software, it's showing signs of age. For one, the code base is just a huge mess, which isn't really a surprise when you consider that it's 20 years old. For another, it was written before compiler programmers realized how awesome SSA is; GCC has only used SSA (in the form of GIMPLE) since v4 came out in 2005. And finally, the C/C++ parser converts code into the language-agnostic GENERIC form, which is kind of useless for an IDE that wants to refactor code.

LLVM took care of all of the backend problems, and the goal of clang is to take care of the front-end problems. clang's architecture is very straightforward; first, lex and parse C/C++ code into an abstract syntax tree, which other software can access; second, convert that syntax tree into LLVM code without bothering with any other intermediate representations. This has a number of useful effects; for example, because there are so many fewer intermediate stages, clang/LLVM uses far less memory than gcc (and thus can more effectively apply whole-program optimization to larger programs). But most importantly, because it generates a language-specific abstract syntax tree, you can use the front-end anywhere that you need to analyze C/C++ code, and you can be guaranteed that you'll be analyzing code in exactly the same way as the compiler.

You should be able to; from what I understand, its not 'virtual machine as in the traditional sense of the word. You can also go straight to native code if you want. Its going to be interesting to see how mature LLCM is like, and much much of a role it'll play in future development of MacOS X; it will be interest if they (Apple) eventually drop GCC in favour of LLVM.

"Low Level Virtual Machine, generally known as LLVM, is a compiler infrastructure designed for compile-time, link-time, run-time, and "idle-time" optimization of programs written in arbitrary programming languages.

Using LLVM, one can create a virtual machine for languages similar to Java and JVM relation, a code generator for a specific machine architecture and optimizers independent from particular platforms or languages. LLVM is language and architecture independent; it lies between a language-specific module and a code generator for a machine. LLVM includes aggressive interprocedural optimization support, static and JIT compilers, and has many components in various stages of development (including Java bytecode and MSIL frontends, a Python frontend, a new graph coloring register allocator, and more). The JIT compiler is capable of optimising unnecessary static branches out of a program at runtime, and is therefore useful for cases where a program has many options, most of which can easily be determined unnecessary in any environment. Because of this, it is used in the OpenGL pipeline of Mac OS X 10.5 ("Leopard") to provide support for missing hardware features.

It currently supports the compilation of C and C++ programs, using front-ends derived from version 3.4 and 4.0.1 of the GNU Compiler Collection (GCC). LLVM is written in C++ and was started in 2000 at the University of Illinois at Urbana-Champaign. It is publicly available under the University of Illinois Open Source License [1], an OSI-approved license that is very similar to the BSD license."

Basically, LLVM is half of a compiler/JIT; it performs optimization and code generation. (This is often called the "back end" of the compiler.) Since LLVM is only used by people who develop compilers and VMs, the Web site is probably aimed at them, not at the general public.

LLVM has a lot of knowledge about processors, but it knows nothing about programming languages, so someone cut GCC in half and welded the GCC front end (which performs source code parsing, type checking, etc.) onto LLVM to create a C compiler called llvm-gcc. However, it sounds like these Apple people really dislike GCC, so they wrote a new C front end called clang. When you combine clang and LLVM, you get a complete C compiler that is supposedly cross-platform, super-fast, light on memory, BSD-licensed, and (best of all) completely under Apple's control.

Being BSD Licensed can't make it completely under Apple's control can it?
Wes mentioned cross-platform so whether Apple turns Xcode into a stand-alone IDE or keeps it on the Mac, using BSD code will allow Apple to produce a tighter IDE without worrying about having to give up any source code. This might be the "completely under Apple's control" that was mentioned.

What I mean is that the development is under Apple's conrol, so they can add whatever features they want. Also, they can choose what code to release and what to keep private because of the BSD license.

1. A compiler toolkit including a mid-level IR, many
optimization passes, a machine-level IR and code gen
for many targets.
2. A set of libraries for building compiler related
tools such as translators, assemblers, linkers,
archive tools, etc.
3. A competent C/C++/Obj-C compiler based on GCC 4.0
4. A framework for program analysis and other compiler
related research.

When I first read about LLVM it also sounded pretty obscure to me. But according to Zack Rusin's blog (http://zrusin.blogspot.com/2007/05/mesa-and-llvm.html) it could be used as a framework to write OpenGL Shading Language vertex and fragment programs in any language (that LLVM) supports and generate GLSL code from it. Well read Zack Rusins block for more details, it is interesting to read anyway.

How does the LLVM idea compare to the "Virtual processor" whichi is the root of the portable TAO Group's Intent RTOS selling since many years?
The comments here are very compiler oriented, but the VP is only the processor instructions set that a compiler can target.
What about the VP code once it has been generated ? It needs a special "loader" which translates on the fly the VP code into native processor code, when it loads the bytecode from disk to memory. TAO said that this didn't affect the load time.
So the LLVM concepts seems the same for me, but all explanations are very obscure, like if concept was not clear.
Somebody could enlight this and show how they differ please ?

Both LLVM and VP are fundamentally virtual machines which a compler can target (as opposed to a real machine). The difference between the two stems from the goals of each. LLVM is designed with a focus towards optimization. VP is designed to facilitate portability. The design of the two systems reflects their purposes. VP is a low-level register machine. LLVM encodes high-level things like function calls, array references, structures, etc, which provide additional information to the optimizer. LLVM code is always in SSA form it has no move instruction and a phi pseudo-operation to reflect this. VP has no phi instruction and allows arbitrary register-register copies.

As for why LLVM and VP look that similar to begin with, it's because an assembly-like representation is a convenient form both for doing optimizations and for doing process-neutral distribution. They allow language-specific details to be expressed using lower-level features, and expose optimization opportunities that may be obscured under high level constructs.

If you want to know how LLVM and gcc compare to each other, please start by reading this: http://gcc.gnu.org/ml/gcc/2005-11/msg00888.html
. This text was posted onto a gcc mailing list and is a proposal by one of the LLVM authors to integrate LLVM into gcc.

Yes, BSD licensing means it's completely under anyone's control. BSD is a freer license than GPL for the user of a given piece of code-- it has no political agenda, just "use and share if you want to (or don't)".

It looks like a win all around. Apple gets a better compiler than gcc for the languages they care about (C/C++/ObjC), they're freed from the GPL restrictions, can more easily move to x86-64, and they retain compatibility with gcc.

The only major downside I can see is that this appears to further lock Apple into Objective-C, which is one of the major things holding them back from mainstream developer adoption. Although one of the slides explicitly called out that they're not interested in Java and such, if it came down to it I wonder if they could do a C# or Java front-end.

"Yes, BSD licensing means it's completely under anyone's control. BSD is a freer license than GPL for the user of a given piece of code-- it has no political agenda, just "use and share if you want to (or don't)"."

I'm not sure what that means since I believe it's always been said that BSD is "freer" for the developer and GPL is "freer" for the end user via the four freedoms. Not to talk about how they are only copyright (DISTRIBUTION) licenses so there is no agenda placed on an end user.

I'm not sure what that means since I believe it's always been said that BSD is "freer" for the developer and GPL is "freer" for the end user via the four freedoms.

Yes, it's been said that, but that doesn't make it true. The bottom line is that the BSD license is pretty much completely free. The GPL license is fairly restrictive, and the FSF is very hostile to the commercial software industry which has brought the "end users" most of the software in existence today (most of which has "freed" me from the drudgery of not having the software at all.)

Anyway, to keep this discussion on-topic, it's very much in any company's interest to minimize dependencies on GPL'ed code, especially in light of GPLv3 and the FSF going off the deep end of late, and in that sense Apple's pretty much forced to move away from GCC. I'm glad they found a solution that seems superior to GCC in all respects when they did it. The fact that they're still sharing it despite it being BSD is also illuminating.

"Yes, it's been said that, but that doesn't make it true. The bottom line is that the BSD license is pretty much completely free. The GPL license is fairly restrictive, and the FSF is very hostile to the commercial software industry which has brought the "end users" most of the software in existence today (most of which has "freed" me from the drudgery of not having the software at all.)"

I'm not sure about that. You refer to "commercial software", and yet GPL'd software is also be commercial-- they just have to follow the GPL. Compared to the default copyright that the Berne Convention gives, GPL and BSD are both free. But they accomplish different things in terms of rights a user/developer gets, because BSD can be taken proprietary and still distributed in public (non-free in GPL parlance). That can be a good thing in some cases, like how RMS blessed OGG going BSD, but bad in terms of free-ness in other cases.

Copyrighted software didn't exist for a long time, so the GPL only restores that balance, especially today when copyright terms are too long, and when software patents threaten individual software developers. GPL creates opportunities for developers that BSD can't, e.g., the GPL allowing for use in GPL'd products, while also allowing people to license their code in proprietary products like Trolltech does to Opera, Adobe, Google Earth. Otherwise with BSD, a company could fork QT and drive Trolltech out of business.

I don't believe the FSF has gone off the deep end either. They are protecting the four freedoms.

BTW Apple and Steve Jobs have a history of not playing nice with the GPL, eg., the KHTML saga, GCC and Objective C in NeXTSTEP, etc, so I guess they don't like it very much like you've said. I think they took Darwin closed as well when they switched from PPC to Intel that pissed off the developers. Free software has certainly driven down the price of software to the benefit of customers, so FOSS licenses play a part in that decision.

Did we really need to turn this into yet another GPL vs. BSD flamewar? Can't we just let the subject rest and say that people prefer different options for different reasons?

Yes, it's been said that, but that doesn't make it true. The bottom line is that the BSD license is pretty much completely free. The GPL license is fairly restrictive

Yes, you could say that. By the same argument I could say that Sudan is more "free" than the US and Western Europe, because they don't have any laws you have to follow and we do (like not murdering someone). But I wouldn't say that, because sometimes restrictions do promote freedom.

OK, perhaps that example was a little inflammatory. I apologize to anyone who was offended. I'm simply saying that the BSD license is like anarchy - some people might argue that = freedom, while others would argue that following a set of moral rules which benefits the community is what truly gives people freedom. Regardless of what your definition of "freedom" is, I think different people have the right to prefer one over the other without constantly being attacked for how they should switch to be more free. That goes to both sides.

"Regardless of what your definition of "freedom" is, I think different people have the right to prefer one over the other without constantly being attacked for how they should switch to be more free. "

Amen, but I fear that a forum without license flamewars is an Utopian dream.