IBM's Octopiler is a compiler that takes as its target not an architecture but …

Share this story

All afternoon I've been slogging through IBM's 25-page paper on their newly released Octopiler, and now things are clearer to me. See, Cell's greatest strength is that there's a lot of hardware on that chip. And Cell's greatest weakness is that there's a lot of hardware on that chip. So Cell has immense performance potential, but if you want to make it programable by mere mortals then you need a compiler that can ingest code written in a high-level language and produce optimized binaries that fit not just a programming model or a microarchitecture, but an entire multiprocessor system. This isn't just a tall order, or even a doctoral dissertation. It's a generation's worth of doctoral research. Meanwhile, the PS3 is due out in 2006.

Octopiler is intended to become just such a compiler—one that can take in a sequential program that's written to a unified memory model, and output binaries that make efficient use of the massive, heterogeneous system-on-a-chip that is the Cell Broadband Engine. I say "intended to become," because judging from the paper the guys at IBM are still in the early stages of taming this many-headed beast. This is by no means meant to disparage all the IBM researchers who have done yeoman's work in their practically single-handed attempts to move the entire field of computer science forward by a quantum leap. No, the Octopiler paper is full of innovative ideas to be fleshed out at a further date, results that are "promising," avenues to be explored, and overarching approaches that seem likely to bear fruit eventually. But meanwhile, the PS3 is still due out in 2006.

What IBM has in mind is what I would call a tiered approach to Cell. At Tier I, there's the "expert programmer" (IBM really means "expert programming team") who codes to the bare metal, manages memory alignment and traffic flow issues like NYC's finest, and just generally makes all the parts of the Cell scream in perfect harmony. This guy, if he exists, is going to be worth his weight in gold. No, scratch that. He'll be worth Marlon Brando's weight in diamond-studded platinum.

On Tier II is the mortal but still highly paid and very overworked C programmer, who uses branch hints, prefetching, profiling, DMA commands, and so on manually to keep code and data flowing to all the parts of the Cell, while letting the compiler handle the lower-level stuff, like register allocation, code scheduling, and optimizations.

Tier III is the domain of the programmer who can't be bothered to vectorize his own code. This guy lets the Octopiler auto-vectorize his programs for use with the Cell's SPEs. He still has quite a bit of work to do, since auto-vectorization is easier said than done. So he has to work with the compiler's feedback in order to tune his code for maximum auto-vectorization potential in order to get the best possible performance out of the SPEs.

Finally, on Tier IV is the programmer who just wants to port his single-threaded x86 program to Cell in as painless a manner as possible. This person doesn't even care to know anything about "heterogeneous multiprocessing" or any of that fancy stuff. He just wants to see "Hello World" greet him on the screen. Ok, just kidding. IBM claims the following for this highest level of Octopiler hand-holding:

The compiler provides user-guided parallelization and compiler management of the underlying memories for code and data. When the user directives are applied in a thoughtful manner by a competent user, the compiler provides significant ease of use without significantly compromising performance.

Getting Tier IV to work where the money is at. It's also going to be quite painful for IBM to achieve their stated goal of "not significantly compromising performance." I think they, or someone else, will get there eventually. Meanwhile, the PS3 is still due out in 2006.

This brings me to the question, is the PS3 launch doomed? No, of course it isn't. Developers will make something happen. That "something" just isn't at all likely to rise to the full potential of what the Cell could be capable of with another decade of industry-wide effort on heterogeneous multiprocessing systems.

The final point I want to make is that nowhere in this post have I mentioned what is perhaps the biggest challenge facing programmers who write non-deterministic applications for a highly multithreaded SoC like Cell: debugging. But I felt justified in skipping over the topic of debugging, because IBM didn't really cover it in their paper.

Update: My apologies to Stephen Shankland over at CNET for neglecting to include a link to his earlier coverage of Octopiler. I got the links in my post from Shankland's article, and I should've referenced him in my original story.