Parallel Programmer

James Reinders loves a puzzle. As the Director of Marketing and Sales for processor manufacturing giant Intel Corp., Reinders spends his days thinking about the low-level software that turns developer code into machine-readable bits. Reinders began his career at General Electric, working with the W2 compiler for the Department of Defense WARP supercomputing project. After joining Intel two years later in 1989, Reinders helped design the world's first TeraFLOP supercomputer (ASCI Red). He's been directly involved in low-level compiler and architectural work ever since.

Lately, Reinders has been very active talking about code parallelism as Intel looks for ways to make software work more efficiently with the new generation of multi-core processors. Not surprisingly, Reinders is the author of a new book, "Intel Threading Building Blocks, Outfitting C++ for Multi-Core Processor Parallelism," from O'Reilly Media (July 2007). We spoke with Reinders about compilers and parallelism, and what dev managers must change in the age of multi-core processors.

You've been deeply involved in compiler development. How exactly did you end up working in the guts of the machine?
I have a deep interest in exactly how things work, and in particular how computers work. So I'm very interested in the bit-level -- the transistor-level, if you will -- logic of the computer. I want to know how to do it. I'm very fascinated with how to take best advantage of it at a very low level.

Compilers are a fascinating place to work, because you're trying to take what you believe to be a language that's abstract, and you try to keep the programmer working on the task of programming, and translate that into something that's efficient for the machine to operate on. It keeps me engaged with the latest ideas in development and how to design the microprocessor.

But it also lets me step back and look at the programmer productivity issues and what we can do to have features in the compiler at a high level that can translate into efficient machine stuff, and are efficient interfaces for the programmers as well.

Boy, that has evolved a lot over time. When I started with Intel, Fortran 77 and C were the languages I worked on. Now it's Fortran 90 and Fortran 2003 and C++, and obviously there are a lot of other variations and languages that have popped up, especially the managed runtimes like Java and C#, and of course Perl and Python -- although I was using Perl the first year when I joined Intel and I still use Perl a lot.

You joined Intel in 1989. How much has the complexity of the underlying computing platform grown in that time?
We've come a long way. Architectures are much easier to target by compilers. But at the same time, compilers have increased their complexity enormously. We've seen the microprocessor architectures get much more complex and we've seen compilers in general get much more complex. It used to be that an optimizer was somewhat of a novel feature in compilers -- only the most exotic compilers had an optimizer, or really good ones. Nowadays it's pretty much a prerequisite.

One of the implications is that there's a lot more reluctance than there used to be to try out something new. Or it's much more difficult to introduce something radically new.

I think 20 years ago people were more open to work with radical new languages like Forth and things like that, playing with microprocessor architectures that didn't support the same instruction sets and so on. Now we're in a position where the investment is so high, and [with] the infrastructure that exists, that any revolutionary idea needs to have an evolutionary method of implementing it. You can still have revolutions, but they need to come at a controlled cost for the industry to absorb them.

"Now we're in a position where the investment is so high ... that any revolutionary idea needs to have an evolutionary method of implementating it. You can still have revolutions, but they need to come at a controlled cost for the industry to absorb it."

James Reiders, Director of Marketing and Sales,
Intel Corp.

How does the emergence of different languages and programming models affect a company like Intel?
At a broad level at Intel we're always interested in how people are programming and using machines and how we can help them. It's gotten a lot more diverse than it used to be. There are a lot of environments like Java and C# where the compiler really isn't the key to the performance, it's generally the runtime. So we have to help in more ways and be more nimble in a way to try and figure out where we can best help and where we can best understand the features of these runtimes and so on.

It affects microprocessor design. Things like managed runtimes tend to be less predictable than compiled languages, so we have to do things in the design of the micro-architecture, because things that might have been really rare before become more common types of code.

I know parallelism is a major thrust for your group. How does this compare to earlier optimizations like MMX, SSE and SIMD?The way I think about it is, instructions like MMX and SSE and SSE2 and so forth, they all had in general a highly localized affect on a program. Where we used them in a program was localized. When we would visit a company that was doing something we would help them identify key routines that would benefit from adding the use of MMX or a key driver that could be changed. It was kind of a surgical strike.

Parallelism in general doesn't conform to that -- that's the biggest change. There are some applications and some things that we're doing that we kind of consider low-hanging fruit, where you can get a lot of benefit out of doing localized adjustments. But to really get the benefit out of parallelism requires some higher-level structural change to programs. And that won't happen overnight.

In a sense it's urgent to get going with it. But it doesn't have to happen instantly for multi-core to have value. You look a decade from now, any popular program that hasn't done some restructure somehow to take advantage of parallelism at a high level isn't really going to take the best advantage of the latest processors.

What should compel dev shops to begin moving to parallelism?
My analogy is, if you thought someone might be coming over to your house but you weren't sure, would you pick it up and clean it up a little bit? Well, if you had a serial processor, you would wait until the doorbell rang to clean your house up. And you'd hang a little sign on the front door that says 'Please wait, I'm cleaning the house up.'

I think there are fundamentally things people will add to their applications to make the user interface better, because that's what we always do with extra compute power. That's what happened when we went from 50MHz to 2GHz, right? We went to better screen displays, more dense displays, font smoothing -- all sorts of things. It went to a better user interface.

I think parallelism opens up some things that fundamentally give us the ability to have better interfaces, to try to make computers more usable.

What tools should developers be familiar with to take advantage of multi-core processors?
For most of us, we'll be studying how to make our applications use parallelism. You should always start with libraries and auto-parallelism -- from compilers -- to see how much you can get easily. Again, for most of us, we need to continue on and look at abstractions that are widely available, such as OpenMP and Intel Threading Building Blocks.

Beyond abstractions is the low-level world of raw threads -- pthreads, Windows threads, Boost threads. This low-level world is not the right world for application development. It is more difficult to program, tune and debug, so results are neither efficient nor likely to do well on any 'future proofing' measure.

How do you start shifting to a multi-core development style when many of the computers you're programming to in the enterprise today are single-core?
Making a program work well on a single-core machine and a quad-core machine is much more difficult than making a program work well on a quad-core and an eight-core machine. The overhead in a program to make it able to break into multiple threads can be an issue, and so the number of programming techniques which are suitable will stay more limited for now than it will be later in this evolution.

Libraries, OpenMP and Intel Threading Building Blocks all leave this balancing act to the implementations under the covers -- and have shown really good results. I see many projects run into problems with other techniques in getting good performance on both single-core machines and quad-core machines.

Our processors may get faster, but programmers aren't getting any smarter. How can we improve programmer throughput and efficiency?
Again, it's a matter of perspective. If programmers could be 10 times as productive, would we need a tenth as many programmers? I think we are nowhere near a point where we have so many programmers that we don't need them. But we're constrained by their productivity. We always need to be looking for more productive environments.

Obviously that's what C# and Java and even Python or Perl [are helping to do], because if you have a tool that's more specialized for the task you're trying to do, it's more likely to make you more productive. I think it's a very rich area and one very much on my mind for parallel computing, because I know we aren't going to get a lot of people programming using threads directly. It's not productive enough. We need something much more productive if we expect people to do more parallelism. I think it applies to all walks of programming.

Beyond the enabling tools, what needs to happen to make parallel programming mainstream?
We need to wrestle with two things. One, how do we start training our newest programmers to think [about parallelism] from day one? And the other is, how do we re-train ourselves to think that way? Because no amount of adding tools and so forth can ever make up for the necessity of people to at least be able to think a little bit about parallelism.

I'm not talking about way down in the bits, exactly how to implement it, but fundamentally thinking about parallelism like my doorbell example earlier. People write programs to wait for the doorbell to ring to clean house in their program. They don't do that in real life, so why do we do that and how do we get out of that mode?

You had some thoughts about developer productivity and the impact on offshoring. Can you tell us about that?
[It's something] I've never seen anyone articulate. We've had a bit of a one-time boost in the number of programmers and the cost of programmers with India and China -- and to some extent other countries, Russia, Poland and what not. A lot of the world has suddenly joined us and is benefiting from Europe and the United States outsourcing a lot of programming to them. A lot of that programming is needed by Europe and the United States because we're at the level of computer usage we are now.

Two things are happening -- two dynamics. One is that the lifestyle and expense of those programmers in those markets is rising very quickly. So the cost advantages aren't going to go away instantly, but they're really changing quickly. You can't expect that to always be as huge of a differential as it has been. But the other thing is, what happens when India, Russia, China and so on start competing with us for programmers for their own domestic needs?

When I think through things like that, that's why I have no doubt that the world doesn't have too many programmers. We need productivity gains. At some point we're going to be competing for the programmers we take for granted right now. At some point China and India will turn around and want to use more and more of those programmers for their domestic needs, and the price goes up again. Supply and demand. And we definitely need productivity increases to deal with that.