11/03/2008 @ 6:00AM

Why Apps Can't Run Faster

The problem may not be evident at two, four or even eight multicore computer processors. But when an application is written to take advantage of eight cores and the next iteration of that processor has 20 cores, the vast majority of applications will continue to take advantage of only eight cores.

Moreover, even if an application is rewritten to utilize all 20 cores, the performance gains will be lower than a decade ago when the clock speed was doubled on a single core processor. And that’s a best-case scenario. A word processing application threaded for 20 cores will not run much faster than a version that uses one or two cores.

This poses a fundamental problem for companies expecting performance gains with every new server that’s introduced, and it’s an even bigger problem for those selling the servers. For now, many corporations are content with hardware that uses less power to do the same job. The newer chips and power supplies in servers are more efficient, and applications are being threaded to split up functionality across multiple cores to provide at least a reasonable performance boost.

And for now, at least, virtualization can boost the utilization rates of servers and soak up some of the excess cores. But virtualization doesn’t offer performance gains. It improves the utilization rate of machines, which is a measure of efficiency. And even the best virtualization scheme isn’t as efficient as a mainframe.

I have asked the top technologists at big server and processor companies about this issue, and the response I routinely receive is that great advances are being made in multicore programming. The companies always say they’ll get back to me with more information and follow-up interviews, but those interviews never materialize. Nevertheless, they’re still talking about more and more cores–hundreds, or even thousands, on a single chip.

It is possible to build many, many more cores on a chip. It’s even possible to run more applications on a single chip using virtualization. But so far, no one has been able to get the vast majority of applications to scale to more cores so that when you write the applications they automatically take advantage of the additional cores in new chips.

The problem is one of parallelization, and it’s been under study by universities for the past four decades. So far, no one has been able to figure out a way to build parallelization into most applications. It works great for things like search and databases, which really are the same thing except the data is located in different places. And it works for very redundant processes, such as splitting up math functions among multiple servers or cores. The hard part in all of those applications is splitting up the functions, and then bringing it back together into a cohesive form.

However, the vast majority of other applications cannot be easily parallelized. Instead, what developers have done is thread different functions across different cores–essentially dedicating a piece of the application to a core. Add more cores, and it has no effect on the work that has been done already.

There are always promises of new programming languages.
Intel
is developing one now called Ct, which allows some applications to be programmed more easily in parallel. The limited number of success stories should say something about that approach. And universities are working feverishly on this problem, but so far they don’t see any breakthroughs. Privately, the researchers say there may not be any significant breakthroughs.

That means we’re going to start hearing a completely different type of business pitch out of hardware vendors in the future. In the short term, performance is taking a back seat to efficiency. In the long term, we’re likely to see some very creative marketing as companies try to duck this issue for as long as they can.