Newer systems such as OpenCL are being made so that we can run more and more code on our graphics processors, which makes sense, because we should be able to utilise as much of the power in our systems as possible.

However, with all of these new systems, it seems as if GPUs are better than CPUs in every way. Because GPUs can do parallel calculation, multi-core GPUs actually seem like they'd be much better than multi-core CPUs; you'd be able to do many calculations at once and really improve speed. Are there still certain cases where serial processing is still better, faster, and/or more efficient than parallel?

Not really a question about hardware. Should be reworded to "when is programming the CPU(s) better than programming the GPU(s)" and is such is a pretty good p.se question IMO. See the GPGPU tag among others on SO. But architecture "What tech to use" questions are better here than there.
–
Kate GregorySep 11 '11 at 15:04

1

@Kate That angle seems to be very well covered in the linked Super User question. Reading through it, I'm a bit surprised it didn't get migrated here, to be honest. There's also this on SO. I'll reopen the question (since you're right, the programming aspects of it are on-topic here). I hope we see an answer that isn't just pointing to existing (excellent) coverage of this problem.
–
Anna Lear♦Sep 11 '11 at 15:19

1

To @Anna's point, I think the answers need to be much more about when a programmer should use the GPU rather than a purely theoretical discussion of what the difference is between a GPU and CPU. I've edited the title to reflect this.
–
user8Sep 11 '11 at 17:14

2

@RetroX We can't close questions as duplicates if they are on different sites.
–
Anna Lear♦Sep 11 '11 at 19:33

7 Answers
7

However, with all of these new systems, it seems as if GPUs are better
than CPUs in every way.

This is a fundamental mis-understanding. Present GPU cores are still limited compared to current top-line CPUs. I think NVIDIA's Fermi architecture is the most powerful GPU currently available. It has only 32-bit registers for integer arithmetic, and less capability for branch prediction and speculative execution then a current commodity Intel processor. Intel i7 chips provide three levels of caching, Fermi cores only have two, and each cache on the Fermi is smaller than the corresponding cache on the i7. Interprocess communication between the GPU cores is fairly limited, and your calculations have to be stuctured to accommodate that limitation (the cores are ganged into blocks, and communication between cores in a block is relatively fast, but communication between blocks is slow).

A significant limitation of current GPUs is that the cores all have to be running the same code. Unlike the cores in your CPU, you can't tell one GPU core to run your email client, and another core to run your web server. You give the GPU the function to invert a matrix, and all the cores run that function on different bits of data.

The processors on the GPU live in an isolated world. They can control the display, but they have no access to the disk, the network, or the keyboard.

Access to the GPU system has substantial overhead costs. The GPU has its own memory, so your calculations will be limited to the amount of memory on the GPU card. Transferring data between the GPU memory and main memory is relatively expensive. Pragmatically this means that there is no benefit in handing a handful of short calculations from the CPU to the GPU, because the setup and teardown costs will swamp the time required to do the calculation.

The bottom line is that GPUs are useful when you have many (as in hundreds or thousands) of copies of a long calculation that can be calculated in parallel. Typical tasks for which this is common are scientific computing, video encoding, and image rendering. For an application like a text editor the only function where a GPU might be useful is in rendering the type on the screen.

GPUs aren't generalist processors the way CPUs are. They specialize in doing one very specific thing--applying the same code to a large amount of data--and they do it very, very well, much better than a CPU does. But the majority of most applications is not about applying the same code to a large amount of data; it's about an event loop: waiting for input, reading the input, acting on it, and then waiting for more input. That's a pretty serial process, and GPUs suck at "serial."

When you have a large amount of data that you need to process, and each item can be processed in parallel, independently of the others, then go ahead and send it to the GPU. But don't think of this as "the new paradigm" that everything has to be squeezed into.

This question is tagged "optimization," so remember to treat it as one. Apply GPU optimization where testing and profiling reveal that optimization is needed and the nature of the task is such that GPU optimization can be applied. Otherwise, don't bother with it, as that would be premature or incorrect optimization, which causes more problems than it fixes.

The simple answer is that a GPU works best when you need to do a fairly small, fairly simple computation on each of a very large number of items. To accomplish much this way, the computation for each item must be independent of the computations for the other items. If there's (normally) some dependency between one item and another, you generally need to figure out some way to break it before you're going to get much out of executing that code on the GPU. If the dependency can't be broken at all, or requires too much work to break, the code might execute faster on the CPU.

Most current CPUs also support quite a few types of operations that a current GPUs simply don't attempt to support at all (e.g., memory protection for multitasking).

Looking at it from a slightly different direction, CPUs have been (largely) designed to be reasonably convenient for programmers, and the hardware people have done their best (and a darned good best it is!) to create hardware that maintains that convenient model for the programmer, but still executes as quickly as possible.

GPUs come at things from rather the opposite direction: they're designed largely to be convenient for the hardware designer, and things like OpenCL have attempted to provide as reasonable of a programming model as possible given the constraints of the hardware.

Writing code to run on a GPU will typically take more time and effort (so it will cost more) than doing the same on the CPU. As such, doing so primarily makes sense when/if either:

The problem is so parallel that you can expect a large gain from minimal effort, or

The speed gain is so important that it justifies a lot of extra work.

There are some obvious possibilities for each -- but a huge number of applications clearly aren't even close to either one. I'd be quite surprised to see (for example) a CRUD application running on a GPU any time soon (and if it does, it'll probably happen because somebody set out with that exact goal in mind, not necessarily anything approaching an optimal cost/benefit ratio).

The reality is that for a lot of (I'm tempted to say "most") applications, a typical CPU is far more than fast enough, and programming convenience (leading to things like easier development of new features) is much more important than execution speed.

you'd be able to do many calculations at once and really improve speed.

improve speed? so what? Through last year I can recall only once or twice when it was needed. Most of the time I've been asked to modify or fix logic, to adjust for a different data source, to improve user interaction etc etc. The only speed customers were interested in these cases was speed of making a change. "Please release new feature in a month, or better yet - in two weeks".

Don't get me wrong - as a coder I enjoy squeezing CPU ticks thoroughly. It's just that this art is not typically in high demand.

Are there still certain cases where serial processing is still better, faster, and/or more efficient than parallel?

I would say there are plenty cases. Serial processing is simpler than parallel, which makes it more efficient in all cases when speed is not a critical requirement. Serial processing allows for easier implementation of complicated logic and user interface, it is easier to specify and test, to maintain and change.

As a rule, serial processing allows clearer expression of programmer's intent and easier reading of code. I would say it saves most precious and scarce resource - programmer's brain.

If you need raw number-crunching, GPUs are the way to go. However, all those ALUs mean that there are less transistors dedicated to control flow (branching) circuitry. So, if you need to write something that needs a lot of complex control flow, lots of conditionals, etc., then a CPU will be faster.