If anyone's notice the trend these days on how many cores there are in a processor over time, they might've noticed that we've pretty much flatlined for mainstream parts since 2007 (saying the first true quad-core part is the AMD Phenom). Sure, higher end parts and servers have gone up, but we've only quadrupled the core count in the past 7 years to 16 cores. Except the chip with the highest core count for a general purpose processor. So why is it that processors have flatlined, but GPUs, which are many-core parts, continue to rise? The answer to the best of my guess is just the fact that GPUs, unlike CPUs, have two things that make a multicore processor of any kind work really well: problems that are deterministic and independent of each other.

What do I mean by deterministic? Essentially, an operation done on a computer is completes in the same exact time, every time its run. If I feed the computer 1+1, then I can say that it will always take, for example, 1 clock cycle to get the result. It's easy to see: the values are known and the computer doesn't need to gather any more information. However, if the operation was 1 plus some user inputted value, then the operation is no longer deterministic. Sure, once the value is received, it will churn out the answer in 1 clock cycle, but the input could take a second or years. For a GPU running graphics, all of the data is present and accounted for. Sure, some problems require, like post processing effects, require an input, but that input has been generated by the time the step is reached. Even with physics, the inputs are known and can be run independent of each other, regardless if the result ends up in say a collision. It'll be accounted for in the next slice of time. But for a CPU, it handles all other kinds of programs, a lot of them of which are waiting for something to happen.

Take for instance, your word processor. 99% of the time it'll probably sit there, waiting for some input. Sure, you could be hammering away at the keyboard at 100WPM (which by the way is the equivalent to 500 characters a second), but to a computer, that's an eternity. By the time you've let go of the key, it's already processed the input and is waiting for you to press the next one. Even an internet browser, sitting there waiting for you to click a link or something from the network to happen. Even if some gaudy Flash animation is playing in the background, it's probably not going any faster than 30FPS, or 33ms per frame. Still, that's a long time for a computer (a tick on a 4.0GHz processor is 0.25 nanoseconds, or 250 picoseconds, or still about 121 million times faster than 30FPS). There's no point in throwing on more workers on a problem that requires idling most of the time.

The other area is independence. Ever since we've decided to make computers do more than just compute numbers, we've run into a bit of a snag known as resource management. As an example, two or more programs are fighting for resources. It can be anything from needing to use the hard drive, to something as simple as owning a single byte in memory. And then there's another separate, but related problem. What if one core was working on something, switches context to another program, then another core picks up where the other left off? Since we'd like to run our programs as much as possible in cache memory where it's very fast, we have an issue with making sure each core's cache contains the same data if they're working on the same problem (called cache coherency). If cache wasn't updated every time a core picks up the program, then it would be working on stale, incorrect data. It would be like if two collaborators decided to work on a Wikipedia page. One person does some work, but needs to do something else, so another user (presumably elsewhere in the world), picks up the job. Except how does that person know what was added? The data that person picked up is stale, old, and if worked on to completion, will be incorrect.

While we can have hardware ensure cache is coherent across all cores working on the problem... it becomes a little silly when much of your silicon isn't devoted to actual processing, but holding an infrastructure to maintain data across all players. So why isn't this a problem with GPUs? Simply put, despite their flexibility and computational prowess, we treat GPUs like simple computers. Simple like the old mainframes in the 60s and 70s. The CPU in place of the human, sorting out the punch cards for processing. They still have some logic similar to a CPU for managing jobs and reordering them, but most of their programs are "Do this until it's done, then spit out the results". A GPU core cannot afford to switch contexts because of the types of jobs we feed it and the purpose it serves. Imagine playing a game, and for whatever reason the GPU decides that rendering the Windows GUI was necessary instead halfway through rendering the game's frame. Nobody would like that.

So to take away from all this... Most of the programs we use daily in our lives are ill-suited for multi-core processors. You would be throwing silicon into creating more workers, rather than using that silicon to make workers that have better tools. They would all need constant updates because they can be interrupted at any time. And while we can blame software for not catching up with hardware and where it's going, the problem is fundamental: you can only parallelize your operations so much before you hit a wall. If adding another core doubled the effort, adding another would only add 50%. Another 25%, another 12.5%, and so on. But the cost of adding that core remains constant, if not increasing because of the infrastructure you have to build to support it.

Graphics cards have more than 2 cores? They are all single core. The 3DFx Voodoo 6000 May have had 4 GPU's but were all seperate single core units just as the GTX 790 is.

I'm on a quad core phone, so let me digest the rest of that awesome wall of text a bit. Right now, I can only state, "because none of us need more than 8 cores at all". Jam 16 cores in there, hyperthread them all, and do what with them? You'd use the same amount of energy because many of them will be resting many more cycles than a current 6 core would.

Edit: close conclusion,but I feel your theorem on CPU's and GPU's behaviors is off. Data is all there for a GPU just as it is in CPU's. A GPU can guess as well as a CPU can...and does. Thinking that a CPU has to predict while a GPU doesn't means that the GPU knows which direction you'll turn next and when you'll pull the trigger for a frag? It all goes back to random number generators and prediction pipelines.

Last edited by Chumly on Tue Oct 14, 2014 6:19 am, edited 1 time in total.

It's kind of hard to define what a core in a GPU. But for now I'm just going to say it's the functional block that they add/remove across versions of cards. So in NVIDIA's Kepler, a core is an SMX (which contain 192 execution units).

Edit on top of edit:

Chumly wrote:

Edit: close conclusion,but I feel your theorem on CPU's and GPU's behaviors is off. Data is all there for a GPU just as it is in CPU's. A GPU can guess as well as a CPU can...and does. Thinking that a CPU has to predict while a GPU doesn't means that the GPU knows which direction you'll turn next and when you'll pull the trigger for a frag? It all goes back to random number generators and prediction pipelines.

The thing is, the CPU is still controlling the GPU. It's feeding it jobs. So the GPU doesn't have to predict anything. Everything is there for it to process by the time it's given the command to process. The fact that NVIDIA also pretty much killed a complex scheduler for a simpler one in Kepler, a la VLIW CPUs like Itanium or the Transmeta Crusoe, is kind of telling. The only branching a GPU needs to do is "I either do this or I don't".

Either way, the take aways are that we can't really continue pushing the CPU as we can other parts that seem to have gained the "many-core" status, by the very nature of what they're designed to do.

It's probably better to think about it like a manufacturing facility and scheduling/managing your resources. The CPU is the main assembly floor. They can do anything. The GPU is a specialized assembly floor. You don't put it to work until you can feed it.

I would guess several things occurring: 1. Unlike parallel processing in which each core is simple with a simple task at which it excels, a complex (sorry brainfart here on proper terminology) CPU core can handle many tasks but is not the master of any. Doubling cores does not result in halving the task. Complex algorithms not keeping up.

2. Lack of a compiler and developed apps taking advantage of more than two (or lately four) cores, in no small part because lack of a market means no time spent on such optimizations.

3. Overhead and diminishing returns in increasing core numbers.

I would think all would be demonstrable and the data would exist in studies answering this very question.

Worse, chicken or egg: Can't get demand without product, can't get product without demand. If you build it they will come, versus market reality of not putting money where there is no demand. I would suspect the latter is winning.

1. Unlike parallel processing in which each core is simple with a simple task at which it excels, a complex (sorry brainfart here on proper terminology) CPU core can handle many tasks but is not the master of any. Doubling cores does not result in halving the task. Complex algorithms not keeping up.

Algorithm complexity also has little to do with it. Rendering the scene of say that Elemental UE4 tech demo is very complicated. And I believe the most complicated parts of a core is just the front end. The back end at the end of the day is just an ALU or AGU. Those haven't really changed much in ever.

Quote:

2. Lack of a compiler and developed apps taking advantage of more than two (or lately four) cores, in no small part because lack of a market means no time spent on such optimizations.

I think this is a myth that needs to die. I believe application software has no concept of cores and it's really up to the operating system to schedule everything appropriately. The most an application does with thread management is to just be aware that it has them and makes sure their data is synchronized properly. But at no time does an application explicitly say "I want to run threads 1 and 2 on core 0" even though it could run it on two cores. That's a task the operating system needs to handle.

In fact, depending on the circumstances, it might be better to schedule two threads on one core. Say for example you have a quad core processor and three threads with a run time completion of 1, 2, and 3 units. Your total run time is going to be 3 units no matter what. Rather than firing up three cores for all three threads, just fire up two cores and put the threads that have completion times of 1 and 2 on one and the last thread on another core. Your run time is the same and you've made better use of your resources.

Here's also the other problem: every sector is demanding lower power consumption. Which means park your cores as often as possible. Two cores consumes more power than one even if slightly overclocked. Which relates to the example I gave above.

Who is online

Users browsing this forum: No registered users and 4 guests

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum