I don't think it's so nonsensical. It's a valid point. Pure, across the board performance improvements aren't really as great as they used to be. On top of that, it's taking longer and longer to get to that next process node so the "just add more transistors because we shrink them" ideology won't work nearly as well in the near future.

So I think the issue is two fold: 1) We're concentrating more on power improvements than pure performance improvements 2) The tricks engineers have been using for years to get more performance out of a chip (higher click speed, smaller node, etc.) are going to hit walls soon. We're going to need some fresh ideas to keep performance going up.

Ryhadar wrote:I don't think it's so nonsensical. It's a valid point. Pure, across the board performance improvements aren't really as great as they used to be. On top of that, it's taking longer and longer to get to that next process node so the "just add more transistors because we shrink them" ideology won't work nearly as well in the near future.

So I think the issue is two fold: 1) We're concentrating more on power improvements than pure performance improvements 2) The tricks engineers have been using for years to get more performance out of a chip (higher click speed, smaller node, etc.) are going to hit walls soon. We're going to need some fresh ideas to keep performance going up.

It's completely nonsensical. The fact that there ARE ways to continue improving -- new materials such as graphene, or III-IV materials, and new ways of processing (quantum computing, etc) -- means that we haven't hit the pinnacle.

If you wanted to ask "have we reached the pinnacle of Intel Core design?" well, the answer is still no, but it's a heck of a lot closer.

The fact that there ARE ways to continue improving -- new materials such as graphene, or III-IV materials, and new ways of processing (quantum computing, etc) -- means that we haven't hit the pinnacle.

Well, first of all, those are all process and material improvements and not really related to microarchitecture. (Well, quantum computing is probably neither and both - at the same time) The problem with all the above is that they are still very much laboratory babies. I read all the press releases and it is invariably stuff like 1 transister just 5 atoms wide! Great, but give me something the least bit commercial ready. How about a simple graphene embedded controller or a quantum calculator? Is this stuff gonna scale? How about manufacturability? And cost? They better start ramping up now, becasue this stuff is not just going to appear overnight. Oh wait. They're not.

As an analogy, there ARE (see, I can do All Caps too) ways to reach Mach 12+ speeds with scramjets but do you see Boeing proposing a 797?

If you wanted to ask "have we reached the pinnacle of Intel Core design?" well, the answer is still no, but it's a heck of a lot closer.

You mention that intel went for power saving and I certainly agree with that. But intel has gone for the low hanging fruit (as they should) before. Remember the Intel charts which had future CPU clocks projected up to 10GHz. But this does suggest that it is getting harder to improve single threaded performance. Most of the really new architecture was in parallel computation. I hope I am wrong, but technologies do mature and dare I say it: stagnate. Just as many humans lived from the Wright Brothers flight to the jet age, maybe some us might live from the 8086 to the i9. So nonsense or not, I still ask: even if transistor budgets keep expanding, do we know how to harness this ever increasing resource?

There are still some major changes coming to computing hardware independent of moves to exotic materials like graphene and the like. In particular, we are already seeing the gradual merging of CPUs and GPUs into processing units that are designed to handle both types of computing organically instead of having an either-or approach.

A huge part of the problem is that we no longer get very big "free" performance boosts where the exact same piece of software has a large performance jump when new silicon comes out, unless the software is already properly setup to take advantage of more cores and more parallel processing resources. Despite what you see in most online benchmarks, the very large majority of software out there that people use every day is *not* heavily parallelized. Even the large majority of multi-threaded programs do not have perfect scaling beyond a comparatively small number of processing cores, and most truly parallel software is still limited to scientific programs that are specifically tailored to run on massive compute clusters. Right now, software and not hardware is the biggest barrier to improving compute performance since we can build hardware that is quite powerful in theory, but the large majority of software programs just can't take advantage of it.

chuckula wrote:Despite what you see in most online benchmarks, the very large majority of software out there that people use every day is *not* heavily parallelized.

Nor does it usually take advantage of specific architectures in any way.

I'd be happy to see the day where I could compile my games/apps on Windows to suit my specific machine instead of using precompiled binaries...but I highly doubt that day will come soon.

+++ To that quote too. As a heavy Linux user, the ability to recompile binaries with architecture-specific compiler flags is a plus, although getting noticeable performance improvements often requires much more work than simply using some compiler flags. In the future, both human programmers and compilers need to become much much better at taking advantage of hardware resources or else even theoretically amazing chips won't show all that much in the way of performance gains.

For the title question - No, we haven't, but the only obvious low-hanging fruit left is integrating I/O on-die. After the Northbridge was integrated the improvements are not huge - a wider bus here, better cache there, etc...'tweaks' if you will.

Without some big hardware changes some have mentioned (new materials, or far-out stuff like quantum computing) the way real improvement leaps will happen is software taking advantage of special instructions or dedicated hardware blocks. (Dedicated in this case just means anything designed to be more application-type specific than the general compute sections.) Some recent examples include video decode/encode hardware blocks which are even included in otherwise very parallel silicon like GPUs. AVX instructions are another example but they haven't been fully exploited yet - any CPU with AVX could see a large jump in performance just with the proper software updates.

As a response to the original post, yes, I believe we are reaching the pinnacle of microarchitectural design, that is, the CPU cores themselves. This is true particularly in the x86 industry. Intel, AMD, and who knows who else can theoretically cram in as many cores as possible (assuming they have the process tech to do so) into one piece of silicon, but designing a very efficient core in terms of IPC as well as reaching ever higher clock speeds is far more difficult, IMHO, and few companies have the resources to do it, and that's IF they're willing to do it. Back in the 1990's we have Intel, AMD, Cyrix, Nexgen, IBM, MIPS, etc. doing all sorts of things to improve IPC. Consider that the golden age of microprocessor design. IPC was going up fast, clock speeds were still making leaps and bounds, power consumption wasn't such a big issue yet, and there are tons of players vying for your money. Best of all? Most of their chips fit into Socket 7 (or Super 7). Far more exciting, really. I remember taking out my Cyrix 6x86MX-PR233, swapping in an AMD K6-2/450, setting a few jumpers, and off I go. Fat chance that happening today.

Today', R&D costs have never been more expensive while at the same time, prices have never been cheaper on a price/performance basis. So in a sense, the manufacturers are being squeezed, selling far more sophisticated chips at far lower prices.

It may also be said that x86 is perhaps one of the most evolved (if not the most evolved) ISAs ever thanks to Intel's Ivy Bridge and upcoming Haswell microarchs (can you also say Bulldozer? hmm..). For comparison, consider Sun/Oracle's SPARC T4, the first chip from Sun/Oracle to implement Out-of-Order integer execution and was introduced only 2 years ago. So when did x86 had OoO? Pentium Pro and Cyrix 6x86 had this feature way back in 1995.

Perhaps you can credit Intel as well as AMD (and other defunct x86 vendors) for the rapid advancement of x86 implementations. Too bad only AMD is left to compete with Intel. Perhaps ARM will continue to keep Intel from stagnating too much.

If people stick with you just because you have a Rolex on your wrist, you can bet losing them is as OK as losing an Invicta. And if they stick with you even if you only have an Invicta, losing them is as OK as losing a Rolex.

ronch wrote:It may also be said that x86 is perhaps one of the most evolved (if not the most evolved) ISAs ever thanks to Intel's Ivy Bridge and upcoming Haswell microarchs (can you also say Bulldozer? hmm..). For comparison, consider Sun/Oracle's SPARC T4, the first chip from Sun/Oracle to implement Out-of-Order integer execution and was introduced only 2 years ago. So when did x86 had OoO? Pentium Pro and Cyrix 6x86 had this feature way back in 1995.

Please don't make it sound as though Intel really innovated with OoO. Ok, I'm not saying they suck or anything, but SPARC is not a good counterpoint:

A buncha RISC chips beat Intel to the OoO punch by several years; decades if you consider mainframes. I consider the Alpha team to have been some of the last real innovators. They showed the way to come years before it happened:

The first few generations of the Alpha chips were some of the most innovative of their time. The first version, the Alpha 21064 or EV4, was the first CMOS microprocessor whose operating frequency rivalled higher-powered ECL minicomputers and mainframes. The second, 21164 or EV5, was the first microprocessor to place a large secondary cache on chip. The third, 21264 or EV6, was the first microprocessor to combine both high operating frequency and the more complicated out-of-order execution microarchitecture. The 21364 or EV7 was the first high performance processor to have an on-chip memory controller. The unproduced 21464 or EV8 would have been the first to include simultaneous multithreading, but this version was canceled after the sale of DEC to Compaq. The Tarantula research project, which most likely would have been called EV9, would have been the first Alpha processor to feature a vector unit.

Granted, the vector unit was not new, MMX having already come out, but the integrated memory controller and SMT was boss.

I would say the answer lies somewhere between "no" and "maybe." The "maybe" side would be, perhaps, dependendent on using silicon and excluding new things like quantum computing. If we can utilize quantum properties, we might be able to open whole new fields of microarchitecture design, but strictly thinking in terms of silicon and micro(/nano) processing we are familiar, we may have reached the peak.

The problem is that moving electrons generate heat. Shrinking to a smaller process means less heat, but a silicon atom has a radius of 111 pm. If we have an 11 nm process, it means (ignoring atomic and electric forces), we can stack about silicon 50 atoms side-by-side. But, as I mentioned, there are forces at work, so that is a very idealistic assumption. We don't have much smaller we can get before we don't have enough atoms to actually create a transistor. Even then, smaller transistors are a little more fragile, so you can't really crank up the frequency to get more performance, and producing chips of this size becomes an increasingly delicate process.

So, as others have mentioned, we are starting to bump up against walls that can't be bypassed by shrinking the process and/or cranking up the frequency, but if we can use something other than silicon or start producing quantum chips for consumer use, we could do more.

"A life is like a garden. Perfect moments can be had, but not preserved, except in memory. LLAP"

My opinion.... no we're not quite at the pinnacle. The things Intel ARE (you guys are right, these caps are fun!) focusing are make sense. Intel's CPU speeds have really taken off to the point that we're now bottle necking at the video cards, so what's the point of making faster CPU's right now. Strictly in the consumer space, that is. But I guess you can argue that IPC should be a little more focused on since you guys are absolutely correct, not much takes true advantage of multi-core architecture. But I think if we can't seem to get programmer's or compilers to work on parallelism, then maybe Intel's next big focus should be automating parallel execution a little better.

Literally speaking, I don't think anyone would agree that we've truly reached the "pinnacle" of computing performance from Intel or anyone, for that would imply that there is no room for improvement and that it can only go down from here. I think the real question here is if we have reached a point of diminishing returns. I would argue that from 2000 to now we've already been experiencing diminishing returns. Consider the following graph:

This curve has been sloping off for awhile now for non-parallel computing. However, there is still quite a ways to go before these diminishing returns become so severe that there isn't much to get excited about from one iteration to the next. We simply haven't reached that point yet. As auxy has said, the past few generations of chips from Intel have been primarily focused on efficiency rather than performance and they have made some major gains in that department. However, some valid points have been made that we'll need engineers to keep thinking outside the box and come up with creative solutions to keep this momentum going because in the next decade we are likely to reach the limits of what we can do with silicon. There's always architectural improvements that can be made and of course Intel has only dabbled in parallel computing, relatively speaking. The tech industry is as healthy as ever.

Moore's law relates to the density of transistors and the cost to manufacture them. It says nothing about performance. It's a common misconception but for a long time process shrinks also coincided with performance increases so people correlated the two.

Ryhadar wrote:I don't think it's so nonsensical. It's a valid point. Pure, across the board performance improvements aren't really as great as they used to be. On top of that, it's taking longer and longer to get to that next process node so the "just add more transistors because we shrink them" ideology won't work nearly as well in the near future.

So I think the issue is two fold: 1) We're concentrating more on power improvements than pure performance improvements 2) The tricks engineers have been using for years to get more performance out of a chip (higher click speed, smaller node, etc.) are going to hit walls soon. We're going to need some fresh ideas to keep performance going up.

It's completely nonsensical. The fact that there ARE ways to continue improving -- new materials such as graphene, or III-IV materials, and new ways of processing (quantum computing, etc) -- means that we haven't hit the pinnacle.

If you wanted to ask "have we reached the pinnacle of Intel Core design?" well, the answer is still no, but it's a heck of a lot closer.

I'll have to give you that. I wasn't thinking about quantam computing or graphene. I came away from the original post thinking about Intel and AMD's current architecture's (though re-reading it it's obvious there was no such restriction). That said (based on the technology that's currently in use on a commercial scale) things are absolutely slowing down and I do believe we're going to start hitting walls soon in improving non-parallel workloads.

I actually wonder if fabrication technology will eventually allow for a new kind of design? For instance, get some of IBM's integrated water/coolant channels and be able to layer a chip into a 3D monster. I've seen some talk about it a bit but nothing concrete. Would allowing for more interconnects let you come up with a radical design? I don't know enough about chip layout and design to make an intelligent comment about it, but maybe there's some university boffins designing stuff using something beyond AND/OR/NOR/etc gates and they just need some funky new transistor type or fabrication to make it happen.

On a core per core performance basis we may be in a bit of a lull. I saw a massive leap in performance when I upgraded from my OC'ed Q6600 to an OC'ed i7-920. I did not see such a leap when I ungraded to my current 3770K system. I won't be upgrading to Haswell.

Scrotos wrote:I actually wonder if fabrication technology will eventually allow for a new kind of design? For instance, get some of IBM's integrated water/coolant channels and be able to layer a chip into a 3D monster. I've seen some talk about it a bit but nothing concrete. Would allowing for more interconnects let you come up with a radical design? I don't know enough about chip layout and design to make an intelligent comment about it, but maybe there's some university boffins designing stuff using something beyond AND/OR/NOR/etc gates and they just need some funky new transistor type or fabrication to make it happen.

Speaking of which, where's our dang memristors and flying cars?!?

Chips are very 2D because you're basically acid washing a wafer. You put a mask on so that you end up carving the paths needed, but that process doesn't really lend itself to 3D arrangement.

The other issue is cooling. The transistors used actually don't generate heat in steady state, but they do generate heat when switching, which happens a few billion times a second (a billion per 1 GHz). So even if you could come up with a 3D design, cooling everything equally could be a problem. You mention IBM, but all my relevant Google hits are from five years ago. Doesn't look like anything ever came from it.

And speaking directly to "university boffins," that stuff is already deeper than logic gates in the form of CMOS, and I don't think you're going to get much better than that because, as I mentioned, they only generate heat when switching because they don't draw current (at least not very much) at steady state.

"A life is like a garden. Perfect moments can be had, but not preserved, except in memory. LLAP"

Scrotos wrote:I actually wonder if fabrication technology will eventually allow for a new kind of design? For instance, get some of IBM's integrated water/coolant channels and be able to layer a chip into a 3D monster. I've seen some talk about it a bit but nothing concrete. Would allowing for more interconnects let you come up with a radical design? I don't know enough about chip layout and design to make an intelligent comment about it, but maybe there's some university boffins designing stuff using something beyond AND/OR/NOR/etc gates and they just need some funky new transistor type or fabrication to make it happen.

Speaking of which, where's our dang memristors and flying cars?!?

Chips are very 2D because you're basically acid washing a wafer. You put a mask on so that you end up carving the paths needed, but that process doesn't really lend itself to 3D arrangement.

The other issue is cooling. The transistors used actually don't generate heat in steady state, but they do generate heat when switching, which happens a few billion times a second (a billion per 1 GHz). So even if you could come up with a 3D design, cooling everything equally could be a problem. You mention IBM, but all my relevant Google hits are from five years ago. Doesn't look like anything ever came from it.

And speaking directly to "university boffins," that stuff is already deeper than logic gates in the form of CMOS, and I don't think you're going to get much better than that because, as I mentioned, they only generate heat when switching because they don't draw current (at least not very much) at steady state.

My main musing was whether or not newer fabrication processes would promote different architectural advances or changes. Hence, "radical." I know the gist of how current chips are fabricated and that wasn't my point. It's supposition, not bound by practicality. Would a new fabrication process promote a radical architecture change?

I specifically talked about a 3D arrangement, more than just the tri-gate stuff that I think only stacks one layer up? and threw in a possible way to cool such a system given the power density would be nasty since instead of dumping heat per square area, you now have to do it via volume. I was throwing a bone to people who might take issue with that part of my "what if?" scenario.

Left out from the discussion of process shrinking though is stuff like electromigration or electron leakage via quantum tunneling; how much of that's gonna jack with Moore's law and people trying to innovate a new architecture if the architecture depends on transistor density that can't easily be attained?

Look at Intel's "extreme" processor these days. Can't add more cache, cores, or memory channels without slowing down the average case performance. This parallels the limitations that are placed on the number of execution units. They have found lots of clever improvements, but at some point the masterpiece must be finished if the big parameters are not changing.

I'm guessing there is significant room left with bulk vector processing.

It seems to me that AMD was being innovative with Bulldozer, we'll see where things stand in 10 years, maybe they are on to something.

No, I'm not crediting OOO to Intel at all. My point is, when it comes to microarchitectural evolution x86 has made and includes many advances already and is, as far as I can tell, nowhere near bringing up the rear of the pack. Today's Intel and AMD chips pack in so much technology (ok, AMD may be behind these days) that it's hard to imagine what else they could do to make things faster. Heck, they even both hold overclocking records, which, from a circuit design level, is no easy thing. In terms of IPC, it's known that adding more and more integer units doesn't result in much higher performance anymore and both Intel and AMD have what, 3 or 4 ALUs max. I don't think other competing ISAs have managed to get further than that. Making use of those few ALUs as much as possible means cramming in features such as OOO, SMT, advanced branch predictors and prefetchers, large caches, etc.

So what else can they add that can net significant performance increases? They probably still have a few tricks to use but as someone here has mentioned/implied diminishing returns are getting worse. And as manufacturing processes become more and more expensive as we go smaller, it's not like anybody can just double the transistor count to achieve all-out performance. And that is assuming we'd know what to do with those extra transistors.

If people stick with you just because you have a Rolex on your wrist, you can bet losing them is as OK as losing an Invicta. And if they stick with you even if you only have an Invicta, losing them is as OK as losing a Rolex.

One thing it seems that nobody has considered is just how much performance you can squeeze out of them with a basic overclock. By default these chips don't seem that impressive, but if you boost it by 1 GHz then you can see just how much these processors are capable of. I think mganai has it right when he says they are being deliberately held back for lack of competiton. The conservative clock speeds allow for Intel to bin their chips pretty much however they want and get good volume. I sometimes wonder though if they're making some latent improvements this way just for the rainy day when AMD decides to step their game up. This is much the same as NVIDIA having the advantage of the fastest chip and only giving us GK104 for so long because they know it's good enough to compete with AMD. Well, at least this was the situation in 2012; those 2 are still leapfrogging each other year after year.