Slashdot videos: Now with more Slashdot!

View

Discuss

Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

nk497 writes "To see one of the 32nm transistors on an Intel chip, you would need to enlarge the processor to beyond the size of a house. Such extreme scales have led some to wonder how much smaller Intel can take things and how long Moore's law will hold out. While Intel has overcome issues such as leaky gates, it faces new challenges. For the 22nm process, Intel faces the problem of 'dark silicon,' where the chip doesn't have enough power available to take advantage of all those transistors. Using the power budget of a 45nm chip, if the processor remains the same size only a quarter of the silicon is exploitable at 22nm, and only a tenth is usable at 11nm. There's also the issue of manufacturing. Today's chips are printed using deep ultraviolet lithography, but it's almost reached the point where it's physically impossible to print lines any thinner. Diffraction means the lines become blurred and fuzzy as the manufacturing processes become smaller, potentially causing transistors to fail. By the time 16nm chips arrive, manufacturers will have to move to extreme ultraviolet lithography — which Intel has spent 13 years and hundreds of millions trying to develop, without success."

Make them bigger. More space to put stuff on them then anyway. Tostito's Restaurant style tortilla chips can fit much more guacamole and salsa on them than their bite size chips. Bigger is better when it comes to chips.

Larger dies generally cost more because it's more likely that they'll have a defect. I haven't done any chip design since college (and even then it was really entry level stuff) but if you could break the chip down into 10 different subcomponents that need to be spaced out, you could put 100 of those components on the chip and then after manufacture you could select the blocks that perform best and are defect free, spacing your choices accordingly.

Since they are so parallel they are made as a bunch of blocks. A modern GPU might be, say, 16 blocks each with a certain number of shaders, ROPs, TMUs, and so on. When they are ready, they get tested. If a unit fails, it can be burned off the chip or disabled in firmware, and the unit can be sold as a lesser card. So the top card has all 16 blocks, the step down has 15 or 14 or something. Helps deal with cases were there's a defect, but overall the thing works.

Actually, it's pretty common practice to put spare arrays and spare cells in the design that aren't connected in the metal layers. When a chip is found defective, the upper metal layers can be cut and fused to form new connections and use the spare cells/arrays instead of the ones that failed by use of a focused ion beam.

But that still adds time and cost. Decreasing die area is pretty much always preferable. Also, larger dies means even more of the chip's metal interconnects have to be devoted to power dist

Actually, it's pretty common practice to put spare arrays and spare cells in the design that aren't connected in the metal layers. When a chip is found defective, the upper metal layers can be cut and fused to form new connections and use the spare cells/arrays instead of the ones that failed by use of a focused ion beam.

Am I the only one who finds it pretty awesome that we're actually using focused ion beams in the manufacture of everyday items?

Distant parts of the chip then have a communication lag, but yes, this will really help. Certainly much less lag than communicating with something outside the die.

Wouldn't that suggest that three dimensional chips be the logical next step. Although heat dissipation would become more difficult, not to mention the fact that the production process would be an order of magnitude more complicated.

Making 3D chips is the holy grail of semiconductor processing but is still beyond reach. They've not been able to lay down a single crystal second layer to make your stacked chip. They have tried using amorphous silicon but the devices are not near as good so there is no point.

We are already seeing the outcrop of all of this, as next years machines are not necessarily 2x the performance at the same cost. I really think that money would be better spent helping all of you coders out there in creating a language/compiler programing paradigm that can use 12 threads efficiently for something beyond rendering GTA. I certainly don't have the answer and given that that problem has not been solved yet, neither does anybody else at this time.

Its a very very hard problem. It is going to be interesting here in the next few years. If nothing changes, your going to have to start becoming accustom to the fact that next years PC is going to cost you MORE not less and thats really going to suck.

We are already seeing the outcrop of all of this, as next years machines are not necessarily 2x the performance at the same cost.

I don't know how long you've been buying computers, but it has never been the case of "2x performance every year". The best it ever was was every 18 years or so, processing power doubled, and that was bumped back to about ever 2 years back in the late 80's/early 90's. But even that has never meant 2x all around performance. You might be able to crunch numbers 2x as fast after two years (never one), but there have always been bottlenecks - like RAM and hard drive speed - which have kept it down to around

You are incorrect about the reason for lack of 3D stacking. Its not that we cant stack them. There has been a lot of work on it. In fact, the reason flash chips are increasing in capacity is because they are stacked usually 8 layers high. The problem quite simply is heat dissipation. A modern CPU has a TDP of 130W, most of which is removed from the top of the chip, through the casing, to the heatsink. Put a second core on top of it, and the bottom layer develops hotspots that cannot be handled. There are currently some approaches based on microfluidic channels interspersed between the stacked dies, but that has its own drawbacks.

No, you are incorrect. You are talking about stacked gates. That is significantly different than what I am talking about which is making entire stacked devices where you have a second level of additional devices including sources and drains as well as gates. Work has been tried with amorphous silicon with mixed results, no of which amount to much.

You are correct in that the power density issue trumps all other concerns.

... I really think that money would be better spent helping all of you coders out there in creating a language/compiler programing paradigm that can use 12 threads efficiently for something beyond rendering GTA.

The entirety of programming is we know it is stuck in a single threaded paradigm and making the shift to massively parallel computing requires a huge shift in thinking.

This is so hard because our technique, languages and compilers all have their roots in a world that barely even multi-tasked let alone considered doing anything in parallel for performance.

Every coder that ever learnt to code, coded for kicks or money, learnt this way, and they still do.

The biggest performance bottleneck is still harddrives. So rather than focusing on faster CPUs, I'd love to see fast SSDs come down in price. I also can't wait until 16 gigs of RAM is standard.

Agreed, except I'd like to disagree on your preference: I'd love to have slow SSDs come down in price and go up in capacity. It will be Good Enough, or at least significantly better.

I mean, seriously: does the common desktop really need secondary storage which has higher throughput than the majority of DDR memory? There are SATA 6GB/s disks out there with >400MB/s rates, whereas DDR 400 only maxed out at 400MB/s. That's freaking INCREDIBLE.

They're going to hit atomic scale transistors fairly soon from what I can see as well, the manufacturing process for those is probably prohibitively expensive but that is as small as they can go(according to our current knowledge of the universe at least).

I can't imagine Intel has all of its eggs in one basket on Extreme Ultraviolet Lithography though. Something thats been in development for even 5 years and doesn't show any concrete signs of success should at least have alternatives developed for it. After 5 years if you still can't say for certain if its ever going to work, you definitely need to start looking in different directions.

No, I really haven't. I tend not to pay much attention to things that are released more than 2 years after their original announced release date.

Though, I have to point out I didn't advocate terminating a project after 5 years of zero results(a la Itanium) just looking in additional directions and not keeping all the eggs in the questionable basket.

You seem to miss the point. You imagine that Intel doesn't point all of its eggs in one basket. The development of Itanium disproves that notion as they had no other real alternatives being developed at the same time.

For one, Itanium is still going strong in high end servers. It is a tiny market, but Itanium sells well (no I don't know why).

However in terms of the desktop, you might notice something: When AMD came out with an x64 chip and everyone, most importantly Microsoft, decided they liked it and started developing for it, Intel had one out in a hurry. This doesn't just happen. You don't design a chip in a couple months, it takes a long, long time. What this means is Intel had been hedging their bets. They developed an x64 chip (they have a license for anything AMD makes for x86 just as AMD has a license for anything they make) should things go that way. They did and Intel ran with it.

Ran with it well, I might add, since now the top performing x64 chips are all Intel.

They aren't a stupid company, and if you think they are I'd question your judgment.

They're going to hit atomic scale transistors fairly soon from what I can see as well

Yeah, there was an article here in the spring on atomic computing, where I did a little math on it. I was surprised, but it worked out that in roughly a decade Moore's Law would get down to atomic transitors if reducing the part size was the method employed.

I had always presumed before that it would never run out, but it's going to have to zig sideways if that's going to be true.

Theres a difference here... those reports were about being practically impossible, not theoretically impossible, on the going below the atomic scale you're hitting the theoretically impossible(given current understandings) point along with the practically impossible. We've had the theory for atomic size transistors for quite a while, its the practical that really needs to catch up.

I deal with EUV lithography for a living. Not at Intel, but at ASML [asml.com], the world's largest supplier of lithography machines and the only one that has actually manufactured working EUV lithography tools.

Something thats been in development for even 5 years and doesn't show any concrete signs of success should at least have alternatives developed for it. After 5 years if you still can't say for certain if its ever going to work, you definitely need to start looking in different directions.

You are misinformed. On our Alpha development machines, working 22 nm devices were already manufactured last year. (source [www2.imec.be]) We are shipping the first commercial EUV lithography machines in the coming year (source [asml.com], source [chipdesignmag.com]) A problem for the chip manufacturers is that the capacity on the alpha machines is rather low and needs to be shared among competitors.

There is a temporary alternative; it is called double patterning [wikipedia.org] (and triple patterning, etcetera). The first problem is that you need twice (thrice) as many process steps for the small features, and also proportionally more lithography machines that are not exactly cheap. The second problem is that double patterning imposes tough restrictions on the chip design; basically you can only make chips that consist mostly of repeating simple patterns. That is doable for memory chips, but much less so for CPUs. Moreover, if you want to continue Moore's law that way, the manufacturing cost will increase exponentially, so this is not a long-term viable alternative.

You can bet that the semiconductor manufacturers have looked for alternatives. But those don't exist, at least not viable ones.

I wasn't aware of someone succeeding where intel failed. I assumed that intel would have simply licensed the tech from anyone that had by now.

IMEC is not the only ASML customer who has played with one of the two EUV Alpha tools, but it's the only one I could find with a quick Google search that has published the results. IMEC is a research institute. Other customers (actual chip manufacturers) have little to gain by disclosing to the competition exactly how much progress they have made.

Then again, just last year means that the licensing talks could easily still be going on. I'm going to keep an eye on this from now on.

Licensing is not the business model. The article suggests that Intel develops these machines ("fancy camera's") themselves, but in reality, they simply buy the machines from one of the three manufacturers (ASML, Nikon, and Canon). We spend an R&D budget of 500 M€ per year to develop these machines; Intel's R&D costs are likely mostly in the design of their chips and optimizing process parameters to squeeze as much as possible out of their fabs.

Why does Intel need to push the envelope that hard and that fast just to create a product that will, in the end, have extremely low yield and extremely high cost?

Just so they can adhere to some ancient "law" proposed by one of their founders? It's time to let go of Moore's Law. It's outdated and doesn't scale well... just like the x86 architecture! *ba-dum, chhh*

At the extreme, maybe it might be time for a new CPU architecture? Intel has been doing so much stuff behind the scenes to keep the x86 architecture going, that it may be time to just bite the bullet and move to something that doesn't require as much translation?

Itanium comes to mind here because it offers a dizzying amount of registers, both FPU and CPU available to programs. To boot, it can emulate x86/amd64 instructions.

Virtual machine technology is coming along rapidly. Why not combine a hardware hypervisor and other technology so we can transition to a CPU architecture that was designed in the past 10-20 years?

Very true, but it eventually needs to be done. You can only get so big with a jet engine that is strapped onto a biplane. The underlying architecture needs to change sooner or later. As things improve, maybe we we will get to a point where we have CPUs with enough horsepower to be able to run emulated amd64 or x86 instructions at a decent speed. The benefits will be many by doing this. First, in assembly language, we will save a lot of instructions because programs will have enough registers to do acti

Itanium failed because it used a VLIW architecture - great for specialized processing tasks on big machines but for general purpose computing (ie. what 99.9% of people do) it wasn't much faster than x86.

Are computers really 'too slow' now? It seems to me that an x64 desktop at 3GHz is fast enough for just about anything a normal person would do. The only "normal task" I can think of that's too slow at the moment is decoding x264 video on netbooks and they're better off with a little hardware decoder tacked

Itanium failed because it used a VLIW architecture - great for specialized processing tasks on big machines but for general purpose computing (ie. what 99.9% of people do) it wasn't much faster than x86.

Itanium failed - because it could not run x86 code at an acceptable speed. Which meant that if you wanted to switch over to Itanium, you had to start from scratch - rebuying every piece of software that you depended on, or getting new versions for Itanium.

AMD's 64bit CPUs, on the other hand, were excellent at running older x86 code while also giving you the ability to code natively in 64bit for the future. AMD's method took the market by storm and Intel had to relent and produce a 64bit x86 CPU.

(There were other reasons why Itanium failed - such as relying too much on compilers to produce optimal code, cost of the units due to being limited quantity, and Intel arrogance.)

x86 and amd64 have an installed base. Itanium doesn't. This doesn't mean x86 is any better than Itanium, in the same way that Britney Spears is better than $YOUR_FAVORITE_BAND because Britney has sold far more albums.

Intel has done an astounding job at keeping the x86 architecture going. However, there is only so much lipstick you can put on a 40 year old pig.

Um, actually Intel has done a lot of work on the architecture and microarchitecture of its processors. The CPUs Intel makes today are almost RISC like, with a tiny translation engine, which thanks to the shrinking size of transistors takes a trivial amount of die space. The cost of adding a translation unit is tiny, compared to the penalty of not being compatible with a vast majority of the software out there.

Itanium was their clean room redesign, and look what happened to it. Outside HPCs and very niche applications, no one was willing to rewrite all their apps, and more importantly, wait for the compiler to mature on an architecture that was heavily dependent on the compiler to extract instruction level parallelism.

All said, the current instruction set innovation is happening with the SSE, and VT instructions, where some really cool stuff is possible. There is something to be said for the choice of CISC architecture by Intel. In RISC ones, once you run out of opcodes, you are in pretty deep trouble. In CISC, you can keep adding them,making it possible to have binaries that can run unmodified on older generation chips, but able to take advantage of newer generation features when running on newer chips.

You are right in that a new architecture could offer improved performance, however it is a one shot deal. Once you've rolled out the new architecture there will be a short period while everything catches up and then you are right back to cramming more on the die.

The point of a new architecture would be for it NOT to be a one shot deal and that it would give you ample room for evolution before hitting physical limitations at least for a few "generations". The problem of stepping sideways is the risk. You don't have to look too far on/. to find other examples of civilization being irrationally tied to a legacy they're unwilling to walk away from even if by doing so they accept mediocre technology.

It seems to be almost an article of faith with geeks that if only we didn't have that nasty x86 we could have so much better chips. However the thing is, there ARE non-x86 chips out there. Intel and AMD may love it, others don't. You can find other architectures. So then, where's the amazing chip that kicks the crap out of Intel's chips? I mean something that is faster, uses the same or less power and costs less to produce (it can be sold for more, but the fab costs have to be less). Where is the amazing ch

Because nowadays, the ISA is really very little impact on resulting performance. The total die space devoted to translating x86 instructions on a modern Nehalem is tiny compared to the rest of the chip. The only time the ISA decode logic matters if for very low power chips (smartphones). This is part of the reason why ARM is so far ahead of Intel's x86 offerings in that area.

Modern x86, with SSE and x86-64, is actually not that bad of an ISA and there aren't too many ugly workarounds necessary anymore that justify a big push to change.

Moore's Law has nothing to do with computing power, but with the NUMBER of transistors on a piece of silicon. Which he said would double every 2 years, which has be petty much true and will remain true for the next decade most likely.

I miss the pressure AMD used to put on Intel. When Intel had an agile competitor often leaping ahead of it chip speeds shot up like a rocket - seems like they've been resting on their laurels lately...

The latest revision of my Phenom II X4 disagrees with you. The Phenom II series is absolutely steamrolling over every other Intel product in its price range.

Hint: Notice I said "in its price range." Because not everyone prefers spending $1300 on a CPU that's marginally better than one at $600. It seems like Intel has stepped away from the "chip speed" game and stepped right into "ludicrously expensive".

The only Intel chips that are $1000+ are those that are either a few months old and/or are of the "Extreme" series. The core i7-860s and 930s are under 300 bucks and pretty much the entire core i5 line is at 200 or less.

The price difference is negligible between AMD and Intel boards, unless you are attending the race to bottom, where AMD rules. You also can't upgrade from an AM2 to AM3 CPU on a AM2 board. The talk about upgrading is meaningless in a broader sense too: Why would you buy something not optimal just so that you can upgrade it later? It's false economy, get the best you can afford now, and a whole new rig with whole new tech a few years later.

You also present a false dichotomy, because upgrading isnt ONLY about buying suboptimal hardware and then upgrading it later. Anyone who purchased bleeding edge AM2 gear when it was introduced can get a bios update and then socket an AM3 Phenom II chip. They still only have DDR2, but amazingly Phenom II's support both DDR2 on AM2 and DDR3 on AM3.

So that guy who purchased a dual-core AM2 Phenom when they were cutting edge can now socket a hexa-core AM3 Phenom II.

Its amazing what designing for the future gives your customers. Intel users have only rarely had the chance to substantially upgrade CPU's.

For the false dichotomy part, you build up another in your case, too. In the last few years (AM2 and AM3 age), the quad cores haven't been too expensive compared to the dual cores. Your example user has made the wrong choice when buying the dual core in the first place; the combined price of the dual and the hexa core CPUs would have given him/her a nice time in multithreaded apps for t

You asked me to provide evidence supporting my claim of 2x performance gains and 8x the pricetag. I did exactly that. AMD and Intel may be in a tight race at the midrange ($140-$200) but the interoperability between AMD's three socket specs (AM2,AM2+,AM3) and the DDR2/DDR3 backwards compatability are what send AMD leaps and bounds ahead of Intel. From a holistic standpoint AMD's offering is alot more stable in the long-term, and this is how they steamroll over the competition.

I think there has been a major article asking this question every six months for the last decade. Then: surprise surprise, there's a new tech development that improves the technology. We've been "almost at the physical limit" for transistor size since the birth of the computer, why will it be any different this time?

Because sooner or later, it has to be. You reach a breaking point where the new technology is sufficiently different from the old that they don't represent the same device anymore. I think you'd have to be crazy to think that we're approaching the peak of our ability to solve computational problems, but I don't think its unreasonable to think that we're approaching the limit of what we can do with this technology (transistors).

Eventually there's a theoretical limit, a limit that can't be exceeded without violating the laws of physics, specifically quantum mechanics. Once your transistors get close enough together, the probability of an electron tunneling from one side to the other gets high enough that it isn't possible to tell between your on and off states. We are rapidly approaching that limit even if all the manufacturing issues can be overcome (I believe it's somewhere around 5nm, but I could be wrong).

The article mentions "dark transistors", which are transistors on the chip that can't be powered because you can't get enough power onto the chip. This is the problem that reversible [theregister.co.uk] computing [wikipedia.org] was supposed to solve.

People have been proposing circuits for regenerative switching (mainly for clocking) for a long long time. The problem always being that if you add an inductance to your circuit to store and feedback the energy, you will significantly decrease how fast you can switch.

Also, you think transistors are difficult to build in small sizes? Try building tiny inductors.

Current technology is based on a single planar layer of silicon substrate. A chips is built with a metal interconnect on top. But the base layers are essentially a 2D structure. We are already postprocessing things with thru vias to stack substrates into a single package. The increases density from the package perspective.Increasing technologies in stacking will keep Moors law going for another decade (as long as you consider Moor's law to be referencing density in 2D).

It is close and this region is sometimes referred to as "soft" X-rays but there is nothing incorrect about the "UV" moniker. It also helps to distinguish EUV from actual X-ray lithography, a largely abandoned approach which used wave lengths on the order of 1nm. http://en.wikipedia.org/wiki/X-ray_lithography [wikipedia.org]

There's actually some truth to this. Originally it was called soft x-ray projection lithography. The other type of x-ray lithography was a near contact shadow technique using shorter (near 1nm) x-rays. To distinguish the two techniques they changed the name from soft x-ray to EUV.

This was also done for marketing reasons. X-ray lithography had failed (after sinking a lot of $$ into it), while optical lithography had successful moved from visible to UV, to DUV. By calling it EUV it sounds like the next

Folks don't often realize how much work we software writers go through to write this big, complex, core-eating software. Back in the day with 8-bit 500 KHz CPUs we could write a simple 1000-iteration loop with a bit of code in it, and it might lag the CPU for a whole second. Now with these fast processors we have to go through all kinds of hoops to use up all those cycles! Building languages on top of languages, interpreted languages, all kinds of extra error checking (error checking can often take 80%-90% of the cycles and code), objects on top of arrays on top of pointers on top of objects... you get the idea. SOMEBODY has to make the software to use up all those cycles.

It's a dirty job, but somebody has to do it!!!

WE CAN NOT LET THE HARDWARE PEOPLE WIN!!! For every added processor, every bump in Hz, we WILL come up with a way to burn it! Soon we will embark on the new 3D ray-traced desktop - THAT will keep the HW folks busy for a while!!! And (don't tell anybody) soon we will establish the need for full time up-to-date indexing of everything on the LAN. Of course, that could be done by one machine, but if we all do it independently on each machine, that will burn another whole 2GHz CPU's worth of cycles.

Our goal and our motto: "A computer is nothing but a very complicated and expensive heater.":D

The diameter of a silicon atom is roughly. 0.25 nm. That means that 32nm is about 120 atoms across. A 16nm line is about 60 atoms across.

For reliable use, there is going to be an approximate minimum to number of atoms in a line. Electron interactions among individual atoms are quantum events, so for any sort of predictability you're going to need enough atoms for the probabilities to average out enough. I don't know how many that is, but it pretty much has to be more than one.

Another critical dimension is gate thickness. When you speak of a 16 nm process, you are (generally) talking about the minimum dimension in the XY plane, which is usually reserved for gate length. Gate thickness is a much smaller dimension, and if I recall correctly we're already down to about 4 molecules of thickness. Quantum tunneling is a problem.

Why hasn't Intel rolled out 3D chips stacked in layers, with microfluidics cooling between layers? I used to see all kinds of engineering PR about it, but it's been years since I saw any progress, and it's taken way longer than I expected.

3D would not only increase the amount of transistors (and other devices) fit into a "chip", but put the circuits closer together, requiring less voltage/power and shorter propagation times. What's holding it up?

Actually, 3D has picked up quite a bit in the last few years. However, the primary interest is connect different chips together in the same package with short, fast, interconnect. It's a lot better than conventional System In Package and much much better than circuit board connections. Unfortunately, the connections are a bit too coarse to spread a single design like an Intel processor across the layers.

For that you need more sophisticated methods like growing a new wafer on top of one that has already been built up. These methods are not yet ready for production.

It has always been about making it smaller. Clock speed was able to increase because the chips got smaller. We were able to add more cores per die because the chips got smaller. Moore's law is about size: it doesn't say computers will get faster, it says they will get smaller.

What we are able to do with the smaller chips is what's changed. Raising the clock speed worked for years, and that is the best option, but because of physical problems, in the latest generations we weren't able to do that. So the next best thing is to add cores. Now the article is suggesting we may not even be able to do that anymore.

I will tell you I've been reading articles like this for as long as I've known what a computer was, so if you're a betting man, you would do well to bet against this type of article every time you read it. But in theory it has to end somewhere, unless we learn how to make subatomic particles, which presumably is outside the reach of the research budget at Intel.

another problem is that adding cores is not as effective, right now, as upping clock speed.

this may change however if the designs change from multiple universal cores to something more like a the cell cpu that powers the playstation 3, or maybe something like the the latest GPUs. Basically, a couple of universal cores like before (as they provide some benefit, if the os do a proper job in spreading processes across them) combined with multiple simpler cores that can be arranged like a assembly line. Then yo

And today, we already know the problem with this approach: most everyday problems aren't easily parallelizable. Yes, there are specific areas where the problems are sometimes embarrassingly parallel (some scientific/number crunching applications, graphics rendering, etc), but generally speaking, your average software problem is unfortunately very serial. As such, those multiple cores don't provide much benefit for any single task. So if you want to execute one of these problems faster, the only thing you can do is ramp up the clock rate.

Actually they do go faster. Clock speed doesn't mean processing speed. Modern CPUs do much more per clock cycle than their predecessors because of their greater instruction-level parallelism, shorter instruction latencies, larger caches, etc. While their cores don't generally operate at a higher frequency, they perform many times faster.

That's not even considering the additional cores and massively improved power efficiency. It's difficult to overstate just how fucking amazingly good CPUs are now.

I'd settle for less bloat-ware. Back in the day amazing things were done with extremely limited CPU resources by programming closer to the wire. Now we have orders of magnitude more resources but most programming is done at a very high level with numerous layers of inefficiency which negates, possibly more than negates, the benefits of increased CPU resources. Yes, yes- I wax a little "in my day/up hill both ways, etc." but do the benefits of high level programming and efficient use of resources have to be