You are currently viewing SemiWiki as a guest which gives you limited access to the site. To view blog comments and experience other SemiWiki features you must be a registered member. Registration is fast, simple, and absolutely free so please, join our community today!

CMOS power trend: data and comments desired

I posted this before in a thread about computer power, I think better to start a new thread. Since the 90 nm node, leakage power is taking a greater share of total power. I think every node has gotten worse, and some of talk of leakage reduction with 28 and below seems to be wishful thinking or selling. Here is what I am after: data that speaks to the real practical ratio of speed and dynamic power improvement seen on a realistic circuit which takes into effect the (seems to me) increased RC delays between the gates. For example, let's say you synthesize something simple like a 32x32 multiplier in a few process nodes. what really matters to me is the simple summary of area/performance/dynamic_power/static_power. not to get too fussy, but performance should be at the slow corner, and leakage at the hi temp corner.At a first pass, that is what matters to me. I am guessing that:

a) from 65 to 40 to 28 the real speed improvement has been modest.b) the ratio of leakage power to dynamic power ( at max performance speed) has increased.

Building really large chips (FPGA or otherwise) that have alot of transistors sitting around leaking has become more painful. The leakage current is notoriously difficult to control , so leakage current can be a yield limiting constraint in many designs now.Can anyone point me to some graphs that show this kind of data???It seems that things got tougher after 130. by the way, any comparison has to compare LP to LP or G-like to G-like to be useful.

I posted this before in a thread about computer power, I think better to start a new thread. Since the 90 nm node, leakage power is taking a greater share of total power. I think every node has gotten worse, and some of talk of leakage reduction with 28 and below seems to be wishful thinking or selling. Here is what I am after: data that speaks to the real practical ratio of speed and dynamic power improvement seen on a realistic circuit which takes into effect the (seems to me) increased RC delays between the gates. For example, let's say you synthesize something simple like a 32x32 multiplier in a few process nodes. what really matters to me is the simple summary of area/performance/dynamic_power/static_power. not to get too fussy, but performance should be at the slow corner, and leakage at the hi temp corner.

One should be aware that leakage current consist of two part: the gate leakage and the source drain leakage. The former is when the gate oxide becomes too thin so it starts to leak the, latter is that the transistor becomes too narrow (e.g. small L) to be able to turn it off fully. The former was fixed by using high-k/metal gates (HKMG) the latter is being fixed by having a fully depleted channel e.g. FD-SOI or FinFET. Also I think introduction of HKMG has made the leakage power better for one or two nodes.

Originally Posted by pseudosemi

It seems that things got tougher after 130. by the way, any comparison has to compare LP to LP or G-like to G-like to be useful.

I think you have to be careful with such a statement; for each node you will need to make a trade-off study which option is best for your specific design and the conclusion may be counter-intuitive. That's why we at imec are providing a service that does exactly that: Technology Targeting Service-imec. For example LP may only be the lowest power option if you really can compromise sufficiently on speed and GP may be also more power efficient if you need to reach certain time constraints.

I heard someone say recently that they use multicore CPUs but turn on only one and overclock it. I think that will be the approach going forward - power gating rather than clock gating. You can round-robbin what's being used to get redundancy and avoid hot spots.

The "overclocking" means your useful power outweighs the leakage power, with no leakage power for thing that are off.

Great questions, I do know that the push for using FinFets and FD-SOI are to reduce leakage currents to an acceptable, and lower level.

It's not just to reduce leakage currents, it's also to allow operation at lower supply voltage to reduce dynamic power. With both FinFETs and planar FDSOI the gate has better electrostatic control over the channel compared to 20nm (the end-of-the-line bulk planar process) so subthreshold slope is steeper (lower leakage), also device variation is less (both random device-to-device and also wafer-to-wafer and lot-to-lot). This means dynamic power is less (CV^2) as well as leakage power, so overall power is consderably lower.
This is also why there's the big push to the so-called "14nm" processes (Intel, Samsumg, TSMC, GF) which are really 20nm processes (same metal) with faster lower-power transistors -- 20nm doesn't deliver the power saving expected for a node shrink and cost per gate is similar to 28nm, so applications which need low power (mW/MIP) will probably leapfrog 20nm and ones which need low cost will stay in 28nm.

It's not just to reduce leakage currents, it's also to allow operation at lower supply voltage to reduce dynamic power. With both FinFETs and planar FDSOI the gate has better electrostatic control over the channel compared to 20nm (the end-of-the-line bulk planar process) so subthreshold slope is steeper (lower leakage), also device variation is less (both random device-to-device and also wafer-to-wafer and lot-to-lot).

Actually device variability is also an important contributor to the minimal supply voltage you can use. Let's consider a big SRAM block (1Mbit or so). The lower values of the Vt will be a big factor to the yield; the higher values of the Vt will determine the supply voltage you'll need to get a certain performance for the block. So if Vt variation is better you can lower the nominal Vt and still have a good yield. Also the highest Vt will then be lower and you can use a lower supply voltage. That's why supply voltages haven't scaled with the new technology nodes because variability becomes bigger.

Actually device variability is also an important contributor to the minimal supply voltage you can use. Let's consider a big SRAM block (1Mbit or so). The lower values of the Vt will be a big factor to the yield; the higher values of the Vt will determine the supply voltage you'll need to get a certain performance for the block. So if Vt variation is better you can lower the nominal Vt and still have a good yield. Also the highest Vt will then be lower and you can use a lower supply voltage. That's why supply voltages haven't scaled with the new technology nodes because variability becomes bigger.

That's exactly what I meant about allowing smaller device variation allowing lower Vdd. However going to a fully-depleted channel does reduce the variability because the biggest cause is random dopant fluctuation due to too few implant ions in the channel.

FDSOI has the lowest variation, then Finfet on SOI (has additional causes of variation like fin height and etching), then FinFET on bulk (some residual doping in channel from implant needed under fin to block channel formation in substrate), then planar bulk (RDF dominates mismatch).

So FinFET can operate at somewhat lower voltage than bulk planar without losing speed while saving significant power, FinFET on SOI even more so. FDSOI has the lowest mismatch (and allows Vth tuning using back gate to largely remove process variation) so can operate at the lowest voltage, it's the most power-efficient of all (also lowest capacitance) if this is critical and the circuits can be designed to run more slowly (increased parallelism), but is not as fast as FinFET in most cases because the drive current is lower.