If you've ever overclocked a system, chances are that at some point or another you've had opportunity to become upset with your Vdroop "problem." Some users, confused as to why their system refuses to exactly match actual processor supply voltage to the value specified in BIOS, are quick to blame the quality their motherboard; still others find fault with the difference noted between their board's idle and full-load processor supply voltages. Actually, load line droop (Vdroop) is an inherent part of any Intel power delivery design specification and serves an important role in maintaining system stability. In most cases, comments regarding unacceptable power delivery performance are completely unfounded. To make matters worse, unjustified negative consumer perception surrounding this often misunderstood design feature eventually forced a few motherboard manufacturers to respond to enthusiasts' demands for action by adding an option in their BIOS that effectively disables this important function.

Based on the currently running tasks, processor load can vary significantly during system operation. The voltage regulator module (VRM) circuit closely regulates CPU supply voltage by sensing instantaneous changes in processor loading and then responds by varying the individual on-time for a bank of power MOSFETs used to charge a multi-phased LC network. This LC network is responsible for providing all of the power demanded by the processor. If the VRM senses a decreasing supply voltage, it provides more current; the opposite is true in the case where voltage is rising. This cycle of sense-and-correct, known as negative feedback, can occur at that rate of thousands to millions of times per second, depending on the particular circuit's switching frequency.

Just like CPU power, CPU supply current increases quickly at higher frequencies

During periods of high CPU demand, the VRM circuit works hard to supply the current required by the processor. However, as soon as that load is gone, the VRM circuit must act quickly in order to reduce the current supply to the level needed to match the new demand. Because it's impossible for the VRM circuit to respond instantaneously, the larger the load change the greater the maximum potential peak overshoot voltage. Controlling the magnitude of these peak values is critical for maintaining system stability. By positioning the processor's no-load (idle) voltage level higher during periods of light loading, it's possible to sustain a larger negative voltage dip without crossing the processor's lower specified voltage limit. In addition, "drooping" the load voltage as a function of supply current allows the VRM to effectively limit the maximum positive peak overshoot voltage (experienced during a heavy to light load transient) to a value below the maximum allowable CPU voltage. This resulting control system ensures the processor supply voltage, regardless of CPU load, never violates a specified limit. The following figure should help to illustrate these concepts.

As intended, Voffset and Vdroop ensure that the supply voltage never exceeds CPU VID

The CPU VID setting establishes the absolute maximum allowable processor supply voltage experienced during transient conditions and is not the target idle voltage. We hope this statement draws attention to this important distinction, as many believe the opposite to be true - a mistake all too commonly made. Together, Vdroop and Voffset ensure that the peak CPU supply voltage seen during heavy to light loading changes remains well below the established maximum. If you determine that 1.17V, as in the case above, is not sufficient for maintaining CPU stability under load, simply increasing the CPU VID does correct the problem. Let's now examine how the system responds if we remove Voffset.

Voltage oscillations while leaving heavy load can cause problems with no Voffset

As we can see, the system exceeds maximum allowable processor voltage whenever any heavy to light load transient is significant enough to warrant one or more voltage excursions above the CPU VID value. Even worse, this all happens without the user's knowledge. Again, removing Voffset completely undermines the purpose of the VID setting - which establishes the maximum CPU voltage, not the target value.

Post Your Comment

56 Comments

Very impressive. Seems more like a thesis paper than a typical tech site article. While the content on AT is of a higher quality than the rest of the sites out there, I think the other authors, founder included, could learn a thing or two from an article like this. Less commentary/controversy and more quality is the way to go. Reply

"Do they worry more about the $5000-$10000 per month (or more) spent on the employee using a workstation, or the $10-$30 spent on the power for the workstation? The greater concern is often whether or not a given location has the capacity to power the workstations, not how much the power will cost."

For High Performance Computers (HPC a.k.a. supercomputers) every little bit helps. We are not only concerned about the power from the CPU, but also the power from the little 5 Watt Ethernet port that goes unused, but consumes power. When you are talking about HPC systems, they now scale into the tens-of-thousands of CPUs. That 5 Watt Ethernet port is now a 50 KWatt problem just from the additional power required. That Problem now has to be cooled as well. More cooling requires more power. Now can your infrastructure handle the power and cooling load, or does it need to be upgraded?

This is somewhat of a straw-man argument since most (but not all) HPC vendors know about the problem. Most HPC vendors do not include items on their systems that are not used. They know that if they want to stay in the race with their competitors that they have to meet or exceed performance benchmarks. Those performance benchmarks not only include how fast it can execute software, but also how much power and cooling and (can you guess it?) noise.

In 2005, we started looking at what it would take to house our 2009 HPC system. In 2007, we started upgrades to be able to handle the power and cooling needed. The local power company loves us, even though they have to increase their power substation.

Thought for the day:
How many car batteries does it take to make a UPS for a HPC system with tens-of-thousands of CPUs? Reply

this is a great article - very technical, will have to read it step by step to get it all ;-)

but i have one question that remains for me.. how is it about electromigration with the very filigran 45nm structures? we have here new materials like the hafnium based high-k dielectricum, guess this may improove the resistance agains em... but how far may we really push this cpu until we risk very short life and destruction? intel gives a headroom until max 1.3625V .. well what can i risk to give with a good waterchill? how far can i go?

i mean feeding a 45nm core p.ex. 1,5V is the same as giving a 65nm 1,6375! would you do that to your Q6600? Reply

Electromigration is an effect usually seen in the interconnect, not in the gate stack. It occurs when a wire (or material) has a high enough current density that the atoms actually move, leading to an open circuit, or in some cases, a short.

To address your questions:
1. The high-k dielectric in the gate stack has no effect on the resistance of the interconnect
2. The finer features of wires on a 45nm process do have a lower threshold to electromigration effects, ie smaller wires have a lower current density they can tolerate before breaking.
3. The effects of electromigration are fairly well understood at this point, there are all kinds of automated checks built in to the design tools before tapeout as well as very robust reliability tests performed on the chips prior to volume production to catch these types of reliability issues.
4. The voltage a chip can tolerate is limited by a number of factors. Ignoring breakdown voltages and other effects limited by the physics of transistor operation, heat is where most OC'ers are concerned. As power dissipation is most crudely though of in terms of CVf^2 (capacitance times voltage times frequency-squared), the reduced capacitance in the gate due to the high-k dielectric does dramatically lower power power dissipation, and is well cited. The other main component in modern CPU's is the leakage, which again is helped by the high-k dielectric. So you should expect to be able to hit a bit higher voltage before hitting a thermal envelope limitation. However, the actual voltage it can tolerate is going to depend on the CPU and what corner of the process it came from. In all, there's no general guideline for what is "safe". Of course, anything over the recommended isn't "safe", but the only way you'll find out, unfortunately, is trial and error. Reply

Doh! Just noticed my own mistake:
high-k dielectric does not reduce capacitance! Quite the contrary, a high-k dielectric will have higher capacitance if the thickness is kept constant. Don't know what I was thinking.

Regardless, the capacitance of the gate stack is a factor, as the article mentioned. I don't know how the cap of Intel's 45nm gate compares with that of their 65nm gate, but I would venture it is lower:

1. The area of the FET's is smaller, so less W*L parallel plate cap.
2. The thickness of the dielectric was increased. Usually this decreases cap, but the addition of high-k counter acts that. Hard to say what balance was actually achieved.

Asking how much voltage can be safetly applied to a (45nm) CPU is a lot like asking which story of a building can you jump from without the risk of breaking both legs on the landing. There's inherent risk in exceeding the manufacturer's specification at all and if you asked Intel what they thought I know exactly what they would say -- 1.3625V (or whatever the maximum rated VID value is). The fact of the matter is that choices like these can only be made by you. Personally, I feel exceeding about 1.4V with a quad 45nm CPU is a lot like beating your head against a wall, especially if your main concern is stability. My recommendation is that you stay below this value, assuming you have adequate cooling and can keep your core temperatures in check. Reply