Wednesday, September 25, 2013

Overclocking Haswell Quick Fail Method

Overclocking is the process of increasing the clock speed on a system processor with the idea of getting higher performance. Given the time, this can be an interesting pursuit if you like pushing hardware performance to the highest level possible. With proper testing, this can be done once on a system and it will remain stable for 24/7/365 use for the life of the platform.

With a week off of work I decided to upgrade my aging i7-920 x58 platform to a Haswell based i7-4770k/z87. Overclocking this platform proved to have a steep learning curve, so I figured I'd share my general approach so others could save time getting started.

Overclocking Quick Fail Method

The general idea of the quick fail method (not sure if that's a name, but let's just pretend it is now) is to isolate each component of your build and then push it quickly to failure, then back it off and find the appropriate voltage/speed combo.

Software/Hardware Needed

Prime95 v.27.9 for Stress Testing (Discussion Below); Alternatively you can use OCCT.

Intel K series processor (K series has an unlocked multiplier)

z87 chipset motherboard that supports overclocking; I used Asus but there are good boards from Gigabyte, MSI, and others as well.

Getting Started/Clocking Approach

Make Sure you have applied your thermal interface material correctly and disabled the IGP in the BIOS(assuming you're not using it). After your hardware is ready to go, we need to isolate components for testing. By doing so, we can more effectively ensure that each piece is stable before we combine them. We'll tweak/test in the following order:

Those are also listed in order of performance. The Uncore, for example, does not need to be synced to the core clock and is not nearly as important from a performance perspective.

Familiarize Yourself With Voltage and Temperature Norms

Before getting started, I recommend familiarizing yourself with the voltage/temperature behavior of you CPU at stock. To do so, we'll use the ADIA64 sensor display and the built in stress-test. (Note I don't recommend ADIA for stress testing other than here to gen heat/voltage) Launch ADIA64 and open the sensor display by expanding Computer->"Sensor". Take note of the CPU Core (vCore), CPU Cache (vCache, sometimes called vRing), VCCSA (or Vsa, System Agent), and CPU VRM (vRIN) settings, as we'll be targeting those for modification later. Launch the system stability test tool by selecting "Tools"->"System Stability Test" and click "Start". Watch the behavior of the voltages and take note of which are dynamic, and how they behave under load, and what they top out at. You'll need to know this for comparison purposes when we start adjusting.

Setting Voltages

Most z87 motherboards offer three or four options for overriding voltages: Offset, Adaptive, Manual and Auto. Let's take a moment to discuss each:

Auto: It's wise to avoid Auto unless you don't plan on spending much of any time testing. Most boards will apply far too much voltage to many aspects of the chip, creating additional heat which will waste energy and negatively effect your clocking potential.

Manual will set voltage to a static number, which I'd shy away from as well for regular 24/7 use as it will counter alot of the benefit of the Haswell chip by not allowing for a dynamic reduction in voltage. Reducing the voltage reduces power consumption and heat mainly while idle.

Offset: Now we're getting somewhere... Offset will apply a modifier to the target voltage on top of Intel's scaling algorithm. For example, if the vCore were set to .95 at a given point in time, and you specified a +.10 offset, then the resulting voltage would be 1.05. The only downside here is that the offset is always applied, but at least the voltage can scale downward. That leads us to:

Adaptive adds additional voltage on top of the offset voltage, but only while the turbo multiplier is active. This is a great new addition as it allows whatever is set here to be entirely rolled off when not in turbo mode. Unfortunately, you will still need to use offset for most of the voltage boost in most cases to keep things stable. A good rule of thumb to start with is putting the desired extra voltage 75% into the offset and 25% into the turbo offset.

Note that many have found that adjusting the LLC ramp is not recommended when using Offset/Adaptive.

Stress Testing, Throttling, and the AVX Problem

Why Prime95?

Of all the tools I've tried, including ADIA64, Prime95 has provided me the most consistent and reproducible results. Prime seems to do a better job of bombing an unstable Haswell than a lot of other tools. For example, I was able to complete an ADIA64 stress test for two hours while the same setup locked up my system nearly immediately on Prime95SmallFFT.

More importantly, Prime allows for an effective customization of what portions of the CPU are being tested. SmallFFT will test the CPU and a bit of the cache(s) while Blend will effectively test the uncore and quite a bit of the system memory.

Haswell Thermal Throttling

The Haswell has an internal throttling mechanism (Thermal Control Circuit) to protect the chip that cannot be disabled. This throttling reduces the speed of the processor until the temperature drops below the temperature specified by Intel. For more information on the throttling point (at most 100 deg C) see table 26 in this document from Intel.

The AVX Issue

AVX are a series of extensions originally introduced in the SandyBridge platform that facilitate accelerating certain math functions. The problem with AVX is that when utilized by stress testing programs they're generally used in an "unrealistic" way (much more often than one would see) generating substantially more heat than the non-AVX equivalent processing. This can be enough heat to trigger thermal throttling of the Haswell chip which invalidates stress testing because it causes the chip to run at a slower speed.

We'll start testing with AVX enabled because we do want to ensure the cooling solution can handle as much heat as theoretically could be thrown out, but if it causes the chip to throttle we will disable it. That will be covered below.

Determine Max Core Speed

Lower Speed of Non-Core Components

To ensure we can definitively find the failure point of the core we need to take the uncore and memory out of the picture.

Reduce the "Uncore" (sometimes called CPU Cache) multiplier to something substantially (5-8 depending on how aggressive) lower than your target overclock. For example if you are targeting 4.5Ghz, set the max Uncore multiplier to 38. If there is an option for minimum Uncore multiplier, leave it at "Auto".

Reduce the memory speed to something your RAM can easily handle, i.e. if you have DDR-2133, set it to DDR-1600 speed.

Overclock the Core

Up the multiplier of the cores (if you have the option just sync all) to a reasonable starting point. On the i7-4770, for example, a good starting point would be 40 or so.

To ensure that your system isn't being throttled launch the ADIA64 stress testing tool but don't start it. The lower portion of the display will monitor and alert on throttling.

Perform a 20 minute SmallFFT stress test using Prime95. To do so, launch the 64 bit Prime95 client and select the "SmallFFT" option. Make sure you monitor and take notes of the system voltages and temperatures during the test. If there is a problem with the overclock at this stage, it will generally manifest itself as a BSOD. If your system throttles more than occasionally you need to disable AVX (see Appendix A) and re-start testing.

If the system seems stable thus far feel free to up the multiplier and repeat. If the system crashed and you want to push further with the understanding that with more voltage and speed comes more power consumption and heat, go ahead and add voltage (see above for methods) and then repeat testing. Voltage to vCore should be added in steps of .01v to .02v or so. I wouldn't exceed 1.35v under load for 24/7 running. If vCore doesn't seem to have an impact here, you may want to bump CPU Input Voltage (vRIN) or the uncore voltage (vCache) just a bit... just don't go to high with those because they don't matter nearly as much as vCore.

After working your multiplier and voltage up to where you're willing to go, perform an extended stability test by saving the BIOS settings and then running the Prime95 SmallFFT test for at least 4 hours. If the extended test fails, bump voltage up or multiplier down and re-start testing.

Overclock the UnCore

Use the same methodology listed above to clock the Uncore. The following modifications exist: the main voltage to change is vCache rather than vCore. To ensure the uncore is stable you'll need to run Prime95 Blend after you finish the SmallFFT test. Don't be surprised if SmallFFT is stable and Blend requires more voltage and/or lowering the multiplier.

Clock Your Memory

Finally, clock your RAM using the correct voltage, speed, and timing adjustments. (Details outside the scope of this article) Use an extended Prime95 Blend test to finalize everything. Personally I'm not comfortable with anything less than 24 hrs for the final test, but opinions on that vary.

Results/Final Thoughts

Obviously there is a ton here I'm not covering, but hopefully this information will save you time when starting your overclock. In some cases you may need to manipulate other voltages and voltage ramps, but those vary between motherboards so I won't touch on them.

Using this methodology I was able to get my water-cooled i4770k 24/7 stable @ 4.8Ghz/4/4Ghz Uncore without de-lidding. Not too shabby I should think. Finally I can get the performance I've been looking for when running Lotus 1-2-3.

Appendix A: (If Needed) Disable AVX

Launch command prompt as administrator

Execute "bcdedit /set xsavedisable 1"

Reboot

Remember to re-enable when done testing by using the same methodology, only executing bcdedit /set xsavedisable 0 rather than 1.

Appendix B: General Usage Notes

If you want to take full advantage of Haswell's power management make sure you use a power profile for 24/7 use that includes reducing the core processor speed. (Stored in advanced options)