Reducing power consumption on Haswell and Broadwell systems

Edit to add: These patches on their own won't enable this functionality, they just give us a better set of options. Once they're merged we can look at changing the defaults so people get the benefit of this out of the box.

Haswell and Broadwell (Intel's previous and current generations of x86) both introduced a range of new power saving states that promised significant improvements in battery life. Unfortunately, the typical experience on Linux was an increase in power consumption. The reasons why are kind of complicated and distinctly unfortunate, and I'm at something of a loss as to why none of the companies who get paid to care about this kind of thing seemed to actually be caring until I got a Broadwell and looked unhappy, but here we are so let's make things better.

Recent Intel mobile parts have the Platform Controller Hub (Intel's term for the Southbridge, the chipset component responsible for most system i/o like SATA and USB) integrated onto the same package as the CPU. This makes it easier to implement aggressive power saving - the CPU package already has a bunch of hardware for turning various clock and power domains on and off, and these can be shared between the CPU, the GPU and the PCH. But that also introduces additional constraints, since if any component within a power management domain is active then the entire domain has to be enabled. We've pretty much been ignoring that.

The tldr is that Haswell and Broadwell are only able to get into deeper package power saving states if several different components are in their own power saving states. If the CPU is active, you'll stay in a higher-power state. If the GPU is active, you'll stay in a higher-power state. And if the PCH is active, you'll stay in a higher-power state. The last one is the killer here. Having a SATA link in a full-power state is sufficient to keep the PCH active, and that constrains the deepest package power savings state you can enter.

SATA power management on Linux is in a kind of odd state. We support it, but we don't enable it by default. In fact, right now we even remove any existing SATA power management configuration that the firmware has initialised. Distributions don't enable it by default because there are horror stories about some combinations of disk and controller and power management configuration resulting in corruption and data loss and apparently nobody had time to investigate the problem.

I did some digging and it turns out that our approach isn't entirely inconsistent with the industry. The default behaviour on Windows is pretty much the same as ours. But vendors don't tend to ship with the Windows AHCI driver, they replace it with the Intel Rapid Storage Technology driver - and it turns out that that has a default-on policy. But to make things even more awkwad, the policy implemented by Intel doesn't match any of the policies that Linux provides.

In an attempt to address this, I've written some patches. The aim here is to provide two new policies. The first simply inherits whichever configuration the firmware has provided, on the assumption that the system vendor probably didn't configure their system to corrupt data out of the box[1]. The second implements the policy that Intel use in IRST. With luck we'll be able to use the firmware settings by default and switch to the IRST settings on Intel mobile devices.

This change alone drops my idle power consumption from around 8.5W to about 5W. One reason we'd pretty much ignored this in the past was that SATA power management simply wasn't that big a win. Even at its most aggressive, we'd struggle to see 0.5W of saving. But on these new parts, the SATA link state is the difference between going to PC2 and going to PC7, and the difference between those states is a large part of the CPU package being powered up.

But this isn't the full story. There's still work to be done on other components, especially the GPU. Keeping the link between the GPU and an internal display panel active is both a power suck and requires additional chipset components to be powered up. Embedded Displayport 1.3 introduced a new feature called Panel Self-Refresh that permits the GPU and the screen to negotiate dropping the link, leaving it up to the screen to maintain its contents. There's patches to enable this on Intel systems, but it's still not turned on by default. Doing so increases the amount of time spent in PC7 and brings corresponding improvements to battery life.

This trend is likely to continue. As systems become more integrated we're going to have to pay more attention to the interdependencies in order to obtain the best possible power consumption, and that means that distribution vendors are going to have to spend some time figuring out what these dependencies are and what the appropriate default policy is for their users. Intel's done the work to add kernel support for most of these features, but they're not the ones shipping it to end-users. Let's figure out how to make this right out of the box.

We are looking at these issues for the Fedora Workstation, but due to the new kernel engineer in the Fedora team only starting last week we hadn't had a chance to even think about the kernel space yet. But Josh will likely be reaching out to you about this at some point.

Hopefully we can work with you to make this stuff top notch in Fedora.

It's great to see someone working on reducing power usage on linux, every little bit counts :) Also, it's always nice to see your explanations of the issues you deal with.

What jumped into my mind right away was this[1-2]. Having bought my first SSD very recently and doing some googling about any possible troubles I found those bug reports.

Correct me if I'm wrong, which I may very well be, but your proposed changes will affect all systems to more aggressively try to save power. If I'm interpreting the bug reports and your changes correctly, it seems that in a few corner cases it might lead to problems.

Just wanted to raise this concern as some hardware combinations might not play well with these changes.

The min_power state enabled ALPE (aggressive link PM enable), which allowed the host to promote the power management mode from PARTIAL to SLUMBER. It also enabled devslp, a mode that asks the device to power itself (rather than the link) down when idle. Windows doesn't enable either of these by default, even with the IRST driver, so it's a less tested configuration. The aim of these patches is to provide settings that are much more likely to work without breaking devices.

On your twitter you mentioned using some PSR patches from drm-intel. Do you have a list of what patches you used from that branch? Also, did you experience any flicker on your laptop as a result of enabling PSR?

I have noticed this difference simply with the command powertop --auto-tune.So now I launch it at boot, and go from 9 watts to 4.5! The autotune activates SATA power save, and also audio codec powersave, and even USB powersave.

So, if I understood correctly, building the kernel with these patches isn't enough: I still need to reconfigure something before building, right?

> The first simply inherits whichever configuration the firmware has provided, on the assumption that the system vendor probably didn't configure their system to corrupt data out of the box.

This may fail with odd combinations (common on desktops, since it's not a single vendor that put things together), but will probably stand true for laptop. I assume this is quite true for macbooks: given the small variety of hardware they use, they're probably quite finely tested.

Just patched a 4.1-rc1 kernel with your patches and will test drive it on a Haswell powered Lenovo T440s. I had bad experience with ALPM earlier and lost my partition table several times due to it, until I realized it was the power setting I tweaked, see:https://lkml.org/lkml/2014/1/20/486

I'm now trying medium_power and firmware_defaults settings. I will report in case I hit problems with one of that, fingers crossed :-)

I'm running this patch set on my XPS 13 (2015) and am not seeing significantly lower numbers with the 'min_power' and 'firmware_defaults' ALPM values set. Definitely hitting 4.x watts on normal-usage-idle but I was doing that prior to this set using TLP and running with max_power on the sata link (due to btrfs corruption concerns, etc.).

The following screenshot was with min_power: http://i.imgur.com/5xuJ5tv.png

On my Thinkpad T440p with a Haswell i7-4700MQ, Fedora 21 was never going below platform c-state PC3, even though the CPU cores and GPU were showing much deeper states (C7 and RC6, respectively). In a default installation, even after tuning things with powertop, it seemed to have terrible idle power draw of 10-12 watts at the dimmest backlight level.

This is using the Intel integrated GPU in Xorg and having the discrete NVIDIA GPU powered down by the nouveau module. (This machine has the displays connected to the Intel GPU and can only use the NVIDIA GPU as an off-screen rendering source.)

Remembering an older tip to save power, I restarted Xorg with 16 bits-per-pixel color depth, and now it is 80% or more in platform c-state PC6 at idle, with an idle power draw around 6-8 watts. Thus, I think either dynamic color depth changes (or ideally: LCD self-sustaining mode to allow the embedded DisplayPort to shut down at idle) will be necessary to get good power savings.

Nice article. I was searching for a reason why when running on battery the highest CPU frequency reached is 1200Mhz, min 800Mhz. While on AC it hits 2900Mhz when boost on single cpu or 2600 when on 2. I was looking for a way when on bat to reach 1900Mhz atleast ... running pstate with powersave governor (kernel 4.0.4 openSUSE Tumbleweed).

As per powerusage, on idle states it is around 5-6W (40% brightness, wifi on, blue/eth off) and can go all the way down to ~4W.This is much better than I had with T430s i5-3320, which I had around 7-8W. With firefox and in reading mode (no videos - flash/html5) the average is 8 ... which is quite good.

I have powertop + LTM running, and when in battery the SATA link power management for host1 and host 2 are automatically enabled (savings mode). And this featured had worked well even on previous Sandy cpu with kernel 3.x.

As my processor doesn't use PC6/PC7 I tried your patch and hoped there would be any improvement. But there was none - it still keeps using sleep states <=PC3 even if I disable the monitor, activate ASPM for all devices, apply all power saving improvements listed by powertop and apply your patches to linux 4.2. The processor is about 98% in C7 and 96% in RC6, uses PC2 for about 25% and PC3 for 50% - nevertheless something prevents it from using PC6/PC7. I enabled several kernel parameters for power saving: "pcie_aspm=force i915.enable_psr=1 i915.enable_fbc=1 drm.vblankoffdelay=1 i915.semaphores=1"Is there anything else, that I could try?Is there any chance this patches make it to the kernel - by know it seems like there was some data corruption and it's stalled (correct me if I'm wrong).By know I haven't noticed any errors in dmesg - but there was no improvement either.It would be great if someone could give me a hint what could be done thus PC6/PC7 are used.

Does all this apply to Ivy Bridge as well? With the 4.2.0 kernel and tlp started (powertop reports all tunables to be "good") I can never get below 12 Watts in idle on a 2012 Macbook pro 13" (no dedicated graphics card, just intel i915). Both Windows and OS X can get as low as 4,5 Watts.Kind regards. M

Well, this explains why when I had to switch from the min_power profile to the medium_power one to avoid crashes on my Broadwell Chromebook with an upgraded m.2 SSD, my battery life took a huge hit. The problem is that ALPM enabled will crash my system within about a minute of booting up due to an alleged firmware bug of the SSD controller (which is an ADATA drive). So if I want that incredible battery life back, I have to get a drive guaranteed to support ALPM on linux, and none explicitly says, so good luck, me!