Zack's Kernel News

New Power-Saving Techniques

Alejandra Morales announced Cryogenic [1], a kernel module that attempted to reduce a system's energy consumption by scheduling non-urgent input/output operations at times when the input/output devices on the system would already be in use. By scheduling this sort of activity in bursts rather than trickles, Cryogenic created a situation that kept devices in an idle, low-power state for longer periods of time. Alejandra added, "Cryogenic is the result of my Master's thesis, completed at the Technical University of Munich under the supervision of Christian Grothoff."

H. Peter Anvin replied, saying that Cryogenic's programmer interface was too clumsy and that such an ambitious project was likely to receive a lot of technical feedback as part of getting it into the kernel.

But, Peter said, "This is NOT a negative, but rather an indication that the work is valuable enough to work with to integrate it into the kernel. Most likely, in my opinion, making this a standalone driver just isn't going to fly, but rather we will want to integrate it into the core I/O model."

Pavel Machek also saw value in Cryogenic, particularly for devices like phones that typically perform input/output operations over whatever network is available at the time, with whatever signal quality applies at the time. Pavel felt that Cryogenic could schedule those operations to have a minimal drain on the device's battery.

Pavel also asked for some numbers that showed how much power would actually be saved. Christian Grothoff replied, "it depends on the device, but we have demonstrated power savings for two different types of devices using two different measurement setups performed by two independent groups. Some of the measurements are available on the website, the second set should become available 'soon' (but we can already say that for the scenario we measured, the savings are in the same range as before)."

Christian added that Pavel's phone usage idea was one of Cryogenic's main points.

Pavel looked at the data and noted that the power savings seemed to be under 10 percent. Christian replied, "note that we only allowed 50% of the packets transmitted to be delayed (a bit). If you were to increase the allowed delay or allowed a larger fraction of packets to be delayed, you should see larger savings."

David Lang reiterated Peter's statement that Alejandra's code would probably go into the core kernel, rather than being a standalone kernel module. As David put it, "this sort of capability is something that is very desirable; there are many people making attempts to provide this sort of event consolidation. So I think it's _very_ safe to say that if this is accepted, it will be a change to the core, not just a loadable module."

He encouraged Alejandra to follow Peter's advice about which parts of the code to clean up and how to change the interface to fit more into the core kernel. He also pointed out that the kernel folks would need to see some metrics of kernel performance with the patch included but not in use – to see how it would affect the performance of tasks that didn't need its particular feature.

So, there seems to be strong, universal support for Alejandra's and Christian's work. It seems to be one of those features that nobody thinks about until someone comes up with it, and then it seems obvious to everybody.

An Overabundance of Backporting

Luis R. Rodriguez pointed out a burgeoning problem in the form of old kernels that have received backports of new drivers. He said, "Today we backport down to the last 30 kernels, from 2.6.24 up to 3.14, and while this is manageable right now I expect the number of supported drivers and features to keep increasing (I've stopped counting). I am very aware of the reasons to support a slew of old kernels, but its nothing but our own fault for not educating enough about the importance on upgrading."

He gave a link [2] to a lengthy argument he wrote in favor of automating the back-porting process using Coccinelle [3].

For starters, Luis asked, what was the oldest kernel that really needed to be supported? Was it really 2.6.25, or could that number be bumped up to 3.0 without anyone having a problem?

Felix Fietkau from OpenWrt (an embedded Linux distribution) replied that the OpenWrt project only supported kernels going back to version 3.3, so a limit of Linux 3.0 would not affect them at all.

Arend van Spriel also responded to Luis, saying, "A lot of test teams in Broadcom WLAN are still using Fedora 15 running a 2.6.38 kernel. We are pushing them to move to Fedora 19," which, as Luis pointed out, runs a 3.13 kernel.

To Felix, Luis pointed out that OpenWrt's 3.3 kernel was actually not listed on kernel.org as being supported. Luis said, "I'm fine in carrying the stuff for those for now but ultimately it'd also be nice if we didn't even have to test the kernels in between which are not listed." He added, "This does however raise the question of how often a kernel in between a list of supported kernels gets picked up to be supported eventually. Greg, Jiri, do you happen to know what the likelihood of that can be?"

Greg Kroah-Hartman replied, "I don't know of anything ever getting picked up after I have said it would not be supported anymore." Luis asked, "How soon after a release do you mention whether or not it will be supported?" Greg replied:

"I only mention it around the time that it would normally go end-of-life.

"For example, if 3.13 were to be a release that was going to be 'long term', I would only say something around the normal time I would be no longer supporting it. Like in 2-3 weeks from now.

"So for 3.14, I'll not say anything about that until 3.16-rc1 is out, give or take a week or two."

Luis also asked, "Are you aware [of] any distribution picking an unsupported kernel for their next choice of kernel?" To which Greg said, "Sure, lots do, as they don't line up with my release cycles (I only pick 1 long term kernel to maintain each year). Look at the Ubuntu releases for examples of that. Also openSUSE and Fedora (although Fedora does rev their kernel pretty regularly) don't usually line up. The 'enterprise' distros are different, but even then, they don't always line up either (which is why Jiri is maintaining 3.12 …)."

Luis replied, "given that some distributions might choose a kernel in between and given also your great documented story behind the gains on trying to steer folks together on the ol' 2.6.32 (http://www.kroah.com/log/linux/2.6.32-stable.html) and this now being faded, I'll be bumping backports to only support >=3.0 soon, but we'll include all the series from 3.0 up to the latest. That should shrink compile/test time/support time on backports to 1/2."

But, Greg thought that 3.0 seemed like an arbitrary choice, because as he put it: "That's not supported by anyone anymore for 'new hardware'." He suggested putting the lower bound at the 3.2 kernel, which was used in Debian Stable.

Luis replied, "That's two stakeholders for 3.0 – but nothing is voiced for anything older than that. Today I will rip the older kernels into oblivion. Thanks for all the feedback!"

But, Arend van Spriel reminded Luis that he had already voiced a need for 2.6.38. Luis replied that this was at least better than 2.6.25.

The discussion veered off at that point, but it seems as though there will be new constraints on which kernels receive backports, in the interest of scalability and the appropriateness of upgrading to track new kernel releases.

Booting Up More Like Windows

For whatever reason, Windows does still seem to do certain things better than Linux – like getting testing by hardware vendors.

Steven Rostedt recently reported that one of his test systems was failing to reboot when it was supposed to. He traced the problem to a patch that changed the logic surrounding the EFI and CF9 reboot methods. EFI (Extensible Firmware Interface) is a BIOS replacement developed by Intel. CF9 refers to the 0xCF9 port that can be used to trigger a cold boot.

Aubrey Li, Matthew Garrett, and H. Peter Anvin tried to ferret out the problem. At one point, Matthew remarked, "Production hardware should never require CF9" to reboot, production hardware being systems that hardware vendors send out to customers, as opposed to pre-production hardware, that is still being tested and refined prior to actual sales.

Peter said to Matthew, "There are a lot of things that shouldn't be."

Matthew explained, "Windows doesn't hit CF9, and production hardware is always tested with Windows, so, adding CF9 may make things work in some situations but it clearly breaks them in others – we'd be better off spending our time figuring out why systems that appear to need CF9 won't otherwise work."

Linus Torvalds also replied to Matthew's remark that production hardware should never require rebooting via 0xCF9. He said:

That's total BS.

The fact is, we may be doing something wrong, but ACPI fails on a *lot* of systems. A huge swath of Dell machines in particular for some reason (laptops, desktops, _and_ now there's tablet reports).

And EFI isn't even in the running.

The keyboard controller is sadly unreliable too, although I really don't understand why. Even when a legacy keyboard controller exists (which isn't as universal as you'd think, even though the *hardware* is pretty much guaranteed to be there in the chipset, it can be disabled), there seem to be machines where the reset line isn't hooked up. Don't ask me why. Same goes for the triple fault failure case.

End result: there's a *lot* of machines that seem to want the PCI or legacy BIOS reboot. And it has absolutely _zero_ to do with "production hardware."

It would be interesting if somebody can figure out *exactly* what Windows does, because the fact that a lot of Dell machines need quirks almost certainly means that it's _us_ doing something wrong. Dell doesn't generally do lots of fancy odd things. I pretty much guarantee it's because we've done something odd that Windows doesn't do.

Anyway, claiming that the PCI method shouldn't be needed is living in some kind of dream-land. The fact is, all the other methods are equally broken, or more so.

Matthew said that using 0xCF9 might indeed work around certain issues such as ACPI failing, but he said, "the actual fix is to figure out why the firmware is wedging and fix it. Otherwise we're going to spend the rest of our lives maintaining a giant DMI list that's still going to be missing entries." Matthew pointed out that ACPI failures probably had to do with "other reboot methods" leaving certain hardware in a different state than Windows did.

Regarding Linus's question about what Windows actually did in these cases, Matthew said, "Windows hits the keyboard controller and then tries the ACPI vector. It then sleeps for a short period, then tries the keyboard controller again and the ACPI vector again. This means that systems which put cf9 in the ACPI vector tend to work because of the second write, which is obviously not what the spec envisaged but here we are. The only time it hits CF9 is when the ACPI tables tell it to."

Linus disagreed with this whole line of reasoning. In particular, he said, "There are no 'other reboot methods' playing games." He said ACPI was the default method that was tried first. He went on:

[ACPI boot] fails on tons of machines. It shouldn't be failing, but it is. We are doing something sufficiently different from Windows that it doesn't work. At some point there was some talk about having enabled VT-d at boot but not disabled it before reboot, but that's just one theory.

And given this *fact*, your denial that "PCI reboot should never be used" is counterfactual. It may be true in some theoretical "this is how the world should work" universe, but in the real world it is just BS.

Linus added that he had already offered to test patches the last time the issue had come up, because he had access to one of the affected machines, but that no one had sent him any to test.

Matthew acknowledged that the kernel tried ACPI boot before other attempts, but he clarified, saying, "We try ACPI. That will sometimes work, but infrequently. We then try the keyboard controller. That will generally work. We then try ACPI again, which will typically work because it's often now the second write to CF9. We then try the keyboard controller again, because that's what Windows does."

He went on, "But *any* of those accesses could have generated an SMI. For all we know the firmware is running huge quantities of code in response to any of those register accesses. We don't know what other hardware that code touches. We don't know what expectations it has. We don't know whether it was written by humans or written by some sort of simulated annealing mechanism that finally collapsed into a state where Windows rebooted and then shipped (or even humans behaving indistinguishably from a simulated annealing mechanism)."

Regarding 0xCF9, Matthew also said:

We know that CF9 fixes some machines. We know that it breaks some machines. We don't know how many machines it fixes or how many machines it breaks. We don't know how many machines are flipped from a working state to a broken state whenever we fiddle with the order or introduce new heuristics. We don't know how many go from broken to working. The only way we can be reasonably certain that hardware will work is to duplicate precisely what Windows does, because that's all that most vendors will ever have tested.

While we may know exactly what Windows does in terms of actually triggering the reboot, we don't know everything else it does on the shutdown path and it's difficult to instrument things like embedded controller accesses when qemu doesn't emulate one.

Linus replied that getting rid of 0xCF9 was not an option – it was simply needed to boot certain systems, including production hardware. However, he said he wanted to identify whatever it was that Windows did that Linux did not do and fix that code. He reiterated his request for patches.

The discussion petered out around there; but, it's interesting to see Linus flat-footedly saying that Windows does something better than Linux. True, there are practical reasons behind it, such as the idea that hardware vendors may focus their quality assurance efforts on Windows support more than Linux support. Still, it's nice to see that there's not a dogmatic assertion amongst Linux folks that everything Linux is superior to everything Windows. It's nice to see the usefulness or value of a thing recognized, regardless of any other details of the situation.

Zack Brown

The Linux kernel mailing list comprises the core of Linux development activities. Traffic volumes are immense, often reaching 10,000 messages in a week, and keeping up to date with the entire scope of development is a virtually impossible task for one person. One of the few brave souls to take on this task is Zack Brown.