On Tue, Jul 28, 2009 at 04:03:08PM +0200, Andreas Mohr wrote:> Still, an average of +8.16% during 5 test runs each should be quite some incentive,> and once there's a proper "idle latency skipping during expected I/O replies"> even with idle/wakeup code path reinstated we should hopefully be able to keep> some 5% improvement in disk access.

Then I changed ./drivers/cpuidle/governors/menu.c(make sure you're using the menu governor!) to use

extern bool io_reply_expected(void);

and updated

if (io_reply_expected()) data->expected_us = 10; else { /* determine the expected residency time */ data->expected_us = (u32) ktime_to_ns(tick_nohz_get_sleep_length()) / 1000; }Rebuilt, rebootloadered ;), rebooted, and then booting and disk operation_seemed_ to be snappier (I'm damn sure the hdd seek noiseis a bit higher-pitched ;).And it's exactly seeks which should be shorter-intervalled now,since the system triggers a hdd operation and then is forced to wait (idle)until the seeking is done.

bonnie test results (of patched kernel vs. kernel with set_io_reply_expected() muted)seem to support this, but then a "time make bzImage" (of newly rebooted box each)showed inconsistent results again and a much higher sample rate (with reboots each)would be needed to really confirm this.

I'd expect improvements to be in the 3% to 4% range, at most, but still,compared to the yield of other kernel patches this ain't nothing.

Now the question becomes whether one should implement such an improvement and especially, how.Perhaps the io reply decision making should be folded into the tick_nohz_get_sleep_length()function (or rather create a higher-level "expected sleep length" function which consults bothtick_nohz_get_sleep_length() and io reply mechanism).And another important detail is that my current hack completely ignores per-cpu operationand thus causes suboptimal power savings of _all_ cpus,not just the one waiting for the I/O reply (i.e., we should properly take into accountcpu affinity settings of the reply interrupt).And of course it would probably be best to create a mechanism which stores a record of averageresponsiveness delays of various block devices and then derive a maximumidle wakeup latency value from this to request.

Does anyone else have thoughts on this or benchmark numbers which would support this?