Commit Message

Provide an exponential back-off after initial busy-loop
to prvent extremly long busy-looping respectively arming
of many highresolution timers.
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
---
During a review of hieghrestimer users I have been looking at the users
of read*_poll_timemout that have a tight loop declared (i.e. passing in
sleep_us as 0) most of them have quite large timeouts defined. At least
in some cases it is documented to be a tight loop intentionally as the
expectation is that it will normally succeed e.g. commit ae02ab00aa3c
("mtd: nand: jz4780: driver for NAND devices on JZ4780 SoCs")
<snip drivers/mtd/nand/jz4780_bch.c>
/*
* While we could use interrupts here and sleep until the operation
* completes, the controller works fairly quickly (usually a few
* microseconds) and so the overhead of sleeping until we get an
* interrupt quite noticeably decreases performance.
*/
ret = readl_poll_timeout(bch->base + BCH_BHINT, reg,
(reg & irq) == irq, 0, BCH_TIMEOUT_US);
<snip>
While this seems justified here to have a tight loop the problem is that
BCH_TIMEOUT_US is 100000 - so in the case of failure this would busyloop
for 100ms and in some cases tightloops of 5 seconds can be found (e.g.
drivers/mtd/spi-nor/intel-spi.c:intel_spi_wait_hw_busy())
There currently are (if my coccinelle script is correct) 7 files with
read*_poll_timeout where sleep_us is set to 0 (busy-loop)
timeout_us value file
INTEL_SPI_TIMEOUT * 1000 5000000 intel-spi.c
FMC_WAIT_TIMEOUT 1000000 hisi-sfc.c
BCH_TIMEOUT_US 100000 z4780_bch.c
numeric const. 10000 clkgen-pll.c
numeric const. 10000 clk-stm32f4.c
numeric const. 1000 tango_nand.c
numeric const. 1000 mxsfb_crtc.c
numeric const. 100 clk.c
One case in /drivers/mmc/host/sdhci-cadence.c has a timeout_us of 1 and
sleep_us of 0 which might actually be a bug as it means that the poll loop
would do a single read OP in many cases (this is non-atomic context) so it
would have to be robust for a single read, thus the poll_timeout might not
be suitable - not sure though - but thats a different problem.
If we now look at those cases where sleep_us is using a min < the recommended
10 us: that can be found in 68 cases (as of 4.10-rc6) and that is probably
also not that sensible given that the timeout_us are in the same ranges as
the cases above (minimum is 20 max is 5000000). So in the error case again
that would result in thousands of high-resolution timers being initialized
- presumably that is not that reasonable.
The problem thus is not the success case but the failure case. A possible
mitigation of this would be to have something like an exponential back-off
built into the non-atomic poll_timeout function:
#define readx_poll_timeout(op, addr, val, cond, sleep_us, timeout_us) \
({ \
ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
might_sleep(); \
unsigned long min = 0; \
unsigned long max = sleep_us | 1; \
for (;;) { \
(val) = op(addr); \
if (cond) \
break; \
if (timeout_us && ktime_compare(ktime_get(), timeout) > 0) { \
(val) = op(addr); \
break; \
} \
if (min >= 10) \
usleep_range(min, max); \
max <<= 1; \
min = max >> 2; \
} \
(cond) ? 0 : -ETIMEDOUT; \
})
which results in min,max values for the initial iterations of:
min max
0 1
0 2
1 4
2 8
4 16
8 32
16 64
32 128
...
for sleep_us being passed in as 0 and would thus effectively be a
busy-loop for the first 6 iterations and then switch to sleeping.
The above code proposal would busy-loop at most 6 times and the
busy-wait would reduce if sleep_us > 0 is passed in:
sleep_us busy-loops
0-1 6
2-3 4
4-9 3
10-19 2
20-39 1
40+ 0
which should be a resonable behavior for the current use cases and eliminate
side effects of very long busy-wait-loops in the failure cases as well.
Pleas let me know if this sounds reasonable or what might have been
overlooked here. The only downside located is that some of the constant
folding that would be possible now would no longer be doable (e.g.
(sleep_us >> 2) + 1 in case of passing in a compile-time constant.
Also I guess readx_poll_timeout() could (should?) be converted to an
inline function.
I did compile test this but actually the key issue is to get feedback on the
concept rather than if the patch is usable in the below form.
Patch was compile tested with: x86_64_defconfig
Patch is against 4.10-rc7 (localversion-next is next-20170213)
include/linux/iopoll.h | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

Comments

Hi Nicholas,
2017-02-14 1:52 GMT+09:00 Nicholas Mc Guire <der.herr@hofr.at>:
>> One case in /drivers/mmc/host/sdhci-cadence.c has a timeout_us of 1 and> sleep_us of 0 which might actually be a bug as it means that the poll loop> would do a single read OP in many cases (this is non-atomic context) so it> would have to be robust for a single read, thus the poll_timeout might not> be suitable - not sure though - but thats a different problem.
Sorry, I could not understand why you thought sdhci-cadence.c seemed a bug.
The SDHCI_CDNS_HRS06_TUNE_UP is auto-cleared by the hardware.
This will normally be done really soon (within 1us),
but it may not be cleared if the hardware is in trouble.
It is the intention of the code:
readl_poll_timeout(reg, tmp, !(tmp & SDHCI_CDNS_HRS06_TUNE_UP),
0, 1);
> which results in min,max values for the initial iterations of:> min max> 0 1> 0 2> 1 4> 2 8> 4 16> 8 32> 16 64> 32 128> ...
I notice you were sending this to me, but
please notice I am not responsible for this file
(include/linux/iopoll.h) in any way.
Please take my comments for grain of salt:
With this patch, the sleep range is doubled in each iteration.
Let's say this routine is called with delay_us=1, timeout_us=50,
then it slept 16 us, then 32 us.
If it sleeps 64 us in the next iteration,
it ends up with sleeping 112 us (=16 + 32 + 64) in total
where we know waiting 50 us is enough.
So, the sleep range granularity may get bigger than
users' intention.
Probably, waiting too long is not a problem in most cases.
If so, what is the meaning of the argument "sleep_us" after all?

On Thu, Feb 23, 2017 at 05:29:03PM +0900, Masahiro Yamada wrote:
> Hi Nicholas,> > > 2017-02-14 1:52 GMT+09:00 Nicholas Mc Guire <der.herr@hofr.at>:> > >> > One case in /drivers/mmc/host/sdhci-cadence.c has a timeout_us of 1 and> > sleep_us of 0 which might actually be a bug as it means that the poll loop> > would do a single read OP in many cases (this is non-atomic context) so it> > would have to be robust for a single read, thus the poll_timeout might not> > be suitable - not sure though - but thats a different problem.> > Sorry, I could not understand why you thought sdhci-cadence.c seemed a bug.> > The SDHCI_CDNS_HRS06_TUNE_UP is auto-cleared by the hardware.> This will normally be done really soon (within 1us),> but it may not be cleared if the hardware is in trouble.> It is the intention of the code:> > readl_poll_timeout(reg, tmp, !(tmp & SDHCI_CDNS_HRS06_TUNE_UP),> 0, 1);>
Im not sure my selfe but as this would need to be robust for a single read try
to be reliable I did not see the point in using a pool loop of 1 microsecond
on x86 I tried this and timeout_us of 1 will resulted in a single
read in more than 10% on the idle system and in almost 100% doing
a single read on loaded systems. So if the poll loop was introduced with
the assumption that just reading once imediately could miss the ritical
event with resonable probability then timeout_us==1 does not change that.
> > > > > which results in min,max values for the initial iterations of:> > min max> > 0 1> > 0 2> > 1 4> > 2 8> > 4 16> > 8 32> > 16 64> > 32 128> > ...> > > I notice you were sending this to me, but> please notice I am not responsible for this file> (include/linux/iopoll.h) in any way.
I included you as your it showed up with:
scripts/get_maintainer.pl -f include/linux/iopoll.h
> > Please take my comments for grain of salt:> > With this patch, the sleep range is doubled in each iteration.> Let's say this routine is called with delay_us=1, timeout_us=50,> then it slept 16 us, then 32 us.> If it sleeps 64 us in the next iteration,> it ends up with sleeping 112 us (=16 + 32 + 64) in total> where we know waiting 50 us is enough.
that is right - but the assumption of poll_timeout is that the
timeout case would actually be rare and if you look at the actual
timings that one has of poll_timout routines (that are based on
usleep_range()) on loaded systems they overrun by 100s of microseconds
very frequnently.
But you are right that in the 50us case this is not ideal - the
focus I had was on the very long timeouts that in some cases were set
to up to 5 seconds with tight-loops (or close to timght-loops)
> So, the sleep range granularity may get bigger than> users' intention.> > Probably, waiting too long is not a problem in most cases.> If so, what is the meaning of the argument "sleep_us" after all?
The problem really is that on idle ssytems the sleep_us argument
works ok and the overruns are not that dramatic - but on loaded
systems the sleep_us routlinely overruns significantly
usleep_range() 5000 samples - idle system
100,200 200,200 190,200
Min. :188481 Min. :201917 Min. :197793
1st Qu.:207062 1st Qu.:207057 1st Qu.:207051
Median :207139 Median :207133 Median :207133
Mean :207254 Mean :207233 Mean :207244
3rd Qu.:207341 erd Qu.:207262 3rd Qu.:207610
Max. :225340 Max. :214222 Max. :214885
CONFIG_PREEMPT_VOLUNTARY=y
usleep_range() 5000 samples - load ~ 8
100,200 190,200 200,200
Min. : 107812 Min. : 203307 Min. : 203432
1st Qu.: 558221 1st Qu.: 557490 1st Qu.: 510356
Median :1123425 Median : 1121939 Median : 1123316
Mean :1103718 Mean : 1100965 Mean : 1100542
3rd Qu.:1541986 3rd Qu.: 1531478 3rd Qu.: 1517414
Max. :8979183 Max. :13765789 Max. :12476136
CONFIG_PREEMPT=y
usleep_range() 5000 samples - load ~ 8
100,200 190,200 200,200
Min. : 115321 Min. : 203963 Min. : 203864
1st Qu.: 510296 1st Qu.: 451479 1st Qu.: 548131
Median : 1148660 Median : 1062576 Median : 1145228
Mean : 1193449 Mean : 1079379 Mean : 1154728
3rd Qu.: 1601552 3rd Qu.: 1378622 3rd Qu.: 1570742
Max. :12936192 Max. :12346313 Max. :13858732
so really small sleep_us make no sense I think - setting it to 0
tight-loop might be justified for small timeout_us (say 10us)
but long busy-wait loops are bad (and probably technically not
that sensible ither). If a busy-wait loop does not get the data/state
it wants within a few loops then busy-waiting is nonsentical and
that is why the intent of the exponential back-off solution does
a few tight-loops and then switches to sleeping delays.
It might be necessary to set it to a more fine grain stepping than
the brute-force *2 - but the principle I think would be better
than what is being done now.
and thanks for your comments !
thx!
hofrat