Convert calculations of proportion of writeback each bdi does to new flexible proportion code. That allows us to use aging period of fixed wallclock time which gives better proportion estimates given the hugely varying throughput of different devices.

So the problem with using a deferred timer is that it 'ignores' idle time. So if a very busy period is followed by a real quiet period you'd expect all the proportions to have aged to 0, but they won't have.

One way to solve that is to track a jiffies count of the last time the timer triggered and compute the missed periods from that and extend fprop_new_period() to deal with period increments of more than 1.

On Fri 18-05-12 00:04:33, Peter Zijlstra wrote: > On Tue, 2012-05-15 at 17:43 +0200, Jan Kara wrote: > > +static struct timer_list writeout_period_timer = > > + TIMER_DEFERRED_INITIALIZER(writeout_period, 0, 0); > > So the problem with using a deferred timer is that it 'ignores' idle > time. So if a very busy period is followed by a real quiet period you'd > expect all the proportions to have aged to 0, but they won't have. Ah, I see. Thanks for warning me.

> One way to solve that is to track a jiffies count of the last time the > timer triggered and compute the missed periods from that and extend > fprop_new_period() to deal with period increments of more than 1. Yeah, that should be easy enough so I'll try it that way since I presume it's nicer to power usage to use deferred timers if it's reasonably possible.

On Fri, 2012-05-18 at 16:24 +0200, Jan Kara wrote: > Yeah, that should be easy enough so I'll try it that way since I presume > it's nicer to power usage to use deferred timers if it's reasonably > possible.

Btw, your current scheme also drifts. Since you do jiffes + 3*HZ you period might actually be longer if the timer got delayed.

Convert calculations of proportion of writeback each bdi does to new flexible proportion code. That allows us to use aging period of fixed wallclock time which gives better proportion estimates given the hugely varying throughput of different devices.

On Thu, 2012-05-24 at 18:59 +0200, Jan Kara wrote: > Convert calculations of proportion of writeback each bdi does to new flexible > proportion code. That allows us to use aging period of fixed wallclock time > which gives better proportion estimates given the hugely varying throughput of > different devices. > > Signed-off-by: Jan Kara <jack [at] suse> > ---

On Mon 28-05-12 17:49:45, Sasha Levin wrote: > Hi Jan, > > On Thu, 2012-05-24 at 18:59 +0200, Jan Kara wrote: > > Convert calculations of proportion of writeback each bdi does to new flexible > > proportion code. That allows us to use aging period of fixed wallclock time > > which gives better proportion estimates given the hugely varying throughput of > > different devices. > > > > Signed-off-by: Jan Kara <jack [at] suse> > > --- > > This patch appears to be causing lockdep warnings over here: Actually, this is not caused directly by my patch. Just my patch makes the problem more likely because I use smaller counter batch in __fprop_inc_percpu_max() than is used in original __prop_inc_percpu_max(), so the probability that percpu counter takes spinlock (which is what triggers the warning) is higher.

The only safe solution seems to be to create a variant of percpu counters that can be used from an interrupt. Or do you have other idea Peter?