sched_yield() options - Kernel

This is a discussion on sched_yield() options - Kernel ; I've seen a lot of discussion about how sched_yield is abused by
applications. I'm working with a developer on one application that looks
like it's falling into this same trap (mutexes between threads and using
sched_yield (or more precisely pthread_yield()) ...

sched_yield() options

I've seen a lot of discussion about how sched_yield is abused by
applications. I'm working with a developer on one application that looks
like it's falling into this same trap (mutexes between threads and using
sched_yield (or more precisely pthread_yield()) to let other threads get
the lock)

however I've been having a hard time tracking down the appropriate
discussions to forward on to the developer (both for why what he's doing
is bad, and for what he should be doing instead)

could someone point out appropriate mailing list threads, or other
documentation for this?

Re: sched_yield() options

Em Mon, Oct 20, 2008 at 03:34:07PM -0700, david@lang.hm escreveu:
> I've seen a lot of discussion about how sched_yield is abused by
> applications. I'm working with a developer on one application that looks
> like it's falling into this same trap (mutexes between threads and using
> sched_yield (or more precisely pthread_yield()) to let other threads get
> the lock)
>
> however I've been having a hard time tracking down the appropriate
> discussions to forward on to the developer (both for why what he's doing
> is bad, and for what he should be doing instead)
>
> could someone point out appropriate mailing list threads, or other
> documentation for this?

Re: sched_yield() options

david@lang.hm wrote:
> I've seen a lot of discussion about how sched_yield is abused by
> applications. I'm working with a developer on one application that looks
> like it's falling into this same trap (mutexes between threads and using
> sched_yield (or more precisely pthread_yield()) to let other threads get
> the lock)
>
> however I've been having a hard time tracking down the appropriate
> discussions to forward on to the developer (both for why what he's doing
> is bad, and for what he should be doing instead)
>
> could someone point out appropriate mailing list threads, or other
> documentation for this?

The main reason why it's bad is that the behaviour of yield() for
SCHED_OTHER tasks is not strongly defined in the spec. Depending on
OS/version you may yield to all other SCHED_OTHER tasks, only one task,
or anywhere in between.

Also, yield() gives the kernel no information on why it's yielding and
to whom, so it is impossible for the kernel to make the optimal decision
in all cases.

For more information, try searching the linux.kernel google groups
archive. There's a thread called "yield API" with some information.
See also " CFS: some bad numbers with Java/database threading [FIXED]".

Re: sched_yield() options

On Mon, 20 Oct 2008, Arnaldo Carvalho de Melo wrote:
> Em Mon, Oct 20, 2008 at 03:34:07PM -0700, david@lang.hm escreveu:
>> I've seen a lot of discussion about how sched_yield is abused by
>> applications. I'm working with a developer on one application that looks
>> like it's falling into this same trap (mutexes between threads and using
>> sched_yield (or more precisely pthread_yield()) to let other threads get
>> the lock)
>>
>> however I've been having a hard time tracking down the appropriate
>> discussions to forward on to the developer (both for why what he's doing
>> is bad, and for what he should be doing instead)
>>
>> could someone point out appropriate mailing list threads, or other
>> documentation for this?
>
> http://kerneltrap.org/Linux/Using_sc...eld_Improperly

that helps, but the case that seems closest to what I'm looking at is
> > > One example I know of is a defragmenter for a multi-threaded memory
> > > allocator, and it has to lock whole pools. When it releases these
> > > locks, it calls yield before re-acquiring them to go back to work.
> > > The idea is to "go to the back of the line" if any threads are
> > > blocking on those mutexes.
> > at a quick glance this seems broken too - but if you show the specific
> > code i might be able to point out the breakage in detail. (One
> > underlying problem here appears to be fairness: a quick unlock/lock
> > sequence may starve out other threads. yield wont solve that
> > fundamental problem either, and it will introduce random latencies
> > into apps using this memory allocator.)
> You are assuming that random latencies are necessarily bad. Random
> latencies may be significantly better than predictable high latency.

in the case I'm looking at there are two (or more) threads running with
one message queue in the center.

'input threads' are grabbing the lock to add messages to the queue

'output threads' are grabbing the lock to remove messages from the queue

the programmer is doing a pthread_yield() after each message is processed
in an attempt to help fairness (he initially added it in when he started
seeing starvation on single-core systems)

what should he be doing instead?

the link above talks about other cases more, but really doesn't say what
the right thing to do is for this case.

Re: sched_yield() options

On 10/20/2008 06:08 PM, david@lang.hm wrote:
> in the case I'm looking at there are two (or more) threads running with
> one message queue in the center.
>
> 'input threads' are grabbing the lock to add messages to the queue
>
> 'output threads' are grabbing the lock to remove messages from the queue
>
> the programmer is doing a pthread_yield() after each message is
> processed in an attempt to help fairness (he initially added it in when
> he started seeing starvation on single-core systems)
>
> what should he be doing instead?

If you're seeing starvation, to me that's a good indicator that the
granularity of queue items are too small... probably there'd be an overall
benefit of grabbing more things at once from the queue.

To make a silly metaphor out of it - imagine you've got a bunch of office
interns (threads) whose jobs are to fill out PQ9-12b forms. An intern has
to wait in line before they can get to the desk where the secretary is
handing out the forms to be filled out, one by one. An intern can fill out
the form just as fast as the secretary hands it to them; if they do so,
then one intern ends up doing all the work while the others just wait in line.

But in order to make everyone equally busy (by injecting sched_yield())
you're making the interns take the paper, walk back to their desk, fill it
out, and then get back in line, which takes somewhat more time overall,
despite making the interns look much busier. A better solution would be to
give each intern a big stack of forms so that they spend a greater
proportion of time filling them out rather than waiting in line; perhaps if
each intern is kept busy enough, the line will always be empty, which would
be the best situation of all.

Re: sched_yield() options

On Mon, 20 Oct 2008, David M. Lloyd wrote:
> On 10/20/2008 06:08 PM, david@lang.hm wrote:
>> in the case I'm looking at there are two (or more) threads running with one
>> message queue in the center.
>>
>> 'input threads' are grabbing the lock to add messages to the queue
>>
>> 'output threads' are grabbing the lock to remove messages from the queue
>>
>> the programmer is doing a pthread_yield() after each message is processed
>> in an attempt to help fairness (he initially added it in when he started
>> seeing starvation on single-core systems)
>>
>> what should he be doing instead?
>
> If you're seeing starvation, to me that's a good indicator that the
> granularity of queue items are too small... probably there'd be an overall
> benefit of grabbing more things at once from the queue.
>
> To make a silly metaphor out of it - imagine you've got a bunch of office
> interns (threads) whose jobs are to fill out PQ9-12b forms. An intern has to
> wait in line before they can get to the desk where the secretary is handing
> out the forms to be filled out, one by one. An intern can fill out the form
> just as fast as the secretary hands it to them; if they do so, then one
> intern ends up doing all the work while the others just wait in line.
>
> But in order to make everyone equally busy (by injecting sched_yield())
> you're making the interns take the paper, walk back to their desk, fill it
> out, and then get back in line, which takes somewhat more time overall,
> despite making the interns look much busier. A better solution would be to
> give each intern a big stack of forms so that they spend a greater proportion
> of time filling them out rather than waiting in line; perhaps if each intern
> is kept busy enough, the line will always be empty, which would be the best
> situation of all.

I've suggested that, but the changes nessasary to support that mode of
operation are very invasive, and so not an option in the near/medium term.

in the meantime is there something better than sched_yield() that should
be happening

clarifying the situation a bit

in this case you have two sets of interns, and a secretary maintaining a
stack of pending messages

one set of interns are listening to the phone and scribbling notes, then
delivering them to the secretary

the other set of interns are picking up the papers from the secretary
and delivering them to their destination.

each set of interns has it's own line, but they are both very pushy and
instead of moving back and forth between the two lines reasonably fairly
the secretary spends too much time servicing one line, and the other set
of interns end up all queued up and unable to do anything, so the total
number of messages plummets.

the sched_yield is an attempt to have the secretary pause once in a while
and check to see if the other line has someone waiting.

from looking at the software running, it doesn't seem to work very well.
I've also suggested investigating lockless algorithms for the queue, but
that is also a lot of high-risk (but high-reward) work. what else can be
done to make a mutex more fair?

Re: sched_yield() options

On 10/20/2008 07:44 PM, david@lang.hm wrote:
> On Mon, 20 Oct 2008, David M. Lloyd wrote:
>
>> On 10/20/2008 06:08 PM, david@lang.hm wrote:
>>> in the case I'm looking at there are two (or more) threads running
>>> with one message queue in the center.
>>>
>>> 'input threads' are grabbing the lock to add messages to the queue
>>>
>>> 'output threads' are grabbing the lock to remove messages from the queue
>>>
>>> the programmer is doing a pthread_yield() after each message is
>>> processed in an attempt to help fairness (he initially added it in
>>> when he started seeing starvation on single-core systems)
>>>
>>> what should he be doing instead?
>>
>> If you're seeing starvation, to me that's a good indicator that the
>> granularity of queue items are too small... probably there'd be an
>> overall benefit of grabbing more things at once from the queue.
>>
> I've suggested that, but the changes nessasary to support that mode of
> operation are very invasive, and so not an option in the near/medium term.
>
> in the meantime is there something better than sched_yield() that should
> be happening
>
> the sched_yield is an attempt to have the secretary pause once in a
> while and check to see if the other line has someone waiting.
>
> from looking at the software running, it doesn't seem to work very well.
> I've also suggested investigating lockless algorithms for the queue, but
> that is also a lot of high-risk (but high-reward) work. what else can be
> done to make a mutex more fair?

No, you're not going to make much progress trying to fix the wrong problem
in my opinion. A lockless algorithm *might* work, but I suspect that since
your computation units are apparently so small, you'll still spend a lot of
time doing compare-and-swap and barriers and that sort of thing anyway, and
it will still be the same sort of situation. I think your design is
basically broken. You're frankly probably better off just ditching the
queue and doing the work directly in the queuing threads. At least then
you won't have contention.

RE: sched_yield() options

> in the case I'm looking at there are two (or more) threads running with
> one message queue in the center.
>
> 'input threads' are grabbing the lock to add messages to the queue
>
> 'output threads' are grabbing the lock to remove messages from the queue
>
> the programmer is doing a pthread_yield() after each message is processed
> in an attempt to help fairness (he initially added it in when he started
> seeing starvation on single-core systems)

Yuck. That forces the worst possible scenario, which is nearly strict
alternation.

There is no reason he should ever be seeing starvation. Each thread should
either:

1) Use out its timeslice. In which case, a thread can at most be starved for
the number of threads multiplied by the size of the timeslice. If this is
too long, then you have too many threads or your timeslice is too big. You
can't fix it with ugly hacks.

2) Get blocked because it needs another thread to help it make forward
progress. In this case, there should be no starvation, since each thread
can't continue until other threads continue.
> what should he be doing instead?

Simply set a limit on the maximum number of messages that can go in the
queue when output threads are CPU limited. Output threads can't starve input
threads because they have to block if the queue is empty (assuming you never
let the queue get too ridiculously full). Input threads can't starve output
threads if there's a reasonable limit on the queue size.
> the link above talks about other cases more, but really doesn't say what
> the right thing to do is for this case.
>
> David Lang

Re: sched_yield() options

On Tue October 21 2008 01:08:39 david@lang.hm wrote:
> On Mon, 20 Oct 2008, Arnaldo Carvalho de Melo wrote:
> > Em Mon, Oct 20, 2008 at 03:34:07PM -0700, david@lang.hm escreveu:
> >> I've seen a lot of discussion about how sched_yield is abused by
> >> applications. I'm working with a developer on one application that looks
> >> like it's falling into this same trap (mutexes between threads and using
> >> sched_yield (or more precisely pthread_yield()) to let other threads get
> >> the lock)
> >>
> >> however I've been having a hard time tracking down the appropriate
> >> discussions to forward on to the developer (both for why what he's doing
> >> is bad, and for what he should be doing instead)
> >>
> >> could someone point out appropriate mailing list threads, or other
> >> documentation for this?
> >
> > http://kerneltrap.org/Linux/Using_sc...eld_Improperly
>
> that helps, but the case that seems closest to what I'm looking at is
>
> > > > One example I know of is a defragmenter for a multi-threaded memory
> > > > allocator, and it has to lock whole pools. When it releases these
> > > > locks, it calls yield before re-acquiring them to go back to work.
> > > > The idea is to "go to the back of the line" if any threads are
> > > > blocking on those mutexes.
> > >
> > > at a quick glance this seems broken too - but if you show the specific
> > > code i might be able to point out the breakage in detail. (One
> > > underlying problem here appears to be fairness: a quick unlock/lock
> > > sequence may starve out other threads. yield wont solve that
> > > fundamental problem either, and it will introduce random latencies
> > > into apps using this memory allocator.)
> >
> > You are assuming that random latencies are necessarily bad. Random
> > latencies may be significantly better than predictable high latency.
>
> in the case I'm looking at there are two (or more) threads running with
> one message queue in the center.
>
> 'input threads' are grabbing the lock to add messages to the queue
>
> 'output threads' are grabbing the lock to remove messages from the queue
>
> the programmer is doing a pthread_yield() after each message is processed
> in an attempt to help fairness (he initially added it in when he started
> seeing starvation on single-core systems)
>
> what should he be doing instead?

This sounds like standard producer/consumer problem. Why don't you simply use
counting semaphore (increased by producers/input and decreased by
consumers/output threads)?
Using pthread_yield sounds really broken (unpredictable results depending on
system implementation of yield) here.
>
> the link above talks about other cases more, but really doesn't say what
> the right thing to do is for this case.
>
> David Lang
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/