Very high CPU usage running growisofs

On 1.5.4-Preview using an up to date pkgsrc the growisofs program
is eating CPU (it's CPU bound on an AMD64 and burning very slowly ~ 1.5MB/s
instead of ~10MB/s which is usual for the media I have to hand). Pointing
truss at the running process reveals over 10k calls per second to
sys_set_tls_area - all identical viz:

sys_set_tls_area(0x0,0x28262f98,0x8) = 123 (0x7b)

Any ideas or suggestions for further investigation ?

History

> Hi,
>
> On 1.5.4-Preview using an up to date pkgsrc the growisofs program
> is eating CPU (it's CPU bound on an AMD64 and burning very slowly ~
> 1.5MB/s instead of ~10MB/s which is usual for the media I have to hand).
> Pointing truss at the running process reveals over 10k calls per second to
> sys_set_tls_area - all identical viz:
>
> sys_set_tls_area(0x0,0x28262f98,0x8) = 123 (0x7b)

I finally got fed up enough with this to try things I don't really
understand. I found that adding a usleep(10000); before the only call to
__thread_yield() - which just calls sched_yield() restored the performance
of growisofs to normal (and yes the DVDs burnt with it still work fine). In
more detail I turned this loop:

Can anyone comment on the sanity or otherwise of this change ?
Presumably this form of loop - similar to those mentioned in the
"Threadding issue" thread in July - does not cause similar problems in
FreeBSD or Linux does anyone understand what the difference is in DragonFly
that causes it to cause problems here.

On Mon, Sep 18, 2006 at 04:37:37PM +0100, Steve O'Hara-Smith wrote:
> I finally got fed up enough with this to try things I don't really
> understand. I found that adding a usleep(10000); before the only call to
> __thread_yield() - which just calls sched_yield() restored the performance
> of growisofs to normal (and yes the DVDs burnt with it still work fine). In
> more detail I turned this loop:

I'm fully aware of it, but the issue is that it changes the original
intention of the code a lot. I never got around to implement an adaptive
loop.

> Can anyone comment on the sanity or otherwise of this change ?
> Presumably this form of loop - similar to those mentioned in the
> "Threadding issue" thread in July - does not cause similar problems in
> FreeBSD or Linux does anyone understand what the difference is in DragonFly
> that causes it to cause problems here.

The userland scheduler of pthread sees that it has no other runnable
thread and continues the execution. FreeBSD or Linux go to the kernel
and the kernel scheduler most likely decides to switch to a different
process. But in principle, it should show the same behaviour.

:On Mon, Sep 18, 2006 at 04:37:37PM +0100, Steve O'Hara-Smith wrote:
:> I finally got fed up enough with this to try things I don't really
:> understand. I found that adding a usleep(10000); before the only call to
:> __thread_yield() - which just calls sched_yield() restored the performance
:> of growisofs to normal (and yes the DVDs burnt with it still work fine). In
:> more detail I turned this loop:
:
:I'm fully aware of it, but the issue is that it changes the original
:intention of the code a lot. I never got around to implement an adaptive
:loop.
:
:> Can anyone comment on the sanity or otherwise of this change ?
:> Presumably this form of loop - similar to those mentioned in the
:> "Threadding issue" thread in July - does not cause similar problems in
:> FreeBSD or Linux does anyone understand what the difference is in DragonFly
:> that causes it to cause problems here.
:
:The userland scheduler of pthread sees that it has no other runnable
:thread and continues the execution. FreeBSD or Linux go to the kernel
:and the kernel scheduler most likely decides to switch to a different
:process. But in principle, it should show the same behaviour.
:
:Joerg

Well, from my point of view, the code is simply broken. The code
is making two false assumptions. First, it is assuming that the ONLY
other process running is another thread that will immediately update
the condition of the loop when allowed to run. What if there are other
processes in the system that the kernel decides to schedule instead?
Nowhere was the yield code ever intended to allow every single other
runnable thread in the system run before returning control to the caller.
Secondly, it assumes that the yield will always transfer control to
this other process. What am I supposed to do? Depress the priority of
the calling thread in order to force a thread in a lower priority
queue to run when I call into the scheduler? The code is simply broken,
on any operating system. It just so happens that it is more broken
on DragonFly.

> On Mon, Sep 18, 2006 at 04:37:37PM +0100, Steve O'Hara-Smith wrote:
> > I finally got fed up enough with this to try things I don't
> > really understand. I found that adding a usleep(10000); before the only
> > call to __thread_yield() - which just calls sched_yield() restored the
> > performance of growisofs to normal (and yes the DVDs burnt with it
> > still work fine). In more detail I turned this loop:
>
> I'm fully aware of it, but the issue is that it changes the original
> intention of the code a lot. I never got around to implement an adaptive
> loop.

I had a strong suspicion that it wasn't an ideal solution. OTOH it
does the job of making the application work and follows an old lesson that
I recall, namely that a small sleep often improves the performance of a
wait loop.

> The userland scheduler of pthread sees that it has no other runnable
> thread and continues the execution. FreeBSD or Linux go to the kernel
> and the kernel scheduler most likely decides to switch to a different
> process. But in principle, it should show the same behaviour.

I have done some searching and I can't find any reports of similar
performance issues on other OSs so presumably the trip through the kernel
on other OSs is somehow reducing this problem to acceptable level. That
being said putting a usleep(10) instead of the usleep(10000) in only
doubled the write speed and left the CPU consumption at 75% so it seems to
need more than just a trip through kernel space to make it work.