Re: lkwt in DragonFly

:> Best case is as I outlined above... no critical section or bus locked
:> operation is required at all, just an interlock against an IPI on
:> the current cpu and that can be done by setting a field in
:> the globaldata structure.
:
: That is of course assuming that your token protects your data
: from other threads on other CPUs, and not local interrupts as well?
: (I'm assuming you'll be masking out interrupts for this).
The 'other cpus' case is handled by the fact that the current cpu must
give away a token that it owns before another cpu can mess with the token.
So in the case where tok->cpu == mycpu, the interlock is sufficient.
Interrupt masking (spl*() code) is an issue I have to think about some
more. The 4.x code is already fairly well protected with SPLs since
4.x interrupts run in the context of the interrupted thread, so
preemptive switching in DragonFly to service an interrupt has inherited
that feature and does not generally have to worry about recursive lock
attempts. But, still, it's a weakness in the design that should be
addressed lest future programmers make mistaken assumptions.
: It should be theoretically possible to disable CPU migration with a
: simple interlock even on FreeBSD 5.x. That, along with masking out
: interrupts, could be used to protect pcpu data without the use of
: bus-locked instructions.
:
: However - and this may not be the case for DragonFly - I have
: previously noticed that when I ripped out the PCPU mutexes
: protecting my pcpu uma_keg structures (just basic per-cpu
: structures), thereby replacing the xchgs on the pcpu mutex with
: critical sections, performance in general on a UP machine decreased
: by about 8%. I can only assume at this point that the pessimisation
: is due to the cost of interrupt unpending when compared to the nasty
: scheduling requirements for an xchg (a large and mostly empty
: pipeline).
:...
:Bosko Milekic * bmilekic@xxxxxxxxxxxxxxxx * bmilekic@xxxxxxxxxxx
:TECHNOkRATIS Consulting Services * http://www.technokratis.com/
It would only add cost to unpending if an interrupt occured
while the critical section was active. This implies that the
critical section is being held for too long a period of time. I
would expect this since the UMA code rightfully assumes that it can
hold the pcpu uma mutex for as long as it wants without penalty.
The DragonFly slab allocator does not need to use a mutex or token
at all and only uses a critical section for very, very short periods
of time within the code. I have suggested that Jeff recode UMA to
remove the UMA per-cpu mutex on several occassions.
You should also check the critical section API overhead in FreeBSD-5.
If it is a subroutine call and if it actually disables interrupts
physically, the overhead is going to be horrendous... probably similar
to the overhead of a mutex (sti and cli are very expensive instructions).
In DFly the critical section code is only two or three 'normal' inlined
instructions and does not physically disable interrupts.
-Matt
Matthew Dillon
<dillon@xxxxxxxxxxxxx>