Spinlocks Part II

SPINLOCKS, Part II

Simultaneous spinlocks

Now after you get into this stuff, you may find that you have to have more
than one spinlock at a time. This can lead to problems. Consider this:

Routine A:

lock spinlock X
maybe do some work
lock spinlock Y
do some more work
unlock spinlock Y
unlock spinlock X

And Routine B:

lock spinlock Y
maybe do some work
lock spinlock X
do some more work
unlock spinlock X
unlock spinlock Y

So CPU #1 comes along and starts routine A, while CPU #2 starts routine B. Well, they'll
never finish, because CPU #1 will be waiting for CPU #2 to release spinlock Y and CPU #2
will be waiting for CPU #1 to release spinlock X.

So we have another simple rule: When locking more than one spinlock, they must always be locked in the same order.

So for our example, both routines need to be changed so that they either both lock X then
Y or they both lock Y then X. You may be thinking, 'I might not be able to do that in ALL
cases!' Easy to fix, replace the two spinlocks with one, say Z.

Now I am terrible at checking to make sure I do everything in order. Computers are good
at that sort of thing. So rather than just using an 'int' for my spinlocks, I use a struct
as follows:

Then I have a spinlock_wait routine that checks to make sure my new spinlock's level is .gt.
the last spinlock's level that was locked by this cpu. If I try to do it out of order, the
routine panics/BSOD's on the spot.

Interrupts

Another question you may wonder, must I always inhibit hardware interrupt delivery when I
have a spinlock?

No. Only if the data being worked with can also be accessed by an interrupt routine. And
then, you only have to disable those interrupts that have access to it.

For example, I don't allow my interrupt routines to do malloc's. So the malloc routine can
run with interrupts enabled, even when it has the spinlock. Same with pagetables, disk cache
pages, and other stuff like that.

You must, however, block any pre-emptive scheduling while you have a any spinlock. Or at
least, you will make things *very* complicated if you don't.

So I ended up with several 'classes' of spinlocks:

low priority - these run with all hardware interrupt delivery enabled

interrupt level - there is one per IRQ level. IRQ 0 is the highest, IRQ 7 is the lowest when one of these is set on a cpu, it inhibits that IRQ and any lower priority interrupts

high priority - these run with all hardware interrupt delivery inhibited

So if a cpu has one of the low priority spinlocks, it is running with hardware interrupt delivery
enabled (but the pre-emptive scheduler is disabled).

There is one spinlock per interrupt level. For example, the keyboard driver uses IRQ 1. So whenever the
keyboard interrupt routine is active, I have the cpu take the IRQ 1 spinlock. Whenever the main keyboard
driver routine is accessing data that the interrupt routine would access, I take the IRQ 1 spinlock. So
it in essence acts as a 'inhibit interrupt delivery' mechanism that works across cpu's in an SMP system.

The high-priority interrupts block all interrupt delivery on the cpu. These are used for routines (like
wake_a_thread) that need to be callable from any interrupt level.

Performance

This is a big thing about spinlocks. It's easy to make your performance go to pot with
poor use of spinlocks.

Consider that you can *theoretically* use just *one* spinlock, period. Whenever you enter
kernel mode, inhibit all interrupt delivery and set your one-and-only spinlock. When you
leave kernel mode, release the spinlock then enable interrupt delivery. Great!

Well, you can imagine then, that what you end up with is having just one cpu doing anything
useful at a time in kernel mode, while the other(s) just grind away spinning. Really bad!

So what you do is try to come up with as many spinlocks as you can. Above I mentioned that
you must combine X and Y if you have a case where you must set X then Y and another case where
you must set Y then X. So now you have two limits to work with:

Too few spinlocks and you have poor performance

Too many spinlocks and you violate the ordering

Here's where the creativity comes in, working with those two limits to
maximize performance. Strive to eliminate as much spinning as possible.

How does one measure spinlock performance? Make an array of counters like this:

static int spin_counts[256];

Then increment spin_counts[spinlock_level] each time the 'test_and_set' fails. After a while
of running stuff, find which spinlock_level has big counts (compared to the others). Try to
break that smplock down into smaller ones if you can figure out how to.

Summary

They may seem sort of nebulous to begin with. But after you work with them, application
becomes very scientific. I hope this little tutorial has helped clear up some issues and
gives you confidence.