mutexes, locks and so on...

I've been looking through the netbsd mutex, and lock implementations the
last few days, with the intent to try and improve the performance on the
VAX, which has suffered somewhat in performance with NetBSD (maybe other
platforms as well, but they might not feel the pain that much :-) ).

While doing this, I have stumbled upon a few questions, as well as some
general observations.

A few observations first.

The mutex implementation in place now, is nice in many ways, as it is
rather open to different implementations based on what the hardware can
do. However, I found that only one platform (hppa) is currently using
this. All others rely on the __HAVE_SIMPLE_MUTEXES implementation, which
utilize a CAS function. Obviously the VAX does not have a CAS, and it is
rather costly to simulate it, so I'm working on getting away from this.
(Does really all other platforms have a CAS?)

What would be nice here is if it had been possible to inline the
mutex_lock and mutex_unlock (as well as their spin equivalents)
functions, but I found out (the hard way) that this is not possible.

I also found out that mutex spin locks are more or less equivalent to
the old scheme of using splraise() and splx(). However, I noticed one
big difference. In the old days, splraise() and splx() were always
handled as a pair. You did not do:

a = splraise(..)
b = splraise(..)
splx(a)
splx(b)
which would have been very broken.

With mutex_spin, you instead store the original spl at the first
mutex_spin_enter, and later calls to mutex_spin_enter can only possibly
raise the ipl further. At mutex_spin_exit, we do not lower the spl
again, until the final mutex_spin_exit, which resets the spl to the
value as before any mutex was held.
The cause a slightly different behaviour, as the spl will continue to
possibly be very high, even though you are only holding a low ipl mutex.
While it obviously don't cause a system to fail, it can introduce delays
which might not be neccesary, and could in theory cause interrupts to be
dropped that needn't be.

Is this a conscious design? Do we not expect/enforce mutexes to be
released in the reverse order they were acquired?

Moving on to locks:

the simple lock defined in lock.h is easy enough, and I haven't found
any problems with it. The rwlock, however, is written with the explicit
assumption that there is a CAS function. It would be great if that code
were a little more like the mutex code, so that alternative
implementations could be done for architectures where you'd like to do
it in other ways. Is the reason for this not being the case just an
oversight, a lack of time and resources, or is there some underlying
reason why this could not be done?

Also, there are a few places in the code in the kernel, where atomic_cas
is called explicitly. Wouldn't it be better if such code were abstracted
out, so we didn't depend on specific cpu instructions in the MI code?

Well, just a few thoughts, and a question for all of you... I'll happily
listed to your wisdom as I try to get my alternative locking on the VAX
running...