CPU Idling Driver for Solaris 2.5.1 (i386)

Continuing my messing around with Solaris 2.5.1, I began exploring the
reasons behind the constant CPU usage from the VM. The rason, as you
may already suspect, is that when Solaris idles, it never yields the
CPU. I've written a driver to resolve this, which can be found on my
Github. I've also made a
binary package.

It's intriguing that if you have only one processor, then by default
Solaris's idle thread simply calls an empty subroutine to idle the CPU.
Perhaps it was expected that platform drivers should handle this in a
platform-specific way. Who knows? Anyhow, here's the pertinent part of
the idle thread:

If the system contains muliple processors, the idle thread calls the
set_idlecpu and unset_idlecpu callbacks provided by the PSM
driver. This also does nothing, because the pcplusmp module doesn't
provide any implementation for these callbacks.

This means that even when the kernel intends the CPU to be "idle" it's
effectively spinning on the dispatcher queue count (the number of
runnable threads in the queue.) The end result is that Solaris hogs a
great deal of CPU time on the host processor. This results in degraded
performance, especially on systems with Hyper Threading, as will the way
that Solaris implements locking, but that's a topic for another post. :P

Solving this problem for a single processor is simple. Fortunately,
Solaris exports the symbol for idle_cpu. This means that it can
be replaced within a kernel driver, and that's precisely what I did,
the difference being the pushf ; sti ; hlt ; popf sequence in my
routine, which translates to:

Save EFLAGS.

Ensure interrupts are enabled.

Halt until the next interrupt.

Restore EFLAGS.

This way, when the CPU gets an interrupt, it will return back to
the idle thread after servicing the interrupt, but will halt when it
should be idle. Conveniently, Solaris uses the processor's local APIC
to drive the system clock (or the PIT if the system lacks a local APIC),
so you already know beforehand that the CPU will wake at some point.
Otherwise, the kernel would just appear to hang when idling the CPU for
the first time.

Solving this for multiple processors can be a bit tricky, but we'll assume
that if you have multiple processors, that each has its own local APIC.
For one, we can't just use poke_cpu() because that causes the APIC
to raise the same interrupt as the system clock, and since the clock
ISR calls the dispatcher, doing so results in a race condition between a
processor holding the dispatcher lock, and another processor trying to
switch to a runnable thread (which also wants to hold the dispatcher lock.)

Besides, as far as I can tell, the dispatcher will never call
poke_cpu(), and the unset_idlecpu callback appears to be called
only from the idle thread. If true, that means that the unset_idlecpu()
callback is useless in the first place, or that somebody forgot to use it
in the dispatcher code. :P

The one saving grace of this whole thing is that pcplusmp only uses the
APIC timer on the boot processor to feed the clock ISR. The APICs on all
the other CPUs have timers that are ripe for the picking to be used to
ensure that the other CPUs wake up every now and again.

So what the driver does in this case is:

Save EFLAGS and disable interrupts.

If the local APIC timer is not set in periodic mode, we know that
this CPU is not the boot processor, and we'll setup a one-shot timer.

Re-enable interrupts and halt, just as we would for a single-cpu system.

Mask the timer we just set up, so it doesn't fire again.

Restore EFLAGS.

Pretty simple, I'd say, and it works nicely with my VM to boot. Speaking
of boot, my guest VM boots in one third the time with this driver
installed, which only serves to further illustrate the performance
penalty on processors with Hyper-Threading like my Core i7.