All sorts of informations is now stored directly in
the mbuf header instead of a seperate mbuf tag. This
brings in a 100% performance increase in comparison
to OpenBSD 4.1. For DragonFly this basically means
this is the same performance as in 2.6, but we are
equal again with OpenBSD's pf data structures.

* Remove the forth interpreter from the build. The last straw was when
I tried to fix the module path and 6 hours later still couldn't get
it right.

* Write a C based menu system and loader, replacing what the forth code
used to do. This is about 85% complete (tftp and boot chaining issues
are not handled). This took exactly one day to do, by the way.

* Reformulate installkernel to now create a directory /boot/kernel.blah
and place the kernel and modules inside that directory.

* The low level cninit() code dives into dev/video and does a ton
of tty_token acquisitions and releases, and even after the bug
fixes there is still something weird going on in there.

This is a workaround to wrap a master tty_token around the
cninit() code which prevents an early-boot crash if
lwkt_gettoken()'s td_mpcount optimization is turned off.
(the optimization masks the problem).

* Also assert that the mpcount remains correct after the mess
is done initializing, if it is wrong the SMP/AP boot will
blow up on us.

* Separate out td_mpcount into td_xpcount and td_mpcount. td_xpcount
is an inherited mpcount. A preempting thread inherits the mpcount
on the thread being preempted until it switches out to guarantee
that the mplock remains atomic through the preemption (as expected
by the poor thread that got preempted).

* Fix a serious but hard to reproduce bug in lwkt_gettoken(). This
function marks the token reference as being MPSAFE if td_mpcount
is non-zero even when the token is not a MPSAFE token.

However, until this patch td_mpcount also included inherited mpcounts
when one thread preempts another and the inherited mpcounts could
go away if the thread blocks or switches, leaving the token unprotected.

* Fix a less serious bug where a new token reference was being populated
prior to td_toks_stop being incremented, and where an existing token
reference was being depopulated after td_toks_stop is decremented.
Nothing can race us but switch around the index increment/decrement
to protect the slot being operated upon.

* Add a ton of assertions in the interrupt, trap, and syscall paths
To assert that the mplock, number of tokens, and critcount remain
unchanged across driver and other calls.

* Slightly rework the initial exponential backoff and test. Do the
atomic swap after the exponential backoff instead of before so
we do not add a superfluous backoff after actually acquiring
the spinlock.

* This isn't expected to have much of an effect on performance and I want
to get rid of shared spinlocks. If it becomes an issue for descriptor
lookups we could actually move to a spinless lookup/hold model using
defered frees.

MPSAFE TTY - Refactor the keyboard switch code to make all drivers MPSAFE

* Add a per-kbd lock.

* Replace the keyboard kbd_*() macros with function wrappers which acquire
and release the per-kbd lock on behalf of each keyboard driver. The
wrapper code also understands polled mode and will not acquire/release
locks in polled mode (aka debugger).

* Have crit_exit() call an actual procedure instead of inlining it.
This doesn't seem to effect performance any and it reduces the
size of the kernel noticeably.

Modern cpus heavily optimize call/return paths these days and there
might even be advantages to the smaller code and branch-cache footprint.
The conditionals inside crit_exit() are nearly perfectly predicted
now that there is no differentiation between the N->(N-1) and 1->0 case.

These functions create a hard code section that, like an interrupt or ipi,
does not allow any case which might potentially block or switch threads.
While in a hard code section any such case will assert and panic the
system.

For example, acquiring a token that is not already held would be disallowed
even if the acquisition could be accomplished without blocking. However,
acquiring a token which is already held would be allowed. Same with the
mplock, lockmgr locks, etc. (mtx's and serializers have not been dealt
with yet).

* Introduce ASSERT_LWKT_TOKEN_HARD() and ASSERT_LWKT_TOKEN_CRIT().

These assert that a token is held and a hard critical section (hard)
or any critical section (crit) is in place.

* Rework the critical section macros and optimize the crit_exit*() code
to two conditionals which are usually always false regardless of
whether critcount is transitioning 1->0 or not. Also declare
crit_panic() __dead2 which may produce better code.

* Rework get_mplock() to reduce code generation. The hard code section
assertions would have made it too big. We still optimize the case
where the mplock is already held.

* badfo_kqfilter() must return an error to cause the kqueue
registraton to drop the knote, otherwise a kernel panic
will occur because the default file_filtops isn't replaced
and has not detach or event functions.

* Callbacks into the main kernel are not allowed when holding a cothread
lock as this could cause a deadlock between cpus. If the callback into
the kernel blocks the cothread lock, being a pthreads lock,
remains locked.

* Refactor the network and disk pipeline to hold the cothread lock for
a much shorter period of time, allowing data to be pipelined without
any stall conditions.

* For vknet if the tx mbuf return fifo is full we wait until it isn't.