* Remove the rest of the LWKT fairq code, it may be added back in a different
form later. Go back to the strict priority model with round-robining
of same-priority LWKT threads.

Currently the model scans gd_tdrunq for sort insertion, which is probably
a bit too inefficient.

* Refactor the LWKT scheduler clock. The round-robining is now based on
the head of gd->gd_tdrunq and the lwkt_schedulerclock() function will
move it. When a thread not on the head is selected to run (because
the head is contending on a token), the round-robin tick will force a
resched on the next tick. As before, we never reschedule-ahead the
kernel scheduler helper thread or threads that have already dropped
to a user priority.

* The token code now tries a little harder to acquire the token before
giving up, controllable with lwkt.token_spin and lwkt.token_delay
(token_spin is the number of times to try and token_delay is the delay
between tries, in nanoseconds).

* Fix a serious bug in usched_bsd4.c which improperly reassigned the 'dd'
variable and caused the scheduler helper to monitor the wrong dd
structure.

* Refactor the vm_page coloring code. On SMP systems we now use the
coloring code to implement cpu localization when allocating pages.
The pages are still 'twisted' based on their physical address so both
functions are served, but cpu localization is now the more important
function.

* Implement NON-OBJECT vm_page allocations. NULL may now be passed, which
allocates a VM page unassociated with any VM object. This will be
used by the pmap code.

* Implement cpu localization for zalloc() and friends. This removes a major
contention point when handling concurrent VM faults. The only major
contention point left is the PQ_INACTIVE vm_page_queues[] queue.

* Temporarily remove the VM_ALLOC_ZERO request. This will probably be
reenabled in a later commit.