Avoid comparing negative signed to positive unsignad values. It was
leading to a bug, when C-state does not decrease on sleep shorter then
declared transition latency. Fixing this deprecates workaround for broken
C-states on some hardware.

By the way, change state selecting logic a bit. Instead of last sleep
time use short-time average of it. Global interrupts rate in system is a
quite random value, to corellate subsequent sleeps so directly.

Move the code to update cpu_cx_count out of acpi_cpu_generic_cx_probe().

Put it into acpi_cpu_startup() which is where all the other code to update
this global variable lives. This fixes a bug where cpu_cx_count was not
updated correctly if acpi_cpu_generic_cx_probe() returned early.

When nanosleep gets interrupted, it returns EINTR. In the case of a
non-zero error status, sys_nanosleep will copyout() the remaining sleep
time. However it would overwrite the nanosleep error status with the
error status of copyout() -- which is 0 (success) most of the time. This
means the important error status of nanosleep (EINTR) would be overwritten
by 0. Follow FreeBSD and NetBSD and only return the copyout status if it
failed.

It turns out that AMD C1E only happens after ACPI-CA module is
running, so we will have to broadcast IPI at the end of the ACPI-CA
attach to clear the C1E related bits and kick start the possible
stalled lapic timer.

- Add lapic_timer_process_oncpu(), which fires per-cpu systimer queue.
- Add lapic_timer_intr_reload(), which restart/start lapic timer.
- Change cputimer_intr_reload to function pointer, so it could be
overridden when needed. It is original cputimer_intr_reload function
on amd64 and vkernel. On i386, APIC initialization will set it to
lapic_timer_intr_reload if lapic_timer_enable tunable is set to 1,
else i8254_intr_reload (origial cputimer_intr_reload) will be used.
- If lapic_timer_enable is 1, then don't try to register "clk" interrupt
handler at all.

As of this commit, lapic timer support is done. It is not enabled by
default, set 'hw.lapci_timer_enable' to enable it.

- Add lapic_timer_oneshot_intr_enable(), which set lapic timer into
one shot mode and enable lapic timer interrupt. It is called
during per-cpu systimers initialization.
- Add lapic_timer_oneshot_quick(), which only set lapic timer's ICR

When merge-printing multiple cpu buffers, we already treat ts=0 as
a condition to prefer a more recent entry. However when searching for
the first entry, ts=0 (empty) will be treated regularly. This can lead
to a situation that ktrdump would only print entries from the last CPU:

Assume you had 4 CPUs, and the buffer for CPU #2 and #3 started out with
empty entries (which would not be ignored by earliest_ts()). When
searching for the next entry, the empty (ts=0) entry of CPU #2 would
always be selected as the first entry. However a ts=0 entry of CPU #3
would override this. In this case only the index of CPU #3 would
advance until full entries would be printed. Once in this situation,
processing the ts of CPU #2 would always reset ts to 0, and this would
be treated as "not found" when processing CPU #3's entries, leading to
an output that only contains CPU #3 entries.

Align the requested size to the nearest alignment to improve our chances
of coming up with a power-of-2.

Greatly improve the fitting algorithm for oddly sized requests, e.g.

(1) 32 byte alignment on a 1026 size. In this case the zone for 1026
already has a chunking (128) that exceeds the requested alignment,
so we just do a _slaballoc().

(2) A 256 byte alignment on a 513 byte size. In this case the zone
for 513 has a chunking of 64, which is not sufficient, so we
find the nearest power-of-2 >= 513 and allocate that. In our
case we would find 1024. Since _slaballoc() guarantees that
power-of-2 allocations within the zone limit will be on the
same-sized boundary, we then just allocate the nearest power of 2.

Data CRC errors should now generate EIO instead of panic()ing the system.
B-Tree CRC errors might still panic() and freemap CRC errors WILL still
panic().

Continuing from DDB on a B-Tree node CRC error when debugging is enabled
now no longer marks the B-Tree node as good.

The mirror-read command will now transfer data records with bad CRCs
instead of aborting the transfer, identifying them with a new type field.
The mirror-write ioctl currently ignores such records.

If a directory entry is encountered and the related inode cannot be
looked up, generate a dummy in-memory inode of type FIFO to placemark
the bad directory entry, allowing it to be removed. Currently it is
possible for a directory entry to be synced to the media in a different
transaction then the related inode (a bug which needs to be fixed).
If a crash occurs at the wrong time the recovery code can leave the media
in a state where the directory entry exists but the inode does not. This
change allows the bad directory entry to be removed.

Add the posix_memalign() function in all of its glory. Our new slab
allocator already does most of the job perfectly, particularly when
alignment < size (for things like cache-line aligned allocations).

Correct a bug in _vmem_alloc() for the case where (size) is much larger
then (alignment). The hack to get mmap() to return an aligned address
was not properly unmapping temporarily-mapped space.

Reformulate how errno is set to support posix_memalign(), which is defined
by the standard to return the error rather then set errno.

Due to Ramadan shifting through the Gregorian calendar it will end before
the fourth Thursday in September in 2009 and the next couple of years, so
Egypt is expected to end DST on the last Thursday in September.

Add ifpoll, which support hardware TX/RX queues based polling.
The implementation is mainly based on the polling(4) code.

Difference to the polling(4):
- Instead of registering one polling handler for both TX/RX and status,
drivers could register multiple polling handlers for TX/RX polling
handler on different CPU based on its own needs. And drivers could
register one status check handler, which is always polled on CPU0.
- TX could be polled at lower frequency than RX; normally we don't
need high frequency polling for TX, but for RX, we may need relative
higher polling frequency.
- Better serializer integration.

ifnet changes:
- ifnet.if_qpoll is added, which should be implemented by driver which
supports ifpoll.
- IFF_NPOLLING is added to indicate that the driver is using ifpoll.

ifconfig(8):
- Add 'npolling' and '-npolling'; they are used to turn on/off ifpoll
on the specified interface.

Drivers:
- emx(4) is converted to use the ifpoll. Coexistance of ifpoll and
polling(4) in one driver requires extra effort in driver itself;
drop polling(4) support in emx(4) for now.

IFPOLL_ENABLE kernel option is added, which is not enabled by default.

If running as a user instead of root uid, gid, and flags changes are allowed
to fail and also, if running as a user, no longer force a copy if they
differ but the mtime and size are the same. Generate a single warning
instead.

Reorder the call to setutimes to occur after chown/chmod instead of before,
and to occur after a chflags call if IMMUTABLE is not set.

Fix an installworld failure due to kernel fixes and a libthread_xu issue.

Build the bootstrap version of cpdup without threading to work around a
bug in libthread_xu. Libthread_xu was trying to map the original user
stack's red zone without using MAP_FIXED or MAP_TRYFIXED or MAP_STACK,
a behavior which the kernel now prohibits.

Add a dummy offset to the arrays generated by genassym to avoid ary[0]

The dummy offset avoids the generation of dummy arrays of size zero.
This whole code path is a hack, but after a lot of messing around
Alex and I determined that it was easier to hack it then to try to
redo the code due to complications introduced by cross-compiled
environments.

1) remove uses of __label__, which is not supported by llvm/clang
2) remove uses of register type var __asm("ecx") and other variable
register-binding as it is not supported by llvm/clang and is superfluous
3) add an ugly hack, conditionalized on __clang__, to allow correct
compilation of atomic_intr_cond_try()