* smp_invltlb() was running asynchronously when it really needs to run
synchronously. Generally speaking the asynchronous ipi did in fact work
pretty well but it still presents a 1uS window of opportunity which
bypasses normal write ordering safeties.

Run smp_invltlb() synchronously.

* Fixing the above lea to the discovery of an ACPI issue. The ACPI
cpu idle halt code, at least on the gigabyte phenom x 6 I've been
testing with, can cause IPIs to be lost. Not just delayed, straight
out lost. Gone. Poof. It doesn't matter whether the IPI is a
broadcast IPI or a directed IPI, it can still get lost.

This was particularly noticeable when I fixed smp_invltlb() and my
test box started locking up due to a random cpu sometimes not receiving
the Xinvltlb IPI, and it is quite possible that this issue was also
responsible for the random seg-faults we would sometimes get on 64-bit
boxes.

For now the acpi halt code has been disabled. It can be enabled with
sysctl machdep.cpu_idle_hlt=2 if you want to risk it.

* Use doreti_syscall_ret and doreti_iret in several cases that were
previously popping the interrupt frame and iret'ing manually. This
is operationally equivalent.

* Add a missing "sti" in the idle loop. Usually the cpu_idle_hook()
deals with this but there are some alternative paths which might not,
potentially causing interrupts to be delayed unnecessarily.
At worst the idle thread has an extra sti in it.

nrelease: Fix an annoying bug that was preventing the ISOs from booting UP.

It seems our CD9660 support in libstand at least has some problems with
directory names containing dots. This prevented the LiveCD to properly
boot the UP kernel. Don't ask me about how it ended up using the SMP
kernel in this case, but that's what it did, making the UP boot
impossible. My guess is it has something to do with the order in which
stuff had been added to the ISO. But that's pure speculation. I'm not
even sure about the dot part. It might as well be directory name length.
In any case, UP boot from the LiveCD was broken.

To fix all this, install the UP kernel to /boot/UP and the SMP kernel to
/boot/SMP on the LiveCD and images. They will be picked up by the
installer from there by a separate commit.

While here, remove some non-functional code from dloader.menu. We can't
easily build i386/x86_64 dual boot ISOs at the moment (for this we would
need cross-building pkgsrc).

* Conditionally output extra types with most-recently-used offsets last
(for use by systems with pre-2011 versions of localtime.c, helping to
ensure that globals "altzone" and "timezone" get set correctly).

* Fix generation of POSIX strings for zones with rules using
"weekday<=n" forms of dates (thanks to Lei Liu for finding the
problem). Also, limit output for non-POSIX-specificable zones defined
to follow the same rules every year. (Note that no zones of either of
the above types appear in the distribution; these changes cater to
add-on zones).

* For now checkout 2010Q3 by default in master, this will be changed
to 'master' in a month or two but for now we want to match the release.

* Add GITHOST feature to the nrelease Makefile, defaulting to
git.dragonflybsd.org. This allows developers who do release builds
to specify a more local clone/mirror/copy of the pkgsrcv2.git and
dragonfly.git repos.

nrelease pulls from these repos so this can save a bunch of time.

* nrelease now installs each kernel into its own kernel.XXX directory
in /boot, with a complete set of modules for each kernel. This is
instead of installing all kernels into /boot/kernel/ and naming them
differently inside /boot/kernel/.

This is to conform to the recent dloader work and very recent new
menu features.

* All kernels+modules are installed with INSTALLSTRIPPEDMODULES so the
ISO comes in at a reasonable size.

* The release ISO now contains both a UP and a SMP kernel, selectable
at boot time.

* Do not try to include pkgsrc in the release ISO/IMG (at 1G+ it is too
big). We continue to include system sources. There isn't enough room
for full sources in the ISO/IMG. Note that GUI builds will include
a full pkgsrc and full sources, including a .git base for them.

* nrelease now supports a IMGSIZE override on the make line (in sectors),
for the usb image.

* nrelease now autosizes the IMG file (when not overriden by IMGSIZE) to
the nearest base-10 gigabyte to ensure that it fits on the USB stick
and provide a bit of extra space for messing around.

* Conditionalize dloader.menu to only present menu options for which
the related kernels are present in /boot.

* Set the default kernel to one of:

kernel, kernel.GENERIC, kernel.GENERIC_SMP, kernel.X86_64_GENERIC,
or kernel.X86_64_GENERIC_SMP. The first one in the list found becomes
the default.

* If ${default_kernel}_SMP is available supply a menu option to change
the default to that kernel, aka UP->SMP

* If kernel.X86_64_GENERIC[_SMP] is available and the current default is
not the same supply menu options to change the default to the 64-bit
UP or SMP kernel.

Note however that for this to work the related boot kernel directory
had better have a loader.conf.local in it that points the root mount
to a 64-bit root 'cause 64-bit kernels can't run 32 bit binaries yet.

* Adjust the menu item execution code to copy the items before executing
them, allowing recursive menu execution. The recursive menu feature
is a bit of a hack right now however.

* NOTE: the optcd for people ESCaping into the boot prompt was moved
to dloader.rc, when updating you may have to manually reinstall that
file to get the functionality.

This code has a bug because the switch code ALSO optimizes the loading
of %cr3 to avoid reloading it if it hasn't changed, for example when
switching between two user threads associated with the process,
because the other cpu(s) running similar threads may lose track of
the fact that our cpu also needs an IPI for page invalidations in the
pmap for a short period of time.

Because we don't reload %cr3 in this case, our tlb can become invalid.
This can also occur with vfork() sequences.

* Fix by testing that we are switching to the same vmspace and do not
clear the pm_active bit in that case. Retain the %cr3 optimization.

* lwp_wait() must defer reaping of a lwp that is still running on another
cpu (i.e. in the midst of exiting).

* It shouldn't be possible for this to happen but just incase the thread
gets switched out after TDF_EXITING has been set, also make sure
the threat is no longer on the LWKT run queue.

* Remove old debugging in the LWKT scheduler path.

* Protect gd_freetd (per-cpu td cache) with a critical section. Again
this case should not occur as new threads are not allocated from
interrupts, but protect it anyway. Also assert that the cached free
td is in no way still scheduled.

Note that this cache is required to ensure that the td does not
end up in the MP-accessible objcache before it has been fully
descheduled.

* Use %cr3 from the dumppcb instead of KPML4phys on x86_64, and similarly
for i386, to access the full page table as of when the panic occured
instead of just the kernel page table.

* minidumps do not dump userspace so userspace will still not be available,
but this gives us the option of sysctl'ing off minidumps when userspace
access is desired, and kgdb will then be able to access the current
userspace context as of the panic, as well.

* Implement dmdump and dump routines for the three main targets (linear,
stripe and crypt).

* The top-level dmpdump will call all the required dump() methods in the
targets just as it does with strategy() calls. The lower level
target-specific dump routines will then redirect (after processing,
etc) these requests to the underlying device's dump routines.

* This should provide quite reliable dumping even through device mapper,
although it is more error-prone than the equivalent dumping on normal
disks as there's a lot more going on behind the scenes.

* Move the MP lock from outside to inside exit1(), also fixing an issue
where sigexit() was calling exit1() without it.

* Move calls to dsched_exit_thread() and biosched_done() out of the
platform code and into the mainline code. This also fixes an
issue where the code was improperly blocking way too late in the
thread termination code, after the point where it had been descheduled
permanently and tsleep decomissioned for the thread.

* Cleanup and document related code areas.

* Fix a missing proc_token release in the SIGKILL exit path.

* Fix FAKE_MCOUNT()s in the x86-64 code. These are NOPs anyway
(since kernel profiling doesn't work), but fix them anyway.

* Use APIC_PUSH_FRAME() in the Xcpustop assembly code for x86-64
in order to properly acquire a working %gs. This may improve the
handling of panic()s on x86_64.

* Also fix some cases if #if JG'd (ifdef'd out) code in case the
code is ever used later on.

* Protect set_user_TLS() with a critical section to be safe.

* Add debug code to help track down further x86-64 seg-fault issues,
and provide better kprintf()s for the debug path in question.

* Ignore any file handles passed from bootp. These are NFSv2 handles
and NFSv2 just doesn't have the directory support needed if the
server is running a filesystem with 64-bit directory cookies (as HAMMER
does).

This will force the kernel's diskless nfs mount to re-authorize and
acquire a NFSv3 handle for the root mount.

* Undo a bit of a previous commit where I tried to enable readdirplus for
NFSv2 to workaround NFSv2 directory cookie issues. It just doesn't work.
readdirplus is again force-disabled for NFSv2 mounts.

* qemu does not debounce the RTC data. The RTC chip uses a serial interface
as well as a simple carry propagation incremenber which can catch
transitions in the middle of an update, resulting in corrupted data.

* A snapshot can sometimes contain visible inodes whos nlinks count is 0,
essentially the snapshot 'catches' the file in the middle of being deleted.

* HAMMER was attempting to truncate the data for such inodes if the file
were opened and then closed, and failed to check whether the inode was a
snapshot or a current inode. This flowed through until it hit an assertion
designed to detect precisely that case.

* Fixed by adding a check to determine if the inode is a snapshot and/or
the filesystem is mounted read-only.

* A very long standing bug in the server cache was finally whacked. The
write-gather code was improperly returning the wrong mbuf for the server
to reply with, causing client stalls. This behavior depends on the client
doing burst asynchronous writes. Newer releases of DragonFly do burst
asynchronous writes but older ones tended not to.

* The server cache was not MPSAFE. Add a MP token to fix that.

* Remove critical sectons from the server cache which are no longer needed.

* Fix a potential client-side rpc request race where a request's
NEEDSXMIT flag is not set until after the request possibly blocks,
which can lead to issues if another thread picks up the request
and then believes that it has already been transmitted when it
has not.

* Document a big problem with NFSv2 and HAMMER-served directories. NFSv2
only has 32-bit directory cookies. It is possible to work around the
problem by using rdirplus (which is the default now). However, some
servers may not be able to handle rdirplus with a NFSv2 mount.

Users who need to serve out NFSv2 cannot serve HAMMER directories
with NFSv2 unless the clients support rdirplus.

Our defaults are NFSv3 and rdirplus and NFSv3 does NOT have this problem.

Add a vkernel_bin var that contains the default path to the binary,
for all the vkernels without a vkernel specific entry.
Also the root image is no longer required (diskless vkernels have no
use for a root image).
And when stopping a vkernel a pidfile is required so we don't kill
innocent bystanders.