Fix the recently committed (and described) page writability nerf. The real
kernel was unconditionally mapping writable pages read-only on read faults
in order to be able to take another fault on a write attempt. This was needed
for early virtual kernel support in order to set the Modify bit in the
virtualized page table, but was being applied to ALL mappings rather then
just those installed by the virtual kernel.

Now the real kernel only does this for virtual kernel mappings. Additionally,
the real kernel no longer makes the page read-only when clearing the Modify
bit in the real page table (in order to rearm the write fault). When this
case occurs VPTE_M has already been set in the virtual page table and no
re-fault is required.

The virtual kernel now only needs to invalidate the real kernel's page
mapping when clearing the virtualized Modify bit in the virtual page table
(VPTE_M), in order to rearm the real kernel's write fault so it can detect
future modifications via the virtualized Modify bit. Also, the virtual kernel
no longer needs to install read-only pages to detect the write fault. This
allows the real kernel to do ALL the work required to handle VPTE_M and
make the actual page writable. This greatly reduces the number of real
page faults that occur and greatly reduces the number of page faults which
have to be passed through to the virtual kernel.

This fix reduces fork() overhead for processes running under a virtual
kernel by 70%, from around 2100uS to around 650uS.

Add missing link options to export global symbols to the _DYNAMIC section,
allowing the kernel namelist functions to operate. For now just make
certain static variables global instead of using linker magic to export
static variables.

Add infrastructure to allow out of band kernel memory to be accessed. The
virtual kernel's memory map does not include the virtual kernel executable
or data areas.

Major pmap update. Note also that this commit temporarily nerfs performance
because the virtual kernel support is being applied globally instead of just
to virtual page tables. This will be fixed in a followup commit.

When clearing the Modify bit, also clear the write bit to force a fault.
We have no other means of detecting when a page is modified. Note that
this may not be necessary in the vkernel implementation once the real
kernel implementation is fixed to properly set the Modify bit (by doing the
same thing).

Invalidate the real kernel's pmap when clearing either the Write bit or
the Modify bit, instead of just the write bit, and clear both bits when
clearing either, again in order to force a page fault to detect page
modifications.

Code cleanup - remove degenerate cases. e.g. pmap_changebit() was only
ever used to clear bits, rip out the set/clear functionality and just
make it pmap_clearbit().

Open the root disk with O_DIRECT. We do not want both the real kernel and
the virtual kernel caching the 'disk' data. We just want the virtual kernel
to cache the data so memory resource use is limited to the 'physical' memory
specified when starting up the virtual kernel.

date: 2004-08-17 22:56:07 -0700; author: njl; state: Exp; lines: +5 -2;
When one entry in the RSDT is corrupted, just skip it instead of bailing out.
This gets us the info we need on systems which have proprietary tables that
don't match the standard. For instance, an AMI system has a table of type
"OEMB" with an invalid checksum.

acpi.c
revision 1.26
date: 2004-08-13 15:59:09 -0700; author: marcel; state: Exp; lines: +44 -10;
branches: 1.26.2;
Add support for SSDT tables. Dumping or disassembling the DSDT will
now include the contents if any SSDT table as well. This makes use
of the property that one can concatenate the body of SSDT tables to
the DSDT, updating the DSDT header (length and checksum) and end up
with a larger and valid DSDT table. Hence, this also works with -f.

Reviewed by: njl@

acpidump.8
revision 1.18
date: 2004-08-16 13:33:20 -0700; author: marcel; state: Exp; lines: +4 -2;
branches: 1.18.2;
We now handle SSDT tables. Remove a reference from the BUGS section
and explicitly mention SSDT when we talk about the DSDT so that people
don't have to guess whether it includes the SSDT.
While here, touch date.

Pointed out by: le@

acpidump.c
revision 1.9
date: 2004-08-13 15:59:09 -0700; author: marcel; state: Exp; lines: +10 -7;
branches: 1.9.2;
Add support for SSDT tables. Dumping or disassembling the DSDT will
now include the contents if any SSDT table as well. This makes use
of the property that one can concatenate the body of SSDT tables to
the DSDT, updating the DSDT header (length and checksum) and end up
with a larger and valid DSDT table. Hence, this also works with -f.

Reviewed by: njl@

acpidump.h
revision 1.17
date: 2004-08-13 15:59:09 -0700; author: marcel; state: Exp; lines: +4 -3;
Add support for SSDT tables. Dumping or disassembling the DSDT will
now include the contents if any SSDT table as well. This makes use
of the property that one can concatenate the body of SSDT tables to
the DSDT, updating the DSDT header (length and checksum) and end up
with a larger and valid DSDT table. Hence, this also works with -f.

Get floating point working in virtual kernels. Add a feature that allows
the virtual kernel to request that the real kernel return a T_DNA fault
if an emulated user context attempts to use the FP unit.

Add a temporary fix to ata_interrupt() to ignore weird interrupts seen on
SMP+ATAPI+DMA. For some reason, the ATAPI device asserts INTRQ while it's
otherwise completely ready to fulfill the request. DRQ is set in the status
register, and no matter how I read the specs, this just should never
happen. Is this actually a race condition? I have no idea :(. Ignoring the
interrupt, however, does not seem to produce adverse results (tested with
a DragonFlyBSD livecd with SMP kernel).

Bail out of acd_open() when we're called with a cdev_t that wasn't created
by us. This happened when e.g. booting from cd; 'acd0c' would then be
resolved to minor 2 instead of the minor 0 we create in acd_attach(). If
one, instead of booting the kernel through, exits to the loader prompt from
the bootloader, then issues `unload` and `load kernel`, then `boot`, this
problem wouldn't show up. My guess is, somewhere, somehow, the bootloader
is mucking about in our cdev storeroom. I haven't yet found where this
happens.

- The Bahamas also observe the new US DST rules this year.
(Thanks to Sue Williams for this.)

* Changes affecting TZ names, to reflect typical English practice better.
The old names are still supported, in the 'backward' file.

- Rename Africa/Asmera to Africa/Asmara.

- Rename Atlantic/Faeroe to Atlantic/Faroe.

* A new zone Australia/Eucla, covering the Eucla area. (Thanks to
Alex Livingston for this.)

* Changes affecting old time stamps.

- The 1982 transition for Pacific/Easter occurred Mar 13, not Jan 18.
Also, the pre-1932 time stamps are now labeled EMT, not MMT, since
they are not Mataveri Mean Time. (Thanks to Jesper Norgaard Welen
for this.)

A virtual kernel running another virtual kernel running an emulated process
context must pop back into its virtual kernel context before posting
any signal. Correct a number of cases that were not being handled.

change_[er]uid() both use cratom(), however their consumers happened to cache
the original cred in a variable. If the cred happened to be shared before
(for example due to an open(2)), a conditional lateron would still reference
the old cred instead of the new one, which was instanciated in change_[er]uid.

Fix this by returning the new cred from change_[er]uid and using this in
subsequent conditionals.

Implement vm_fault_page_quick(), which will soon be replacing
vm_fault_quick(). vm_fault_quick() does not hold the underlying page
in any way and is not SMP friendly. It also uses architecture-specific
tricks to force the page into a pmap which do not work with the VKERNEL.

Don't call device_busy() when /dev/psmN is opened. DS_BUSY is used to
prevent recursive bus scanning on DragonFly. Strip the device_unbusy()
in psmclose().
This should make module loading after system startup work on some system.

- Add {TAILQ,STAILQ}_CONCAT() macros
- Add comment about *_FOREACH_MUTABLE()
- Sync ath(4) with FreeBSD (sam@freebsd.org):
o Add/Correct some debug information.
o Add more statistics.
o Receive control frames in monitor mode.
o Close race in handling mcast traffic when operating as an ap with
stations in power save: add a new q where mcast frames are stashed
and on beacon update (at DTIM) move frames from the mcast q to the
cabq and start it. This ensures the cabq is only manipulated in
one place.
o Correct the type of ath_descdma.dd_desc_len
o Correct max segement size when creating DMA tag

When parsing an invalid parameter expansion (eg. ${} or ${foo@bar}) do not
issue a syntax error immediately but save the information that it is erroneous
for later when the parameter expansion is actually done. This means eg. "false
&& ${}" will not generate an error which seems to be required by POSIX.

Remove some white space at EOL.

The sub-expression "(nulonly || 1)" always evaluates to true and
according to CVS logs seems to be just a left-over from some
debugging and introduced by accident.

Implement nearly all the remaining items required to allow the virtual kernel
to actually execute code on behalf of a virtualized user process. The
virtual kernel is now able to execute the init binary through to the point
where it sets up a TLS segment.

* Create a pseudo tf_trapno called T_SYSCALL80 to indicate system call traps.

* Add MD shims when creating or destroying a struct vmspace, allowing the
virtual kernel to create and destroy real-kernel vmspaces along with.

Add appropriate calls to vmspace_mmap() and vmspace_mcontrol() to map
memory inside the user process vmspace. The memory is mapped VPAGETABLE
and the page table directory is set to point to the pmap page directory.

* Clean up user_trap, handle T_PAGEFLT properly.

* Implement go_user(). It calls vmspace_ctl(... VMSPACE_CTL_RUN) and
user_trap() in a loop, allowing the virtual kernel to 'run' a user
mode context under its control.

* Reduce VM_MAX_USER_ADDRESS to 0xb8000000 for now, until I figure out the
best way to have the virtual kernel query the actual max user address from
the real kernel.

* Correct a pm_pdirpte assignment. We can't look up the PTE until after
we have entered it into the kernel pmap.

Handle page faults within the virtual kernel process itself (what would be
called kernel mode page faults in a real kernel). Fortunately the system
trapframe is a subset of the ucontext supplied to userland signal function.

Since user vs kernel addresses and trap types cannot be discerned by looking
at the frame (because they are in entirely different VM spaces), separate
trap() into user_trap() and kern_trap().

Implement a poor-man's (really kludgy) version of copyin, copyout, and related
functions.

Implement and process T_PAGEFLT virtual kernel traps. They work about 85% of
the time.

The stack frame available from a signal to user mode stores the fault address
in tf_err. The PGEX bits that used to be stored in tf_err are lost.

When a process a virtual kernel (p->p_vkernel != NULL), store the PGEX bits
in the high 16 bits of the tf_trapno field so the virtual kernel can
figure out whether the page fault was a read fault or a write fault.

Use -s to flag POSIX's "special built-in" utilities in builtins.def. Add a
new member to struct builtincmd and set it to 1 if -s was specified. This
is done because there are cases where special builtins must be treated
differently from other builtins.

Implement some of the differences between special built-ins and other builtins
demanded by POSIX.
- A redirection error is only fatal (meaning the execution of a shell script is
terminated) for special built-ins. Previously it was fatal for all shell
builtins, causing problems like the one reported in FreeBSD PR 88845.
- Variable assignments remain in effect for special built-ins.
- Option or operand errors are only fatal for special built-ins.

Add the times builtin. It reports the user and system time for the shell
itself and its children. Instead of calling times() (as implied by POSIX) this
implementation directly calls getrusage() to get the times because this is more
convenient.

Print pointers with %p rather than casting them to long.

Replace home-grown dup2() implementation with actual dup2() calls. This
should slightly reduce the number of system calls in critical portions of
the shell, and select a more efficient path through the fdalloc code.

Implement the PS4 variable which is defined by the POSIX User Portability
Utilities option. Its value is printed at the beginning of the line if tracing
(-x) is active. PS4 defaults to the string "+ " which is compatible with the
old behaviour to always print "+ ".

Don't crash on "<cmd> | { }".

Remove some white space at EOL.

Add the POSIX options -v and -V to the 'command' builtin. Both describe the
type of their argument, if it is a shell function, an alias, a builtin, etc.
-V is more verbose than -v.

Do not assume there is only a space between #define and the macro name
when grepping for JOBS in mkbuiltins

Add a new procedure, vm_fault_page(), which does all actions related to
faulting in a VM page given a vm_map and virtual address, include any
necessary I/O, but returns the held page instead of entering it into a pmap.

Use the new function in procfs_rwmem, allowing gdb to 'see' memory that
is governed by a virtual page table.