Factor out 'curnetport'
This macro returns the current thread's msgport, if the current thread
is a network protocol thread, else the current CPU's netisr msgport is
returned; the latter case should be fixed.

Make sure to read the last byte of EEPROM descriptor. Previously
the last byte of the ethernet address was not read which in turn
resulted in getting 5 out of the 6 bytes of ethernet address and
always returned ENOENT.

The blockmap layer1/2 CRCs were being checked without the blockmap lock
being held. It was possible for the check to occur while another thread
was blocked with the layer half-modified, resulting in an assertion but
NO on-media corruption.

Fix the issue in an optimal manner by rechecking the CRC with the blockmap
locked when the first check fails. Only assert if the second check fails.

Generic layer changes:
- Pass more detailed information to ifaddr_event handler.
o The ifaddr which triggers the event is passed in
o The action (add/delete/change) performed upon the ifaddr is
passed in
- Add ifa_prflags field in ifaddr_container. This field should
be used to hold protocol specific flags. For inet addresses,
IA_PRF_RTEXISTOK is defined to ignore rtinit() EEXIST error in
in_ifinit().

carp(4) changes:
- Add virtual address struct, which holds corresponding carp(4)
inet address and backing address of a "real" interface (backing
interface).
- The list holding virtual address struct is sorted. This is
mainly used to fix the bug in following case:
host1:
ifconfig carp0 192.168.5.1
ifconfig carp0 alias 192.168.5.2
host2:
ifconfig carp0 192.168.5.2
ifconfig carp0 alias 192.168.5.1
Before this change, the inet addresses sha1 calculated for these
two host will be different, thus CARP fails.
Based-on: OpenBSD
- Allow inet addresses to be added to carp(4) interface, even if
no backing interface could be found or the backing interface is
not running.
- Don't abuse IFF_UP, which is administrative flag; use IFF_RUNNING
instead.
- Factor out carp_stop().
- Handle ifaddr_event; most of the carp(4) inet address configuration
happens in this event handler. In carp_ioctl(), we just mark the
carp(4) interface IFF_UP|IFF_RUNNING and set IA_PRF_RTEXISTOK on
the inet address.
- Fix the ifdetach_event handler:
o Don't sit on the branch while we are sawing it off.
o We always need to leave the joined multicast group.
- Free carp_if to the proper kmalloc pool.
- Simplify the carp_if struct; except the TAILQ_HEAD, rest of the
fields are not used; nuke them.
- Use 'void *' as ifnet.if_carp's type. This could ease upcoming
carp(4) MPSAFE work.
- M_NOWAIT -> MB_DONTWAIT
- Throw in assertions
- Cleanup:
o Nuke SC2IFP
o Nuke carp_softc.sc_ifp compat shim
o Constify function parameters
o ...

This is a MAJOR rewrite of usched_bsd4 and related support logic, plus
additional improvements to the LWKT scheduler.

* The LWKT scheduler used to run a user thread not needing the MP lock
if it was unable to run a kernel thread that did need it, due to some
other cpu holding the lock. This created a massive priority inversion

LWKT no longer does this. It will happily run other MPSAFE kernel
threads but as long as kernel threads exist which need the MP lock
LWKT will no longer switch to a user mode thread.

Add a new sysctl lwkt.chain_mplock which defaults to 0 (off). If set
to 1 LWKT will attempt to use IPIs to notify conflicting cpus when the
MP lock is available and will also allow user mode threads to run if
kernel threads are present needing the MP lock (but unable to get it).
NOTE: Current turning on this feature results in reduced performance,
though not as bad as pre-patch.

* The main control logic USCHED_BSD4 was almost completely rewritten,
greatly improving interactivity in the face of cpu bound programs
such as compiles.

USCHED_BSD4 no longer needs to use the scheduler helper when the
system is under load. The scheduler helper is only used to allow
one cpu to kick another idle cpu when additional processes are
present.

USCHED_BSD4 now takes great advantage of the scheduler's cpu-local
design and uses a bidding algorithm for processes trying to return
to user mode to determine which one is the best. Winners simply
deschedule losers, and since the loser is clearly not running when
the winner does this the descheduling operation is ultra simple to
accomplish.

This is a major revamping of the pageout and low-memory handling code.

The pageout daemon now detects out-of-memory conditions and properly
kills the largest process(es). This condition occurs when swap is
full (or you have no swap) and most of the remaining VM pages in memory
have become dirty. With no swap to page to the dirty pages squeeze out
the clean ones. The pageout daemon detects the case and starts killing
processes.

The pageout daemon now detects stress in the form of excess cpu use
and tries to reduce its cpu footprint when that occurs. Excess cpu use
can occur when the only pages left in-core are dirty and there is nowhere
to swap them to. Previously if this case occured the system would basically
just stop working.

These changes make the system truely have VM = RAM+SWAP. If you 1G of ram
and 1G of swap the system can run up to 2G worth of processes.

Scanning PCI configuration registers (which are not going to change) on
every interrupt looks expensive, especially when interrupt is shared.
Profiling (in FreeBSD) shows 3% of time spent by atapci0 on pure network
load due to IRQ sharing with em0.

Fix bugs in dealing with low-memory situations when the system has run out
of swap or has no swap.

* Fix an error where the system started killing processes before it needed
to.

* Continue propagating pages from the active queue to the inactive queue
when the system has run out of swap or has no swap, even though the
inactive queue has become bloated. This occurs because the inactive
queue may be unable to drain due to an excess of dirty pages which
cannot be swapped out.

* Use the active queue to detect excessive stress which combined with
an out-of-swap or no-swap situation means the system has run out of
memory. THEN start killing processes.

* This also allows the system to recycle nearly all the clean pages
available when it has no swap space left, to try to keep things going,
leaving only dirty pages in the VM page queues.

This actually fixes a potential "bug" in vfs_register() which does not
compare the new VFS to register with the last entry from the list, i.e.
two (or more) sequential vfs_register() calls with the same argument
would succeed.

Originally there is no time gap between the running of the tcp timer
handler and the deactivation of the tcp timer callout, but the message
based tcp timer has a time gap in between these two actions. This
time gap affects the code path which depends on the current state of
the tcp timer, i.e. return value of callout_active(tcp_timer). To
close this time gap, we take the pending and running tcp timer tasks
into consideration when testing the current state of the tcp timer.