Adjust the kern_utimes() code in the kernel to check for write permissions
prior to diving into the VFS. UFS checks for write perms but HAMMER doesn't.
Generally speaking we want (at least for now) the kernel to do as much of
these checks as possible.

When deleting a file, msdosfs keeps its denode in the denode cache until it is
reclaimed. This causes a collision in the cache when recycling the directory
entry of a deleted but still open file for a new or renamed file. This
collision was incorrecly handled resulting in a kernel panic (rename case) or
syscall error and corrupted in-core state (new file case).

Fix by allowing denodes pointing to the same directory entry to coexist in the
cache as long as a single one of them represents an existing file.

This function returns an error if there is already a denode in the hash table:
EBUSY if the hashed denode represents a live file and EINVAL if it represents a
deleted but still opened file.

There was a typo in the function causing it to check for liveness in the denode
to insert instead of the already inserted one. As a consequence, if N threads
were in a race in deget() to insert a new denode for the same file in the hash
table, the losers would fail with EINVAL instead of retrying.

With this change, the device will have at most 48 TX descriptors pending
to be write back. 48 is chosen according to the table listed on:
Intel 82571EB/82572EI Ethernet Controller Revision 6.0, Page 43,
Item 70. 82571/82572 Overwrites Transmit Descriptors in Internal Buffer.

We don't use TIDV/TADV to implement TX interrupt moderation, i.e.
TX desc's IDE bit should always be off. When we set TX desc's RS
bit, we do want TX interrupt to come immediately after the TX
desc's DD bit is set by hardware.

The RS (report status) bit in the TX desc controls whether DD bit
should be set by device (via write request) and whether TX interrupt
should be generated. By setting RS bit in the last TX desc of
int_tx_nsegs TX descs, we greatly reduce the TX interrupt rate
(from 20000/s to 1200/s for full speed 1472bytes UDP datagrams) and
the number of device's TX desc write requests. This also gives me
additional +10Kpps on 82573E_IAMT. Add sysctl node for int_tx_nsegs,
its default value is 1/16 number of TX descs. The implementation
details are commented near struct adapter's related fields.

OPIE requires a certain seed length and generates a default seed, using
a 0-padded random number. Subsequently changing the password will
increment this seed. The code missed 0-padding this increased integer,
so opiepasswd would advance the seed from e.g. "la0092" to "la93". This
would prevent opiekey(1) from working, complaining about a too short
seed.

HAMMER Utiliites: undo can now detect all prior replacements of a file.

The undo code will now iterate the history of the parent directory and
attempt to locate all versions of the requested file even if the inode
number changes due to the file being deleted and recreated, or
renamed-over.

undo -i attempts to show inode number changes and deletions in the list.

According to Intel's PCIe GbE Controllers Open Source Software
Developer's Manual Revision 1.8: a csum offloading TX desc will
prevent TX data read requests from being pipelined, thus reduce TX
performance. The pipelining effect is not obvious when transmitting
bulk data (e.g. 1472 bytes UDP datagram), but it could be dominant
when transmitting tiny packets. So we should avoid allocating a
csum offloading TX desc whenever possible to take advantage of the
pipelining effect.

On 82573E_IAMT,
Before this commit: ~700Kpps
After this commit: ~990Kpps

The funny thing about this commit is:
Old driver code from Intel's FreeBSD driver 6.2.9 roughly did what
we are doing in this commit, while Intel's FreeBSD driver 6.9.6
simply follows Linux's way to flush the performance to the toilet ...

In addition to adding support for some chips (e.g. 82574L), this also gives
me the chance to rearrange and clean up if_em.[ch] :)

Noticable changes to the FreeBSD driver:
- The hardware abstraction layer is put into a seperate module (ig_hal)
- IP csum offloading is supported when hardware TX csum offloading is enabled
- mbufs on RX/TX ring are freed in em_stop(), i.e. during "ifconfig emX down"
- TX pattern is adjusted that we test number of avaiable TX descs before
dequeuing mbuf from ifq. We also reserve double spare TX descs for 82544
cards on PCI-X bus, so we will not need to unload the loaded mbuf in the
mid way due to short of TX descs; this at least makes the logic a little
bit simpler.

When a tty is revoked, the opencount of its associated vnode is forced to
zero and calling vop_stdclose() on this vnode causes a panic. Call therefore
vop_stdclose() from spec_close() if and only if the opencount is strictly
positive.

I haven't managed to reproduce the original panic locally, hence the
"potential" above.