While here, clean up the comments a bit. -Wextra cleanness is not
something we're aiming for. What we want are warnings that help
catching bugs and/or keeping the code nice, but at the same time
don't get on anyone's nerves.

Currently the passed in address is copied into a newly allocated
memory (grr, additional blocking kmalloc), and the PRUS_FREEADDR
will be set so that protocol thread could know when to free the
address.

Before this change netperf UDP_STREAM (unconnected socket) could
only do ~200Kpps (w/ -m 18), now it could do ~990Kpps (w/ -m 18).
This gives ~500% performance improvement for tiny UDP packet TX.
The improvement is not as good as the connected socket, which is
~600%, mainly because of the additional memory allocation for
the address. We _may_ further optimize out the address allocation.

There is no performance impact on the mostly used sockets:
- IPv4/IPv6 TCP implemented pru_savefaddr, so their pru_accept will not
be called at all
- UNIX domain socket uses sync msgport, so no protocol thread dispatching

* Reorder the vnode ref/rele sequence in the exec path so p_textvp is
left in a more valid state while being initialized.

* Removing the vm_exitingcnt test in exec_new_vmspace(). Release
various resources unconditionally on the last exiting thread regardless
of the state of exitingcnt. This just moves some of the resource
releases out of the wait*() system call path and back into the exit*()
path.

* Implement a hold/drop mechanic for vmspaces and use them in procfs_rwmem(),
vmspace_anonymous_count(), and vmspace_swap_count(), and various other
places.

This does a better job protecting the vmspace from deletion while various
unrelated third parties might be trying to access it.

* Implement vmspace_free() for other code to call instead of them trying
to call sysref_put() directly. Interlock with a vmspace_hold() so
final termination processing always keys off the vm_holdcount.

* Implement vm_object_allocate_hold() and use it in a few places in order
to allow OBJT_SWAP objects to be allocated atomically, so other third
parties (like the swapcache cleaning code) can't wiggle their way in
and access a partially initialized object.

* Reorder the vmspace_terminate() code and introduce some flags to ensure
that resources are terminated at the proper time and in the proper order.