sync with head.
for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.
this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")

Allow using CMSG_NXTHDR with -Wcast-align.
If various checks are omitted, the CMSG_NXTHDR macro expands to
(struct cmsghdr *)((char *)(cmsg) + \
_ALIGN(((struct cmsghdr *)(cmsg))->cmsg_len))
Although there is no alignment problem (assuming cmsg is properly aligned
and _ALIGN is correct), this violates -Wcast-align on strict-alignment
architectures. Therefore an intermediate cast to void * is appropriate here.
There is no workaround other than not using -Wcast-align.
Taken from FreeBSD commit r220742 by jilles

Change CMSG_SPACE and CMSG_LEN to provide Integer Constant Expressions
again. This was changed in sys/socket.h r1.51 to work around fallout
from the IPv6 aux data migration. It broke the historic ABI on some
platforms. This commit restores compatibility for netbsd32 code on such
platforms and provides a template for future changes to the CMSG_*
alignment. Revert PCC/Clang workarounds in postfix and tmux.

- Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).

* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).
- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.
* Fix the system calls that take socklen_t arguments to actually do so.
* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).
* Bump libc version for the new syscalls.

Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.

Make ifconfig(8) set and display preference numbers for IPv6
addresses. Make the kernel support SIOC[SG]IFADDRPREF for IPv6
interface addresses.
In in6ifa_ifpforlinklocal(), consult preference numbers before
making an otherwise arbitrary choice of in6_ifaddr. Otherwise,
preference numbers are *not* consulted by the kernel, but that will
be rather easy for somebody with a little bit of free time to fix.
Please note that setting the preference number for a link-local
IPv6 address does not work right, yet, but that ought to be fixed
soon.
In support of the changes above,
1 Add a method to struct domain for "externalizing" a sockaddr, and
provide an implementation for IPv6. Expect more work in this area: it
may be more proper to say that the IPv6 implementation "internalizes"
a sockaddr. Add sockaddr_externalize().
2 Add a subroutine, sofamily(), that returns a struct socket's address
family or AF_UNSPEC.
3 Make a lot of IPv4-specific code generic, and move it from
sys/netinet/ to sys/net/ for re-use by IPv6 parts of the kernel and
ifconfig(8).

Add accept filters, ported from FreeBSD by Coyote Point Systems. Add inetd
support for specifying an accept filter for a service (mostly as a usage
example, but it can be handy for other things). Manual pages to follow
in a day or so.
OK core@.

1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:
int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);
2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.
3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.
Shorten some pr_ctloutput staircases for readability.
4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():
struct mbuf *m_getsombuf(struct socket *so)
Create an mbuf and make its owner the socket `so'.
struct mbuf *m_intopt(struct socket *so, int val)
Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).
int fsocreate(..., int *fd)
Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.
void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)
Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.
socklen_t sockaddr_getlen(const struct sockaddr *sa)
Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.
const struct sockaddr *sockaddr_any(const struct sockaddr *sa)
Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.
const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)
Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.
Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.
Avoid using sizeof(struct sockaddr_dl) in the kernel.
Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.
Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().
Constify: LLADDR() -> CLLADDR().
Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.

Split sys_getpeername() and sys_getsockname() so they can be called when the
'name' is wanted in kernel code.
Similarly split sys_accept() and change the split in recvmsg() so that it
is useful to the compat functions, recvit() is removed and replaced by
do_sys_recvmsg().
Factor out the code that writes socked names to userspace (from mbuf) to
avoid replicated code.
Extract the code that writes socket 'control' (CMSG) data out to userspace,
being more careful about the 'fd' that may exist inside SCM_RIGHTS msgs.
(they still get lost if some of the latter copyout calls fail).
Since these are new functions, and old LKMs will fail to load.

Split sys_bind() and sys_connect() so that compat code can use common code
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.
The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.
Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.
DETAILS
1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:
struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);
sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.
sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.
The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).
2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.
3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:
int rtcache_setdst(struct route *, const struct sockaddr *);
rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.
It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.
4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.

version the socket(2) syscall. for compat30 socket, we use
EPROTONOSUPPORT instead of EAFNOSUPPORT.
from pavel@ with a little bit of clean up from myself.
XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.

Initial import of bluetooth stack on behalf of Iain Hibbert. (plunky@,
NetBSD Foundation Membership still pending.) This stack was written by
Iain under sponsorship from Itronix Inc.
The stack includes support for rfcomm networking (networking via your
bluetooth enabled cell phone), hid devices (keyboards/mice), and headsets.
Drivers for both PCMCIA and USB bluetooth controllers are included.

Pull up following revision(s) (requested by kleink in ticket #10233):
sys/sys/socket.h: revision 1.78
Since NULL isn't necessarily available, change CMSG_*() to use the
integer constant expression directly instead.
Fixes a problem in the build of rxvt-unicode-7.2, as reported by
Thomas Klausner.

Pull up following revision(s) (requested by kleink in ticket #1143):
sys/sys/socket.h: revision 1.78
Since NULL isn't necessarily available, change CMSG_*() to use the
integer constant expression directly instead.
Fixes a problem in the build of rxvt-unicode-7.2, as reported by
Thomas Klausner.

Solve problem invented in revision 1.73 differently:
Use "__uint64_t" etc. instead of pulling in "sys/socket.h" to avoid
namespace pollution. Based on comments made by Klaus Klein on the
current-users mailing list.

Add the following nodes to the sysctl tree:
net.local.stream.pcblist
net.local.dgram.pcblist
net.inet.tcp.pcblist
net.inet.udp.pcblist
net.inet.raw.pcblist
net.inet6.tcp6.pcblist
net.inet6.udp6.pcblist
net.inet6.raw6.pcblist
which allow retrieval of the pcbs in use for those protocols. The
struct involved is 32/64 bit clean and incorporates parts of struct
inpcb, struct unpcb, a bit of struct tcpcb, and two socket addresses.

Add a new feature-test macro, _NETBSD_SOURCE. If this is defined
by the application, all NetBSD interfaces are made visible, even
if some other feature-test macro (like _POSIX_C_SOURCE) is defined.
<sys/featuretest.h> defined _NETBSD_SOURCE if none of _ANSI_SOURCE,
_POSIX_C_SOURCE and _XOPEN_SOURCE is defined, so as to preserve
existing behaviour.
This has two major advantages:
+ Programs that require non-POSIX facilities but define _POSIX_C_SOURCE
can trivially be overruled by putting -D_NETBSD_SOURCE in their CFLAGS.
+ It makes most of the #ifs simpler, in that they're all now ORs of the
various macros, rather than having checks for (!defined(_ANSI_SOURCE) ||
!defined(_POSIX_C_SOURCE) || !defined(_XOPEN_SOURCE)) all over the place.
I've tried not to change the semantics of the headers in any case where
_NETBSD_SOURCE wasn't defined, but there were some places where the
current semantics were clearly mad, and retaining them was harder than
correcting them. In particular, I've mostly normalised things so that
_ANSI_SOURCE gets you the smallest set of stuff, then _POSIX_C_SOURCE,
_XOPEN_SOURCE and _NETBSD_SOURCE in that order.
Tested by building for vax, encouraged by thorpej, and uncontested in
tech-userlevel for a week.

Expel MSG_COMPAT/COMPAT_OLDSOCK, make the COMPAT_43 wrappers
arrange things as needed. Unfortunately, the check in sockargs()
have to stay, since 4.3BSD bind(2), connect(2) and sendto(2) were
not versioned at the time :(
This code was tested to pass regression tests.

move socklen_t decl to sys/ansi.h.
sys/socket.h pulls in sys/ansi.h and declare socklen_t as necessary.
the change is to allow declaration of socklen_t with less inclusion ordering
constraints. (netdb.h needs this change)

make CMSG_ALIGN always synchronize with kernel's idea of ALIGNBYTES.
ancillary data alignment will be ALIGNBYTES, not sizeof(long) - 1, from now.
CMSG_xx will NOT resolve into constant. if you use CMSG_xx to allocate
arrays, you'll lose.
bump shlib minor for libc.
NOTE: if you are on top of arch with ALIGNBYTES != sizeof(long) - 1,
you need to recompile IPv6-related binaries. there is no way to guarantee
backward compat in this aspect. sorry for this. this should be the last
backward compat breakage for IPv6-related ancillary data manipulation.
(we still have PR 9516 for unix-domain sockets...)

bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.
The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.
synchronization to latest KAME will take place on HEAD branch soon.

Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.

* Define socklen_t, an unsigned integral type used to hold the lengths of
socket options, addresses etc., and use it where specified by XNS5.
* Per XNS5, change data pointers types in struct msghdr from caddr_t to void *;
make msg_iovlen a signed int (apparently for consistency with the iovcnt
argument to readv()/writev()).
* Some name space protection.

Ok, _really_ back out the sockaddr alignment change this time. We can't
reasonably "fix" the compiler "bug" that causes the forced alignment
to fail on certain platforms (e.g. m68k), so we _have_ to solve this
problem a different way.

Deal with an alignment problem on the Alpha port. The maximum required
alignment of any field in a struct sockaddr is 1, since all members are
chars or char arrays (as noted by Ross Harvey on port-alpha). This causes
the possibility of unaligned access faults when a sockaddr is cast to
e.g. a sockaddr_in. Solution: explicitly direct the compiler to
longword-align the start of a struct sockaddr.

Allocate and define AF_ARP and PF_ARP. These will be used to
a) communicate the hardware independent part of ARP packets (rfc826) from/to
if_*subr.c in the ongoing effort to rewrite the ARP subsystem for non-Ethernet
networks
b) communicate the hardware independent part of RARP packets (rfc903) from/to
userland sockets to avoid putting link level header format knowledge into
rarpd, when it is enhanced to support non-Ethernet networks.

Allocate and define AF_ARP and PF_ARP. These will be used to
a) communicate the hardware independent part of ARP packets (rfc826) from/to
if_*subr.c in the ongoing effort to rewrite the ARP subsystem for non-Ethernet
networks
b) communicate the hardware independent part of RARP packets (rfc903) from/to
userland sockets to avoid putting link level header format knowledge into
rarpd, when it is enhanced to support non-Ethernet networks.

Remove #ifdefs
Thanks to cgd@NetBSD.ORG for pointing the following out to me:
listen (fd, SOMAXCONN); would break.
As programs wouldn't see the changes that might be specified in
the kernel config file.
As penance I am going to see if it would be possible to move this
into param.h and provide away of finding out what the kernel
value is. On busy network servers this value is useful to have as a tunable
kernel parameter.