Tuesday Aug 16, 2005

Pardon my channelling of Jeff Spicoli, but Mike Kupfer is working on delivering the with-kernel-crypto build of OpenSolaris. This means that OpenSolaris users will have a working implementation of ESP at their fingertips. It also means Bill and I can (time permitting... uggh) give some more details and explanations about what's going on under the covers.

Tuesday Jun 14, 2005

An OpenSolaris IPsec Hello

Hi! Those of you visiting here probably know I'm one of the IPsec guys
(actually, I'm the original IPsec guy) here in Solaris-land. Bill may also have some stuff to
say about the IPsec source in Solaris.

The kernel source for IPsec (AH, ESP, and the internal databases) lives in
usr/src/uts/common/inet/ip/,
because we're an integral part of our
IP implementation. I should warn you now that there's a mixture of STREAMS
boundaries and function calls between the different parts of the IPsec
subsystem. It used to be almost all STREAMS, because of broken US Export restrictions (across
all political party lines, BTW). We figured we could sell it as
exportable to the powersthat be more easily if we used a
"general-purpose interface" which allowed for easy module perforation for
moving our data around. As the restrictions loosened up, we were able to
streamline things somewhat. We hope to do even more now that OpenSolaris is
available. There are bits and pieces of the actual Solaris IPsec missing
from OpenSolaris (especially from ESP) that will show up on OpenSolaris soon
as well, now that we're officially open-source. (It's a bit of a chicken &
egg problem.)

This entry will be discussing the PF_KEY implementation in
Solaris. I assume you know something about how IPsec works, have read RFC
2367, and have a handle on TCP/IP protocol suite principles.

A Brief PF_KEY Synopsis

PF_KEY is analagous to the PF_ROUTE routing socket. See Keith Sklower's Radix-Tree
paper at his site for the introduction to routing sockets. Where the
routing socket manipulates IP forwarding entries (or routes), the PF_KEY
socket manipulates IPsec Security Associations (SAs). A user-space
application sends a message to the kernel telling it to ADD, DELETE, or
UPDATE SAs, and the kernel sends back a message indicating either success or
failure.

The paper makes
mention of a message that's little-used in most PF_ROUTE implementations --
RTM_RESOLVE. RTM_RESOLVE allows a user-space application to resolve an
address, e.g. a user-space ARP. This inspired PF_KEY's similar message,
SADB_ACQUIRE, which is used to tell a user-space key management (KM) daemon
that an outgoing IPsec SA is needed. RFC 2367 has the specification
for a PF_KEY socket.

Solaris Changes from RFC 2367

Most, if not all, existing PF_KEY implementations either alter or add to
the message types in RFC 2367. Most changes were made because:

RFC2367 does not mesh well in to some KM protocols (esp. IKEv1).

The UNIX errno space is not sufficient to describe some failures.

Some implementors thought PF_KEY would be a suitable place to put their
IPsec Security Policy Database (SPD) manipulations.

Solaris addresses the last bullet by introducing a separate PF_POLICY
socket for SPD manipulation. The other issues, however, were a problem for
us.

All of our changes to PF_KEY were a direct result of implementing The Internet Key Exchange
(IKE) as part of our work in Solaris 9. They are summarized below:

Extended ACQUIRE - Instead of sending up a message for every IPsec
SA that is not present, send up a list of what is needed to protect the
packet to the listening KM daemon. This allows a packet that requires AH and
ESP to express that protection in one ACQUIRE message.

Extended REGISTER - Goes hand-in-hand with Extended ACQUIRE. You
tell the kernel that you can handle the Extended ACQUIRE.

Inverse ACQUIRE - The closest to policy manipulation we come in
PF_KEY... it's a one-time consultation of the IPsec SPD, and you get as an
answer an Extended ACQUIRE, just like if a outbound packet was triggering
it. This is useful for IKE responders, and for diagnostic listeners on the PF_KEY
socket.

Diagnostic codes - EINVAL is a frequently occurring value for
sadb_msg_errno. Was it a weak DES key? Was it a botched sockaddr
structure? The reserved field in struct sadb_msg now contains
useful extra data when an EINVAL occurs.

typedefs - It's easier to type sadb_ext_t instead of
struct sadb_ext.

64-bit alignment - RFC 2367 claims all PF_KEY structures can be
aligned on 64-bit boundaries. In Solaris, we force it to happen. That's why
net/pfkeyv2.h
has a lot of unions in most of its structure definitions.

But Dan... weren't you an author of RFC 2367?!?

Yes I was. Hence the question: Dude, Where's My Spec?

I wasn't allowed (yes, I'm serious; and no, it had nothing to do with
any government interference) to work on IPsec or IKE when I first got to Sun,
but the RFC was work that was a continuation from my previous job. In
hindsight, I think we should've been paying more attention to the customers
(authors of KM daemons, of which I'd be one someday). I was wrapped up in
non-IPsec work at Sun when I wasn't working on what would become RFC 2367,
and I split my attention in a non-optimal fashion.

Enough yapping, let's see some code!

The first place to look is usr/src/uts/common/net/pfkeyv2.h,
which gets deposited into /usr/include/net/ on a running system.
You'll notice every structure that doesn't have a field of type uint64_t will
have a union in it. Here's the base PF_KEY message:

Notice that every extra field that is not in RFC 2367 uses the _X_ naming
convention. In the case above, we took the two uint32_ts and merged them
into a union with a uint64_t so that we can force 64-bit alignment on
sadb_msg_t. This makes PF_KEY message manipulations 64-bit happy.

If you look at an extension that has a 64-bit type in it already, you'll see
that there's no alignment-forcing union inserted into the definition:

ipsec_info.h
- The structures for M_CTL messages prepended to data that gets passed around
between IPsec STREAMS modules. The keysock consumer interface definitions
are important here. Note the keysock_in_t, especially
ks_in_extv[]. This vector allows easy access to all PF_KEY
extension headers. (As we remove STREAMS from IPsec, this file will
shrink. If all goes well, all that will remain (in some form) are IPSEC_IN
and IPSEC_OUT messages, and that's because you can't enforce policy without
some form of packet tagging.)

keysock.h
- The keysock driver implements the PF_KEY socket interface at its
most basic. A keysock_t represents an open PF_KEY socket, and a
keysock_consumer_t represents a consumer of PF_KEY messages (i.e. AH
and ESP).

All relevant PF_KEY internal-implementation source lives one level down from
the headers, in the ip/
directory. They are:

keysock.c
- STREAMS driver that implements the PF_KEY socket as /dev/keysock.
It is also a STREAMS module that sits atop AH and ESP listening for their messages.

sadb.c
- Where IPsec's Security Association Database (SADB) is mostly implemented.
You'll notice an ip_sadb.c
file, because we want the fast-path lookups to be in IP without going through
modstubs.

Most of this entry will be spent in keysock.c. Other portions of
the subsystem will either be visited by one of Team IPsec when we have
cycles, or if there are enough requests.

keysock either handles messages from a PF_KEY socket, or from the
SADB and its consumers. The heavy lifting for user-generated PF_KEY messages
is in keysock_parse().

The first thing keysock_parse()
does is perform some reality checks. First off, the actual data length
should match what's in the sadb_msg_len field. (Note the first of
may SADB_nnTOmm() macros, for converting units of 64-bits to units of 8-bits,
etc.) The first really interesting part of reality-checking is the
"extension vector", or as it's called in the IPsec code, the extv.
A PF_KEY message has a base header, followed by one or more extension
headers. Let me quote this section from RFC 2367:
There MUST be only one instance of a extension type in a message. (e.g.
Base, Key, Lifetime, Key is forbidden). An EINVAL will be returned if there
are duplicate extensions within a message.
The keysock code takes this to heart in keysock_get_ext().
The extv is a vector of sadb_ext_t pointers, where the
specific extension type (SADB_EXT_foo) can be found by merely indexing into
the vector by that value. Say you want the sadb_sa_t extension:

sadb_sa_t \*sa;
sa = (sadb_sa_t \*)extv[SADB_EXT_SA];

The above code snippet shows what you need to do. As we generate the
extv, if we see a collision, we return EINVAL (with an appropriate
diagnostic). We do not enforce extension ordering inbound or outbound. Once
keysock is done with first-pass reality-checks, the extv is sent around (as
part of the KEYSOCK_IN M_CTL that is prepended to the data) to all who need
it.

Most messages are shuttled off to keysock_passdown()
for sending off via STREAMS to either AH or ESP. Unusual inbound messages
for keysock are the SADB_REGISTER, SADB_FLUSH, SADB_DUMP,
SADB_ACQUIRE, and SADB_X_INVERSE_ACQUIRE ones.

SADB_REGISTER sets socket state, as well as informs consumers about the
register. If an SADB_REGISTER is sent for a specific SA type (e.g. ESP, AH),
then the message is treated like a common-case message, except that when its
reply arrive, keysock_t state is altered to indicate a registered
socket. If the sadb_msg_satype is set to 0, then the message is an
EXTENDED register, and an extended-register extension
(sadb_x_ereg_t) is required. keysock converts the
0-terminated list of one-byte values into a bit vector internally. Then it
sends the message to consumers like a normal REGISTER.

An inbound SADB_ACQUIRE can be used to signal other KM applications. (If
PF_KEY is used to keep keys in the kernel for user-space consumers.) The
more common case, however, is a negative ACQUIRE, which means a KM
negotiation failed and the internal ACQUIRE record (more on this in a bit)
needs to be cancelled.

SADB_FLUSH and SADB_DUMP messages need to lock down the keysock
module until their respective operations are finished. FLUSH doesn't take as
long, but DUMP needs to keep track of all consumer-originated replies until
the consumer indicates it is done.

SADB_X_PROMISC merely changes some keysock state. It never goes to a
consumer.

The SADB_X_INVERSE_ACQUIRE handling is a glimpse of things to come for
keysock. It does not use the keysock_passdown()
method of calling a consumer. It instead calls directly into IPsec (and if
we had other in-kernel consumers, it would directly call to those) and
returns a message to the user immediately.

The keysock_rput()
function handles all messages from consumers. The KEYSOCK_OUT portion of the
switch checks for FLUSH and DUMP messages, and releases the clamps on keysock
if the final message for a FLUSH or a DUMP has been received. Otherwise, the
keysock_passup()
does the work.

The more interesting parts of PF_KEY are handled inside the SADB code (sadb.c)
and in the consumers. Those will be the subject of one or more other
entries, because of all of the interaction with the IPsec SADB.
This entry was brought to you by the Technorati Tags OpenSolaris and Solaris.