Federico: Mike, how did you get the idea to include p0f features in
PF?

MF: One of my coworkers, Greg Taleck, added p0f features to
NFR's IDS to resolve traffic abiguities. And then a damn SMTP worm hit. That
annoyed me and I wanted to filter all Windows boxes from connecting to my mail
server for the duration of the worm. So I talked to Michal Zalewski who wrote
p0f v1 and he was cool with integrating p0f into PF but the guy who had been
maintaining p0f never responded to relicense the fingerprints. Michal then
started p0f v2 which was not encumbered with the maintainer's copyright; I got
it working in PF; and then blocked all Windows boxes from connecting to my mail
server for the duration of the worm. Hurray! Never underestimate the annoyed
developer.

Federico: What are you working on for 3.5?

MF: Working on brewing my own beer. Made a pretty good
boston style ale, needed a little more hopping though. A golden ale is next.
Theo has been calling me a nasty hobbittsesss lately too, so I've been working
on growing hair on my feet. Hopefully, I'll get back to TCP scrubbing and
normalization in time for 3.5.

HB: well, the focus this time is obviously
bgpd.

bgp, the Border Gateway Protocol, is what ISPs speak to each other to
announce reachability of their networks through certain paths. A bgp daemon
announces its own networks to its neighbors, and its neighbors announce their
networks and all networks including the paths to reach them they learned from
their respective neighbors. In the usual so-called full-mesh setup that results
in bgpd having a table of about 130 thousand networks (prefixes),
and multiple paths to reach each. Of those it picks the "best" path (the
algorithm for that decision is actually rather easy), and enters the resulting
route into the kernel routing table.

Now, that is a bit more complicated than described here, and it is quite
obvious that keeping these huge tables and working on them with reasonable
performance is not that easy.

There are a few more or less free bgp implementations, but they all have
major design flaws, and the resulting runtime problems. As I've been bitten by
those I was considering doing a bgpd for some time, but was a bit
scared by the projects size. When I was in Calgary in September I finally
talked to theo about it who tricked me into starting coding. Back in Germany I
finally did mid-November, and much to my surprise I had a fully working bgp
session engine, fully implementing the Finite State Machine described in RFC
1771 as core, withing 9 days, and had sessions established and hold up to other
bgp speakers. We found a few bugs later, but it is basically still what I had
then. I talked to a few people and showed code, and fortunately, Claudio Jeker joined. He did an incredible amount of work implementing what we call the RDE,
Route Decision Engine, that holds the tables of prefixes and paths. At the same
time I started working on the code to interface the kernel routing table, which
includes holding an internal view of it.

Well, nowadays we are feature complete for the basics.

We have no showstopper bugs we are aware of, heck, I am not aware of any bug
right now (tho', let me assure you, there are a few). We learn routes, sync the
one picked as best into the kernel routing table, can send them to our
neighbors, and can announce our own networks. We have a control utility,
bgpctl, too, which can be used to gather and show run-time data,
take single sessions up/down, reload configuration, etc. And we have something
that I have not seen anywhere before: we can couple and decouple the internal
view from the kernel routing table.

So you can start up decoupled, adjust your settings while evaluating the
internal view of the routing table, and then, after you are satisfied, you can
issue a bgpctl fib couple and the routes enter the kernel. In the
same vein a bgpctl fib decouple removes them again, leaving the
kernel routing table as it was before coupling. Oh, and opposed to the other
implementations, bgpd notices when you statically enter routes to
the kernel routing tables and doesn't mess with them. It even tracks interfaces
showing up and being removed at runtime like it is possible with PCMCIA and
USB-based ones, and cloneable devices like tun and
vlan. For most Ethernet devices it can even notice when you pull
the cable (or the link gets lost for other resons) and react accordingly.

bgpd is 11500 lines of code as of tonight, of which about 500
are manpages. And it is very fast...

CB: I'm working on many little enhancements in the way PF
deals with interfaces.

That includes better support for dynamic/cloneable interfaces, the ability to
lock states to a single or group of interfaces, better handling of interface
aliases and other related things. I believe there was 12 little points to the
commit message. :)

RM: I've been mainly working on the components necessary to
deploy OpenBSD in high availability and load balancing configurations,
including the Common Address Redundancy Protocol (CARP), which handles IP
address failover, and pfsync enhancements which synchronise state between two
or mosynchronizes. I also added source ip tracking, which keeps track of states
by source IP address, but this work was actually done before 3.4, at the
hackathon in Calgary.

CEA: As you may have noticed, I have moved away from pf to
privilege separation and bpf. Already worked on privsep for named in 3.5, and
now there is at least the DHCP tools waiting for privilege separation. Henning
is already working on dhclient. If I can find some time, I want to
design some kind of framework for developing userland proxies.

Atomic commits of ruleset changes (reduce the chance of ending
up in an inconsistent state).

CB: This change ensures that when you type pfctl -f
pf.conf, then the entire content of pf.conf will be loaded
into PF kernel memory, or nothing at all if there are errors. Before that
change, it was possible in rare circumstances that only half of the
pf.conf ruleset would be loaded inside the kernel.

So for example, you could have the new RDR rules loaded, but not FILTER
rules.

Or, if your main pf.conf contains load anchor
entries, and some of the anchor files had a syntax error, then only part of the
anchors would be loaded.

This change does not bring any new functionality to PF, but it makes
pfctl -f more reliable in case of errors (syntax errors,
pfctl gets kill(1)ed, not enough memory is available,
...).

A 30% reduction in the size of state table entries.

RM: Basically I found a little trick of storing the tree
indexes inside the state structure, rather than having separate tree nodes that
point to the state structure. It's actually a pretty obvious thing in
retrospect, but nobody had really considered it. For the end user, all this
means is that they can have more states in the same ammount of memory.

Source-tracking (limit the number of clients and states per
client).

RM: Source IP tracking allows you to create an entry for
the source of connections and link states to it. This is useful for a number of
reasons: first, it allows you to use a round-robin address allocation mechanism
for translation or redirection, but ensure that the connections for a
particular client are always mapped the same way. This functionality is
important for some applications or protocols which rely on source address for
identification, or in the case of server balancing, where the application keeps
state across multiple connections, so the client must always connect to the
same server.

Second, it allows you to set limits on how many distinct sources can connect
to a service, and how many simultaneous connections each source can have. This
can be used to connection limit internal clients, or mitigate certain kinds of
denial-of-service attacks.

Sticky-address (the flexibility of round-robin with the benefits
of source-hash).

RM: When sticky-address is enabled, we create
source-tracking entries for each source ip address, and states are associated
with it. In this entry, we store the translation address that was selected by
round robin, and the subsequent connection from this source, which hits the nat
or rdr rule, will get this translation address rather than the next
round-robinaddress. The source-tracking entries last at least as long as there
are states associated with it, plus an additional configurable lifetime.

So if you're redirecting traffic to a pool of web servers, and the first time
a client connects, they get redirected to server 4, all connections afterward
from that client will hit server 4, so long as the source-tracking entry
exists.

This is very similar in behaviour to source-hash, except it removes the
restriction that the pool must be specified as a CIDR netblock; it can be a
list of addresses, including network blocks, or more powerfully, it can be a
table.

Invert the socket match order when redirecting to localhost
(prevents the potential security problem of mis-identifying remote connections
as local).

DH: It is common practice to redirect incoming TCP
connections to local daemons using pf, for instance to force HTTP connections
through a proxy, or to redirect spam to a tarpit.

Often, the daemon was bound to 127.0.0.1 and the redirection used 127.0.0.1
as replacement destination. While using the loopback address is convenient in
such cases (it's always present), that can have security implications.

Many daemons assume that the loopback interface is isolated from the real
network, i.e., that connections to sockets bound to 127.0.0.1 are local, and
may grant some privileges based on this assumption.

pf redirecting foreign connections to the loopback address is violating that
assumption, now suddenly foreign peers might be able to connect to daemons
listening on loopback sockets.

To deal with this potential risk, the network code has been changed so that
foreign connections to loopback addresses are first matched against listeners
on unbound sockets (listening on any address). Only if no such socket is found,
the connection is matched against a specific loopback listener.

So, if you're running a daemon listening on both 127.0.0.1 and
ANY, and use pf to redirect external connections to 127.0.0.1,
these connections will now connect to the ANY socket, instead of
the 127.0.0.1 one, where the daemon might wrongly assume a local
connection.

This problem only occurs with daemons that follow this pattern (listen on
127.0.0.1 in addition to other addresses, treat 127.0.0.1 as privileged local
connections), many daemons are don't.

1) PF should do the right thing when unplugging/replugging or
cloning/destroying NICs.

2) Rules can be loaded in the kernel for not-yet-existing devices (USB,
PCMCIA, Cardbus). For example, it is valid to write: "pass in on kue0" before
kue USB is plugged in.

3) It is possible to write rules that apply to group of interfaces
(drivers), like "pass in on ppp all".

4) There is a new ":peer" modifier that completes the ":broadcast" and
":network" modifiers.

5) There is a new ":0" modifier that will filter out interface aliases. Can
also be applied to DNS names to restore original PF behaviour.

pass in from www.openbsd.org:0 will only select the first IP
returned by resolver, while pass in from www.openbsd.org will
select all IPs. Similarily, pass in from fxp0:0 or pass in
from (fxp0:0) will not take into account address aliases on
fxp0.

6) The dynamic interface syntax (foo) has been vastly improved, and now
support multiple addresses, v4 and v6 addresses, and all userland modifiers,
like "pass in from (fxp0:network)".

Specifying pass [...] from (ifspec) is now equivalent in all
cases to pass [...] from ifspec, except that the
ifspec -> IP address resolution is done in the kernel, i.e.,
will adapt automatically to interface address changes (dhcp, hot plug removal,
whatever).

7) Scrub rules now support the !if syntax.

scrub in on !fxp0 now works.

8) States can be bound to the specific interface that created them or to
a group of interfaces for example:

pass all keep state (if-bound)
pass all keep state (group-bound)
pass all keep state (floating)

9) The default value when only keep state is given
can be selected by using the "set state-policy" statement.

if you put set state-policy if-bound then all rules declared
with keep state like pass out on fxp0 keep state will
be if-bound.

Previously, you had a fixed number of vlan interfaces in your kernel config.
If you needed more, you needed a new kernel and a reboot. Now, you don't have
any vlan interface by default — but the kernel has a "template". You
create the interfaces as needed on the fly. So, when you configure you first
vlan, you could do something along

Of course, you can collapse those into one, but it is even nicer: ifconfig
creates the interface for you when you configure it, without an explicit
create:

# ifconfig vlan0 vlan 100 vlandev fxp0 192.168.0.1 up

is sufficient. When you don't need the interface any more, you just destroy
it, and it is gone:

# ifconfig vlan0 destroy

Federico: The 3.5 presentation page says "authpf(8) now tags
traffic in pflog(4) so that users may be associated with traffic through a NAT
setup." How does it work?

DH: This is best explained with the example in the
authpf(8) man page. You can use the following in authpf.rules (the ruleset
which is loaded for each user who authenticates)

nat on $ext_if from $user_ip to any tag $user_ip -> $ext_addr
pass in quick on $int_if from $user_ip to any
pass out log quick on $ext_if tagged $user_ip keep state

Nothing special about the usage of tag/tagged here, except that we use a
macro that gets expanded to the user's IP address, for instance NATed
connections from 10.1.2.3 get tag 10.1.2.3.

The point of adding a unique per-user tag on the internal interface is so
that we can pass connections on the external interface, after translation, with
a unique rule as well. Without tags, connections from different source
addresses would all pass by the same rule on the external interface.

The reason for this construct is that tcpdump on pflog0 shows anchor and
ruleset name of the rule that created the matched state, and the ruleset name
conveniently contains the user name and pid of the authpf process
authenticating the user, for example

1) CARP (the Common Address Redundancy Protocol) carp(4)
allows multiple machines to share responsibility for a given IP address or
addresses. If the owner of the address fails, another member of the group will
take over for it.

Ryan, could you explain the new Common Address Redundancy Protocol
(CARP)?

RM: The Common Address Redundancy Protocol allows multiple
hosts to transfer an IP address amongst each other, ensuring that this address
is always available. CARP is much like VRRP, although it improves on it in many
ways: it supports IPv6 addresses, provides strong authentication via a SHA1
HMAC, and supports a limited degree of load balancing via an "arp balancing"
feature.

CARP is the direct result of our frustration with the current IETF standards
process: Cisco maintains that they hold a patent which covers VRRP and none of
the right people at the IETF are willing to stand up and tell them their patent
is irrelevant. It's a specific case of the general problem of vendors
involving themselves in the standards process, then producing patents after the
standard is finalised. The same sort of thing is happening with the various
IPSec standards. We'd like very much for the IETF to put an end to this, and
use a non-RAND intellectual property policy, much as the w3c has done. An open
standard is not really an open standard if you have to enter into licensing
agreements to use it.

2) Additions to the pfsync(4) interface allow it to
synchronise state table entries between or more firewalls which are operating
in parallel, allowing stateful connections to cross any of the firewalls
regardless of where the state was initially created.

Federico: Ryan, how would state table synchronization work?

RM: The pfsync protocol works by sending out state
creations, updates, and deletions via multicast on a specified interface. Other
firewalls listen for such messages, and import the changes into their state
table. There is some additional complexity, of course: we implement some
methods for minimizing pfsync traffic, and minimizing the mechanism for recovering
from missed messages.

The net benefit of all this is that you can have two firewalls running in
parallel and have one firewall backup for the other. In many situations this
will be combined with CARP.