Harald Welte's netfilter blog

For some reason, the amount of inquiries about companies who want to put ads
on netfilter.org has significantly increased. Since the content of that
site has not really changed much in the last (at least) four years, this
sudden interest is somewhat surprising to me.

However, we are absolutely not interested in advertisements. I personally
hate any form of advertisement, whether in print media, radio, TV, WWW or on
billboards. In fact, advertisements are the reason for me to not watch any
privately owned TV or radio stations for at least eight years.

So to all the advertising companies out there: Only over my dead body will
there be any kind of banner ads on any of the websites of the projects in which
I have anything to say.

Since I've been doing no netfilter/iptables related work recently, I've
announced that the three day training is going to be the last one, at least for the time being.

Though stressful as usual (have you ever talked/presented straight 8 hours on
three consecutive days?) it was a quite joyful experience. Apart from the
netfilter/iptables workshop earlier this year, the only contact with my former
much-beloved project in 2007.

However, the training made me realize how outdated all the existing
documentation (and even my own training material) is. Basically everything was
written in the early 2.4.x days - and much has changed ever since.

There's all the nf_conntrack / nf_nat related changes, as well as the x_tables transition, which can cause many subtle errors due to old scripts expecting different kernel module names, etc.

None of the HOWTO's or similar documents talk about the conntrack userspace program yet, there's no documentation (and no release) for ulogd2, etc.

So I'll really try to sit down and find some time to improve some of those areas. It yet remains to be seen if I can actually make it. But I feel there's a real gap to be filled...

I've returned to Germany in order to attend the 5th netfilter development
workshop in Karlsruhe. It's sponsored by Astaro, whose continuing support
of netfilter/iptables is really outstanding. Even after I took my "leave" to
work on OpenMoko, they continue their
funding by paying for Patricks maintenance of the netfilter/iptables codebase,
and things like hosting the netfilter workshop.

It's really great to meet with the old colleagues with whom I've co-worked for
a number of years on netfilter/iptables. I really miss those days, basically
spending most of my day working together and communicating with cool people
hacking on similar problems. Quite a bit different from what I'm doing right now.

So while I'm here, I'm actually trying to spend most of my time related to
netfilter/iptables, which is really refreshing.

I've been gone for long enough. Even though neither my RFID projects nor
OpenMoko are anywhere close to be finished, I'm determined to get back into
netfilter work again.

Started to catch up with mailing lists. There has been amazing progress, most notably
the implementation of NAT for nf_conntrack, which finally should get us rid of the old
ip_conntrack code in one of the upcoming kernel releases. No more support of
two versions in parallel. And the ability to do IPv4 NAT and IPv6 connection tracking
on the same machine. Isn't that all that we wanted? Not quite...

So for now, I'm participating in the discussions again, and I'm now also working on
getting IPv6 interpreter plug-ins into ulogd2. The nfnetlink_log mechanism can happily
send IPv6 packets to user space, it's just that ulogd2 doesn't yet know what to
do with them. That needs to be changed.

Only two months after the involuntary absence of bugzilla.netfilter.org (due
to database corruption while doing a gentoo mysql update), I have finally
found some time (and a way) to fix the problem. Therefore, as of today,
bugzilla.netfilter.org is now
up and running again.

This was possible due to the fact that the bugzilla tables were still present
in myISAM format. The mysql tables of patchwork.netfilter.org were not
that lucky. They were stored in exactly that InnoDB file that got corrupted.
However, the loss of archived (and lots of unmaintained) information on patches
that had been submitted on netfilter-devel is not really all that important
anyway.

However, let this be a lesson: Do daily dumps of all mysql tables in a cronjob before doing backups ;)

It's been terrible to be away from netfilter development for about two months
now. This really has to change, I have to cut down on other stuff if I don't
want to loose track completely.

Anyway, I finally did what I wanted to do at least for many weeks: To push new
releases of libnfnetlink, libnetfilter_log, libnetfilter_queue,
libnetfilter_conntrack and conntrack. The files are available from their usual location.
Haven't been in the mood to write changelogs yet, so if you're really
interested in them, you'll have to wait for a bit more.

The main architectural change is that the internal api between libnfnetlink
and libnetfilter_* has changed, e.g. caller-allocated structures are now
callee-allocated. Apart from that, a very important bugfix was made in libnfnetlink,
one that actually affects future-compatibility of the kernel/userspace interface.

For anything else, it's mainly a maintenance release.

libnetfilter_queue doesn't yet contain the bits required for the 'upcoming'
libnetfilter_cthelper (userspace helpers), because I felt pushing that code
without having the rest of the infrastructure plus some test cases running
isn't really worth it.

So please include in your prayers that there are not too many gpl violations
during the next couple of weeks, that I finally get hold of that stupid PPTP problem
that is bugging me for many weeks. If that happens, I think I'll be back to
netfilter stuff early next week after returning from the Barcelona GPLv3 event.

Not sure whether I mentioned it already: I'm actually skipping OLS (and kernel
summit) this year in order to gain some time. Meeting folks and attending talks
is a lot of fun, but it also (including the travel overhead, jetlag, drinking, etc.)
eats a lot of time. So I'll actually take my long-announced pkttables
holidays when the rest of the Linux kernel developers are in Ottawa. For those
not familiar with the term: The idea is to 'go on holidays' (i.e. abandon anything
else like reading emails, etc) and stay focused working on netfilter stuff for
at least one week in order to finally see the ideas so far known as pkttables to finally
materialize in one way or the other.

Meanwhile, I have to extend my deepest thanks to Patrick McHardy, and all the work he's
been putting into netfilter maintenance over the last year or so.

I've spent the whole Monday in the hosting center where netfilter.org,
gnumonks.org and most of my other projects are hosted. The main reasons for
this visit were:

do kernel updates on two boxes that are known to be difficult with new kernels

move all five machines to a new rack, the old one is too crowded (no space for new machines, too hot)

add yet another new box (parvati.gnumonks.org), which makes the number of machines now six

As usual, Murphy's law applied, so about everything that could go wrong went wrong.
And, confirming Murphy's law, the most important machine (vishnu.netfilter.org)
had the longest downtime, something close to 9 hours.

This was mainly due to
the last Gentoo update overriding my custom-modified yaboot boot script (for
using the serial port, this is a headless XServe cluster node) with the default
one, which wants to use the non-existent framebuffer.

That combined with the fact that KDUMP-capable kernels can't be booted from
OpenFirmware (why isn't this indicated in the menuconfig help???) and thus the new
default boot kernel couldn't be booted from yaboot.

That day I've tried about anything, from attaching a powerbook with bootable cd
in firewire target mode to booting yaboot via tftp (which fails to load
yaboot.conf via tftp *sigh*).

I don't know how people like securityfocus and heise.de and others claim
that the recently-discovered and fixed 'do_replace()' bug is remotely exploitable.

In fact, the bug (which was found and fixed by Solar Designer while working for
the OpenVZ project) can only happen in a codepath that can be executed by the
local root user. Not even a non-root user, neither any remote parties can hit
that bug and/or exploit anything.

Well, before I try to build some conspiracy theories about somebody manipulating the bug id number sequence generation of our bugzilla installation, I'd rather concentrate on the real work.

Dave Remien is an excellent bug reporter, so as a maintainer you can actually
not expect anything more than his detailed documentation
(yes, I know, certificate has expired, too lazy and busy to update it right
now, stay tuned). From an outside perspective, it appears like packets get
'stuck' in nfnetlink_queue. In reality, it seems like the kernel is doing
everything fine, just the library eats some packets from time to time, meaning
that they remain inside the kernel queue and increase it's length (and thus
leak memory) one at a time.

The real cause has yet to be discovered, I'm confident that there will be some news tomorrow.

Since we now have the x_tables kernel side code in the upcoming 2.6.16 series,
I'm working on getting iptables-1.4.x done to actually take advantage of
the new kernel's abilities.

The main reason why people are interested in this, is to get matches like
'state' and 'conntrack' working for IPv6. Even though 2.6.15 has nf_conntrack
and thus state tracking for IPv6, you cannot really use it from ip6tables yet.

The same goes for all native x_tables matches and targets. However, I think
we'll also release a new version of iptables-1.3.x just with 'state' and
'conntrack' support, since it gives a more stable foundation for production
users than a completely new 1.4.x branch with hundreds of kilobytes of patches.

Finally, both the kernel side (nfnetlink_helper) and the userspace side
(libnetfilter_cthelper) code for userspace conntrack helper support is
basically finished and compiles. I didn't yet dare to test it, and I'm rather
heading off to bed now. Testing will be done tomorrow.

So how is this supposed to work? Well, basically a new nfnetlink subsystem
exists, which can (on behalf of an userspace process) create dummy
"nf_conntrack_helper" structures inside the kernel. Such a dummy structure has
the usual properties (tuple, mask, timeout, etc.) but a dummy expectfn() which
only calls NF_QUEUE() to send the packet to userspace. Userspace can then look
at the packet, possibly modify it and re-inject it back into the kernel. Since
helpers are now processed at a different netfilter hookfn() than the rest of
the conntrack code, this actually works.

Now during the reception of such a packet in userspace, the process is likely
going to want to create a new expectations. Expectations can already be
created by means of libnetfilter_conntrack/nf_conntrack_netlink. However, in
order to create the expectation, a number of things are needed. Mainly the
tuple(s) of the master conntrack, but also other ancillary data such as ctinfo
are sometimes desired. As long as we don't do NAT, the process could derive
the tuple from the packet's IP[v6] header, and query nf_conntrack_netlink for
the remaining details. However, this is inefficient since we'd add another
kernel/userspace round-trip and the associated latency. So instead, I chose to
extend nfnetlink_queue a bit, and allow it to have a new queue_mode
(NFQ_MODE_PACKET_CT) in which there is a new nested attribute (NFQA_CT) which in
turn contains the tuple, id and ctinfo.

Userspace now has all informations to create a new expectation. But wait, what
do we do about expectfn()? We use the same magic as with helpfn(): Userspace
tells the kernel to which nfnetlink_queue queue_id packets hitting the
expectfn() should be sent. The 'minor' difficulty here is that expectfn() is
called from the middle of the conntrack code (init_conntrack() actually), and
when we get back from the queue (set_verdict or re-inject), then the netfilter
hook code would continue at the next hookfn, skipping most of the conntrack
code. But we can also return NF_REPEAT in order to call conntrack again.
Since our expectation is already confirmed, expectfn() will not be called and
it _SHOULD_ somehow just magically work, maybe with some tiny ugly hack here or
there.

The NFQA_CT way is still far from being optimal, since we copy the same
conntrack tuple for every packet of the control connection to userspace, no
matter that this information never changes, and no matter that we actually only
need it in those few cases where we want to raise an expectation. So the
mid-term plan is to make userspace keep a small copy of selected conntrack
state entries. This can be done by sending NEW and DELETE events for all
conntracks that have a helper assigned. We could create a new multicast group
specifically for this purpose, in order to keep the overhead and memory usage
low. Userspace keeps a hash table indexed by ct->id. Packets sent via
nfnetlink_queue will therefore only need a single 32bit ID attribute and not
the full tuple(s).

Apart from userspace helper code, I've been working on getting some x_tables /
nf_conntrack refcounting / dependency issues sorted out. Again another issue
where having a couple of dozens of inter-dependant netfilter modules seems to
become a major PITA. Sometimes I want to have back the simplicity of a
truly monolithic kernel.

Linus has merged x_tables, even though I introduced some "doesn't build without
IPv6 support" breakage that only somebody not into networking would ever detect
(hey, would you build a kernel without ipv6?) ;)

Anyway, will try to be more cautious about these issues, as nobody wants to end
up with a "your patches break the kernel tree" reputation.

If you use (and like) ulogd-1.x, you should definitely have a look at the 2.x
release. Apart from packet-based logging, ulogd-2.x now also support
flow-based logging. This means that you can just run this daemon (and a recent
2.6.14/2.6.15 kernel) to log per-connection meta data into text files, syslog,
mysql, postgresql, or sqlite3 databases. If you enabled per-connection packet/byte counters in your kernel config, you even get flow-based accounting.

Today I've posted the (hopefully) final version of x_tables, the in-kernel
generalization of {arp,ip,ip6}_tables to netfilter-devel.

After some nfsim hacking, I've been able to add x_tables support to nfsim and
have been successfully running the full nfsim testsuite. The testsuite found a single bug (which has been fixed) but otherwise all tests are passed.

Seems like we're going to push x_tables as well as the nf_conntrack port of
ctnetlink (nf_conntrack_netlink) for 2.6.16. Also, as I just noticed on kaber's
blog, his IPsec patches have made it in time, too. Userspace conntrack
helper support is definitely 2.6.17, though.

It seems like DaveM was away, there was some communication problem that lead to
the fact that none of the netfilter related fixes went into 2.6.14.y series (up
to 2.6.14.2) so far. I'm sorry for that, and all the fixes have been submitted
now.

To be more efficient in flooding DaveM with netfilter patches, I've now hacked
up a set of 'wrapper scripts' around my git tree. They enable me to
efficiently apply patches to my tree, generate sequential sets, and send them
off (actually not using a mail user agent).

This means, that for now my patch submissions are (like those of 99.9% of the other kernel hackers) not PGP/GPG signed. If I find some time, I'll add that feature to my script.

Anyway, I've sent off the first set of 10 netfilter patches and it worked like
a charm.

ulogd2 has now reached beta stage, and it now has almost all the plugins of
ulogd-1.x. Only the SQL database backends are missing. It also features a
ctnetlink input plugin for flow-based accounting with 2.6.14 kernels.

Next, I'll be working on documentation, testing and on some simple IPFIX output
plugin.

I've already received three different serious bug reports about problems with
netfilter/iptables in 2.6.14. This is frustrating, considering how long the
2.6.14 development cycle was. People should try new features of a new kernel
_before_ there is a release. Afterwards it's too late.

Given the specific situation that David Miller is on holidays, and we have
Arnaldo Carvalho de Melo maintaining the network stack meanwhile, Linus hasn't
accepted that huge patch in the first round, since he lacked explanation why such a monster was required.

I hope my comments will convince him that nf_conntrack really is the way to
go.... let's hope we'll have nf_conntrack mainline in one or two days.

I hope Yasuyuki (the main author behind nf_conntrack) will make a big party with his USAGI friends once that happens ;)

One of the best early design choices of iptables was its support for plugin
matches and plugin targets. Over the last five years, we have seen some 100 of
such user-developed special-purpose plugins.

One that I find particularly funny is ipt_SYSRQ, a target
module that allows you to issue the "magic sysreq" command via a network
packet. This way you can sync, unmount and reboot a otherwise stuck machine that still responds to interrupts.

Obviously quite dangerous, but the author includes a time stamp and a
cryptographic signature, so replay attacks can only occur in a very small
time frame.

It's definitely a cool hack, although I'm not sure whether I'd want to put this
on a production system or not.

Some years ago, the netfilter project only had the kernel side
netfilter/iptables code, and the userspace iptables program. Then we added
patch-o-matic(-ng), and more recently there were a number of more sub-projects
growing, like ipset, all the nfnetlink-related code, ctnetlink, etc.

Unfortunately the homepage design didn't really cope with the fact that there is
now a more hierarchical structure with many sub-projects.

It was always my hope that some "new webmaster" would take care of it. Unfortunately
we still don't have a webmaster, so I spent some time on it today. You can see
the results at www.netfilter.org.

Since DaveM is on holidays, Acme is now in charge of running the net-2.6.15 tree. I've already
submitted nf_conntrack, the ip_conntrack hash table resizing code from Rusty, as
well as "revisions" support for {arp,ip6}_tables.

I'm also basically finished with x_tables now. Everything has been merged with
a post-nf_conntrack tree, and all the conntrack related matches/targets have been ported
to x_tables.

Now I need to do some serious testing (including nfsim), before it can be
submitted, too.

After having terminated the traditional workshop part, we've today had day 1
of the workshop.netfilter.org
hacking sessions.

Despite the different topic, I spent the better part of the day with Michael
Bellion and Henrik Nordstrom working out the details of nf-hipac / nfnetlink
integration.

Apart from that, there's now a nf_conntrack header cleanup in my git tree, I've
ported ebt_[u]log to nf[netlink]_log, fixed some minor Kconfig issues, merged
some patches from Yasuyuki and Pablo, and pushed forward a round of fixes and
updates to DaveM.

I've managed to bring ulogd2 to a state where it finally does something. The
dynamic key resolval/linking of plugin stacks is working, and some basic
plugins (NFLOG input, IPV4 packet interpreter (BASE), LOGEMU output) are
working, too.

So the remaining work will mostly be in the plugin area. We're currently missing

Yet another of my projects that never received the amount of attention that was
required is ulogd2. If you
already know the ulogd-1.x series, then you know it as an efficient packet
filter policy violation logging daemon, with backends for files, syslog and
various SQL databases.

ulogd2 is much more than that. It's more abstract, and more universal. It's
no longer limited to receiving packets from the ULOG target, but is fully
modularized, with modules for ULOG, NFLOG (see linux-2.6.14), IPFIX, ctnetlink,
... Now you might wonder why there is something like IPFIX and ctnetlink?
That's because ulogd2 can also process (aggregate, export) per-flow
information.

The most difficult part of the implementation is the dynamic creation of
"plugin stacks", but I think I wrote about this earlier in my blog.

The good news is, that just before I went to bed, ulogd2 compiled for the first
time ;) This means I've waded through the tons of errors and warnings created
by all the changes introduced since it forked off ulogd-1.x about a year ago.

Now there are some bits of missing functionality here and there, and certainly
a large bunch of bugs. But if you are a software developer, you know it's much
easier (and rewarding) once the beast actually runs :)

Following-up the recent site-wide installation of blosxom on people.netfilter.org, I've now also
created our own planet.netfilter.org. At the
moment, only three netfilter related blogs/journals/diaries are aggregated
there, but with some luck (and your help, since you will have to tell me what
other netfilter related weblogs) it will grow :)

I first wrote about this in early 2005: Having developer blogs on people.netfilter.org. Unfortunately I
never finished that project so far. I'm not really a web guy at all, so doing
stuff related to (X)HTML and CSS always gives me the creeps. Why can't we just have a technically skilled web master volunteer for netfilter.org? *sigh*

I've continued work on ulogd2, the next generation netfilter userspace logging
daemon. In addition to packet-based logging, it supports flow-based logging.

It turns out my overly-flexible concept of plugin stacks ends up with quite
some implementation complexity. The problem can be viewed similar to a linker
problem (linking symbols of multiple objects), but in addition resolving
dynamically changing dependencies, with some 'symbols' being optional, and with
objects that you can ask "if I give you input symbol X, which output symbols
can you give me" ?

I really need to do resolve some tax issues before the netfilter workshop, so
I'm not sure whether I can finish it before.. especially since I've also started to merge years-old pkttables code into a recent kernel.

This triple-release is in anticipation of a 2.6.14 kernel release. The two
libs as well as the conntrack program are userspace counterparts to the "next
generation" subsystems inside the kernel netfilter part.

The release involved lots of painful learning-by-doing of autoconf/automake.
I'm not a fan of them at all, but I sill think it's less burden than trying to
invent everything on your own (like we did with the iptables package) and thus
forcing more burden onto the package maintainers of the distributions.

I'll probably release libnfnetlink_log and libnfnetlink_queue tomorrow... but I really don't have any time to work on netfilter at the moment, despite this TODO list :(.

Following-up some serious testing today, I've finally submitted the latest
version of the PPTP helper from the netfilter-2.6.14#pptp tree to the mainline
kernel.

With some luck, it will be included before 2.6.14 gets final. It should go in,
since it doesn't modify existing code but is merely an addition.

Also, please note that the "ip_conntrack_proto_gre.ko" and "ip_nat_proto_gre.ko"
modules are gone with that 3.x version of the PPTP helper. The respective
code has been integrated into ip_{conntrack,nat}_pptp.ko. My initial dream
of doing some generic (non-PPTP) GRE connection tracking has evaporated, and
thus the PPTP helper now really only handles the special case of pptp-GRE.

Some time ago, Jeremy Kerr wrote the patchwork program as a
means to track patches sent to mailing-lists (specifically netfilter-devel in our case).

I'm now using it more-or-less frequently and it has already uncovered a number
of patches that got lost otherwise. Therefore I consider it a very helpful tool. Hopefully reports of netfilter-devel being "a write-only mailing-list" will
cease now..