QOS and IP Accounting with BGP under linux

At NSP we’ve go a fibre connection into the building, and a 10MBit feed from our ISP, and over that we’re allowed 10MBit of national and 3 Mbit PIR of international traffic. Note that this adds up to more than 10Mbit in total! This can cause annoying problems, like someone doing a lot of national or APE traffic at 10MBit, and closing out real international traffic. For a long time I’ve wanted to separate this out, but have not had the time to look into it

This week I finally organised a BGP from my ISP, and had a look at what my options were. I’d seen the Route-based QOS mini-HOWTO a while back, and it looked like it would work ok, but had a few problems. There’s no current way it to apply tc or iptables rules selectively based on a routing decision, or even on a route table. You can match on a route realm, however. The mini-HOWTO suggests copying your BGP routes into a separate table and into a realm at the same time, and then using tc and iptable’s realm matching code.

A quick aside: route realms are best described as a collection of routes. The decision as to which realm a route is placed is made by the local administrator, and each realm can contain routes from a mix of origins. Realms are used to allow administrators to perform bulk operations on large groups of routes in an easy manner. From the iproute command reference:

The main application of realms is the TC route classifier [7], where they are used to help assign packets to traffic classes, to account, police and schedule them according to this classification.

After a bit of digging, I found a link to a patch for quagga to provide route realms support. It’s even still maintained! After a bit of battling with autotools[1], and a bit of battling with linux capabilities[2], I had it up and running.

The route realms patch page covered off the BGP configuration I needed, and now I have a set of iptables counters for national, international and total traffic (for completeness). The only bit it doesn’t cover off is graphing, but we already have a set of perl scripts which pull information from interface totals or iptables FWMARK counters, so I modified that to pull from these counters as well, and set up RRD graphs. I was previously graphing interface totals out the external nic, and it’s interesting to note that the iptables “total” traffic, while adding up to the sum of national and international, does not correspond to the interface totals.

It’s worth pointing out that, as seen in iproute command reference, the rtacct tool will grab realm counts for you without needing iptables, so if you just want to something to graph things quickly, rtacct might do the job:

rtacct has a naive limit of 256 realms however, where as the actual implementation supports a 16 bit number, so if you have a large number of realms, or you autoclassify your inbound BGP into realms based on the AS number, you will have to use iptables only

I’m currently only accounting for traffic using this mechanism, but I can also do QOS on it – tc will match directly on realm tags, and any iptables based match systems you may have can be adapted to match on a realm as well.

[1] The realms patch touched configure.ac, which then required the autotools chain to rebuild everything, but it needed a very particular combination of autoconf and automake. Because it took me an hour or so to get this right, I’ll record it here:

autoheader and autoconf above are version 2.13. I have no idea why I had to run autoconf2.13 then autoconf2.50, but it seems that this actually worked.

[2] I initially tried building against quagga-0.98.6, because the quaggarealms patch site implied this was the “stable” verson, but it seems that quagga drops priviledges too soon. This works out fine if you have “capabilities” support in your kernel, which mine didn’t. They’ve changed this behaviour in 0.99.5, and incidentally this is the version in debian etch.