Linux and Unix at the University of Bristol

Main menu

Monthly Archives: February 2014

I currently look after 6 DNS resolvers, which are used to provide DNS two large sections of the university network.

While they’re all running BIND, there’s a real mix of versions so I want to consolidate them down, retire the oldest pair and generally tidy up.

The two oldest are in an IP address range that I can’t easily move onto any of the other servers, so any clients using them need to be pointed at the newer servers before I can retire them.

The problem is, I’m not 100% sure which clients are using them.

So, how do you find out what machines are using a given DNS server?
The temptation is to turn on querylogging, and then look at the log files after a week. There are two problems with that, the first is a technical issue (querylogging appears to block responses while waiting for the logging IO to happen, slows your DNS server down significantly and sends the load average through the roof) but the second (more important) problem is a privacy issue.

Querylogging logs every DNS request made by your clients, as well as the IP the client machine is using. Taken together, those two pieces of information are enough to identify what a given user is looking at on the internet. That’s a pretty serious breach of privacy, needs signoff from on-high and is generally a world of paperwork and hastle which is best avoided.

So we need an approach which is lightweight, and only identifies the machines which are talking to us and not what they’re asking for. Gathering the minimum of data required to allow us to make the decision we’re interested in.

On Linux, that’s pretty easy to arrange with a bit of tcpdump and some good old awk:

What does that do then?
It’s quite a long pipeline, so if we split it into chunks, it looks a bit like this:

/usr/bin/nohup

nohup combined with the ‘&’ at the end runs the whole pipeline in such a way that it doesn’t die when you log out. Which is useful if you want to set something running and then come back to it later.

/usr/sbin/tcpdump -ln port 53 2>&1

We then use tcpdump to dump all traffic on port 53 (DNS) The -n tells it to not resolve any IP addresses to dns names, and the -l changes the STDOUT buffering to be line buffered. This is helpful if you want to be able to see clients hitting your box in real time, for example to make sure the script is doing what you’re expecting it to.

“2>&1” merges STDERR and STDOUT into a single STDOUT stream.

awk '/\.domain:/ { print $3; system("") }'

Next we awk to look for lines which contain the string “.domain:” these are the ones which are headed towards our DNS server, rather than outbound requests made by processes running on the server. “print $3” prints the third field on the line (which is the source IP address/port of the request)

system(“”) is a useful bodge which makes the output from awk line buffered, in the same way as we used -l for tcpdump.

awk -F'.' '{ print $1"."$2"."$3"."$4; system("") }'

A bit more awk to strip off the port number, leaving us with just the IPv4 address. The observant amongst you will notice that this does nothing for IPv6 addresses. I should probably come up with a better solution, but this is sufficient for the boxes I’m currently interested in.

> /tmp/dns_clients.log &

redirects all the output into the /tmp/dns_clients.log file.

Handling the log file
So set that running on a DNS server, and client IP addresses will be logged to /tmp/dns_clients.log without logging what they’re looking at or slowing down the server too much. However, it logs one line for every request so it can get large quite quickly on a busy server.

I’ve been rotating that logfile out daily and running it through “sort -u” to get a list of unique clients. Once I’ve tracked them all down and pointed them at a more appropriate DNS server I’ll be able to retire the old ones.

Win!

Solaris Bonus Round
If you need to do this on solaris, tcpdump isn’t generally available (unless you’re willing to build it yourself) but /usr/sbin/snoop has roughly equivalent functionality. The output format is a little different, although it’s a bit easier to handle.

That does broadly the right thing. It includes traffic originated by the server as well as traffic hitting it from clients (so you could also pipe it through “grep -v $MY_IP” to weed that out if you wanted)