DNS logging and tools

Data Retention

Logging DNS replies can prove very useful. Kurt explains how to capture the data and keep it forever.

In last month's issue, I described how you can easily force all DNS queries through your own DNS servers and use RPZ (Response Policy Zones) to filter and redirect outgoing queries [1]. What about logging replies, though? Being able to block traffic based on the results returned is great, but without looking at the results, how do you know what to block?

Tcpdump, Wireshark, and Snort

Unfortunately, BIND does not have an internal logging method to log the replies of DNS queries. However, DNS is a well-understood protocol and is not encrypted (DNSSEC only signs data, it does not encrypt it), so the best method to handle logging of DNS replies is simply to run a network sniffer – such as tcpdump, Wireshark, or Snort – and to log everything on port 53 (both TCP and UDP).

I recommend logging the raw data, which has the advantage of providing data for correlation that you might not have stored otherwise (e.g., the TTL of packets could prove useful). Additionally, you can either process the network capture files in batches or add logging to a database. For example, with Wireshark's tshark utility, you can simply create a script to write the network data to a MySQL database and then pipe the network capture through the script for real-time logging to MySQL [2].

Please note that databases and NoSQL add complexity and may drop traffic instead of logging it if the data comes in too fast (whereas logging the raw capture to a disk is relatively simple and reliable).

Making Sure You Get It All

In the previous article [1], I talked about blocking queries that aren't sent through your specified server(s). This approach is simple but can potentially cause problems with devices hardwired to use certain DNS servers and ignore DHCP options. (Google's ChromeCast hardware is hardwired to use 8.8.8.8.) It can also result in malicious activity not being logged, for example, if a user brings in an infected laptop that attempts DNS lookups against a known evil DNS server, which simply fail.

Capturing everything is simple; you can set up a second server and redirect all network traffic to port 53 to it, then either log the queries, drop them, or answer them. Please note depending on your jurisdiction and organization, this may or may not be legal, or may require informed consent, so using the walled garden functionality within RPZ can be a good way to solve those issues. For example, you can notify users that they are trying to use a non-approved DNS server and allow them to apply for an exception.

Using a second server also gives you additional clues, because systems using other IPs may be misconfigured, malicious, or compromised (e.g., systems with hard-coded DNS, typos, etc.). You can also configure all outgoing queries to go to your existing DNS server; however, this means you can only apply a single set of RPZ policies easily. If you want to apply different sets of RPZ policies to the clients using the official server(s) and a different RPZ (e.g., a more restrictive one) to the clients trying to use other DNS servers, you will need a second server.

You can run both servers on the same system. Simply run one on port 53 for the queries directed at it, and run a second instance of BIND on a different port (e.g., 54). Listing 1 shows a firewall running BIND on port 53 (for queries sent to 10.1.2.1) and one on port 54 (for all other queries).

Note that the second/minute/hour/day/year might be off – for example, if a packet comes in 23:59:59 toward the end of the second and is forwarded and then processed/logged at 00:00:01 the next day. This is unlikely to happen with local REDIRECT rules but is likely to happen with DNAT to another server (because of network latency).

One way to deal with this is to log the data into a MySQL database, for example, using one table for the BIND logs and one for the outgoing connection logs. The IP and source port will be the same, and then you can SELECT for time using a range of a few seconds, for example,

Reconciling Logs

Make sure all the clocks on all the systems generating logged events are synchronized, definitely against each other, and ideally to the correct time as well. Please note that if you are logging at various levels (e.g., incoming packets in iptables, then the actual request processing in BIND, and the web proxy in Squid), the timestamps will most likely vary by tens to hundreds of milliseconds or more. If logging occurs after a request is processed, some requests will take seconds or even minutes to process.

Note that you will probably see a lot of suspicious-looking behavior that is in fact legitimate. For example, Google's Chrome web browser attempts to speed things up by looking up the DNS entries for all links on a page.

Thus, if you click a link the DNS has already resolved, you save tens to hundreds of milliseconds. Google Chrome will also attempt to look up domain names as items are typed into the address bar. For example, when you are typing a .com domain, Google will look up the record for whatever you are typing in the .co (Colombia) top-level domain.

Because many new top-level domains start with .co, .ca, and so on, you will see a lot of outgoing DNS requests for things users are not actually interested in seeing. Also, even if you are blocking replies from a certain server, BIND will still make the outgoing query. Once the reply is returned, then BIND will block it, so you will most likely still see outgoing connections to servers you have blocked.

Finally, the more logs you have, the more likely you are to have useful data. I'm slowly moving to a "log everything" strategy, including logging all HTTP queries through a transparent proxy, HTTPS queries where possible using an HTTPS proxy, all outgoing connections/packets (TCP, UDP, ICMP, etc.), all email using an email proxy, and so on. I'm not sure if the data will be useful, but at least I will have the data to look at to make that determination.