21 January, 2018

A string of DNS protocol bugs.

I went to turn on DNSSEC for cqx.ltd.uk today - the server that signed it broken right before my Christmas busy period so I disabled DNSSEC on that zone until I got round to fixing it.

I've encountered three different apparent protocol implementation bugs in the space of a few hours:

Andrews and Arnold's web based control panel accepts DS records as generated by BIND's dnssec-keygen tool but then throws a complicated looking error when talking to Nominet, the UK domain registry, to put those records where they need to be. As far as I can tell, this is because the BIND output has whitespace in the middle of a hex string, something RFC 4034 s5.3 seems to think is acceptable. Why is installing crypto keys always so hard?

For a while, Hetzner's recursive resolvers were unable to verify (and therefore refused to answer) results for my zone. I have a suspicion (but I don't have much to go on other than a hunch) that this was something to do with DS records and the actual zone having some kind of mismatch - although Google Public DNS at 8.8.8.8, and Verisign's DNSSEC checker both worked ok.

I discovered an implementation quirk in the Haskell dns library, which I use inside a debugging tool I'm slowly building. This is to do with the mechanism which DNS uses to compress replies: where a domain name would be repeated in a response, it can be replaced by a pointer to another occurence of that name in the reply. It looks like in this case that the dns library will only accept those pointers if they point to regions of the reply that have specifically already been parsed by the domain name parsing code, rather than pointers to arbitrary bytes in the reply.
This is frustratingly familiar to another bug I encountered (at Campus London) where their (not-so) transparent firewall was reordering DNS replies; giving a bug that only manifested when I was sitting in their cafe. (github issue #103)