Two new DNS measurements, both designed to assess the impact of issues with one or more root name servers, have been added to every RIPE Atlas probe.

As you may be aware, DNSMON and RIPE Atlas built-in measurements are often used when analysing outages and other issues with important DNS services, particularly the root zone servers. These measurements do not provide a direct indication of the impact on actual end-user experience because they involve directly querying authoritative name servers, thereby ignoring the effects of having multiple authoritative name servers as well as caching local recursive resolvers.

To improve upon our DNS monitoring, we have initiated two built-in DNS measurements from all RIPE Atlas probes that use the probes’ default resolvers instead of authoritative name servers. These measurements should complement each other and existing RIPE Atlas DNS measurements to allow extra analysis of important DNS events.

The first measurement queries for random top-level domains with the intention of avoiding caches and therefore hitting at least one root name server. The second measurement queries for popular domain names, with the intention of hitting caches where appropriate and getting an idea of the true impact on users.

Random domains

The first measurement, which has ID 30001, deliberately bypasses DNS resolver caches to explicitly measure the availability of root servers. Probes query for an A record for a domain called <probe_id>.atlas.ripe.net<random_string>. The intention of this measurement is to make it possible to see the effect on the root name server infrastructure as a whole if a limited number of root server letters are affected by availability issues.

Popular domains

As a starting point for the second measurement, ID 30002, we will use a recent snapshot of the global top 50 visited sites according to the Alexa top sites list, which combines unique visitors and individual page views in its calculations. Probes will cycle through the domains on this list, querying for a different A record every 10 minutes.

This list could be manually adjusted in the future to reflect changes in end-user usage or to use a different data source. The list was proposed on the RIPE MAT WG and RIPE Atlas mailing lists to allow feedback from the community, and we deliberately exclude all domains with an “Adult” categorisation.

In the event of a major DNS outage, this measurement would provide a fairly good approximation of end user impact, being sensitive to both redundancy in the root domain name system and to the caches of DNS resolvers.