Will if fail over, or just fail? Some DNS empirical testing

While out to dinner the other night we got to talking about name resolution in Windows, as one does while out enjoying a fine meal with friends… During the discussion we found that one of us had experienced some strange behavior with regards to the prioritization of DNS servers and failover. Specifically the issue was that the DNS resolver in Windows would not fail over to the next DNS server on the list if the server currently in use became unavailable. This sounded like a major bug and we were amazed that if this was indeed the case; how come we had not heard about it before? Out came the smartphones, the conversation stopped, the food went cold and the beer warm. After quite some time (at this point the waitress had started sending us long gazes, wondering if we were all stricken by some strange debilitating disease) we still had nothing to back up this case. So there was nothing for it: I went back to the hotel to do some testing. This is what I found…

Resolving names on Windows

It is important to understand that we are talking about the DNS name resolution behavior here when the machine is acting as a DNS client, not a DNS Server. Windows uses two components to resolve, register and cache DNS names. These are the DNS Client and the DNS Resolver. The DNS Resolver, aka the Windows resolver, is part of the TCP/IP protocol and cannot be disabled without disabling the protocol itself. The DNS Client is implemented as a service with the friendly name DNS Client, and a service name of Dnscache. Its description reads as follows:

The DNS Client service (dnscache) caches Domain Name System (DNS) names and registers the full computer name for this computer. If the service is stopped, DNS names will continue to be resolved. However, the results of DNS name queries will not be cached and the computer’s name will not be registered. If the service is disabled, any services that explicitly depend on it will fail to start.

So the DNS Client is responsible for caching the results from the DNS Resolver, and register the computer’s FQDN in Dynamic DNS. If you stop the DNS Client you can still resolve names, but they will not be cached and a DNS Server will be queried each time a name needs to be resolved to an IP address. Needless to say, this may impact performance.

Name resolution works the same for both Windows clients; XP, Vista, 7, 8 etc., and servers; 2003, 2008, 2008 R2, 2012. Both servers and clients need to be able to resolve, cache and register network names and they all do it the same way.

The DNS Server list

Each network adapter in a Windows machine that is bound to either the TCP/IPv4 or TCP/IPv6 protocol has a prioritized list of zero or more DNS servers to which queries to resolve names can be sent. The adapters themselves are also prioritized. A name is resolved using this process:

The DNS Resolver queries the DNS servers in the following order:

The DNS Resolver sends the name query to the first DNS server on the preferred adapter’s list of DNS servers and waits one second for a response.

If the DNS Resolver does not receive a response from the first DNS server within one second, it sends the name query to the first DNS servers on all adapters that are still under consideration and waits two seconds for a response.

If the DNS Resolver does not receive a response from any DNS server within two seconds, the DNS Client service sends the query to all DNS servers on all adapters that are still under consideration and waits another two seconds for a response.

If the DNS Resolver still does not receive a response from any DNS server, it sends the name query to all DNS servers on all adapters that are still under consideration and waits four seconds for a response.

If the DNS Resolver does not receive a response from any DNS server, the DNS client sends the query to all DNS servers on all adapters that are still under consideration and waits eight seconds for a response.

If the DNS Resolver receives a positive response, it stops querying for the name, adds the response to the cache (via the DNS Client service) and returns the response to the client.

If the DNS Resolver has not received a response from any server within eight seconds, the DNS Resolver responds with a timeout.

The Case

The case we are exploring here is as follows:

On a Windows machine with one network adapter, which is bound to the TCP/IPv4 protocol, with two or more DNS servers specified in its TCP/IP properties; if the primary DNS server does not respond the machine will not fail over to the secondary DNS server and will be unable to resolve names.

I used this setup to test.

2 Windows Server 2012 DC/DNS servers named MAYA1 and MAYA2 with the addresses 192.168.131.10 and 192.168.131.11 respectively. Both have the same DNS zones and configuration, i.e.. they will both answer with the same information when queried.

A Windows 8 Professional client, named MAYA-CLIENT1, with a dynamically assigned IP address from MAYA1, acting as a DHCP server, and 2 dynamically configured DNS servers (MAYA1 and MAYA2) in that order.

All machines on the same subnet.

No Internet access

Resolving names from a zone the two DNS servers were authoritative for

Network Monitor on the Windows 8 client used to capture network traffic

All servers and client were VMs running on Hyper-V

2 DNS names for testing; nothere1 (1.1.1.1) and nothere2 (1.1.1.2)

Testing

The test I did was very simple. With Network Monitor running on the client I first pinged the name nothere1. In Network Monitor I verified that the response had come from MAYA1 (the configured primary DNS server). I emptied the DNS cache on the client and disconnected MAYA1 from the network. I then pinged nothere1 again, using Network Monitor to see which server answered.

It should come as no surprise that this worked exactly as expected. Under normal conditions Windows will use its configured primary DNS server, if that fails it will use the next configured server on the list after a short delay.

Here is the process in Network Monitor

Frame

Operation

15

Query from client to primary DNS server for the name nothere1

16

Answer from primary DNS server to client with the IP address of nothere1 (1.1.1.1)

<Primary DNS server MAYA1 disconnected from network and the DNS Cache is emptied on the client>

31

New query from client to primary DNS server for the name nothere1

34

Since no reply has been received from the primary DNS server for 1 second a new query is sent to the secondary DNS server MAYA2 (192.168.131.11) for the name nothere1

37

Answer from secondary DNS server to client with the IP address of nothere1 (1.1.1.1)

Notes

These tests were performed with Windows 8 as the resolving client. It is quite possible that an earlier version of Windows behaves differently, but nothing I have found suggests so. Testing of that will have to wait for the time being, though.

I did come across people who reported variations of the problem originally stated in this post, but none with the exact same result. Some people were able to log on to their domain and resolve names successfully when the primary DNS server was offline, but were baffled that nslookup did not work. I guess they didn’t know that nslookup always queries the primary DNS server.

Some documentation claims that it is the DHCP Client service (service name Dhcp) that registers the computer’s FQDN with DNS. I have tested this on the setup used in this article and have not been able to reproduce that behavior. This article; How to configure DNS dynamic updates in Windows Server 2003, claims that the DHCP Client service registers the name even for statically configured addresses. A funny thing here is that the service description for the DHCP Client service also claims that it registers addresses: Registers and updates IP addresses and DNS records for this computer. If this service is stopped, this computer will not receive dynamic IP addresses and DNS updates. If this service is disabled, any services that explicitly depend on it will fail to start. I also remember reading about this in the Windows 2000 days, so maybe this was the way it worked before?

Conclusion

So far it looks like Windows behaves exactly as you would expect! That is always nice.

Post navigation

2 thoughts on “Will if fail over, or just fail? Some DNS empirical testing”

In a Windows XP environment, domain logins were taking 5 to 10 MINUTES to log in. After reviewing the configuration I found that the Primary DNS was set to a Public DNS and the Secondary DNS record was configured as the local domain server. It was surprising it attached to the domain at all! The timing of logins gave the impression that Windows XP was not using the second DNS for quite a while or possibly using a cached domain login instead. Interesting experiment and thanks for posting it!