Its been a while since I was dabbling in DNS setups, so my knowledge might be a bit outdated.

We have a domain for which master server is administered by our team, and slave server is provided by the hosting company. At some point we have decided to switch to another hosting company. Also while at it, we decided that both master and slave servers would be administered by us. As a part of preparing for this process we have reduced refresh, retry and expire parameters of the domain to cause any changes to propagate a bit faster. Then we moved server to the new host, made changes to zone file, raised up serial and reloaded DNS server. The propagation process is somehow unbalanced. Most of the DNS servers out there have fresh data. However some do not, even when it is waaayy past our expiry setting and (forced by some DNS admins) default expiry setting of one week.

While investigating possible reasons for this we have found, that our server was misconfigured and was allowing queries only from slave. So during a pretty long period of time any outside query for our server would fail, but would work with slave. This was of course corrected immidiatelly. We have also found that, previous hosting company still has our slave zone up and responds to manual (nslookup/host) queries providing outdated view of an zone.

So my questions is: can an old slave dns server, somehow be responsible for flaky DNS propagation? If some DNS server hits expire/refresh time for some zone, does it get his data by walking DNS tree all the way from top (root servers)? Or does it simply check with the server that it recently got zone information from?

EDIT: Sorry for not mentioning it clearly, but naturally I changed DNS entries for the domain at the registar level. If I do non-recursive queries with nslookup/host and work your way down from TLD to my domain everything is consisten with what schould be. Yet still some DNS servers provide old zone data as non-authoritative answer.

4 Answers
4

You simply can't fix admins that refuse to listen to the TTL setting. So forget that problem, because it's not one you can solve.

Assuming that can't be solved, then if you've done all the other right steps then there is nothing you can do. You need to try very hard to synch the servers as best you can in a small window, but even with that there is a always small window where the master is ahead of the slaves.

The important thing is that if you're dropping a slave, change the NS records in the zone and change the parent zone's notion of the NS records too. Make sure you remember that your parent has a copy of the NS records and the associated "glue" (IP addresses) to associate with those NS records. If you haven't changed your parent's notion, then this could be the cause of your issues.

This is not accurate. The problem in situations such as this is far from people ignoring TTLs. TTLs are in fact being properly respected, and that is the problem, ironically.
–
JdeBPMay 16 '11 at 14:18

They're both problems. Reducing the TTL before changing things that have mission critical timing is always a good thing. However, there is a lot of cases documented of resolvers that are caching information much longer than the TTL says to cache it.
–
Wes HardakerMay 18 '11 at 19:34

As I said, they are not, however, the problem here. Don't mistake an answer being accepted for it being correct. The problem here, which is a well- and long- known one when one is changing content DNS servers, is, ironically, resulting from TTLs being properly respected.
–
JdeBPMay 20 '11 at 12:05

DNS doesn't propagate. The only way that the slave server at your previous host can affect name resolution for your domain is if it's still listed as a name server for your domain. Check the WHOIS information for your domain and check your domain registrar to see what name servers are listed for your domain. If the old slave server is listed then you need to get that fixed. Once your name servers are listed as the sole name servers then you should be good. As far as other DNS servers not honoring your TTL's, there's nothing you can do about that. As for the slave server hosting a zone for your domain, it doesn't matter as long as they're not authoratative for your domain.

I can host DNS zones for Microsoft and Google if I want to but it only matters to DNS clients that use my DNS servers for name resolution. If the slave server is used by any DNS clients and if the slave server still answers authoratively for your domain then it will affect name resolution of your domain for those clients, but all other DNS clients will be unaffected.

Here j.gtld-servers.net says, that name servers for att.com are ns[123].attdns.com, and ns3.attdns.com says, there actually is a 4th one. If you ask all name servers for com. about NS records for att.com. (dig NS att.com @a.gtld-servers.net. then dig NS att.com @b.gtld-servers.net. etc.) you'll have complete picture, what a properly configured client can receive when he asks for a name server for your domain.

Then ask each of the actual name servers for your domain (ns[1234].attdns.com. in the example) what do they think about NS records for their domain (att.com in the example) and you have completely complete picture.

It may be a bit tedious, but if all the answers you get are sane, then your configuration is demonstrably correct. You did your part to shorten the transition phase, and finally all will be well, but you cannot do anything about ISPs purposefully ignoring TTL for DNS entries to lower load on their servers.

As mentioned, the other answers are not in fact entirely accurate. So, answering your question:

Can an old slave dns server, somehow be responsible for flaky DNS propagation?

Yes, it can, for approximately the reason that you yourself identified and substituting "the effects that I am seeing" for "flaky propagation" (which isn't what's happening):

If some DNS server hits expire/refresh time for some zone, does it get his data by walking DNS tree all the way from top (root servers)? Or does it simply check with the server that it recently got zone information from?

Apart from the fact that people making DNS queries don't download whole zones, and the fact that the expire/refresh times have nothing to do with it, you are thinking along the correct lines. People who have already looked up DNS information for your domain and its subdomains will have been sent the delegation information for that domain. Until that information's TTL is reached, they'll keep talking directly to the content DNS servers that they know about; and if they keep talking to your old DNS hosting service, they'll keep being told the old delegation information with a fresh TTL each time.