My Goal: Let nscd maintain a fairly large DNS cache in excess memory since I have it available.

Description:

I have a webserver that has a broadly dispersed but high-repeat user base. It has plenty of memory so I thought I'd improve response time by caching lookups but according to nscd -g I'm only at a 6% cache hit rate (meaning nscd is most likely introducing more latency saving to the cache or looking through the cache for an entry it will never find, than it's preventing by going out to the network):

Probably a large contributor to the 6% hit rate is the fact that apparently it's only cached 17 entries. Doing a strings /var/db/nscd/hosts shows that the host cache entries it has created are mostly for machines on our internal network. It's good to have these cached since the daily re-publish of the website is likely sped up but my goal is to speed up end user experience without making any real configuration changes.

Basically, I need help understanding how my host cache can be so small even though I've set the positive TTL's on the host cache to be incredibly high. I'm sure it's the small number of actual cached entries that is causing the hit rate to be so low.

I'm assuming since the hit rate is 6% but my positive TTL is fairly large, that means my current workload is performing DNS host lookups, but they're just not being save. I have no idea why these aren't being saved nor what to check next. What I had expected would be a fairly large DNS cache now.

Even if the hit rate stayed small (i.e: clients weren't repeating as often as I thought) I'd still expect those DNS lookups to be cached but looking at the "current number of cached values" that doesn't appear to be happening either.

4 Answers
4

What part of your webserver is even doing DNS lookups? Most webserver configurations explicitly disable reverse DNS lookup of each incoming user, for speed (because DNS is slow in general).

As Patrick notes, nscd is doing the right thing and respecting the positive TTL values. Yes, you could override it (unbound would let you do this easily, just modify server.cache-min-ttl, has warnings about increasing it beyond 1 hour for the same reasons). HOWEVER, your queries are probably mostly rDNS, which will tend to have longer TTLs in general.

Additionally, since your maximum number of cached values is so low, I'd like to note that you're hardly getting any traffic.

If you do care about where you users repeat from that often, I'd suggest logging it outside nscd, and not worrying about it anymore.

The DNS lookups are being done so that upper management can see DNS names in their reports (generated from the log files). I suppose I could set up BIND to do this, but my question is how to manage nscd since that's more generally useful especially for stuff like user ID's and groups.
–
BratchleyDec 9 '13 at 14:49

Also, I realize the maximum number of cached values is low, but I still think that with a hit rate of 6% when I set the TTL to 10 hours would result in a larger cache than 17 cached host values. If there's a way to get nscd to hold onto the records longer, that's preferable and would minimize the impact of the reverse DNS.
–
BratchleyDec 9 '13 at 14:52

@JoelDavis just using BIND/unbound isn't going to increase the hit rate directly. I do see a related problem for you if you're doing the rDNS lookup later from logfiles: there is no gaurentee that the rDNS points to the same entry now that it did when the event happened.
–
robbat2Dec 9 '13 at 18:03

Marking this as the answer since I think this is as close to an answer that the question makes possible.
–
BratchleyDec 10 '13 at 19:17

It may be a bit off-topic but instead of using nscd you can switch to sssd (which I consider its successor).

I'm using it on SUSE Linux Enterprise Server 11.3 (fully supported) and I'm glad that I did the switch. It has many more and finer grained configuration options than nscd and also has capabilities that go far beyond what nscd can achieve.

This is on RHEL5 but when we upgrade the server (probably to RHEL7) I can look into it. It's interesting for a lot of reasons. Can SSSD cache UID's in general or does it just do authentication and DNS caching?
–
BratchleyDec 10 '13 at 19:14

Checked in #sssd on FreeNode and one of the devs said that they still recommend using something else for DNS caching and mentioned the unbound and nscd options. They did say, though, that it is designed to cache user and group information.
–
BratchleyDec 10 '13 at 20:43

There is unscd as well out there; but runs into the same root problem
–
robbat2Dec 10 '13 at 20:46

Is there a way to override this behavior?
–
BratchleyDec 6 '13 at 16:09

Here is an ugly way: you could filter the incoming DNS answer packet with iptables, send it to a userspace program (NFQUEUE target) which will then counterfeit it to change the TTL.
–
TotorDec 6 '13 at 21:34

I would not recommend this even if it were possible. One scenario: When servers are brought down for maintenance, they are removed from DNS. The admins will then wait for the DNS records to expire before shutting the server itself down. By overriding the TTL you'll be sending traffic to a server that could be shut down.
–
PatrickDec 7 '13 at 0:56