I want to set up several Stratum 2 time servers on my local network. Virtual machines would certainly be a cheaper way to do this than buying three 1U servers. What limitations would doing so impose? That is, to what degree will accuracy be adversely impacted?

Additionally, my instinct is that these local time servers should reside on different physical machines in order to mitigate any hardware irregularities. Is this intuition correct?

Edit
I should say that by "virtual machines" I didn't specifically mean VMware. Rather, I meant the general concept of virtualized instances.

4 Answers
4

The simple fact is that clock accuracy within a VM is still really bad. This comes from a few spots, but the killer thing is that the time drift is not constant; the drift factor changes from moment to moment. NTP is a protocol that has clock compensation built within it, but it was designed with a static drift factor built in. For example, if a physical machine loses 12 seconds every 30 days, NTP can compensate for that and does so very well. But if that machine can lose anywhere from 4 to 70 seconds every 30 days, NTP isn't so good at tracking that level of change.

What makes it really hard for NTP to keep up in a VM environment is that the local clock it sees can change its drift factor over the course of a minute. Depending on the frequency it is checking its parent time sources it can cause major drift-factor changes and cause it to go out-of-sync far more often. Out-of-sync time cascades throughout your organization.

NTP for a local network is a relatively low impact protocol with a very small memory footprint, and can happily piggy-back on your other network infrastructure servers like your DNS and DHCP servers. Some routers can also provide NTP functionality, so you may want to look into that.

Ideally you want two separate servers in separate locations that each sync against a different set of higher stratum servers. It would also be a very good idea of both time-servers were configured to use the other server as a 'peer', which will minimize the impact to time-service should one of the upstream time-sources go awry; there will be a stratum change but at least it won't report out-of-sync. And finally, be nice to your upstream time providers and configure your servers to go a very long time between polls once time is well established. This is the 'maxpoll' parameter on the 'server' line, and is a power of two in seconds between sync attempts.

If you absolutely had to use VMs for this, I'd set up no less than three such NTP servers. Each of those needs to be on a different host, and if possible in a different data-center. As with what I just suggested, they need different time-sources and should peer with each other. Then configure all of your NTP clients to use all three as Parent sources. Make sure your maxpoll values are low enough to never go more than an hour and a half between sync packets off-network, and 30 minutes on-network. Chances are good at least one of the three will be in-sync at any given time. For clients that can only talk to one time-host, they'll just have to put up with the occasional out-of-sync event. Overall, time-quality in this scenario would not be as exact as it would be with physical servers.

If I had to ball-park, I'd say your consensus time in the pure-VM environment would probably be within, oh, 30 to 100ms of true. In a purely physical environment, your consensus time would probably be within 10ms once the time servers had been up long enough for time to settle.

I'm fairly certain that I want at least three, not just two local NTP servers. How would a client disambiguate between just two?
–
James A. RosenJan 26 '10 at 22:42

You need a minimum of four for some arcane reason. That said, we just have two internal servers which are synced to half a dozen external servers (and their local clocks as backups). Works well enough for us.
–
JamesJan 26 '10 at 23:29

James A Rosen - That's the joy of the peer-group configuration. So long as at least one member of the peer group has an outside connection and is in-sync, the entire peer-group is in-sync. Clients may degrade a stratum, but at least they don't go out-of-sync. Have three in the peer group? No problem.
–
sysadmin1138♦Jan 29 '10 at 16:58

Running NTP in a virtualised environment, you'll be luck to achieve 20ms accuracy (that's what we've done using VMware). The virtualised clock skew is bad, particularly in a virtualised environment with resource contention.

It depends on how accurate you need to be. If you only care about to the second (EG for web servers) you'll likely be fine, as long as you dont have resource contention. If you want millisecond accuracy (such as a busy database, log server, research project) then forget virtualised time servers.

NTP servers should always be on physical hosts. You should have at least 3 of them peering in a pool (so that one rogue server gets voted down by the pool); and if possible, get their time from GPS or other local tier-0 source rather than over the Internet.

unfortunately ntp and virtualisation does not go very well together. clients are ok in most cases, however ntp server (esp str2 and above) generally won't work reliably on virtual server.

i'm commenting from xen and xen enterprise perspective, but i believe vmware/kvm will be just the same.

re different servers, yes, you are right, ideally they should be in different environments as well, so that temp/humidity are not affecting the accuracy either, but at least i don't bother with that. also don't forget that whatever you do it still be not as accurate as proper atomic clock, so just accept this (very slight) deviation.