Top 10 Networking Features in Windows Server 2019: #10 Accurate Network Time

Share On: Twitter Share on: LinkedIn
This blog is part of a series for the Top 10 Networking Features in Windows Server 2019!
Look for the Try it out sections then give us some feedback in the comments!
Don't forget to tune in next week for the next feature in our Top 10 list!

Windows Server 2019 provides regulatory compliance with highly accurate time that is traceable and UTC-compliant, including support of leap seconds.In this article, we’ll talk about the technical advances we made between Windows Server 2016 and Windows Server 2019 including true UTC-compliant leap second support, a new time protocol called Precision Time Protocol, and end-to-end traceability. But before we talk about the technical details, let’s talk about why this matters to you.

In the past, the requirement for time accuracy on Windows was limited to domain-based scenarios that required all devices to be synchronized within 5 minutes. Now worldwide government regulations (for example, US: FINRA, EU: ESMA/MiFIDII) are demanding much higher accuracy time – as stringent as 100µs (microseconds). Self-proclaimed accuracy is not enough. You must also be able to prove or “trace” your time back to an authoritative time source – More on this later. ESMA justifies the accuracy and traceability requirements in this way: “…It is also essential for conducting cross-venue monitoring of orders and detecting instances of market abuse and allows for a clearer comparison between the transaction and the market conditions prevailing at the time of their execution.”

As a result, we first brought 1 ms (millisecond) time accuracy to Windows Server 2016 meeting some of the regulatory requirements – This is supported in-market today. However, our work was not done, and so Windows Server 2019 makes improvements to comply with these regulations and allow Windows to be the preferred choice for workloads with time dependencies. Now, let’s talk a little bit about the features you’ll find in Windows Server 2019 and current Insider builds.

Important! While many of our efforts directly address concerns from regulated industries,
this technology applies to any industry, application, or cloud-service with a time dependency.

There’s a lot of content in this article (because we did a lot!) – here’s a quick summary of the information you’ll see in this article

Traceability (including system logging, performance counters, and our work with partners)

Leap Second Support

A leap-second is an occasional 1-second adjustment to UTC. Now you may be thinking, “why on earth would anybody need to adjust UTC?” As the earth’s rotation slows, UTC (an atomic timescale) diverges from mean solar time or astronomical time. Once UTC has diverged by at most .9 seconds, a Leap Second is inserted to keep UTC in-sync with mean solar time. Since the practice of inserting leap seconds began in 1972, a leap second has typically occurred every 18 months (for more information, please see the Leap Second FAQ).

In the US, the maximum end-to-end divergence from UTC(NIST) is 50ms – It’s even more strict in the EU. This requires that Windows Server 2019 be able to maintain accuracy during a Leap Second.

Note: It’s not enough to apply leap-seconds; it matter how you apply them.
Leap-second smearing has been condemned by the Time authorities at NIST and other national labs
around the world. As such, Microsoft will not include a smearing option in Windows Server 2019.
Keep reading to understand the difference between the Microsoft approach,
and the non-compliant practice of leap second smearing.

To most, this seems like a such a simple idea – just add 1 more tiny,little,insignificantsecond to the day. As IT Pros, we remember all those Y2K shenanigans that had us (rightfully) a little…well..worried…

So how does a leap second actually work? Normally, computers keep seconds from 0 through 59 for a total of 60 seconds. When a leap second occurs, an extra second is added to the last minute of the UTC day and the clock goes from 0 through 60 for a total of 61 seconds.

On the clock it looks like this (in my time zone, the last minute of the UTC day is actually 4:59 PM local time):

Without a Leap Second

With a Leap Second

16:59:58

16:59:58

16:59:59

16:59:59

17:00:00

16:59:60

17:00:01

17:00:00

17:00:02

17:00:01

Important: Some of the "gurus" out there (I’m looking at you Neil deGrasse Tyson) might
rightfully say “technically, there can be both positive or negative leap seconds. A positive
leap second adds one second and a negative leap second removes one second from the day.”
Rest assured Neil, while a negative leap second has never actually occurred, if it does, you
can still celebrate your leap seconds with very tiny bottles of champagne – We’ll support both 😊

As always, I celebrate my leap seconds with very tiny bottles of champagne.

The problem with Leap Second smearing

As noted above, we will not include a leap second smearing option. Leap Second smearing (where you carve the extra second up into smaller units and add them throughout the day) has “an error of order ±0.5 s with respect to the definition of UTC” (see below). As noted previously, this will not meet the accuracy requirements in these regulated industries and as outlined below, there is no standard method for applying smearing frequency adjustments which can lead to a disagreement in time stamps. As such, smearing does not meet customer regulatory requirements.

Some corporations, in an attempt to minimize the impact on their systems and eliminate the discontinuity, have implemented “smears”, that slow down their clocks for a period around the time of the leap second insertion.

This method has the advantage that the time stamps are monotonically increasing even in the vicinity of the leap second, but it has an error of order ±0.5 s with respect to the definition of UTC.

In addition, there is no standard method for applying this frequency adjustment, so that different implementations may disagree among themselves in addition to the time error with respect to UTC.

I’m sure there will be many implementation and application compatibility questions stemming from this article; please stay tuned for more detailed information. In the meantime, please note that regular day-to-day operations, you won’t need to change anything. Check the “Leap Seconds for the Dev” validation guide for examples and stay tuned for further guidance.

Accuracy Improvements

We’re also improving our inherent accuracy in the platform. First, why is it so hard to get the time right!? While the answer may not be immediately apparent, there are a lot of pieces working against time-sensitive systems, some of which I’ve listed below:

Here’s some of the work we did to address each of the challenges listed above:

Precision Time Protocol:

In Windows Server 2019, Windows will include a new time synchronization protocol called Precision Time Protocol (PTP). You may be asking yourself what’s wrong with NTP? It’s served us well for so many years!

Think back to the last thunderstorm you saw – Did you see lightning and hear thunder at the same time? Unless you’re very close to the storm, you’ll likely detect an audible delay after you’ve seen the lightning. How much of an audible delay are you experiencing? The delay is not based strictly on the speed of sound and your distance from the storm. It’s also affected by buildings or other influences that introduce additional acoustic delay. If you want to know just how close you are to the storm, you’d have to consider all the influences.

Likewise, there is delay (latency) introduced in the timing packets being passed from the time server across the network. If that delay is not accounted for, or if it is not symmetric (equal in both directions – to and from the client), then it becomes increasingly difficult for the client to properly apply the time stamp sent from the time server.

Network Time Protocol (NTP) has long been the primary time synchronization method for Windows but unfortunately, NTP does not have a solution to this problem; NTP assumes that the round-trip delay introduced by the network is symmetric.

Enter Precision Time Protocol (IEEE 1588v2). PTP enables network devices to add the latency introduced by each network device into the timing measurements thereby providing a far more accurate time sample to the endpoint (Windows Server 2019 or Windows 10, host or virtual machine).

Precision Time Protocol is not for everyone; due to the network configuration requirements, NTP will continue as the default protocol. However, for customers with the highest of accuracy requirements, you can drive towards even higher accuracy systems using our inbox PTP Client in Windows Server 2019.

Ready to give it a shot!? Download the latest Insider build and Try it out!

Software Timestamping:

When a timing packet is received over the network from a time server it must be processed by the OS’ networking stack prior to being consumed in the time service. Each component in the networking stack introduces a variable amount of latency that affects the accuracy of the timing measurement. This may sound insignificant, but this can add 30µs and in extreme scenarios closer to 200µs. You may remember from earlier in this article, some systems are targeting sub-100µs accuracy!

In addition, there may be many other services on the system all looking for data from the network. As a simple example, imagine a SQL Server with remote databases, or file servers with SAN/NAS storage that also require time accuracy. Packets for these workloads would all compete with the Windows Time service packets attempting to traverse the networking stack introducing additional delay.

To address this problem, we timestamp packets before and after the “Windows Networking Components” shown above. Now we can improve time accuracy by accounting for software delays!

Ready to give it a shot!? Download the latest Insider build and Try it out!

Clock Source Stability

Our final accuracy-based improvement actually affects the stability of the clock. It’s not enough to have an accurate clock occasionally; you must maintain that accuracy over long periods of time. It’s important to understand that a host system receives time “samples” from its time server, however it does not immediately apply these samples to the clock.

You can imagine that if a time sample is subject to variable network delay (among other unpredictable network challenges) and we immediately stepped the clock to match every time sample, the clock would likely be incorrect fairly often – it could even move backwards – a problem that would certainly make for a rainy day in the life of an IT Pro…

Instead we take multiple time samples, eliminate the outliers, and discipline the clock with the goal of bringing the system closer and closer to synchronization with the time server.

Disciplining the clock entails making adjustments to gradually converge on the correct time. Ultimately there is a natural limit to how small of a change we can make but the key is that smaller is better. Just how granular can we get? This is a complicated question but is based on the frequency of the QPC clock.

For a more in-depth look at this subject including QPC, please reference this article.

Previous versions of Windows allowed for a QPC granularity (the smallest change we could make to the system clock) of 6.4 µs/second (microseconds / second). In Windows Server 2019, the QPC granularity drops to 100 nanoseconds / second! This is akin to the difference in clarity between 480p and 4K television. There is much finer granularity in the 4K picture!

So why does all this matter? Well accuracy as measured over time is reflective of your stability; not only can we hit the bulls-eye, we can hit the bulls-eye over and over again. In a 3.5-day measurement, our partners at Sync-N-Scale measured, and NIST corroborated, Windows Server 2019 pre-release bits. In the picture below, notice the MIN Time Offset reports 41µs (microseconds) RMS diverged from UTC(NIST)!

Note: The AVG method involves comparing the system under test to UTC(NIST) every 10 seconds, then averaging these measurements for 10 minutes (60 readings). UTC(NIST) is available with 0.0001 ms resolution. The difference between the two 10-minute averages is the difference between the time broadcast by the server and UTC(NIST).

The MIN method involves comparing each NTP server to UTC(NIST) every 10 seconds for a 10 minute interval (60 measurements). However, only one of the 60 measurements is saved, the one with the shortest round trip delay. This method is based on the assumption that NTP measurements with the shortest round trip delays provide the best estimate of the true time difference.

This leads me to our last topic, Traceability.

Traceability

Self-proclaimed accuracy is not enough – you must be able to prove, or trace, your accuracy to a known reference time source. In the US, this would be UTC(NIST). Traceability is a multi-faceted aspect of the regulations. FINRA for example, states:

Members must document and maintain their clock synchronization procedures. Among other requirements, members must keep a log of the times when they synchronize their clocks and the results of the synchronization process.

System Logging

The first step in meeting these requirements is auditing changes and synchronization of the local system. To do this, Windows Server 2019 will include additional logging capabilities that can be used to audit the actions taken by the Windows Time service. We’ve documented the full list of events here. These logs can be used to answer the questions above, such as:

What is the chosen time server and synchronization frequency

When was the last synchronization and results of that synchronization

What actions were taken after the synchronization (did we discipline the clock?)

These logs are contained in a standard event log channel called Time-Service (more details in the link provided) and can be queried and forwarded by your SIEM of choice.

Performance Counters

We also have performance counters that allow you to observe and troubleshoot a number of critical time-related areas. In the picture below, you can see two of the included counters, the Computed Time Offset (in microseconds) and the NTP Roundtrip Delay (also in microseconds).

The Computed Time offset is the absolute time offset between the system clock and the chosen time source, as computed by W32Time Service – This number should be as small as possible indicating how close your clock is synchronized with the reference clock. The NTP Roundtrip Delay is the time elapsed on the NTP client between transmitting a request to the NTP server and receiving a valid response from the server – The higher this number, the harder it will be to maintain an accurate clock. There are other counters and we encourage you to explore and provide some feedback!

SCOM Management Pack

If your monitoring system includes SCOM, you could also leverage a SCOM management pack that allows you to monitor and alert when a specified NTP Offset threshold is exceeded for a particular node.

Ready to give it a shot!? Download the latest Insider build and Try it out!

Completing the Unbroken Chain

Dr. Judah Levine of NIST defines traceability as requiring an unbroken chain of measurements. While Windows can provide information about its local system, traceability requires timing information from the entire chain of time sources as well – This is more than what Windows alone can provide. Windows Server 2019 can participate in a fully traceable environment through our partners like Sync-N-Scale and Spectracom, . Shown here is the partner solution from Spectracom:

Summary

Previous time accuracy requirements were lax by today’s standards. Now regulated industries have much more stringent accuracy requirements but accuracy alone is not enough – Your systems must also be traceable.

Windows Server 2019 meets the current accuracy and regulatory requirements required for time-sensitive workloads through a variety of improvements including compliant and accurate time during a leap second, a new time synchronization method in Precision Time Protocol, inherent platform improvements for stability, and lastly (but equally important), system-wide and end-to-end traceability. You can use Windows Server 2019 for time-sensitive workloads, whether you’re in a regulated industry, application, or cloud service.

I’m sure there will be additional questions about some of these features as we near Windows Server 2019 launch at Ignite; please stay tuned as we’ll update our public documentation and provide additional blogs on this site as necessary. Please give our validation guides (shown in the Try it Outlinks above!) a shot! And most importantly, let us know what you think in the comments!

Tags

Join the conversation

I just read the articles on time accuracy improvments in WS 2016 a couple of months ago and was very impressed, this is even cooler. Will read up on the new time protocol and looking forward to the rest of the series

Hi Simon – VMs are included. In fact, the 41us (microsecond) accuracy screenshot as measured by Sync-N-Scale, and NIST corroborated through the NISTDC is in fact a WS2019 virtual machine sitting on top of a WS2019 Hyper-V host.

I read the dev guide – good to see I was on the right track with my thinking, that applications would have to opt-in.

Curious to know about
a) SQL Server, Exchange, etc – will they support this
b) Downstream apps for the two above – will database drivers be getting updates, need some sort of connection-string “opt in”? Will Outlook opt-in to get this extra second from Exchange
c) The .NET issue mentioned in the dev docs was interesting. Good to see it’ll be addressed, but it made me think of UNIX-style epoch timings. If I have “number of seconds since 1970” in a field, I’d now need to be aware of whether leap seconds are included or ignored in my conversion to a regular date+time value.
I think it’s for the reasons above why smearing is such an attractive option for regular people.
I appreciate why it’s not an option for you at the OS level since you’re aiming to support high-frequency trading and other scenarios. However, for 99% of IT pros and developers, all of the considerations above will mean we’ll just ignore leap seconds altogether since it’s much less ambiguous and hasn’t caused any grief so far (unlike 2->4 digit years for example)

Thanks for the great post with details, etc and the bit of humour thrown in here and there (I’ll try to run really fast East to bring on that leap second!)

Hi Ian – Thanks for reading and you have some great questions. Suffice to say that we’ll follow-up with a detailed blog (more heavily developer focused as well).

Yes, many IT Pros may not need to worry about the Leap Second, and for them there will be no changes required. However I’m curious why Leap Second smearing would matter to those IT Pros at that point? If they could accept smearing as an option, which has a +/- .5 second error, they may be just as well to ignore the leap second and have a 1-second adjustment come down lazily as previous Windows versions did. Curious to hear your thoughts.

Net-Net if you’re required to support Leap Seconds (you’d be required not to smear) and there’s a path forward for Windows 10 and Windows Server 2019 systems 🙂

Interesting article – thanks. Applause for doing a proper job with leap seconds! Question: Seeing times like ‘4:59:60’ makes sense (if you know about leap seconds), but what about all the software out there that will fail when processing this (de facto “invalid”) time? Does, even, .tryparse() understand this?

Hi Chris – In the picture you mentioned, the process is opted-in to the leap second. For compatibility reasons, we intentionally do not automatically opt all applications into the leap second, but make it available if desired by the application/process.

Please check out our “Try it Out” guides for the Dev and IT Pros as they will provide some of the details (including methods for testing etc.). Suffice to say, that if you don’t need to know about the leap seconds, nothing will change for you. If you want an application to be leap second aware (for example, you’re in a regulated industry) there’s a path forward for Windows 10 and Windows Server 2019 systems.

Thanks. We at Meinberg are providing time and frequency synchronization products, and it would also be interesting to know how a driver e.g. for a GPS PCI card can forward a leap second announcement to the Windows kernel.

We provide GPS PCI cards, which get the announcement directly from GPS, and there are Time Code Receiver cards which may get the information from an incoming IRIG input signal *if* the used time code format supports this, and there are long wave receivers which get it from the German DCF77 transmitter.

The driver software can get the leap second announcement from the PCI card, and can provide it to the OS kernel, if there’s a known API.

I built a firewall, threw pfSense on it, enabled NTP, and according to the displayed stats, it’s within 1ms of several stratum 1 public NTP servers around the Internet, and every 600 seconds adjusts itself by 30us-200us depending on load. I assume higher load adds more noise from both thread scheduling and heat affecting the RTC. A stable Internet connection is very important for reliable time. Or your own Stratum 0. Even 100ms+ RTT servers in Europe can have less than 1ms of maximum observed offset for months at a time.

Hi Ben – Yes, you’re correct. There are a lot of factors that affect time accuracy and It’s important to distinguish high accuracy time that is maintained vs observationally seen. For example, we maintained 41us accuracy over the course of 4 days (shown above). That requires certain conditions be met for example our current documentation on the subject located here (note this has not yet been updated to reflect Server 2019): https://docs.microsoft.com/en-us/windows-server/networking/windows-time-service/support-boundary

If you require high-accuracy time as outlined by some of the regulatory requirements mentioned in this article, you should not rely on time over the internet. You should have a time source in your datacenter that chains to a known reference clock (e.g. GPS).

“A stable Internet connection” is hard to achieve and maintain regardless of how one defines “stable”.

For demanding apps and workloads, Sync-n-Scale disciplines the system clock continuously to deliver load-insensitive no-drift stratum-one UTC-accuracy directly to the general-purpose computing Windows platform and its Hyper-V VM instances without costly and complex networking dependency. Multiple Sync-n-Scale enabled Azure Stack deployments and all of their VM instances would be UTC accurate in the low double-digit microsecond precision range and naturally clock synchronized, even off-grid.

‘“A stable Internet connection” is hard to achieve and maintain regardless of how one defines “stable”.’

I use HFSC shaping in combination with Codel AQM. Even when running WAN saturating thousands of connections download and upload, I can maintain less than 0.01ms of jitter. This is at home on my $40 fiber connection.

Even under DDOS, my ping never goes above 40ms, but my loss goes sky high. 1Gb DDOS against my 150Mb connection resulted in 85% loss, but an average 15ms ping. That’s because my ISP rocks and also uses an AQM. During a 1 hour long WAN saturation test, I showed less than 0.001ms difference in avg and std-dev between loaded and unloaded latency to my ISP’s speedtest server .

I did a 30 day ping test to AWS Germany, about 120ms RTT. Two samples per second, observed min and avg ping where within less than 0.01ms of each other, max ping was less than 20ms over avg, std-dev was less than 1ms, and total packets lost were fewer than 100.

An 8 hour long download saturation test shaped down to 99.8% of my provisioned rate, uncapped Bittorrent downloads of every Linux ISO I could find. All heavily seeded. Less than a 0.1Mb/s difference between min and max link utilization, ping to my ISP never went over 0.14ms, 0 ping packets lost. ICMP is not prioritized.

Even when my ISP was under a DDOS, it was relatively minor. My ping to Chicago was about 20ms about 2-3% loss, which is much higher than the normal 6ms. Called the ISP, because they guarantee dedicated bandwidth and +14ms and 3% loss is ridiculous. Turned out they were getting hit with a several hundred gigabit DDOS and their link was at 100%. 15 minutes later, problem abated, back to 6ms and 0% loss. Got a call back from their engineering dept to let me know they increased their trunk bandwidth to the point that the DDOS was being absorbed.

To me, I feel that my connection is quite “stable”. Being a complete greenhorn to networking, it took me a good week to learn all the networking to set this up. The biggest hurdles was all of the professional misunderstandings out there. I had to dig into some theory and algorithms to understand how these “black boxes” work, so I could properly configure them. Surprisingly easy, but no guides. And a bit of packet sniffing to see what was going on.

After such an accurate and precise article (puns intended), there, almost at the end, what do I find but a tiny grammatical error, i.e., “about it’s local system” should read “about its local system”. Hopefully we won’t see such error creeping into the new Precision Time Protocol (PTP).