Note: Windows is a registered trademark of Microsoft Corporation in the United States and other countries. The Windows Timestamp Project is an independent publication and is not affiliated with, nor has it been authorized, sponsored, or otherwise approved by Microsoft Corporation.

A substantial amount of time and effort has been spent on the attempt to get a proper high resolution time service implemented for Windows.
However, the performance of these implementations is still not satisfactory. The complexity arises from the variety of Windows versions
running on an even greater variety of hardware platforms.

Proper implementation of an accurate time service for Windows will be discussed and diagnosed within the Windows timestamp project.
Test code will be released to prove functionality on a broader range of hardware platforms. Besides the timestamp functionality,
high resolution (microsecond) timer functions are also discussed.

2. Resources

Time resources on Windows are mostly interrupt controlled entities. Therefore, they show a certain granularity. Typical interrupt periods are 10 ms to 20 ms.
The interrupt period can also be set to be 1 ms or even a little below 1 ms by using API calls to NTSetTimerResolution or
timeBeginPeriod. However, for several reasons they can and shall never be set to anything near the 1 μs regime.
The best resolution to observe by means of Windows time services is therefore in the 1 ms regime.

The best resource for retrieving the system time is the GetSystemTimeAsFileTime
API. It is a fast access API that is able to hold sufficiently accurate (100 ns units) values in its arguments. The alternative API is GetSystemTime,
which is 20 times slower, has double the structure size, and does not provide a well-suited data format.

An interrupt independent system resource is used to extend the accuracy into the microsecond regime i.e., the performance counter.
The performance counter API provides the asynchronous calls QueryPerformanceCounter and QueryPerformanceFrequency.
A virtual counter delivers a performance counter value, which increases by a performance counter frequency. The frequency is typically a few MHz
and can therefore open the microsecond regime. The counter parameters are typically backed by a physical counter, but they are not necessarily independent of the
version of the operating system. A hardware platform can deliver different performance frequencies when running Windows 7 or Windows Vista, for example.

The Sleep() API and the WaitableTimer API are further timing resources in the context of this project. Their functionality and their habit also need to be looked at.

2.1. GetSystemTimeAsFileTime API

The GetSystemTimeAsFileTime API provides access to the system time in file time format. It is stated as

A 64-bit FILETIME structure receives the system time as FILETIME in 100ns units, which have been expired since Jan 1, 1601.
After some 400 years about 1.28×1010 seconds or 1.28×1017 100ns slices have been accumulated.
The 64-bit value can hold almost 2×1019 100 ns time slices. The remaining time before this scheme wraps would be about 58,000 years from now.
The call to GetSystemTimeAsFileTime typically requires 10 ns to 15 ns.

In order to investigate the real accuracy of the system time provided by this API, the granularity that comes along with the time values needs to be discussed.
In other words: How often is the system time updated? A first estimate is provided by the hidden API call:

NtQueryTimerResolution is exported by the native Windows NT library NTDLL.DLL. The ActualResolution reported by this call represents the update period
of the system time in 100 ns units, which obviously does not necessarily match the interrupt period. The value depends on the hardware platform. Common hardware platforms
report 156,250 or 100,144 for ActualResolution; older platforms may report even larger numbers. This is one of the heartbeats controlling the system.
The MinimumResolution and the ActualResolution are relevant for the multimedia timer configuration.
Two common hardware platform configurations are discussed here to highlight the details to be dealt with:

Platform configuration A

- Min. Res.:

156,250

- Max. Res.:

10,000

- ActualRes.:

156,250

Platform configuration B

- Min. Res.:

100,144

- Max. Res.:

10,032

- ActualRes.:

100,144

Platform A simply has 64 timer interrupts per second (64 x 156,250 x 100 ns = 1 s), but when looking at platform B the difficulties become more obvious:
99.856 interrupts per second? Answer: The full second interrupt is not available on all platforms.

However, the system time may be updated at these interrupt events. An API call to

will disclose the time adjustment and time increment values. The actual purpose of this call is to query the status of the system time correction, which is active when
TimeAdjustmentDisabled is FALSE. When TimeAdjustmentDisabled is TRUE, no adjustment takes place and TimeAdjustemt and TimeIncrement are equal
and do report exactly what was read as ActualResolution before. For a platform A type system, the call will report that the system time has incrementally increased by 156,250
100 ns units every 156,250 100 ns units. Within this description, this is considered the granularity of the system time.

Knowing the system time granularity raises doubts about its accuracy. Certainly, the TimeIncrement will be applied, thus changes of the system
time will always be one TimeIncrement, but does the interrupt period or any multiple of it always match the time increment?

Even when the standard setting of ActualResolution corresponds to the MinimumResolution, the ActualResolution may have a setting different from
MinimumResolution (see table below). In fact it may be configured to values in the range from MinimumResolution to MaximumResolution.
The ActualResolution determines the interrupt period of the system. That is the period after which the timer generates an interrupt to let the system react.
The ActualResolution can be set by using the API call

Typical values are 1 ms for wPeriodMin and 1,000,000 ms for wPeriodMax. The 1,000 s period for wPeriodMax is somewhat meaningless within the context
of this description. However, the possibility of setting the timer resolution to 1 ms requires a more detailed investigation. When the multimedia timer interface is used to set
the multimedia timer to wPeriodMin, the ActualResolution received by a call to NtQueryTimerResolution will show a new value.
For the two platform configurations discussed, the examples are as follows:

Platform configuration

A

B

MinimumRes.

156,250

100,144

MaximumRes.

10,000

10,032

ActualRes.

156,250

100,144

ActualResolution varies according to the varying multimedia timer periods uPeriod applied by the timeBeginPeriod() API:

Platform configuration

A

B

uPeriod

ActualRes.

9,766

10,032

1 ms

ActualRes.

19,532

20,064

2 ms

ActualRes.

19,532

30,096

3 ms

ActualRes.

39,063

39,952

4 ms

ActualRes.

39,063

49,984

5 ms

ActualRes.

39,063

60,016

6 ms

ActualRes.

39,063

70,048

7 ms

ActualRes.

156,250

80,080

8 ms

ActualRes.

156,250

89,936

9 ms

ActualRes.

156,250

100,144

10 ms

ActualRes.

156,250

100,144

11 ms

ActualRes.

156,250

100,144

12 ms

…

…

…

…

ActualRes.

156,250

100,144

100 ms

This list shows the supported interrupt periods for platforms of type A and B in 100 ns units. Platform A only supports four different
interrupt heartbeat frequencies, while platform B has a better approximation to the desired period. The specific numbers are relevant
for the procedures described here and thus need a detailed interpretation.

Note: TimeIncrement provided by GetSystemTimeAdjustment and ActualResolution provided by NtQueryTimerResolution are not
necessarily identical.
Platform A operates with an ACPI PM timer and platform B operates with a PIT timer. More modern platforms do not show "unsupported" values of uPeriod.

2.1.1. ActualResolution on Platform Type A

The timer intervals are given with 100 ns accuracy in the last digit. Since the true ActualResolution cannot be expressed correctly,
rather than reporting the true ActualResolution of 0.9765625 ms the call to NtQueryTimerResolution reports the rounded value of
0.9766 ms. The other values are also rounded (shall be 1.953125 ms and 3.90625 ms respectively).

A quick test using the Sleep(dwMilliseconds) API confirms this assumption:

Sleep(1) = 1.9531 ms = 2 x 0.9765625 ms

Sleep(2) = 2.9295 ms = 3 x 0.9765625 ms

Sleep(3) = 3.9062 ms = 4 x 0.9765625 ms

The Sleep() will only return when n x ActualResolution exceeds the desired duration. The required accuracy for the interval
specification would have to extend to 0.5 ns, in other words show the 100 ps digit. The number would be 156,250,000 for the MinimumResolution
and 9,765,625 for the MaximumResolution (in 100 ps or 10-10 s units).

Note: Sleep(1) measurements (10,000, with 100 ahead) result in a mean delay of 1953.163824 μs. This is 2.0000397 times
the interrupt time slice (should have been 1953.125 μs, so the measurement was off by 0.04 μs).

2.1.2. ActualResolution on Platform Type B

An interrupt timer period of 1.0032 ms will accumulate 10.032 ms after 10 interrupts and change the system time by 10.0144 ms.
A time change of 10.0144 ms after 10.032 ms means that the time is behind by 176 μs. At the 57th of such periods, the deviation
has accumulated to 1.0032 ms, which is exactly one timer interrupt period and the time will be updated after just 9 interrupts (9.0288ms).
This way the time is updated by 10.0144 ms 56 times after 10.032 ms and one time after 9.0288 ms, which is a total elapsed time of
570.8208 ms with an adjustment of 57*10.0144 ms = 570.8208 ms. This corresponds to a total number of interrupts of 569 (57*100,144 = 569*10,032).
As a result, the time will lose 176 μs for each of the 56 consecutive system time updates and then gain 9.856 ms in the 57th interrupt interval.

2.1.3. Changes of System File Time

The system time changes according to the described mechanisms after a certain period of time. Additional time changes do
happen if time corrections are caused by periodic time changes, which are continuously applied to the system time over a
longer period of time to adjust to an external time reference. The occurrence and the parameters of this adjustment can be gathered by
a call to GetSystemTimeAdjustment. Sudden time changes, for example, introduced by using the clock GUI or SetSystemTime(…)
, are not announced or predictable; they happen spontaneously.

Changes of the system time will have no influence on the expiration of Sleep periods or waitable timer periods. The actual change will
be taken over by the routines here. Nevertheless, system time changes are discontinuities in time, whether they are sudden or spread over a
longer period of time. What is an accurate time stamp supposed to deliver when the system inserts several hundred seconds at an interval
of 1.0000032 s? The system will assume that the seconds are that long (elongated) for the time being. This can be accomplished by
the temporary adaptation of the performance counter frequency to the applied granular time correction.

2.1.4. Windows 7/8/8.1/10 and Server 2008 R2/2012/2012 R2/2016

Time services on windows have undergone changes with any new version of Windows. Considerable changes are to be reported beyond VISTA
and Server 2008. The synchronous progress in hardware and software development requires the software to stay compatible with a whole
variety of hardware platforms. On the other hand new hardware enables the software to conquer better performance. Today's hardware
provides the High Precision Event Timer (HPET) and an invariant Time Stamp Counter (TSC). The variety of timers is described in
"Guidelines For Providing Multimedia Timer Support".
The "IA-PC HPET Specification"
is now more than 10 years old and some of the goals have not yet been reached (e.g. aperiodic interrupts). While QueryPerformanceCounter
benefited using the HPET/TSC when compared to ACPI PM timer, these days the HPET is outdated by the invariant TSC for many applications.
However, the typical HPET signature (TimeIncrement of the function GetSystemTimeAdjustment() and MinimumResolution of the
function NtQueryTimerResolution() are 156001) disappeared with Windows 8.1. Windows 8.1 goes back to the roots; it goes back to 156250.
The TSC frequency is calibrated against HPET periods to finally get proper timekeeping.

An existing invariant TSC influences the behavior of GetSystemTimeAsFileTime() noticeable. The influence to the functions QueryPerformanceCounter()
and QueryPerformanceFrequency() is described in sections 2.4.3. and 2.4.4. Windows 8 introduces the function
GetSystemTimePreciseAsFileTime()"with the highest possible level of precision (<1us)". This seems the counterpart to the linux gettimeofday() function.

2.1.4.1. Resolution, Granularity, and Accuracy of System Time

Since Windows 7, the operating system runs tests on the underlying hardware to see which hardware is best used for timekeeping.
When the processors Time Stamp Counter (TSC) is suitable, the operating system uses the TSC for timekeeping. If the TSC cannot
be used for timekeeping the operating system reverts to the High Precision Event Timer (HPET). If that does not exist it reverts
to the ACPI PM timer. For performance reasons it shall be noted that HPET and ACPI PM timer cause IPC overhead, while the use
of the TSC does not. The evolution of TSC shows a variety of capabilities:

Constant: The TSC does not change with CPU frequency changes, however it does change on C state transitions.

Invariant: The TSC increments at a constant rate in all ACPI P-, C- and T-states.

Nonstop: The TSC has the properties of both Constant and Invariant TSC.

"The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor’s support
for invariant TSC is indicated by CPUID.80000007H:EDX[8].

The invariant TSC will run at a constant rate in all ACPI P-, C--, and T-states. This is the architectural behavior moving forward.
On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers).
TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource."

An invariant TSC enables QueryPerformanceCounter(), QueryPerformanceFrequency(), and GetSystemTimeAsFileTime() to be served by the
same hardware. Deviations, as described in 2.4.3 are non existing when the performance counter values and the wall clock are supported
by the same counter (TSC).

Polling system time changes by repeated call of GetSystemTimeAsFileTime() discloses a new behavior on Windows 8: Examples given
in 2.1.1. and 2.1.2. are typical timekeeping schemes for systems running with a ACPI PM timer a PIT timer respectively. System time changes occurred at
some regular base. This is not the case on Windows 8; a whole bunch of varying file time increments is observed when polling on
file time transition. A truly periodic cycle can only be approximated by a "mean increment". However, this mean increment matches
the result given by ActualResolution. Despite these little hiccups, resolution, granularity, and accuracy of GetSystemTimeAsFileTime()
are comparable to earlier Windows versions.

2.1.4.2. Desktop Applications: GetSystemTimePreciseAsFileTime()

GetSystemTimePreciseAsFileTime() uses the performance counter to achieve the microsecond precision.
Depending on the hardware platform and Windows version, a call to QueryPerformanceCounter may be expensive or not
(HPET, ACPI PM timer, or TSC, see
"MSDN: Acquiring high-resolution time stamps."
). Consecutive calls may return the same result. The call time is less than the smallest increment of the system time.
The granularity is in the sub-microsecond regime. The function may be used for time measurements but some care has to be taken:
Time differences may be ZERO.

The function shall also be used with care when a system time adjustment is active.
Current Windows versions treat the performance counter frequency as a constant. The high resolution of GetSystemTimePreciseAsFileTime()
is derived from the performance counter value at the time of the call and the performance counter frequency. However, the performance
counter frequency should be corrected during system time adjustments to adapt to the modified progress in time. Current Windows versions
don't do this. The obtained microsecond part may be severely affected when system time adjustments are active. Seconds may consist of
more or less than 1.000.000 microseconds. Microsoft may or not fix this in one of the next updates/versions.

GetSystemTimePreciseAsFileTime() works on all platforms.

As of Windows 10 (Build 10240), the inaccuracy of GetSystemTimePreciseAsFileTime() during system time adjustments persists.

2.1.4.3. Timer Periods with Invariant TSC

Using the processors invariant time stamp counter for timekeeping requires a calibration of timer periods.
The TSC is used as a measure for the progress in time. However, the periodic update of the system time is still done by timer hardware
because the TSC does not produce periodic events. Such periodic event may be generated by the HPET. Querying the timer resolutions on
such a platform as described in section 2.1 will produce a pattern like this:

ActualResolution

uPeriod

1,0007

1 ms

2,0001

2 ms

3,0008

3 ms

4,0002

4 ms

5,0009

5 ms

6,0003

6 ms

7,0010

7 ms

8,0005

8 ms

9,0012

9 ms

10,0006

10 ms

11,0000

11 ms

12,0007

12 ms

13,0001

13 ms

14,0008

14 ms

15,0003

15 ms

15,6251

16 ms

The values of ActualResolution are accompanied by small offsets which may vary from boot to boot but they stay
constant during operation. This clearly indicates that the timer periods are calibrated during boot time. Consequently, system time
updates are done a those periods with a mean progress of ActualResolution.

The calibration of the performance counter frequency during boot is described in section 2.4. The tiny deviations
seen in the list above are a result of the calibration accuracy. Again: The TSC frequency is calibrated against HPET timer periods.
This is to be done in a reasonable short time to not extend the boot time too much. The remaining deviations are small but noticeable
(e.g. 1.2 μs in 9,0012 for a 9 ms period corresponds to 840ppm!).

2.2. The Sleep API

The Sleep function suspends the execution of the current thread for a specified interval.

VOID Sleep(DWORD dwMilliseconds);

This would indeed be a very useful function if it were doing what it is supposed to do. Unfortunately, a detailed view discloses
some artifacts, some of which are helpful, and others that are not. The Sleep() function is backed up by the system's interrupt services. As described
in section 2.1, the interrupt period can be configured to some extent. This has a direct impact on Sleep(). The call to Sleep() passes
the parameter dwMilliseconds to the system and expects the function to return after dwMilliseconds. In practice the
Sleep() only returns when two conditions are met: Firstly, the requested delay must be expired and secondly an interrupt has occurred
(the test to see if the requested delay has expired is only done with an interrupt). A simple Sleep(1) call may therefore have a number of
different results. The results also depend on the time at which the call was made with respect to the interrupt period phase.

Say the ActualResolution is set to 156,250, the interrupt heartbeat of the system will run at 15.625 ms periods or 64 Hz and a
call to Sleep is made with a desired delay of 1 ms. Two scenarios are to be looked at:

The call was made < 1ms (ΔT) ahead of the next interrupt. The next interrupt will not confirm that the desired period of time has expired.
Only the following interrupt will cause the call to return. The resulting sleep delay will be ΔT + 15.625ms.

The call was made >= 1ms (ΔT) ahead of the next interrupt. The next interrupt will force the call to return.
The resulting sleep delay will be ΔT.

The observed delay heavily depends on the time at which the call was made. This matters particularly when the desired delay is
shorter than the ActualResolution. However, when the ActualResolution is set to MaximumResolution, the system runs at
its maximum interrupt frequency and the deviations are in the order of one interrupt period.

This behavior can be used to synchronize code with the interrupt period in an easy way by simply calling two or more consecutive sleeps.
Regardless of what ΔT is, the first will end at the time of an interrupt. Consequently the following sleep call will start at the interrupt
time (or at least so close to it that the system will assume that it happened at the same time). As a result a ΔT = 0 applies and the sleep will
return when N x ActualResolution becomes larger than the desired period. Right after the return of a sleep, the system has just processed
an interrupt. Conditional latency may be on board due to a priority and/or task/process switching delay or due to interrupt handler CPU capture reasons.
Typical latencies of a few μs can be observed with very little implementation effort.

A special case is the call Sleep(0). It looks meaningless, but it is a very powerful tool since it relinquishes the reminder of the thread's time slice.
That means that other threads of equal priority level will take over when ready to run. When a number of threads are running at the same priority level
and all of them are very responsive, all of them will make frequent calls to Sleep(0) whenever they can afford it. As a result, a task switch can be
forced to happen in just a few μs.

2.3. The WaitableTimer API

Another important mechanism for performing timed operations is provided by the waitable timer interface:

This tool can be unsed in a variety of ways. Below are just a few things that need to be mentioned within the scope of this description:

The LARGE_INTEGER structure DueTime specifies when the timer is to be set signaled for the first time. This is
basically a file time, but formatted as LARGE_INTEGER to allow signed values. The sign is used by the system to allow input of absolute times (positive) or
relative times (negative). The system time only changes in steps of TimeIncrement and the DueTime is only compared when an interrupt occurs.
This effectively means that the timer can only reach a signaled state for the first time when a system time transition occurs.

The Period parameter specifies whether the timer will be a single shot timer or a periodic timer.
With Period = 0, the timer will only get signaled once when the system time has reached the DueTime. With Period > 0, the period
specifies a timer period in ms, resulting in a timer heartbeat of Period ms. Similar to the Sleep, the periodic waitable timer will be set signaled
when Period expires. But this is only tested when an interrupt occurs. Real cyclic periods can only be observed if Period is a multiple of
ActualResolution (the interrupt period) or when the overshoot remains constant. An example for the first case can be easily described for a platform
configuration of type A. A timer Period of 1,000 ms hosts exactly 1,024 interrupt intervals of 0.9765625 ms. Such a periodic timer will be truly cyclic.
If, on such a platform, Period is setup to 995 ms, the timer will expired after 1,019 interrupt periods, resulting in a delay of 995.1171875 ms.
However, the waitable timer uses the system file time and those overshoots will show deviations when a Period hits a system time transition.
In other words: A non-truly cyclic timer setup will suffer from a beat frequency with the system file time increment frequency. A detailed discussion
of this behavior falls outside the scope of this description. Evidently, a truly cyclic timer interval can also be set up when the beat frequency
stays in phase with the system file time update. A typical scenario can be described for a platform configuration B type system:

Assuming the ActualResolution is set to MaximumResolution (10,032 100 ns units); the
TimeIncrement (100,144 100 ns units) is not a multiple of the ActualResolution. In order to setup a truly cyclic timer, the least common multiple of 100,144 and 10,032
has to be found. The value of 5,708,208 suits this need here; it hosts 57 periods at MinimumResolution or 569 periods at ActualResolution.
The first truly cyclic timer period is therefore 570.8208 ms. It will be set up by a Period value of 570 and will expire after 569 interrupt periods.
At the time of expiration the system will have done 57 system time updates. More truly cyclic timer setups can be created at any multiple of 5,708,208 for this type of platform.
(Example: Period= 1,141, the timer will expire after 1,138 interrupt periods or 1141.6416 ms and the system time will have progressed by 114 x 10.0144 ms
which is 1141.6416 ms too.)

An optional asynchronous CompletionRoutine (APC) with an optional pointer to arguments
ArgToCompletionRoutine can be passed to the timer. However, the calling thread needs to be in the alertable state to allow execution
of the APC. The only advantage of the scheme with a completion routine is that this routine is automatically supplied with the systems FILETIME at
which the timer was signaled. Calling the APC unfortunately results in a considerable extension of the observed cyclic interval. When the system
file time is needed, it can be queried as described in 2.1). The extra time required to do this is a tiny fraction (1/2,000) of the time added to the
timer period by calling the APC.

The expired (signaled) timer can be handled by means of an asynchronous procedure (APC) call or by means of a call to WaitForSingleObject, for example.
According to the last point above, the former is useless when high accuracy is required. The latter suits the needs of the mechanisms described here much
better. The API needs the handle to the object to wait for and allows specifying a timeout dwMilliseconds, which can be optionally set to INIFINTE.

Waitable timers synchronize to the rhythm of the systems interrupt period (ActualResolution). This has to be kept in mind because it has
severe implications to the system's overall performance. All of the tasks waiting for a Sleep() or a timer to reach a signaled state will continue
after the interrupt has occurred. The system's load tends to reach peaks at interrupts.

2.4. The QueryPerformanceCounter and QueryPerformanceFrequency API

This API is backed by a virtual counter running at a "fixed" frequency started at boot time. The following two basic calls are used to explore
the microsecond regime: QueryPerformanceCounter() and QueryPerformanceFrequency(). The counter values are derived from some hardware counter, which is platform
dependent. However, the Windows version also influences the results by handling the counter in a version specific manner. Windows 7, in particular
has introduced a new way of supplying performance counter values.

2.4.1. QueryPerformanceCounter

The call to

BOOL QueryPerformanceCounter(OUT LARGE_INTEGER *lpPerformanceCount);

will update the content of the LARGE_INTEGER structure PerformanceCount with a count value. The count value is initialized to zero at boot time.

2.4.2. QueryPerformanceFrequency

The call to

BOOL QueryPerformanceFrequency(OUT LARGE_INTEGER *lpFrequency);

will update the content of the LARGE_INTEGER structure PerformanceFrequency with a frequency value. The frequency is treated by
the system as a constant.
From Windows 7/Server 2008 R2 onwards the result of QueryPerformanceCounter() may be calibrated at boot time and may therefore
return varying results. This depends on the underlying hardware (see 2.1.4.1.), But QueryPerformanceCounter() never reports any
changes of the frequency during operation; its result remains constant. The following chapter describes deviations on systems
on which the underlying hardware neither provides an invariant TSC nor provides a HPET for time services.

2.4.3. Performance of the Performance Counter

The range in time that can be held by the LARGE_INTEGER structure PerformanceCount depends on the update rate or the Frequency
at which the count will incrementally increase. Depending on the hardware platform the counter may be an Intel 8245 at 1,193,000 Hz or an ACPI Power Management
Timer chip with an update frequency of 3,579,545 Hz or even another source. A number of Platforms do not have these timers at all; they mimic
the timer by providing the CPU clock. As a result of the latter, the frequency can get into the GHz range. PerformanceCount.QuadPart (signed) will
change sign after 263 increments. At a frequency of say 1GHz (109 s-1), such a system can run for about 290 years without
reaching the sign bit. Even for multi-GHz platforms, there does not seem to be a serious limit.
However, apart from the system's treatment, the frequency cannot be considered being constant. Firstly, the frequency generating hardware
will deviate from the specified value by an offset and secondly the frequency may vary (i.e., due to thermal drift). The impact of these
deviations is not negligible. Oscillators do have tolerances in the range of a few ppm and would consequently introduce errors of a few μs/s in the measured
time period. Within this description the performance counter will be used to predict time intervals over a few seconds at accuracies better than 1 μs.
If an accuracy of 0.1 μs is reached after 10s, the frequency needs to be known to 0.01 ppm, which corresponds to 0.035 Hz at a nominal frequency of 3,579,545 Hz.
Obviously, that value is not provided by the system and needs to be calibrated. A first estimate of the true frequency can be gathered by querying two counter
values at a certain (known) time apart from each other. The code snippet uses the API call

However, due to artifacts described in 2.2, timeGetTime() is accompanied by an inaccuracy of up to 2 ms, thus a Sleep(1,000) would give an accuracy
for ticks_per_second of 0.002 (2,000 ppm) at most. An accuracy of 2 ppm would be achievable when the Sleep extends to 1,000,000 ms
or 1,000s. In order to obtain 0.01 ppm, the Sleep would have to cover more than 55 hours. This is obviously a hopeless approach. It also averages
temporary changes of the frequency and it will not forgive frequency changes due to thermal drifts.
The thermal drift of the performance counter frequency can be severe:

This graph shows an older system with heavy thermal drift. At boot time (~8:00) the measured performance counter frequency is
off by about 60Hz. The system reports the performance counter frequency as 3,579,545 Hz. In fact, it is already at 3,579,605 Hz when it is
"cold". After many hours of doing nothing, the system seems to reach a thermal equilibration. At ~14:00 (six hours after boot), the system
was heavily loaded for about 45 minutes and consequently warmed up. The load has increased the main board temperature by 5 deg. (centigrade scale) only, but the influence to the measured
performance counter frequency is quite considerable. It rose to an offset of almost 100 Hz or a true performance counter frequency of 3,579,645 Hz.
A 100 Hz offset at a base frequency of 3,579,605 Hz is a deviation of about 28 ppm or an error in time of 28 μs/s.

The calibration procedure used for the time stamp mechanism described here uses a repeated averaging period evaluation and reaches an accuracy
of better than 0.05 ppm after about 100s. Thermal drifts can be captured reasonably well and can be applied without much delay.
(Note: The declaration of ticks_per_second as a 64-bit float in the code snippet above enables the ticks_per_second
to hold a number with an accuracy of 15 digits. A value of 3,579,545.12 Hz shows the 0.01 ppm accuracy in the last digit.)

The use of QueryPerformanceCounter on multi-processor platforms implies that the call is made on the same processor all the time.
The SetThreadAffinityMask API and its associated calls are used to ensure this.
This rule only applies to systems using non invariant TSC hardware. The system analyzed in this chapter operates time services based on ACPI PM hardware.

2.4.4. Is the CPU Time Stamp Counter an Alternative?

The RDTSC specifies a call to query the time stamp counter of the CPU. The advent of multi processor platforms or muti-core processors highly recommends
not using RDTSC calls. Newer processors also support adaptive CPU frequency adjustments. This is just another reason to not use RDTSC calls for the
purpose discussed here.
Microsoft strongly discourages using the TSC for high-resolution timing
("Game Timing and Multicore Processors").
However, the introduction of invariant Time Stamp Counters has changed the situation. Starting with Windows 7/Server 2008 R2,
Windows has a clear preference: Look for invariant TSCs, see whether they can be synchronized on different cores and use them
for wall clock and performance counter whenever possible
("MSDN: Acquiring high-resolution time stamps.").

2.5. Discussion of Resources

Some of the resources discussed show a platform-specific behavior. They may deliver results depending on the hardware and/or
on Windows version. The precision time functions developed within the windows timestamp project mainly rely on four function suites
provided by the operating system:

GetSystemTimeAsFileTime

QueryPerformanceCounter with QueryPerformanceFrequency

Sleep

The WaitableTimer function together with WaitForSingleObject

The complexity of the system time update with respect to the interrupt settings was explained and is understood. A complex automatic
diagnosis of the system has to establish proper settings in order to obtain the desired accuracy.
Particularly, the continuous calibrations of the performance counter frequency described in 2.4.3 is of utmost importance to obtain high accuracy.
In addition, the proper interrupt period setting to obtain truly cyclic timer behavior (e.g., as described for example in 2.1) is very important.
Another set of APIs is used to establish functionality:

Pipes

Events

Shared Memory

Mutexes

The description of these functions falls outside the scope of this description.

3. Goals

The Windows Timestamp Project provides the tools to enable access to time at microsecond resolution and accuracy. Furthermore, it provides
timer functions at the same resolution and accuracy. The high accuracy and microsecond resolution are archived by synchronizing the system time
with the performance counter. In fact, the performance counter is phase locked to the system time. A diagnosis determines the system's specific
parameters and establishes a "truly cyclic" timer interval for updating the phase of the performance counter value. The drift of the
performance counter is permanently evaluated and taken into account while the system is running.

3.1. Time Support

Any time providing mechanism needs time for its internals. Thus, the following question arises with respect to time:
Is the time requested at the time the call is made or shall the time be reported at the time in which the call returns?
This may sound strange, but considering the level of resolution and accuracy aimed for here, it matters.

Example:

Something just happened and you want to assign a timestamp to it.
In this case, you would want the time at the time you're asking.

You want to do something at a specific time.
In this case, you would want the time at the time you are getting the answer.

Two time functions are implemented to fulfill these two needs:

3.1.1. GetTimeStamp

The function GetTimeStamp, declared as

void GetTimeStamp(TimeStamp_TYPE * TimeStamp);

fills the argument pointed to by TimeStamp with numbers according to the TimeStamp structure definition:

The 64-bit value Time represents the number of elapsed 100-nanosecond intervals elapsed since January 1, 1601. ScheduledDueTime
reports the system file time at which the next reference time is scheduled for an attempt to update the phase. This value should
primarily be used to verify the operation of the precision time mechanism. If ScheduledDueTime is noticeable behind the
current system file time, the scheduled update of the time reference must have failed for a number of consecutive attempts.
Finally the 32-bit value of Accuracy gives an estimate of the assumed accuracy (rms) of the time stamp in 1 ns units (error in ns/s).

RefinedPCF returns the calibrated frequency of results from QueryPerformanceCounter().
GetTimeStamp() may be called at any time, thus it provides information about the state of the calibration.
States can be the following:

TIME_STAMP_OFFLINE (1): Time calibration service is offline.

TIME_STAMP_AWAITING-_CALIBRATION (2): Calibration service just started but time service not yet calibrated.

TIME_STAMP_CALIBRATED (3): Time service calibrated.

TIME_STAMP_LICENSE_EXPIRED (4): License expired during runtime.

The call to GetTimeStamp is fast and it reports the time at the time it is called.
As of version 2.01, this call is done in 10 to 20 ns on current platforms.

3.1.2. Time

A simple function is stated as

long long Time(void);

The function is as fast as GetTimeStamp and it returns the time at the time the call returns. With the need for a few thousand
CPU cycles, the call will require very few μs with the current hardware. The Time() can be used to compare times or to wait until a certain
time is observed. The 64-bit return value represents the number of elapsed 100-nanosecond intervals since January 1, 1601.

3.2. Timer Support

A set of timer functions:

Creating a timed named event:

HANDLE CreateTimedEvent(BOOL bManualReset,LPCTSTR lpTimerName);

bManualReset [in]

If this parameter is TRUE, the function creates a manual reset event object, which requires the use of the ResetEvent
function to set the event state to nonsignaled. If this parameter is FALSE, the function creates an auto reset event object, and the system automatically
resets the event state to nonsignaled after a single waiting thread has been released.

lpTimerName [in, optional]

The name of the event object. The name is limited to MAX_PATH characters. Name comparison is case sensitive.
If lpTimerName matches the name of an existing named event object, this function will fail. If lpTimerName is NULL, the event object is created without a name.
If lpTimerName matches the name of another kind of object in the same namespace (such as an existing semaphore, mutex, waitable timer, job, or file-mapping object),
the function fails and the GetLastError function returns ERROR_INVALID_HANDLE. This occurs because these objects share the same namespace.
The name can have a "Global\" or "Local\" prefix to explicitly create the object in the global or session namespace. The remainder of the name can
contain any character except the backslash character (\). For more information, see Kernel Object Namespaces. Fast user switching is implemented
using Terminal Services sessions. Kernel object names must follow the guidelines outlined for Terminal Services so that applications can support
multiple users. The object can be created in a private namespace. For more information, see Object Namespaces.

Return value

If the function succeeds, the return value is a handle to the event object. If the named event object existed before the
function call, the function returns NULL and GetLastError returns ERROR_ALREADY_EXISTS. If the function fails, the return value is NULL.
To get extended error information, call GetLastError.

Setting the timed event with a DueTime in 100 ns units and an optional Period in 100ns units:

A handle to a named timed event. The CreateTimedEvent() function returns this value.

TimerDueTime [in]

The time after which the state of the timer is to be set to signal in 100 nanosecond intervals.
Positive values indicate absolute time. Be sure to use a UTC-based absolute time,
since the system uses UTC-based time internally. Negative values indicate relative time.

TimerPeriod [in]

The period of the timer in 100 ns intervals. If TimerPeriod is zero, the timer is signaled once.
If TimerPeriod is greater than zero, the timer is periodic. A periodic timer automatically reactivates each time the period elapses,
until the timer is canceled using the CancelTimedEvent function or reset using SetTimedEvent.
If TimerPeriod is less than zero, the function fails.

Return value

If the function succeeds, the return value is nonzero. If the function fails, the return value is zero.
To get extended error information, call GetLastError.

Canceling the timed event:

int CancelTimedEvent(HANDLE hTimerEvent);

hTimerEvent [in]

A handle to a named timed event. The CreateTimedEvent() function returns this value.

Return value

If the function succeeds, the return value is nonzero. If the function fails, the return value is zero.
To get extended error information, call GetLastError.

Opening a timed event:

HANDLE OpenTimedEvent(LPCTSTR lpTimerName);

lpTimerName [in]

The timed event name used when the timed event was created.

Return value

If the function succeeds, the return value is the handle to the named timed event. If the function fails, the return value is NULL.
To get extended error information, call GetLastError.

Deleting the timed event:

int DeleteTimedEvent(HANDLE hTimerEvent);

hTimerEvent [in]

A handle to a named timed event. The CreateTimedEvent() function returns this value.

Return value

If the function succeeds, the return value is nonzero. If the function fails, the return value is zero.
To get extended error information, call GetLastError.

These timer functions are based on timed events. The handle returned by CreateTimedEvent() is in fact a handle to a named event of which
signaled state is supervised by a time service routine. Standard wait functions like WaitForSingleObject or WaitForMultipleObjects
can be used to wait for the high resolution timer events.

4. Implementation

Only two hardware platforms were described here to highlight some of the problems to bear in mind when implementing reliable time services for Windows.
Many more configurations need to be diagnosed to ensure platform independent functionality to a large extent. However a flexible and automatic evaluation
of hardware specific behavior may result in hardware independence.

The implementation of all the above into a time service is done by careful separation into different processes and threads. The time
critical parts are hosted by a process running at real-time priority class. Some of the threads inside this process are even
running at time-critical priority level. In the case of a multi-processor or multi-core system, certain threads are assigned to a specific
CPU/core. This is the Kernel and hosts the time service routines. For testing and debugging the Kernel process has some IO
capabilities shared with the IO process. A later version may not need this additional functionality. The high priority class requires the
Kernel process to run with administrator privileges.

A second process hosts all kinds of less time critical service threads. It shares some IO service with the Kernel process by means
of piped IO between these two processes. Furthermore it provides pipe services to the graphical user interface (GUI).

The third process is a graphical user interface (GUI), which runs optionally and helps in the current stage of the development to get an
insight into what is going on.

The GUI and the IO process are development tools only. The only process that needs to run to access the time functions discussed here is the Kernel process.

4.1. The Real-time Priority Class Process: Kernel

The Kernel is the heart of the time service described here. It provides the important link between the system file time and the performance
counter value. The idea in this context is to provide data triplets of system file time, performance counter, and performance counter frequency.
Knowing the performance counter value at a certain system file time allows the extrapolation of the system file time to the actual time by applying the
performance counter value and the performance counter frequency. As discussed, the performance counter frequency is of insufficient accuracy; a refined performance counter
frequency is supplied in format double (64bit float). There is also some internal information which allows a refinement of the performance counter value
itself (as a result of some self-calibration). Thus, it is also represented in double (64bit float) format.

This information is sufficient to establish time services. Querying the current performance counter value gives the difference to the value calculated
to match the last captured file time. This difference is divided by the performance counter frequency and the result is the elapsed time since the last
file time capture. This data triplet is, besides other parameters, written to a mutex protected shared memory section. Other processes/threads have access
to this data triplet.As of version 2.02 the mutex scheme has been replaced by a read-copy-update (RCU) scheme to improve the performance.

As described in sections 2.1 and 2.3, the important part is to get the file time updated correctly. It proves best to gather the data triplet exactly
when the file time transits or just transited. Difficulties archiving this have been described for platform examples A and B. At startup, a complex
diagnosis of the interrupt timing structure and file time update/transition structure is performed. This results in a timing scheme for updating the data triplet.
The desired update period is in the range of 1 to 10 seconds. As discussed, the period duration influences the accuracy. Algorithms are looking for patterns and
beat frequencies in the file time update and interrupt timing structure. As a result, a periodic timer is set up to run the data triplet generation and the calibration
in parallel. Once exact ∆T file time periods do occur, the true performance counter frequency can be measured and averaged over a number of consecutive
measurements. A running average over the last n captures is maintained at all times to provide information about the true (calibrated) performance counter frequency.
When the accuracy of the average reaches a certain quality, the phase locking of file time change and performance counter is considered as established and
timestamp requests are accompanied by information about their accuracy.

Running all of this at utmost priority ensures that there is very little overhead after an interrupt. Remember: Many processes/threads are waiting for
interrupts. Therefore, systems do have a workload peak at the occurrence of an interrupt. Even running at such priority settings, it is unavoidable to be
influenced by the load of other processes. However, the accuracy of this scheme easily stays below a few microseconds, even with heavy load on the system.

The routines Time() and GetTimeStamp() are applying the extrapolation scheme described here. Both calls are done in far less than a few 10 ns, even on older systems.

The functionality of the timer routines listed in 3.2 is handled in this real-time process as well. Timed events are registered in a timer event queue.
They are monitored with respect to their due time/period. When there is less than one interrupt period left before the due time expires, the timer service polls
the timed event queue for the precise time to set the event. This may happen for a number of timed events, even within the same interrupt period.
However, it should be noted that the time service thread is running at a high priority level and the signaled event may not be accessible to other
processes/threads when there is just one CPU.
A single CPU/core system simply cannot cope with multiple timed events setup to signal within the same interrupt period.

4.2. Less critical services: The IO-Process

In order to implement the kernel as small as possible, much of the functionality is performed by a second process. The IO, in particular, matters.
The IO process establishes pipe services to release the kernel from blocking IO. All IO done by the kernel is queued into the IO processes pipe service.
These operations are nonblocking. A complex fprintf() can be queued in just a few microseconds. This allows extensive output for diagnosis.
Furthermore, output is logged into a file.

4.3. The Optional GUI-Process

The current GUI is mainly created for developing the time service. Meanwhile, it has become a valuable tool for diagnosing platforms. It runs optionally.

Fig. 4.3.1: The Graphical User Interface (Version 1.70).

The output is split into four tabs: the all output tab, the error messages tab, the Calibrated Performance Counter Offset tab,
and the NTP Offset tab. The text output within
the first two tabs is produced using the queued qfprintf(…) function. This function makes its message time stamped and shows also some other parameters
of the output piping thread:

As already mentioned, the GUI runs optionally and any number of GUI can be started and ended at anytime. Ending a GUI will neither end the
kernel process nor end the IO process. In order to terminate the whole group of processes, the Kernel process has to be stopped. The Stop Kernel button
(lower right corner) stops the kernel. By doing so, queued messages that are supposed to be processed are stuck. A few message windows will pop up to show the contents of
the unprocessed parts of the queues of all involved processes. These popup windows are not error messages; they just report what was happening while the Kernel
was stopped.

The plot at the left lower corner shows the history of the accuracy in μs/s during the last 600 seconds. The GUI produces this information by means of
GetTimeStamp() imported from the time service DLL.

It also provides a tiny test of the timer functionality: A single shot timer can be setup. The due time setting here
is absolute, thus the time has to be in the future. Hint: Use the Update Date/Time Fields button to get the actual time into the fields and than e.g.
incrementally change the minute field by 1. Press the Create Timed Event button quickly before the due time expires. Progress of the timed event approaching its due time
is shown next to the button, which has now converted into a Stop Timed Event Button to allow cancellation of the timed event. A message window will popup when
the timed event has signaled. It shows the precise time at which the signaled state was detected and how much it deviates from the requested due time.

The output can be stopped for the all output tab (Hold Output Button). All output will be queued and the button converts into a Continue Output button until
the Continue Output button is pressed. An optional auto cont. check box lets the GUI continue automatically when the queue buffer reaches a critical stage.
The auto cont. check box can only be checked when the output is hold.

The Calibrated Performance Counter Frequency Offset tab shows the offset of the calibrated performance counter frequency.
The graph shown in 2.4.3 was created within this tab. The graphs context menu (right mouse button) allows saving the graph or clearing the graph's data.
Clearing the data will not stop further recording; creation of the graph will continue.
Version 1.2 introduced the NTP Offset tab, the NTP/autoadjust status line, and the NTP/autoadjust check boxes. Details about these items are given in
"Part II: Adjustment of System Time".

4.4. The Libraries

The functions described above are accessible to other processes/threads through a static library (LIB) or a dynamic link library (DLL).

5. Results

Microsecond resolution time stamps are possible on Windows systems. Resolution in the microsecond regime can be observed at accuracies of a few microseconds
without distracting the system too much. Timer functions at the same resolution and accuracy are implemented and tested. Handling many timed events created by those
timer functions set up to fire within the same millisecond is tricky but possible. The evaluation at the startup of the services may sometimes take a few seconds and needs all the CPU time. Doing this at utmost priority will freeze
single core/processor systems for a moment.

Fine granularity time services are established with the tools described above. System time adjustment following an NTP time server relies on fine granularity
of time keeping. The accuracy obtained when synchronizing to an NTP server is determined by the accuracy of the system time. Granularities of 15.625 ms are way
to poor to achieve reasonable NTP synchronization. The time keeping has improved with newer windows versions. The function GetSystemTimePreciseAsFileTime() was
described in 2.1.4.2. It is proposed to have very fine granularity. Unfortunately GetSystemTimePreciseAsFileTime() shows a misbehavior when a Windows system time
adjustment is active. This is counterproductive when the goal is precise synchronization of the system time to an NTP server.

The following second part "Adjustment of System Time" deals with high accuracy system synchronization.

A pdf version of "Microsecond Resolution Time Services for Windows" can be downloaded here.

Don't miss more details described on the "News" page.
A pdf version of the News History can be downloaded here.

Note: Windows is a registered trademark of Microsoft Corporation in the United States and other countries.
The Windows Timestamp Project is an independent publication and is not affiliated with, nor has it been authorized, sponsored, or
otherwise approved by Microsoft Corporation.

Part II: Adjustment of System Time

Arno Lentfer, June 2012

Last Update: Version 3.10, May 2019

Windows provides the following simple tools to manage and monitor system time adjustments: The Internet Time GUI and the console application w32tm.exe.
These tools are sufficient to obtain an initial rough estimate of the performance of the Windows internet time synchronization.

1. The Internet Time GUI

Synchronization to an internet time server is accomplished directly from the user interface.
Windows Vista, Windows 7 and Windows 8 provide the Internet Time Settings window and Windows XP provides
the Internet Time tab in the Date and Time Properties window:

Fig. 1.1: Internet Time Settings window of Windows Vista and higher.

Fig. 1.2: Internet Time Settings window of Windows XP.

An internet time provider can be chosen from a list or a new NTP server address can be added to the list.
It is also possible to add an IP address to the list. Adding an IP address may be advisable when the name represents a pool
of servers and the server needs to be explicitly indicated.

The common "Update Now" button will attempt to synchronize the system time to the time server.
This allows synchronization to take place or it becomes active upon confirmation. Note: The message "...has been successfully synchronized..." does
not necessarily mean that synchronization has finished. It could also mean that a synchronization process was successfully started.
Such processes can last for many hours.

2. w32tm.exe

In order to verify the result or progress of the synchronization, another tool has to be run in parallel.
The console application w32tm.exe
allows monitoring of the offset of the local time to the time of an internet time server.

The easiest way to do this is from a console window with the following set of parameters:

w32tm /stripchart /computer:time.windows.com /period:120

As a result, the system time and its offset to the time server are dumped to the console every 120 seconds:

Each line consists of the local time (08:38:57), an internal delay (time difference between the udp package received and udp package sent
on the server side, i.e., d:+00.0419394s), the actual offset between the local time and the server time
(o:+00.1024506s) and a very basic stripchart of the offset.

The first output line of w32tm will also resolve the name of the time server (time.windows.com) to an IP (UDP port 123 is reserved for NTP).
This is important because time.windows.com does not refer to a single server but rather to a pool of servers; therefore, consecutive attempts
to synchronize to it may use different physical servers. However, w32tm resolves the IP of the server currently in use with w32tm.
This IP can also be chosen as a server for the synchronization. For example, one of the addresses of the time.windows.com pool is 65.55.21.14.
The best proof of quality is obtained when the IP address in the internet time GUI described above and the same IP address with the w32tm command are used:

w32tm /stripchart /computer:65.55.21.14 /period:120

3. Results

The results obtained with w32tm are difficult to interpret. When the offset in time is large (i.e., several seconds),
synchronization of the system time seems to happen in one step. In these cases, the remaining offset is typically larger than a few milliseconds.
However, when the offset is less than a few seconds, an algorithm gently adjusts the offset in small steps. This procedure can take many hours.

It turns out that obtaining detailed insight into this adjustment algorithm by using w32tm is difficult.
A more in-depth investigation may uncover the cause of the behavior observed, however, this requires additional software.

4. Discussion

Applying the scheme described above frequently gives very dissatisfying results. Sometimes the synchronization results in a time offset that is worse
than the offset prior to synchronization. In particular, Windows Vista and Windows 7 show strange behavior, e.g., seemingly never-ending
adjustments to huge offsets.

A piece of software is necessary to find out the secret of the adjustment algorithm. Actual system time adjustment parameters can be obtained by a call
to the function GetSystemTimeAdjustment because Windows
performs the system time adjustment through calls to the function SetSystemTimeAdjustment.

MSDN: "For each lpTimeIncrement period of time that actually passes, lpTimeAdjustment will be
added to the time of day." Assuming this rule, the adjustment gain can be calculated:

gain = (lpTimeAdjustment - lpTimeIncrement)/ lpTimeIncrement

A simple program can call GetSystemTimeAdjustment frequently while a system time adjustment is active and evaluate the gains for individual values of lpTimeAdjustment.
The function SetSystemTimeAdjustment allows to initiate and control a system time adjustment:

System time adjustments occur when bTimeAdjustmentDisabled is set to FALSE and dwTimeAdjustment is set to some meaningful value.
Unfortunately, the influence of the values of dwTimeAdjustment depends on the Windows version: The MSDN description of the SetSystemTimeAdjustment function
contains the note:
"Currently, Windows Vista and Windows 7 machines will lose any time adjustments set less than 16." Note: Windows 8 is not mentioned here,
the related knowledge base article
KB2537623 also does not mention Windows 8.

The update scheme of the system time and also the scheme of system time adjustments depends on the presence of a
High Precision Event Timer [HPET].
Intel specifies [hpetspec.pdf]:
"An existing HPET does not replace the RTC Time of Day, the RTC Alarm, and the RTC CMOS functionality.
The HPET architecture supplements/replaces only the RTC Periodic Interrupt function."
The RTC (Real Time Clock) Periodic Interrupt function used to be the heartbeat of the system time update. However, an existing HPET will replace this
functionality and remove the system time update activity from the RTC periodic interrupt function. Those systems can typically be identified by a specific value of
the update period lpTimeIncrement: 156001. HPET and RTC are driven by different hardware.
Therefore they are neither synchronized nor are they in phase by default; additionally they may show specific drifts.
More information about the evolution of the HPET architecture is given in
"Guidelines For Providing Multimedia Timer Support" [MSDN].
Newer systems may provide hardware with an invariant Time Stamp Counter (TSC) as described in section 17.17 of
"Intel® 64 and IA-32 Architectures, Software Developer’s Manual".
Windows has a clear preference about what hardware resource is to be used for timekeeping. When suitable TSC characteristics
are obtained, Windows uses the TSC for timekeeping. If the TSC is not suitable, Windows uses the HPET when available,
and if that is not available or disabled in BIOS Windows uses the ACPI PM timer
("MSDN: Acquiring high-resolution time stamps.").

It was already shown in section 2.3 of Microsecond Resolution Time Services for Windows that the Windows system timing cannot be assumed to show a fixed
pattern. The evolution of Windows with newly introduced limitations (... will lose any time adjustments set less than 16.) and emerging new hardware results
in a big variety of schemes for system time adjustments. A few relevant combinations are diagnosed and described here.

4.1. Windows XP and Windows Server 2003: The Classical Case

A call to GetSystemTimeAdjustment reveals a value of 156250 for lpTimeIncrement on most platforms running Windows XP or its server variants
(Some specific hardware may return other values e.g. 100144). Note: A value of 156250 represents 15.625 ms, an RTC Periodic Interrupt at 64 Hz.
This is a very common hardware fingerprint.

Using the function SetSystemTimeAdjustment with dwTimeAdjustment = 156250 and bTimeAdjustmentDisabled = FALSE shall initiate a system time adjustment.
However, according to the gain equation described in 4. no adjustment shall take place, the gain shall be zero, but the adjustment shall be active
with lpTimeAdjustmentDisabled = FALSE.

Setting dwTimeAdjustment to any number different from lpTimeIncrement shall result in a system time adjustment.
Example: lpTimeIncrement = 156250 and dwTimeAdjustment = 156257. The system time will advance by 15.6257 ms every 15.6250 ms,
the system time will gain 0.0448 ms/s (7/156250). This way the gains are predictable, a small list shows the obtained gains at the
neighborhood of 156250 at dwTimeAdjustment from 156255 to 156248:

This hardware also consistently follows the gain equation provided by the MSDN description. However, the smallest adjustment gain on
this hardware is almost 10 μs/s.

Windows XP and Windows Server 2003 do not support a hardware HPET. These Windows versions may use Programmable Interrupt Timers (PIT),
Real Time Clocks (RTC), the processors Time Stamp Counter (TSC), and Power Management Timer (PMTIMER) to mimic what is later done by
the High Precision Event Timer (HPET). These Windows versions increment the system time at a fixed period every lpTimeIncrement.
This period does not depend on settings of the timer resolution by means of the
timeBeginPeriod() function.
This is easiest confirmed by polling system file time transitions over a longer period of time with different settings of timeBeginPeriod().
As a result, the granularity of the system time is typically in the range of 10 ms to 20 ms.

4.2. Windows Vista, Windows 7, Windows 8, 8.1 and Windows 10

Windows VISTA introduced HPET support. It has been the first public Windows version decoupling the system time update and the system
time adjustment from the RTC Periodic Interrupt function or the ACPI PM timer in case of existing HPET hardware. This was a big step towards higher timing
accuracy. However, it also caused some inconsistency with a remarkable drawback for Windows VISTA and Windows 7 (KB2537623)
persisting until now. Windows Vista also introduced the influence of the multimedia timer resolution (set by timeBeginPeriod) to the
update period of the system time: The system time is updated at a period of ActualResolution returned by the function NtQueryTimerResolution.

The following list of system time gains vs. dwTimeAdjustment (156154 to 156330) was taken with Windows Vista on a platform without HPET/TSC support
(lpTimeIncrement = 156250):

This list discloses some information contained in "... will lose any time adjustments set less than 16...".
It seems that it is not losing time adjustments with values less than 16, but SetSystemTimeAdjustment ignores the lower 4 bits
of dwTimeAdjustment. The obtained gain is the same for all dwTimeAdjustment values in one group.
The group size is 16. Only the group ranging from 156234 to 156250 has 17 members. It is yet unclear why the
scheme shows this exception. However, the gain equation used for the gain calculation obviously does not apply here.
Therefore, MSDN: "For each lpTimeIncrement period of time that actually passes, lpTimeAdjustment will be added to the time of day"
becomes incorrect for this configuration. Exception: Gain is zero at dwTimeAdjustment = lpTimeIncrement.

The next list is taken with Windows Vista on a platform with HPET/TSC support (dwTimeAdjustment: 155908 to 156079, lpTimeIncrement = 156001):

Windows 7 and Windows Server 2008 R2 introduced Timer Coalescing
(more detailed: TimerCoal.docx)
to "...improve the efficiency of periodic software activity by expiring multiple distinct software timers at the same time...".
This portion of software shifts interrupts into groups of interrupts. A requested interrupt is accompanied by a tolerance to tell the OS by
how much it is allowed to shift the interrupt in time. This may affect the update of the system time and has to be diagnosed carefully.
Windows 7 does not update the system time by fixed increments.

Capturing the adjustment gain on a Windows 7 platform with constant TSC support results in the following list
(dwTimeAdjustment: 155908 to 156079 lpTimeIncrement = 156001):

The gain distribution is asymmetric. The gain steps are in the order of 0.1 ms/s, but the smallest positive gain differs from the smallest negative gain.

The smallest available adjustment is 42 μs/s in the positive direction and -57 μs/s in the negative direction.
This does not appear to be a good resolution when compared to Windows XP and Windows Server 2003.

This behavior raises the question of whether a specific gain for a specific value of dwTimeAdjustment
remains constant over time. Careful evaluation of this matter has not confirmed any variation of the gain
(added advancement of the system time) when a constant value of dwTimeAdjustment is applied. Therefore, it remains
difficult to predict the adjustment gain for values of dwTimeAdjustment for systems affected by this scheme
(Windows Vista and Windows 7 with HPET/TSC support). "For each lpTimeIncrement period of time that actually passes, dwTimeAdjustment
will be added to the time of day." In this regard, [MSDN]'s
claim turns out to be wrong on Windows 7 too.
Note: This specific asymmetry occurs with the systems interrupt period set the minimum by
means of e.g. timeBeginPeriod(wPeriodMin).

All software packages using SetSystemTimeAdjustment are in serious danger of relying on predictable gains.
It should also be noted that there is no dwTimeAdjustment setting for a gain of 0.0 ms/s. It was shown in section 4.1 that
earlier versions of Windows had a much more predictable scheme. The scheme observed on Windows VISTA and Windows 7 requires the
software to calibrate itself to the appropriate gain for values of dwTimeAdjustment because it cannot be easily evaluated by the
given values of lpTimeIncrement and lpTimeAdjustment.

The system time synchronization routines of these newer Windows versions do not seem to take these facts into account.
A typical synchronization to an internet time server uses all bits for setting the values of dwTimeAdjustment.
This can be easily monitored through frequent use of GetSystemTimeAdjustment. Furthermore, these tools expect the
lower 4 bits to be taken into account by the system. Windows calculates a correction scheme ahead of the actual
adjustment based on the offset to the network time. Unfortunately, the gains are not set as expected and the
predicted scheme messes up the adjustment/synchronization, which results in the synchronization being completely off.
This is accompanied by the fact that there is no monitoring of the internet time provider while the system time adjustment progresses.
Such an adjustment can run for hours and a big deviation may appear with wrong gain estimates resulting from the synchronization
algorithm. Finally, at some point the deviation will be several seconds and the next synchronization will only set the local time
to the network time without applying the function SetSystemTimeAdjustment.

Windows 8 has finally fixed this mishap. This list has been captured on a Windows 8 system with constant TSC support:

The missing resolution for the value of dwTimeAdjustment is gone, each value has its own gain and the gain is close
to the predicted gain (Example: 156003: (156003 - 156001)/156001 = 0.0128 ms/s). The deviation of gains shown in this
list are a result of the changes in Windows 8 timekeeping. Windows 8 does not increment the system time by constant
increments, it rather applies a variety of increments to achieve a desired mean increment. As a consequence, the above
measurement would have to be taken over many more periods to show results with less deviations. However, it is very
obvious that the described adjustment scheme if fulfilled with Windows 8.

As of Windows 8.1, timekeeping has again undergone some modifications. The same hardware now reports 156250 for lpTimeIncrement.
The list of gains appears as follows:

This looks very much like the classical Windows XP adjustment gain scheme. It matches the
formula gain = (lpTimeAdjustment - lpTimeIncrement)/ lpTimeIncrement.

The system time adjustment will take care that the system time will progress by TimeAdjustment during
TimeIncrement. This effectively happened with Windows XP. Since Windows 8 (on specific hardware also since Windows 7)
this process may also appear as a progress in smaller steps, depending on the setting of the timer resolution. When the timer
resolution is set to maximum resolution (see section 2.1. of Microsecond Resolution Timer Services for Windows), the
obtained increments are in the same order of magnitude as the timer resolution. However, Windows 8 and Windows 8.1 maintain
the average progress of TimeAdjustment during TimeIncrement.

MSDN: "If the time difference between the local clock and the selected accurate time sample (also called the time skew)
is too large to correct by adjusting the local clock rate, the time service sets the local clock to the correct time."
[How the Windows Time Service Works]

4.3. Monitoring an NTP time provider

A much more detailed view of the system time adjustment can be obtained when the local time is compared to a precise remote
time while the system time adjustment is active. The accuracy of w32tm.exe is simply too poor to extract meaningful results.
Also, the accuracy of time.windows.com is unsatisfactory.

In order to facilitate a closer look at the problems described above, an NTP
(Network Time Protocol) client was added to the time services
and the user interface was extended by an NTP Offset tab. This allows to see how the local
time progresses against a reference time.

The calibrated performance counter frequency receives an additional correction when a system time adjustment is active.
The system time adjustment forces the local time to advance slower or faster, thus the performance counter frequency has to
be corrected in a way that takes the modified duration of the "second" during the adjustment into account (see section 2.1.3. of
Microsecond Resolution Time Services for Windows).
Consequently, an applied system time adjustment becomes visible in the "Calibrated Performance Counter Frequency Offset" tab.
As of version 1.2, the calibrated performance counter frequency offset is given in ppm. It is referenced to the value
given by QueryPerformanceFrequency() and scaled to show deviation in parts per million. This corresponds to μs/s.
This way applied system time adjustment gains will directly show in the plot with real numbers.

The user interface now also provides two checkboxes. When the NTP checkbox is checked, NTP monitoring is activated.
The "Autoadjust" checkbox enables permanent synchronization of the local time to a network time:

The NTP status and the current offset to the network time are reported at the bottom in the NTP status line.
Another status line contains information about the automatic adjustment (see section 4.4 for more information on automatic adjustment).

The following two plots were captured when the a system time adjustment was triggered by Windows XP:

Fig. 4.3.3: NTP Offset during the adjustment (Windows XP).

Fig. 4.3.2 shows that the performance counter frequency offset jumps to about 140 ppm. This corresponds to an initial adjustment
gain of 120 μs/s because the initial offset was already 20 ppm. The gain was reduced in steps over a long period of time (the total
adjustment lasted from 8:46 to around 16:00). In the first part, the gain was reduced after about the same time until about 11:33.
At that point, the granularity of dwTimeAdjustment prohibited smaller steps and the time between the modifications of
dwTimeAdjustment was extended. This way, the target could be approached with a decreasing adjustment speed. The last step
from about 13:50 represents the dwTimeAdjustment = 156250. The system time adjustment was still enabled, however the gain was 0.0 ms/s.
At this point, the system drifted with its own drift rate.

Typical drifts of local time are in the area of a few μs/s. However, the smallest gain obtainable on Windows XP is 1/156250 = 6.4 μs.
In practice, the drift may be higher than the smallest gain setting. This way, a final adjustment step may not move in the desired direction.
This can be seen in Fig. 4.3.3. As mentioned, the whole scheme of how and when the various gain settings are applied is worked out ahead
of the actual adjustment; however, the local drift can add a considerable offset when the adjustment takes many hours.

As described in 4.2, a lot can fail during an adjustment on newer Windows versions. The following plot was recorded during an adjustment on Windows 7:

Fig. 4.3.4: Calibrated performance counter frequency during a system time adjustment (Windows 7).

The initial offset is about -40 ppm. The jump to 540 ppm indicates an initial gain of about 580 ppm or μs/s.
Due to poor resolution (granularity of gain), the sign of the adjustment gain changes after just 2 steps and remains there
for a long time (at least for another day). This is a typical example of a failing system time adjustment on a Windows 7 system.
The offset time is basically the sum of the adjustments and is completely messed up (large negative offset) during this attempt.

Windows 8 has fixed the limited resolution of dwTimeAdjustment and shows adjustments comparable to Windows XP.
The following two plots show a system time adjustment initiated by the Windows 8 internet time GUI:

Fig. 4.3.5: Calibrated performance counter frequency during a system time adjustment (Windows 8).

Fig. 4.3.6: NTP offset during a system time adjustment (Windows 8).

NTP monitoring was enabled at 10:15:15. From this point in time no adjustment was active, the system drifted at about 14.4 μs/s
until 10:33:13 when the NTP offset reached 0.5 s (500 ms) and the system time adjustment was enabled. The procedure was performed
by Windows in 11 steps, starting with dwTimeAdjustment = 156014:

The list shows the progress of the adjustment for each setting of dwTimeAdjustment followed by the period of time
during which dwTimeAdjustment was active. The gain was calculated using the expression given in 4. Consequently,
the adjustment contribution and the remaining offset was calculated. The adjustment scheme looks identical
to the scheme observed on Windows XP. Presumably no changes have been made to the systems adjustment tool.
However some more details can be extracted from the list above:

It extends the duration to achieve a similar effect. However, the duration is unnecessarily fixed
to multiples of 1024 seconds (156004: 2048 s, 156003 3072 s ...).

Reaching dwTimeAdjustment = 156003 causes the desired gain of 12.82 μs/s to be below the systems drift.
From this point onwards, the adjustment gain is not capable to compensate for the systems drift.
This also becomes very obvious in the NTP offset plot, from about 13:06 the offset starts to increase again.

Adjustment is effectively disabled at 15:06:17 by setting dwTimeAdjustment to 156001, which causes the (mean-) gain
to be zero. But lpTimeAdjustmentDisabled remains FALSE for an unknown reason. Even many hours later (past 19:10:00)
lpTimeAdjustmentDisabled was kept FALSE by Windows.

The observed offset at the end of the active adjustment was approx. 270 ms. The total adjustment time was 16386 s
(10:33:11 to 15:06:17, 16 x 1024 s). The systems drift was 14.4 μs/s. At a drift rate of 14.4 μs/s the system drifted by
235.36 ms over the 16386 seconds. The difference to the observed offset of 270 ms is 34.64 ms. This corresponds to the
remaining offset derived from the adjustment progress table.

This evidently shows that Windows calculates an adjustment scheme based on a one-time offset measurement ahead of the
actual adjustment. Unfortunately the scheme captured here does allow for a remarkable remaining offset. The drift is
not taken into account at any time. This way an adjustment, like the adjustment shown here, may take several hours to
adjust the offset into the few milliseconds regime and just about the same time to be where the offset was prior to
the attempt to adjust.

Larger offsets are not adjusted using such a scheme. An offset of say 10 seconds is simply corrected by setting
the system time in one shot. This produces a jump in time which may be confusing to software, particularly when the
jump in time is backwards.

4.4. Synchronizing to an NTP time provider

Windows broadcasts a WM_TIMECHANGE message to all top level windows when a system time change occurs. This can be used to detect
changes of system time but it requires a window. However, there is no notification when the system time is adjusted. As a result, the system time changes
gradually without any notification other than the actual changes in the flow of time. The only way to check this is through a
frequent call to GetSystemTimeAdjustment. This is an obvious drawback. The state of such asynchronous behavior can only be closely
estimated by calling GetSystemTimeAdjustment frequently.

Time control with high accuracy, as proposed by the Windows Timestamp Project, cannot accept the uncertainties and inaccuracies
described here. The proposed solution is continuous synchronization of the system time to a network time using NTP. This automatic
adjustment can be enabled by checking the "Autoadjust" checkbox of the GUI (Fig. 4.3.1). Synchronizations of the local time may
still occur asynchronously when scheduled by the operating system; however, the service described here is capable of detecting
and canceling them. Nevertheless, disabling the automatic synchronization provided by Windows (see the Windows GUI in section 1)
is recommended in order to obtain the greatest accuracy.

The following graph shows a Windows 8 system:

Fig. 4.4.1: Drift and autoadjust on a Windows 8 system..

NTP monitoring was started at around 18:16 and the local time drifted at a rate of about -14.2 μs/s. The NTP offset increased
from around 0.0005 s to around 0.015 s within the next 19 min (green plot line). At about 18:35, the autoadjust was enabled
and the local time was synchronized to the network time.

The effect of the system time adjustment on the performance counter frequency has been described in section 4.3.
The plot of the calibrated performance counter offset for the adjustment shown in Fig. 4.4.1 is given below:

Fig. 4.4.2: Adjustment steps on a Windows 8 system.

Fig. 4.4.1 shows that the network time is running faster and the local time loses about 14 μs/s. Positive gains are required to catch
up with the network time. The time service started by applying the smallest positive gain with dwTimeAdjustment = 156002. This resulted
in a gain of 0.0064 ms/s. Afterwards, the value of dwTimeAdjustment was incremented periodically. At a value of 156051, the gain
increased to 0.3205 ms/s. The dwTimeAdjustment was decremented periodically after half of the desired offset was adjusted.
A positive gain causes the system time to progress faster; the calibrated performance counter frequency consequently
gets lowered with positive gains. As already mentioned, the calibrated performance counter offset is normalized to
the performance counter frequency given by the system to show ppm. As a result the plot effectively shows negated gain
values (e.g. a gain of +18.2 μs/s will show as -18.2 ppm).

The continuous adjustment results in a mean offset of the network time to local time in the range of a few 100 microseconds.
However, this may be affected by network bandwidth and/or NTP server quality. The network time server pool used here is
pool.ntp.org (it is highly recommended to read the information provided by this site).
The accuracy of servers provided by this source typically outperforms the accuracy of time.windows.com. The available bandwidth is essential
for very high accuracy. Heavy traffic on the network connection may temporarily drop the level of accuracy to within a few milliseconds.

The next graph shows a continuous adjustment interrupted by a three minute drift phase in between to highlight the narrow band
in which the NTP offset is held during the adjustment:

This figure was taken as a screenshot of the GUI to show the estimated local drift. This local drift can be estimated from the mean
of the applied gains after a few minutes of continuous operation of "autoadjust". Its value appears in the "all output" tab and at the
end of the NTP status line when available.

The quality of adjustment becomes visible when the network time offset drifts. In just three minutes, the offset drifted to about 2.7 ms.
If high accuracy is required, it is not only necessary to synchronize the local time to a network time periodically; it is essential
to synchronize it continuously.

Note: Version 1.70 introduced the precision mode. The NTP capture leaves this mode when the offset exceeds 2 ms and re-enters it
when the offset is below 1.5 ms. This behavior was already visible in fig. 4.3.1 and becomes also visible here.

4.5. Conclusions

Windows synchronization to a network time reference has proved to not be very accurate. In particular, Windows versions VISTA and 7
seem to have lost some of the capabilities for some unknown reason. Unfortunately, there is not much information on this issue and
the little information available basically says that Windows time synchronization should not be expected to be more accurate than a
few seconds and that there may be a mishap in the behavior of SetSystemTimeAdjustment with respect to the meaning of the value of
dwTimeAdjustment.
Only Windows 8 has now overcome these drawbacks and its system time adjustment performs like it did on Windows XP.

Unfortunately, there are still many NTP synchronization packages around which operate under the assumption of the current
MSDN description that "For each lpTimeIncrement period of time that actually passes, lpTimeAdjustment will be added
to the time of day". Evidently, this assumption is not true for Windows VISTA and Windows 7.
These versions need software that is capable of dealing with the artifacts described here to set the system
time correctly to obtain good accuracy.

Offsets of system time may drift seconds per day. Even on systems with a low drift rate the drift can easily reach half
a second per day. This can only be overcome by a correction of the systems knowledge of its clock frequency. Newer Windows
versions calibrate the performance counter frequency (result of QueryPerformanceFrequency) at boot time when operating with TSC and/or HPET.
This was initially done by Windows 7 and has improved with Windows 8. But there does not seem to be an on the fly correction of
this value while a network time synchronization occurs. This is basically the reason for the noticeable drift and the need for
a continuous adjustment. Windows 8.1 has not shown any improvements with respect to the "build in" system time adjustment.

A pdf version of "Part II: Adjustment of System Time" can be downloaded here.

Don't miss more details described on the "News" page.
A pdf version of the News History can be downloaded here.

Note: Windows is a registered trademark of Microsoft Corporation in the United States and other countries.
The Windows Timestamp Project is an independent publication and is not affiliated with, nor has it been authorized, sponsored, or
otherwise approved by Microsoft Corporation.