Monday, February 6, 2012

In java world there is a very good perception about System.nanoTime(). There is always some guys who says that it is fast, reliable and, whenever possible, should be used for timings instead of System.currentTimemillis(). In overall he is absolutely lying, it is not bad at all, but there are some drawback which developer should be aware about. Also, although they have a lot in common, these drawbacks are usually platform-specific.

Windows

Functionality is implemented using QueryPerformanceCounter API, which is known to have some issues. There is possibility that it can leap forward, some people are reporting that is can be extremely slow on multiprocessor machines, etc. I spent a some time on net trying to find how exactly QueryPerformanceCounter works and what is does. There is no clear conclusion on that topic but there are some posts which can give some brief idea how it works. I would say that the most useful, probably are that and that ones. Sure, one can find more if search a little bit, but info will be more or less the same.

So, it looks like implementation is using HPET, if it is available. If not, then it uses TSC with some kind of synchronization of the value among CPUs. Interestingly that QueryPerformanceCounter promise to return value which increases with constant frequency. It means that in case of using TSC and several CPUs it may have some difficulties not just with the fact that CPUs may have just different value of TSC, but also may have different frequency. Keeping all that in mind Microsoft recommends to use SetThreadAffinityMask to stuck thread which calls to QueryPerformanceCounter to single processor, which, obviously, is not happening in JVM.

Linux

Linux is very similar to Windows, apart from the fact that it is much more transparent (I managed to download sources :) ). The value is read from clock_gettime with CLOCK_MONOTONIC flag (for real man, source is available in vclock_gettime.c from Linux source). Which uses either TSC or HPET. The only difference with Windows is that Linux not even trying to sync values of TSC read from different CPUs, it just returns it as it is. It means that value can leap back and jump forward with dependency of CPU where it is read. Also, in contrast to Windows, Linux doesn't keep change frequency constant. On the other hand, it definitely should improve performance.

Solaris

Solaris is simple. I believe that via gethrtime it goes to more or less the same implementation of clock_gettime as linux does. The difference is that Solaris guarantees that counter will not leap back, which is possible on Linux, but it is possible that the same value will be returned back. That guarantee, as can be observed from source code, is implemented using CAS, which requires sync with the main memory and can be relatively expensive on multi-processor machines. The same as on Linux, change rate can vary.

Conclusion

The conclusion is king of cloudy. Developer has to be aware that function is not perfect, it can leap back or just forward. It may not change monotonically and change rate can vary with dependency on CPU clock speed. Also, it is not as fast as many may think. On my Windows 7 machine in a single threaded test it is just about 10% faster than System.currentTimeMillis(), on multi threaded test, where number of threads is the same as number of CPUs, it is just the same. And on IBM Z400 workstation with WinXP System.nanoTime() is always approximately 8 times slower.

So, in overall, all it gives is increase in resolution, which may be important for some cases. And as a final note, even when CPU frequency is not changing, do no think that you can map that value reliably to system clock, see details here (this example describes just Windows, but more or less the same stuff is applicable to all other OSes).

Appendix

Appendix contains implementations of the function for different OSes. Source code is from OpenJDK v.7.

Solaris

// gethrtime can move backwards if read from one cpu and then a different cpu
// getTimeNanos is guaranteed to not move backward on Solaris
inline hrtime_t getTimeNanos() {
if (VM_Version::supports_cx8()) {
const hrtime_t now = gethrtime();
// Use atomic long load since 32-bit x86 uses 2 registers to keep long.
const hrtime_t prev = Atomic::load((volatile jlong*)&max_hrtime);
if (now <= prev) return prev; // same or retrograde time;
const hrtime_t obsv = Atomic::cmpxchg(now, (volatile jlong*)&max_hrtime, prev);
assert(obsv >= prev, "invariant"); // Monotonicity
// If the CAS succeeded then we're done and return "now".
// If the CAS failed and the observed value "obs" is >= now then
// we should return "obs". If the CAS failed and now > obs > prv then
// some other thread raced this thread and installed a new value, in which case
// we could either (a) retry the entire operation, (b) retry trying to install now
// or (c) just return obs. We use (c). No loop is required although in some cases
// we might discard a higher "now" value in deference to a slightly lower but freshly
// installed obs value. That's entirely benign -- it admits no new orderings compared
// to (a) or (b) -- and greatly reduces coherence traffic.
// We might also condition (c) on the magnitude of the delta between obs and now.
// Avoiding excessive CAS operations to hot RW locations is critical.
// See http://blogs.sun.com/dave/entry/cas_and_cache_trivia_invalidate
return (prev == obsv) ? now : obsv ;
} else {
return oldgetTimeNanos();
}
}

2 comments:

On Linux (RedHat at least) you can tune what System.nanoTime does (speed vs. precision): https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_MRG/1.3/html/Realtime_Tuning_Guide/sect-Realtime_Tuning_Guide-General_System_Tuning-gettimeofday_speedup.html