Friday, March 24, 2017

On Linux vDSO and clock_gettime sometimes being slow

Like the previous post on this somewhat dormant blog, I want to share an oddity I discovered that no search engine could really find for me - even though once I found what the problem was, it turns out I was by no means the first person to discover this.

Some system calls that are used extremely frequently in Linux can be speeded up by a mechanism called vDSO: a virtual dynamically linked shared object. In this way, the kernel can publish selected functions that can run straight in userspace. This means a regular program dynamically links in bits of kernel supplied code, which in turn means that there is no overhead to "jump into the kernel" to execute code. All good.

One way you notice your system call has received the vDSO treatment is that "strace" and friends no longer see it, since there actually is no system call anymore.

Of specific interest are time related calls, like gettimeofday and clock_gettime. Many programs make a ton of these calls, and little can be done to prevent it. You might want to cache the current time perhaps, but to do so, you'd need to know the time. So quite some code relies on time related system calls being really really fast.

Within PowerDNS software (dnsdist), we use clock_gettime() in hopes of getting the kind of timer we want, and also one that is fast and cheap for the kernel to provide. While doing "million QPS" scale benchmarking of dnsdist today, we did a strace to find out what dnsdist was doing, and lo, within there we found millions and millions of system calls to clock_gettime(). Help!

My first thought was that the platform we were on might perhaps not actually support clock_gettime as vDSO. To figure out what is actually in the kernel supplied vDSO, I used a program called dump-vdso.c that can be found strewn across the web. This emits the library on stdout, and we can then run the regular objdump tool on it to get:

From this we see that clock_gettime is in fact in there. So why was it not getting used? I donned the protective gear and the spelunking equipment and entered the caves of glibc, where I found several nested files, each #including a file from a parent directory, in an impressive attempt to abstract out per CPU, per OS and C library logic. I stared at that code for what felt like a long time, but it appeared to check lots of things, to eventually always end up calling __vdso_clock_gettime(). Weird.

I then headed to __vdso_clock_gettime() in the Linux kernel where things finally became clear. It turns out the vdso code ITSELF will generate an actual system call for many timers you can request. In fact, this happens for all cases except CLOCK_REALTIME, CLOCK_MONOTONIC, CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE (as of Linux 3.13 up to 4.11-rc3).

So that solved the mystery: the vDSO stuff was working, but it was itself causing an old fashioned system call. Perhaps the other timers are too difficult (or perhaps even impossible) to supply from the userspace context.

3 comments:

On skylake cores, the vdso times out and always calls the syscall (tested in 4.9.54) since it wants to talk to the hypervisor (that isn't there). disable virtualization and the cycles count for clock_gettime() drops from 6900 cycles to 59 cycles.

Further analysis shows that it was clocksource=hpet that caused the most TSC problems. Dell has taken to modifying the TSC rate on the fly, so it no longer is a constant which makes this much worse than it first was. Since HPET is also unstable as far as the kernel is concerned, it never settles, and will eventually revert to clocksource=tsc. Turning off virtualization just made the kernel realize that hpet was unreliable faster, and it switches back to TSC, which is also unreliable. But ptpd/timekeeper compensates for that.

It seems like even though CLOCK_MONOTONIC along with others you listed might not do the actual system call with vDSO enabled, this post suggests that it might still be very slow (e.g. few microseconds) due to the implementation:https://stackoverflow.com/questions/45863729/clock-gettime-might-be-very-slow-even-using-vdso