Contents

Background on time scales (TAI, UTC, UT1)

Let me start with some background. I discussed elsewhere
the reason why the day is ever so
slightly longer than
86400 SI
seconds. One of the consequences of this is that there are
several different time scales involved in accurate timekeeping (which
I discuss in more
detail here). One
is TAI, which is a pure linear time scale,
maintained by several atomic clocks across the globe and
counting SI seconds on the geoid; it has been maintained
since 1958 (though it can conceptually be extended back in the past as
a variant of Terrestrial Time), it was initially synchronized
with universal time and has now drifted away from it by approximately
34.136 seconds today. Another
is Universal
time UT1 (there are other variants of universal time
such as UT2, but they will not concern us here): though
there are subtleties, this is essentially the real mean solar time on
Greenwich meridian, and it is the parameter of the real orientation of
Earth in space. So as measured in UT1, one solar day is
always 86400 seconds: conversely, this means that UT1
does not measure actual SI seconds, but rather some kind
of angle parameter. This time scale makes sense very far back in the
past.

Finally, we
have UTC, which is the most important time
standard as it is the basis of civil time: this is a compromise
between TAI and UT1, in the following
manner: since 1972[#], it always
differs from TAI by an integral number of seconds, and
always stays within 1s of UT1; the way these goals are
accomplished is by inserting (or possibly,
subtracting) leap
second discontinuities into UTC, at the end of June
or December of a year (or possibly March or September, but these
possibilities have never been used), as decided by the International
Earth Rotation Service and published at least three months in advance
(for
example, this
is the decision implying that no leap second will be inserted at
the end of this year). Unlike TAI and UT1,
the UTC time scale should not be considered as a pure
real number (or seconds count): instead, it should be viewed as a
broken-down time (year-month-day-hour-minute-second-fraction) in which
the number of seconds ranges from 0 to 60 inclusive (there
can be 61 or 59 seconds in a minute); during a positive leap second
the number of seconds takes the value 60 (while a negative leap second
would skip the value 59, but this has never occurred). For example,
today at noon civil time in Paris, UTC was (exactly)
2010-12-27T11:00:00.000 while TAI was (exactly)
2010-12-27T11:00:34.000 and UT1 was (approximately)
2010-12-27T10:59:59.863. The last leap second took place at the end
of 2008 (and increased the TAI−UTC
difference from 33s to 34s): the following three events were separated
by half-second intervals:

(Note that the instant at which the leap
second occurs is the same in every time zone, the local time is not.
So the last leap second, 2008-12-31T23:59:60 in UTC, was
2009-01-01T00:59:60 in Paris (+0100), 2008-12-31T18:59:60 in New
York (−0500), 2009-01-01T10:59:60 in Sydney (+1100), and so on:
Australians get their leap second in the end of the morning on New
Year's day or July 1, and Americans get theirs during the evening of
New Year's eve or June 30.)

If we attempt to condense UTC to a single number (say,
the number of seconds since 1970-01-01T00:00:00 or since
1900-01-01T00:00:00, or the number of 86400s-days since
1858-11-17T00:00:00, or something of the sort), we encounter the
problem that the same value can refer to two different instants since
the clock has been set back one second (negative leap seconds, of
course, would cause no such difficulty). One of the themes of what
follows is what Unix functions such as gettimeofday()
or clock_gettime() can, should, do or might return at
these various points in time.

What the spec says

The earlier versions of the Single
Unix specification stated that: The gettimeofday()
function obtains the current time, expressed as seconds and
microseconds since 00:00 Coordinated Universal Time
(UTC), January 1, 1970 (a moment known as the Unix
epoch). The meaning of this phrase is a bit obscure, because, after
all, leap seconds have elapsed just the same as non-leap seconds (so
one might interpret this sentence to mean the number of
seconds actually elapsed in a linear time scale such
as TAI since the Unix epoch). However, the intent is
clear, and some documentations specify that leap seconds are not
counted or ignored; this still leaves some room for doubt
as to what happens with the “rubber seconds” used
by UTC between 1970 and 1972. To make things even
clearer, versions 3 and 4 of the Single Unix specification have
introduced the following wording:

§4.15. Seconds Since the Epoch

A value that approximates the number of seconds that have elapsed
since the Epoch. A Coordinated Universal Time name (specified in
terms of seconds (tm_sec), minutes
(tm_min), hours
(tm_hour), days since January 1 of the year
(tm_yday), and calendar year minus 1900
(tm_year)) is related to a time represented as
seconds since the Epoch, according to the expression below.

If the year is <1970 or the value is negative, the relationship
is undefined. If the year is >=1970 and the value is non-negative,
the value is related to a Coordinated Universal Time name according to
the C-language expression,
where tm_sec, tm_min, tm_hour, tm_yday,
and tm_year are all integer types:

The relationship between the actual time of day and the current
value for seconds since the Epoch is unspecified.

How any changes to the value of seconds since the Epoch are made to
align to a desired relationship with the current actual time is
implementation-defined. As represented in seconds since the Epoch,
each and every day shall be accounted for by exactly 86400
seconds.

The key point here is that instead of using a vague expression such
as number of seconds elapsed since <Epoch> not counting leap
seconds, a precise expression is given to convert a
broken-down UTC value to a time as seconds since the
Epoch. But the meaning is the same. And the consequence is as
mentioned above: by reducing UTC to a single number, we
have an ambiguity in the actual instant being referred. For example,
if we apply the formula to the same six instants as above, we get:

UTC=2008-12-31T23:59:59.0 → seconds_since_epoch=1230767999[.0]

UTC=2008-12-31T23:59:59.5 → seconds_since_epoch=1230767999[.5]

UTC=2008-12-31T23:59:60.0 → seconds_since_epoch=1230768000[.0]

UTC=2008-12-31T23:59:60.5 → seconds_since_epoch=1230768000[.5]

UTC=2009-01-01T00:00:00.0 → seconds_since_epoch=1230768000[.0]

UTC=2009-01-01T00:00:00.5 → seconds_since_epoch=1230768000[.5]

So the value 1230768000 can refer to two different consecutive
seconds: the leap second itself (that is, the start of the leap second
if we indicate a precise instant) and the next second (that is, the
end of the leap second); or, if we
consider seconds_since_epoch to be a
fractional quantity (as indicated within brackets), then every value
between 1230768000.0 inclusive and 1230768001.0 exclusive refers to
two instants at one second interval. So it is not possible to break
down a seconds_since_epoch value to hour,
minutes and seconds (and, based on this value, one could never
actually infer a time of 23:59:60).

This is what the spec says; however, does not mean
that gettimeofday() actually returns these values on real
Unix systems. What actually happens depends on the implementation
details.

What actually happens

The TAI−10 setup

First, there are some people who suggest ignoring the spec and
basing gettimeofday()not on UTC as
the spec defines it (number of seconds counted by UTC
since 1970-01-01T00:00:00(UTC) excluding leap seconds,
see above) but on TAI as the number of seconds since
1970-01-01T00:00:10(TAI), sometimes summarized
as TAI−10. The reason for the 10s offset is
that it was the TAI−UTC offset in 1972
when leap seconds were introduced, so that TAI−10
coincides, post-1972, with the number of seconds counted
by UTC since
1970-01-01T00:00:00(UTC) including leap seconds.
The deduction of leap seconds is then left to the time zone file,
i.e., converting to UTC or any civil time is considered
as a time zone shift. This proposal was made by Arthur David Olson
(founding author of
the timezone
database) and, for this reason, a series of time zones, known as
the right/ time zones exist in the database which assume
that the system clock is set to TAI−10. The main
advantage of this setup is that it is unambiguous: for the sample six
instants one would have

TAI=2009-01-01T00:00:32.0 → tai_minus_10=1230768022[.0]

TAI=2009-01-01T00:00:32.5 → tai_minus_10=1230768022[.5]

TAI=2009-01-01T00:00:33.0 → tai_minus_10=1230768023[.0]

TAI=2009-01-01T00:00:33.5 → tai_minus_10=1230768023[.5]

TAI=2009-01-01T00:00:34.0 → tai_minus_10=1230768024[.0]

TAI=2009-01-01T00:00:34.5 → tai_minus_10=1230768024[.5]

The advantage of this scheme is that TAI−10
uniquely refers to an instant in time, and the difference between two
such values determines the interval between these instants (but see
the next paragraph for a related misconception). It is only by
setting the system time to TAI−10 and using
the right/ time zones that a Unix system can presently
display the time correctly during a leap second
(as 23:59:60 in the +0000 time zone, say). The
disadvantages, however, in my mind outweigh the advantages.
Using TAI−10 breaks the spec: it will confuse not
only programs that make an explicit assumption about the reference of
time (say, ephemerides programs), but also interoperability
(timestamps written in many Unix-related filesystems and formats,
e.g., tar: the twenty-so second difference
between UTC and TAI−10 is no longer so
negligible that we can ignore it) and probably a number of programs
that will use the fact that one can convert time to date by simply
integer dividing by 86400. Also, there is no reliable way to
synchronize clocks to TAI (GPS receptors
and NTP servers, for example, only
broadcast UTC, not TAI). And even ignoring
the fact that we have an explicit commitment to UTC by
the spec and the notion of civil time, using some form of universal
time as a time basis is probably a good thing because we are still
more concerned with the position of the Sun in the sky than some
abstract thing like the flow of time measured by atomic clocks on the
geoid. Maybe using UT1 as time basis would have been
smarter (as long as PC's don't have atomic clocks built
into them[#2]), because it is
easier to skew the clock ever so slightly than to insert a full second
every odd year, but anyway.

Digression: measuring intervals

At this point, I should probaby rebut a misconception: the idea
that one can/should use gettimeofday() to measure
intervals. This is wrong not merely because leap seconds can set the
system clock back by one second every now and then: there are many
other reasons why the system clock can be readjusted after getting out
of synch, and one should consequently avoid taking the difference
between values of gettimeofday() to measure delays.
Instead, use clock_gettime(CLOCK_MONOTONIC,...):
the gettimeofday() function should only be used to obtain
the current date and time (as a wall clock), not to measure
intervals (as a stopwatch) except if those intervals span
over several months (or, perhaps more to the point, if they are
expected to survive over a reboot). One should imagine that the
accuracy of gettimeofday() is never too good and the
error can vary from measure to measure (because the clock can be
reset), while clock_gettime(CLOCK_MONOTONIC,...) gives
you a slowly changing error which cannot tell you the date but is more
appropriate for measuring intervals in time.

Keeping UTC in reality

Now return to gettimeofday(): what if we do
follow the spec and use UTC as a time basis for the
system clock? If the system is completely unaware of leap seconds, it
will tick past the leap second without noticing it, and (if
synchronized with some external time source like NTP)
suddenly realize that it is ticking one second early and attempt to
resynchronize. One possibility is that resynchronization will be
achieved by slowing the clock by a few dozen parts per million, thus
effectively diluting the error caused by leap second into a few hours
or perhaps a day. This is a common view of things (and perhaps
desirable, see the suggestion about CLOCK_UTS below), but
I am uncertain whether any real-life system (typically a Unix system +
an NTP implementation or some other timekeeping device)
actually does this. Another possibility is that resynchronization
will be performed brutally, by stepping the system clock back one
second: this is like counting the leap second, except that it is
performed a few minutes or hours too late and at an unpredictable
moment: obviously undesirable. Now assume the system clock handling
part in the kernel knows about leap seconds and tries to handle them
gracefully: what can it actually do?

The NTP protocol attempts to do things as follows
(described here in
details by
the inventor
of NTP): NTP packets record time as
seconds counted by UTC since
1900-01-01T00:00:00(UTC) excluding leap seconds, but that
is not really relevant and the actual handling of the leap second is
really left
to
the kernel
clock discipline. The idea is that the system clock will be set
back by one second during the leap second, but that for the duration
of this second, gettimeofday() will nearly stall: it
returns a constant value incremented by the smallest possible
increment at each call. (Personally, I don't really see the point of
the increment: anyone assuming that gettimeofday() must
strictly increase between two subsequent calls is in error.) So if we
take our recurring example of the leap second at the end of 2008 and
assume gettimeofday() is called every half-second as
displayed, we would get:

UTC=2008-12-31T23:59:59.0 → gettimeofday() returns 1230767999.0

UTC=2008-12-31T23:59:59.5 → gettimeofday() returns 1230767999.5

UTC=2008-12-31T23:59:60.0 → gettimeofday() returns 1230768000.0

UTC=2008-12-31T23:59:60.5 → gettimeofday() returns 1230768000.000001

UTC=2009-01-01T00:00:00.0 → gettimeofday() returns 1230768000.000002

UTC=2009-01-01T00:00:00.5 → gettimeofday() returns 1230768000.5

One could argue that this violates the spec: I don't think it does
because the spec only seems to say something about the integral part
of seconds_since_epoch, but anyway that would
be reading the spec in a very anal way. However, it is still
profoundly unsatisfactory, because we lose all accurate timekeeping
during the leap second itself, and this still leaves no way of
displaying a time as 23:59:60 (or distinguishing 23:59:60
from 00:00:00). There is no point in using NTP to
synchronize clocks to a few milliseconds if we have to lower the
expectation to one second every time a leap second is inserted.
Incidentally, what the actual NTP protocol should show
during the leap second itself
is very
obscure (but that doesn't matter much because the servers can
always be interrogated again one second later).

A proposal: CLOCK_UTC and friends

What I suggest to fix this mess is to create a new clock specifier
to clock_gettime(), called, for
example, CLOCK_UTC, which differs
from gettimeofday() (or
equivalently CLOCK_REALTIME) in only one
respect: during a leap
second, clock_gettime(CLOCK_UTC,...) would return the
same value in tv_sec as during the previous second, and a
value in tv_nsec which is greater than 1000000000.
E.g.:

There are a number of nice features about this scheme. The
creation of a new clock CLOCK_UTC is destined to protect
compatibility on gettimeofday() for those programs which
believe that tv_nsec should never exceed 1000000000. The
values of tv_sec and tv_nsec returned
by clock_gettime(CLOCK_UTC,...) uniquely determine the
full value of UTC and (with the knowledge of the leap
seconds table) the instant in time. The duration of the leap second
is characterized by the simple
test tv_nsec>=1000000000L. The boundaries of days is
simply determined by the value of tv_sec being a multiple
of 86400 (whereas this is not the case
with gettimeofday() as above: it returns a multiple of
86400 at 23:59:60, just one second before the beginning of the new
day), so converting a time to a date becomes simply possible. And
even displaying the time becomes a simple matter of adding 1 to
the tm_sec field in a broken-down time (returned
by gmtime() or localtime()) if
the tv_nsec value exceeds 1000000000, after the
time has been broken down:

Assuming clock_gettime(CLOCK_UTC,...) works as I
suggest, this would correctly display the time
as 23:59:60 (and fraction) during a leap second, and the
same trick works for local time (using localtime()
instead of gmtime()) as well as universal time.
The strftime() function is designed to work with a value
of tm_sec of 60, because leap seconds can exist (and
indeed do if one uses the right/ time zones).
Using while instead of if in the
test t.tv_nsec >= 1000000000L above ensures that the
code will still work even if double leap seconds are introduced some
day. So this code is very robust. Negative leap seconds, of course,
present no difficulty: they are simply skipped.

Similarly, should one wish to convert the return value
of clock_gettime(CLOCK_UTC,...) to an accurate value
of TAI−10, one would simply consult a table of leap
seconds, count the number of those whose time of occurrence is less
than tv_sec (and subtract negative leap seconds whose
time of occurrence is similarly less than tv_sec) to
compute the (TAI−10)−UTC offset,
and add this result to tv_sec (and then subtract
1000000000 from tv_nsec and increase tv_sec
so long as tv_nsec exceeds 1000000000). The point here
is that whether or not the leap second has elapsed (and hence the
value of (TAI−10)−UTC) is
entirely reflected in the tv_sec value,
ignoring tv_nsec. While this computation can be done in
user space, it might be argued that the kernel could/should store
the TAI offset and provide
a clock_gettime(CLOCK_TAIMINUS10,...) as well.

Besides clock_gettime(CLOCK_UTC,...)
and possibly clock_gettime(CLOCK_TAIMINUS10,...) which
have been discussed here, it might also be desirable for the kernel to
provide a clock_gettime(CLOCK_UTS,...) clock,
where UTS stands for Universal Time, Smoothed:
this clock would be equal to UTC except for a few hours
after a leap second, where it would slowly compensate for the latter
by speeding up by a small (but unspecified) factor. This would be for
the benefit of programs wishing to “ignore away” leap
seconds while keeping a reasonable level of precision when measuring
durations.

So far I haven't actually looked into what goes on inside the
kernel (say, Linux): the little that I have seen
(in kernel/time/timekeeping.c and so on) is messy and
undocumented, so it's hard to gauge how hard it would be to actually
implement this clock_gettime(CLOCK_UTC,...) proposal;
and, of course, it would probably be even harder to get it accepted.
Even assuming it can be done, this does not fully solve the leap
second mess: some filesystems provide sub-second granularity on
timestamps, and it is hard to fix the stat() return
structure to provide correct access to times inside a leap second
without breaking compatibility with those programs that might have
already assumed that the sub-seconds field in struct stat
will never exceed 1000000000. It is also unclear which clock or which
time scale is referred to when a timeout is specified in a function or
system call
(e.g., select(), pthread_cond_timedwait()…),
though the
Linux timerfd
interface solves this particular problem. And, of course, there
are zillions of programs in existence which simply
call gettimeofday(), that need to be fixed one way or
another to work properly inside a leap second; and zillions of
programming languages beside C which would need to be given access to
some accurate way of measuring time. I find this horribly
depressing.

Footnotes

[#] The UTC
time scale has existed since 1961, but before 1972, the relation
between UTC and TAI was more complex: not
only were there discontinuities, but also the length of the second
itself was adjusted to keep closer to UT1;
and TAI−UTC was not kept integral.
For example, the Unix epoch of UTC=1970-01-01T00:00:00
equals TAI=1970-01-01T00:00:08.000082 (exactly).

[#2] A
typical PC quartz seems to have an accuracy of about ten
to fifty parts per million, i.e., a couple of seconds per day.
Counting SI seconds without any attempt at correction
gives UT1 an accuracy of about 2ms per day, i.e., 20
parts per billion, so that's around a thousand times better.
Hence, without external assistance (say by NTP, an atomic
clock or a GPS receiver), the typical PC
quartz cannot see or hope to see the difference between universal time
and atomic time.