Requirements

(these notes are from the face-to-face meeting on Jan 10th)

- The clock needs to support instrumentation lasting during the boot up period. This
may be up to 2 minutes. For warm resets, the clock may not be reinitialized (is this true?)
Therefore, even for bootup measurements, the clock may be running far into its cycle when
the measurements are made.
- Note that embedded processors have CPU clock ranging from a few MHZ to
a few GHZ. A 1 GHZ clock will overrun 32 bits in about 4 seconds.
- result is that we should probably use a 64-bit clock value.
- for supporting periods of several seconds, a clock driver will need
to manage upper bits to handle hardware clock overrun or wrap.
- There should be a config option to turn on or off support for the feature
- The instrumentation clock must be available early
(before calibrate_delay()). It should probably be initialized inside setup_arch()
- Note that a free-running clock could be set up by firmware
(before kernel start)
- Note that often you can use same clock as system clock (used for
jiffies)
- 1 usec or greater resolution is desired.
- 100 usec accuracy is desired for our current bootup time measurement work
- the value returned should not fluctuate with changes in CPU frequency.
- Alternative, the CPU frequency should not be modified during the
timing period. For the BTWG, the timing period is system bootup, so it
should be easy to avoid changing CPU frequency during this period. (??)
- the value returned should be monotonically increasing (except for rollover or
some process specifically setting the clock)
- the values returned need not be linearly related. That is, it is acceptable
for the values to be non-linear, as long as the conversion to time results (sec, nsec)
is correct. Thus, as one example of value management, it is possible to
store the hardware clock value in the low 32 bits, and the number of rollovers
in the high 32 bits. This works even if the clock source itself is less than
32 bits wide (eg 12 bits, or 16 bits).
- the API should be available on all architectures of interest (eg. cpu cycle read
is not available on ARM or SH)
- it should add minimal overhead to the system.

Proposed Specification

Old proposed spec.

- unsigned long long get_cycles(void) - which maps to get_arch_cycles()
- unsigned long cycles_to_usec(unsigned long long) - which maps to arch_cycles_to_usec()

Problems:

- usecs returned in a unsigned 32-bit overflows at about 4000 seconds. This
should be enough for a reasonable bootup time.

Another old proposed spec. (deprecated)

New proposed spec.

- use sched_clock(), and admonish board support authors to support it properly
- also, document methods to provide scaling factor prior to time_init()
to that measurements are available from very start of kernel execution.

printk-times

In the patch submitted to the forum, this was only supported on x86.
The highres_timer_ticks_to_timeval used a hardcoded value which had
to be set at compile time. highres_timer_read_ticks used a read of
the Time Stamp Counter (TSC) of the central processor.
Also, highres_timer_ticks_to_timeval did not handle rollover
from fast to slow very well.

Current implementation for x86 (see Printk Times) uses hardcoded
cpu_fixed_khz (in the 2.4 version of the patch), which requires
a code change for different targets.

These techniques are not portable.

Problems:

- separation of timer value into high and low 32-bit values
doesn't seem necessary
- it doesn't resemble other clock read APIs in the kernel
- it should use term "clock" instead of "timer"
- uses hard-coded (compiled in) value for conversion function

Current Linux (2.4) get_cycles()

Linux version 2.4.20 has:

- typedef unsigned long long cycles_t
- cycles_t get_cycles(void) defined in include/asm/timex.h
This returns 0 on x86 processors without a TSC, and 0 on some other processors
- supported on x86, ppc, mips, alpha
- not supported on arm, sh

There appears to be no supporting function to convert to usecs.

Current Linux (2.6) sched_clock()

Linux version 2.6.7 has:

- sched_clock() - returns current time in nanosec units.
- unsigned long long sched_clock(void)
- this routine won't function correctly (it returns 0) until a valid scale
factore is set (for x86 and ppc). For x86, this means until the routine
set_cyc2ns_scale() is called. This is normally called from time_init().
- on x86, it reads TSC.
- on x86, found in arch/i386/kernel/timers/timer_tsc.c
- set_cyc2ns_scale() (x86 only)
- static inline void set_cyc2ns_scale(unsigned long cpu_mhz)
- initializes the conversion factor for the clock scaling of sched clock.
- this is also called?? for CPU frequency changes

- this only give jiffy accuracy (10 ms, when HZ=100)
- this is completely unacceptable for microsecond timings

do_gettimeofday()

This is supported all platforms. Most (??) have sub-jiffy resolution (usecs
or better?).

- When is this available in boot cycle?
- What is overhead of call?

I'm assuming the overhead of the call to do_gettimeofday is what has prompted
the proposal for Fast Timestamps (see next section).

Todd Poynor writes:

Re: gettimeofday(), the implementation can vary considerably between
architectures. Generally, I believe architectures read the count of
jiffies and also a board-specific microsecond timer source, if any, to
add the number of microseconds that have elapsed since the last timer
interrupt bumped jiffies. And a spinlock is expected to be held during
this operation. gettimeofday() is available everywhere, but not all
boards necessarily implement microsecond accuracy -- I don't know
statistics on this. You would probably also need some hook to
compensate for the place in the boot sequence in which the system time
is seeded from the RTC or set via settimeofday, etc. gettimeofday()
isn't setup immediately at kernel startup, but not long afterwards, and
it would probably be easy to force the init earlier.

Directly using the microsecond-level accuracy time source for
gettimeofday would be board-specific, so an API wrapper would still be
needed.

Greg Ungerer writes:

On many platforms (I would think most) it [gettimeofday] gives much better
than jiffy resolution.

Looking around the underlying architecture and platform code
in 2.6 it looks like most have code to deal with determining
the time reasonably accurately in do_gettimeofday(). Even on
the small/slower embedded processors I deal with this is easy
to do, and mostly gives resoutions in the usec's range.

The support do this on an architectural and platform basis
is flexible enough and easy to implement I would argue there
is more value in implementing a better do_gettimeofday()
[if it is only jiffie resolution on your platform of interest]
than to have a separate API. A good gettimeofday helps all
system timing calculations.

Fast Timestamp (proposal)

This was a proposal for a mechanism that could quickly record a timer value
that could be translated into timeofday (timeval) at some later time.
The purpose of this would be to separate the operations of acquiring
the timing data, and converting the units into a recognizable form.
I didn't understand the full context, but I gather that this was decoupled
to allow for very quick time recording, with later subsequent interpretation,
possible to preserve performance inside the network stacks.

1) Some kind of fast_timestamp_t, the property is that this stores

enough information at time "T" such that at time "T + something"
the fast_timestamp_t can be converted what the timeval was back at
time "T".

For networking, make skb->stamp into this type.

2) store_fast_timestamp(fast_timestamp_t *)

For networking, change do_gettimeofday(&skb->stamp) into
store_fast_timestamp(&skb->stamp)

3) fast_timestamp_to_timeval(arch_timestamp_t *, struct timeval *)

For networking, change things that read the skb->stamp value
into calls to fast_timestamp_to_timeval().

It is defined that the timeval given by fast_timestamp_to_timeval()
needs to be the same thing that do_gettimeofday() would have recorded
at the time store_fast_timestamp() was called.