Hello Ron,
thank for tracing all this, and many thanks for pointing to the div64
bug. It would be nice, if you would open a bug report on sf.net, so we
don't forget to change the co_div64 some times. Currently I have no idea
for better function.
I don't assume that is the problem. Because the rounding error will
later adjust by multily and storing the rest in the variable
timestamp_reminder. I mean this line:
cmon->timestamp_reminder = timestamp_diff - (jiffies *
cmon->timestamp_freq.quad);
A debug version is available from here:
http://www.henrynestler.com/colinux/testing/devel-0.7.8/20100916-jiffies
I have changed the casts from "long long" to "unsigned long long" and
remove the casts where we don't need. So we would have one bit more and
no negative values.
Old:
long long timestamp_diff;
timestamp_diff += 100 * (((long long)timestamp.quad) - ((long
long)cmon->timestamp.quad));
New:
unsigned long long timestamp_diff;
timestamp_diff += 100 * (timestamp.quad - cmon->timestamp.quad)
Henry
On 16.09.2010 19:06, Ron Avriel wrote:
> Hi,
>
> Any update on this issue? The server leaped again with almost an
> identical value (30949 seconds).
> Is it possible to at least have a debug version with log prints in
> case of large leap?
> I also suggest replacing co_div64() - see below.
>
> Thanks,
> Ron
>
>
> From: [email protected]
> To: [email protected]
> Date: Sun, 12 Sep 2010 14:29:25 +0000
> Subject: Re: [coLinux-users] Very large time offset in coLinux
>
> Hi Henry,
>
> One of our servers leaped forward again. The interesting part is that
> the leap is almost identical to a previous leap.
> Last time it leaped forward by 30944 seconds, and this time by 30961
> seconds.
> Performance frequency is 3579545.
>
> Since these two leaps are very close, I have a feeling it's not some a
> random error, but rather a calculation error.
> It's possible that Windows/Linux were loaded at time of leap.
>
> I went over some of the code and found that co_div64() isn't accurate
> (!), although I couldn't explain the leap by this bug.
>
> For example,
> co_div64(0x100000000,0x10000000) returns 15 instead of 16.
> co_div64(0x1000000000000,0x10000000) returns 983055 instead of 1048576.
>
> I'm sure you'll find more accurate algorithms.
>
> Could you also go over relevant code and see if you notice any
> overflow, signed/unsigned error that can explain the leap with the
> above data?
> Would it be possible to to get a debug version to get more information
> next time the problem occurs?
>
> Thanks in advance,
> Ron