Sorry for the mail flood. This is the last one and then I'm going to wait for some reactions.

On Wednesday 11 March 2009, Frans Pop wrote:> So, lets look next what happens if I allow clock->error to be changed> here. This makes the boot fail and I believe that this is the critical> change in 5cd1c9c5cf30.[...]> Note that clock->xtime_nsec is now running backwards and the crazy> values for clock->error.>> From this I conclude that clock->error is getting buggered somewhere> else: we get a completely different value back from what is calculated> here. The calculation here is still correct:> $ echo $(( -4292487689804800 + (-256 << 24) ))> -4292491984772096>> I suspect that clock->error running back is what causes my hang.

s/clock->error/clock->xtime_nsec/ of course.

Looking a bit closer at what Roman's patch 5cd1c9c5cf30 does, I see this:

So, in the old situation the code first added xtime.tv_nsec toclock->xtime_nsec and later subtracted it again, so there's symmetry.

In the new code we no longer do the first, but still do the second. Thatseems strange and probably upsets assumptions in the code in between, whichincludes the call to clocksource_adjust(). AFAICT this is the root cause ofthe overflow visible in my earliest traces.I've done some tries to correct that, but did not find anything that reallyworked.

I also do now know with near certainty where the system hangs with thevanilla 2.6.28.7: in the 'while (offset >= clock->cycle_interval)' loop inupdate_wall_time. That loop should probably have some mechanism to warn ifit's running wild...This whole code is pretty tricky, but I'm convinced Roman's patch isstructurally broken.