[ Cross-posting to tech-kern to try to get some help on working
out WTF if going on here. ]
Hi folks,
Can anyone explain how the x86 local APIC timer is supposed to work?
For some reason, just keeping the box the shows the symptoms in the PR
busy with builds doesn't make time unhappy, but a busy named does. This
box is a Dell (I think) dual Xeon with an Intel E7525 chipset.
I've written a couple of programs that sort of simulate the
system call sequences that the busy named generates (these are at
ftp://ftp.NetBSD.org/pub/NetBSD/misc/simonb/named-simul.tar.gz ).
Running "receiver" and "sender <host> 2000000 2000" on the same box lead
it to totally lose time the same way that named makes it loose time.
On that particular i386 MP box when idle, we get to the top of
lapic_clockintr() approx every 29926950 TSC ticks in most cases. Here's
some data showing the TSC cycles between lapic_clockintr() calls
(measured only on the primary CPU):
tsc from last clockintr
tsc cycles over 29926950
29926950 0
29926950 0
29926950 0
29950837 23887
29926973 23
29926950 0
29926950 0
29926950 0
29926950 0
29926950 0
I would have expected that if we're late getting to lapic_clockintr()
because say interrupts were blocked at the time, then we'd get to
the next (or at least some subsequent) call to lapic_clockintr() in
the future early WRT the TSC tick count - if the local APIC timer
interrupts are really coming in at a fixed rate. So above I'd expect to
see something like:
29926950 0
29950837 23887
29903063 -23887
29926950 0
and thus keeping an average of 29926950 TSC ticks per call to
lapic_clockintr().
Occasionly I do see little cycles like:
tsc from last clockintr
tsc cycles over 29926950
29926950 0
29927025 75
29926875 -75
29926935 -15
29926965 15
29926950 0
and
29926957 7
29926950 0
29926943 -7
29926950 0
29926957 7
29926943 -7
29926950 0
29927092 142
29926808 -142
29926950 0
where we miss a few TSC ticks but then make those up in the next call
to lapic_clockintr() or two, which is what I'd expect to see, but this
never happens where any significant number of TSC ticks are missed, only
small values like above.
However, while the system is busy running the programs mentioned above
we see sequences like the following:
tsc from last clockintr
tsc cycles over 29926950
29926950 0
30573397 646447
30620333 693383
30596385 669435
30739912 812962
30812250 885300
30691905 764955
30440408 513458
30082155 155205
30453150 526200
30525195 598245
30571875 644925
30692910 765960
30691215 764265
30501015 574065
30465150 538200
30062040 135090
30472365 545415
30596445 669495
30070425 143475
29926995 45
29926950 0
where we have a much larger than normal delay between lapic_clockintr()
calls.
Here's some further data. First here's three sets of the total number
of TSC ticks for 10,000 lapic_clockintr() calls while the box is idle:
299274903660
299273899440
299274126585
and three sets while it's busy running the above programs:
302256563415
302291485440
302242349520
The busy TSC ticks are pretty close to 1% over the idle TSC ticks, and
the clock on the box has lost approximately 23 seconds in 40 minutes
which is also close to 1% time loss.
Looking at i386/i386/vector.S, the lapic_ltimer interrupt vector sends
an end-of-interrupt (EOI) to the APIC and then calls lapic_clockintr().
Is it somehow possible that if there's a delay in sending the EOI that
the APIC timer won't restart counting down automatically?
Any other theories to describe the behaviour we're seeing?
I also have a dual Xeon here based on the Intel E7505 chipset that
doesn't seem to be affected by this problem, so it would seem that
it may be chipset related.
Simon.
--
Simon Burge <simonb@wasabisystems.com>
NetBSD Support and Service: http://www.wasabisystems.com/