Commit Message

This adds a register identifier for use with the one_reg interface
to allow the decrementer expiry time to be read and written by
userspace. The decrementer expiry time is in guest timebase units
and is equal to the sum of the decrementer and the guest timebase.
(The expiry time is used rather than the decrementer value itself
because the expiry time is not constantly changing, though the
decrementer value is, while the guest vcpu is not running.)
Without this, a guest vcpu migrated to a new host will see its
decrementer set to some random value. On POWER8 and earlier (and on
POWER9 in POWER7 or POWER8 compatibility mode), the decrementer is 32
bits wide and counts down at 512MHz, so the guest vcpu will
potentially see no decrementer interrupts for up to about 4 seconds,
which will lead to a stall. With POWER9, the decrementer is now 56
bits side, so the stall can be much longer (up to 2.23 years) and more
noticeable.
To help work around the problem in cases where userspace has not been
updated to migrate the decrementer expiry time, we now set the
default decrementer expiry at vcpu creation time to the current time
rather than the maximum possible value. This should mean an
immediate decrementer interrupt when a migrated vcpu starts
running. In cases where the decrementer is 32 bits wide and more
than 4 seconds elapse between the creation of the vcpu and when it
first runs, the decrementer would have wrapped around to positive
values and there may still be a stall - but this is no worse than
the current situation. In the large-decrementer case, we are sure
to get an immediate decrementer interrupt (assuming the time from
vcpu creation to first run is less than 2.23 years) and we thus
avoid a very long stall.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/uapi/asm/kvm.h | 2 ++
arch/powerpc/kvm/book3s_hv.c | 8 ++++++++
arch/powerpc/kvm/powerpc.c | 2 +-
3 files changed, 11 insertions(+), 1 deletion(-)