Introduce a paravirtualization interface for KVM/arm64 based on the
"Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A.
This only adds the details about "Stolen Time" as the details of "Live
Physical Time" have not been fully agreed.
User space can specify a reserved area of memory for the guest and
inform KVM to populate the memory with information on time that the host
kernel has stolen from the guest.
A hypercall interface is provided for the guest to interrogate the
hypervisor's support for this interface and the location of the shared
memory structures.
Signed-off-by: Steven Price <steven.price@arm.com>
---
Documentation/virt/kvm/arm/pvtime.txt | 100 ++++++++++++++++++++++++++
1 file changed, 100 insertions(+)
create mode 100644 Documentation/virt/kvm/arm/pvtime.txt
diff --git a/Documentation/virt/kvm/arm/pvtime.txt b/Documentation/virt/kvm/arm/pvtime.txt
new file mode 100644
index 000000000000..1ceb118694e7
--- /dev/null
+++ b/Documentation/virt/kvm/arm/pvtime.txt
@@ -0,0 +1,100 @@+Paravirtualized time support for arm64
+======================================
+
+Arm specification DEN0057/A defined a standard for paravirtualised time
+support for AArch64 guests:
+
+https://developer.arm.com/docs/den0057/a
+
+KVM/arm64 implements the stolen time part of this specification by providing
+some hypervisor service calls to support a paravirtualized guest obtaining a
+view of the amount of time stolen from its execution.
+
+Two new SMCCC compatible hypercalls are defined:
+
+PV_FEATURES 0xC5000020
+PV_TIME_ST 0xC5000022
+
+These are only available in the SMC64/HVC64 calling convention as
+paravirtualized time is not available to 32 bit Arm guests. The existence of
+the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES
+mechanism before calling it.
+
+PV_FEATURES
+ Function ID: (uint32) : 0xC5000020
+ PV_func_id: (uint32) : Either PV_TIME_LPT or PV_TIME_ST
+ Return value: (int32) : NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
+ PV-time feature is supported by the hypervisor.
+
+PV_TIME_ST
+ Function ID: (uint32) : 0xC5000022
+ Return value: (int64) : IPA of the stolen time data structure for this
+ (V)CPU. On failure:
+ NOT_SUPPORTED (-1)
+
+The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
+with inner and outer write back caching attributes, in the inner shareable
+domain. A total of 16 bytes from the IPA returned are guaranteed to be
+meaningfully filled by the hypervisor (see structure below).
+
+PV_TIME_ST returns the structure for the calling VCPU.
+
+Stolen Time
+-----------
+
+The structure pointed to by the PV_TIME_ST hypercall is as follows:
+
+ Field | Byte Length | Byte Offset | Description
+ ----------- | ----------- | ----------- | --------------------------
+ Revision | 4 | 0 | Must be 0 for version 0.1
+ Attributes | 4 | 4 | Must be 0
+ Stolen time | 8 | 8 | Stolen time in unsigned
+ | | | nanoseconds indicating how
+ | | | much time this VCPU thread
+ | | | was involuntarily not
+ | | | running on a physical CPU.
+
+The structure will be updated by the hypervisor prior to scheduling a VCPU. It
+will be present within a reserved region of the normal memory given to the
+guest. The guest should not attempt to write into this memory. There is a
+structure per VCPU of the guest.
+
+User space interface
+====================
+
+User space can request that KVM provide the paravirtualized time interface to
+a guest by creating a KVM_DEV_TYPE_ARM_PV_TIME device, for example:
+
+ struct kvm_create_device pvtime_device = {
+ .type = KVM_DEV_TYPE_ARM_PV_TIME,
+ .attr = 0,
+ .flags = 0,
+ };
+
+ pvtime_fd = ioctl(vm_fd, KVM_CREATE_DEVICE, &pvtime_device);
+
+Creation of the device should be done after creating the vCPUs of the virtual
+machine.
+
+The IPA of the structures must be given to KVM. This is the base address
+of an array of stolen time structures (one for each VCPU). The base address
+must be page aligned. The size must be at least 64 * number of VCPUs and be a
+multiple of PAGE_SIZE.
+
+The memory for these structures should be added to the guest in the usual
+manner (e.g. using KVM_SET_USER_MEMORY_REGION).
+
+For example:
+
+ struct kvm_dev_arm_st_region region = {
+ .gpa = <IPA of guest base address>,
+ .size = <size in bytes>
+ };
+
+ struct kvm_device_attr st_base = {
+ .group = KVM_DEV_ARM_PV_TIME_PADDR,
+ .attr = KVM_DEV_ARM_PV_TIME_ST,
+ .addr = (u64)&region
+ };
+
+ ioctl(pvtime_fd, KVM_SET_DEVICE_ATTR, &st_base);
--
2.20.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

On 22/08/2019 16:28, Sean Christopherson wrote:
> On Wed, Aug 21, 2019 at 04:36:50PM +0100, Steven Price wrote:
>> kvm_put_guest() is analogous to put_user() - it writes a single value to
>> the guest physical address. The implementation is built upon put_user()
>> and so it has the same single copy atomic properties.
>
> What you mean by "single copy atomic"? I.e. what guarantees does
> put_user() provide that __copy_to_user() does not?
Single-copy atomicity is defined by the Arm architecture[1] and I'm not
going to try to go into the full details here, so this is a summary.
For the sake of this feature what we care about is that the value
written/read cannot be "torn". In other words if there is a read (in
this case from another VCPU) that is racing with the write then the read
will either get the old value or the new value. It cannot return a
mixture. (This is of course assuming that the read is using a
single-copy atomic safe method).
__copy_to_user() is implemented as a memcpy() and as such cannot provide
single-copy atomicity in the general case (the buffer could easily be
bigger than the architecture can guarantee).
put_user() on the other hand is implemented (on arm64) as an explicit
store instruction and therefore is guaranteed by the architecture to be
single-copy atomic (i.e. another CPU cannot see a half-written value).
Steve
[1] https://static.docs.arm.com/ddi0487/ea/DDI0487E_a_armv8_arm.pdf#page=110
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

On Thu, Aug 22, 2019 at 04:46:10PM +0100, Steven Price wrote:
> On 22/08/2019 16:28, Sean Christopherson wrote:
> > On Wed, Aug 21, 2019 at 04:36:50PM +0100, Steven Price wrote:
> >> kvm_put_guest() is analogous to put_user() - it writes a single value to
> >> the guest physical address. The implementation is built upon put_user()
> >> and so it has the same single copy atomic properties.
> >
> > What you mean by "single copy atomic"? I.e. what guarantees does
> > put_user() provide that __copy_to_user() does not?
>
> Single-copy atomicity is defined by the Arm architecture[1] and I'm not
> going to try to go into the full details here, so this is a summary.
>
> For the sake of this feature what we care about is that the value
> written/read cannot be "torn". In other words if there is a read (in
> this case from another VCPU) that is racing with the write then the read
> will either get the old value or the new value. It cannot return a
> mixture. (This is of course assuming that the read is using a
> single-copy atomic safe method).
Thanks for the explanation. I assumed that's what you were referring to,
but wanted to double check.
> __copy_to_user() is implemented as a memcpy() and as such cannot provide
> single-copy atomicity in the general case (the buffer could easily be
> bigger than the architecture can guarantee).
>
> put_user() on the other hand is implemented (on arm64) as an explicit
> store instruction and therefore is guaranteed by the architecture to be
> single-copy atomic (i.e. another CPU cannot see a half-written value).
I don't think kvm_put_guest() belongs in generic code, at least not with
the current changelog explanation about it providing single-copy atomic
semantics. AFAICT, the single-copy thing is very much an arm64
implementation detail, e.g. the vast majority of 32-bit architectures,
including x86, do not provide any guarantees, and x86-64 generates more
or less the same code for put_user() and __copy_to_user() for 8-byte and
smaller accesses.
As an alternative to kvm_put_guest() entirely, is it an option to change
arm64's raw_copy_to_user() to redirect to __put_user() for sizes that are
constant at compile time and can be handled by __put_user()? That would
allow using kvm_write_guest() to update stolen time, albeit with
arguably an even bigger dependency on the uaccess implementation details.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

On 22/08/2019 17:24, Sean Christopherson wrote:
> On Thu, Aug 22, 2019 at 04:46:10PM +0100, Steven Price wrote:
>> On 22/08/2019 16:28, Sean Christopherson wrote:
>>> On Wed, Aug 21, 2019 at 04:36:50PM +0100, Steven Price wrote:
>>>> kvm_put_guest() is analogous to put_user() - it writes a single value to
>>>> the guest physical address. The implementation is built upon put_user()
>>>> and so it has the same single copy atomic properties.
>>>
>>> What you mean by "single copy atomic"? I.e. what guarantees does
>>> put_user() provide that __copy_to_user() does not?
>>
>> Single-copy atomicity is defined by the Arm architecture[1] and I'm not
>> going to try to go into the full details here, so this is a summary.
>>
>> For the sake of this feature what we care about is that the value
>> written/read cannot be "torn". In other words if there is a read (in
>> this case from another VCPU) that is racing with the write then the read
>> will either get the old value or the new value. It cannot return a
>> mixture. (This is of course assuming that the read is using a
>> single-copy atomic safe method).
>
> Thanks for the explanation. I assumed that's what you were referring to,
> but wanted to double check.
>
>> __copy_to_user() is implemented as a memcpy() and as such cannot provide
>> single-copy atomicity in the general case (the buffer could easily be
>> bigger than the architecture can guarantee).
>>
>> put_user() on the other hand is implemented (on arm64) as an explicit
>> store instruction and therefore is guaranteed by the architecture to be
>> single-copy atomic (i.e. another CPU cannot see a half-written value).
>
> I don't think kvm_put_guest() belongs in generic code, at least not with
> the current changelog explanation about it providing single-copy atomic
> semantics. AFAICT, the single-copy thing is very much an arm64
> implementation detail, e.g. the vast majority of 32-bit architectures,
> including x86, do not provide any guarantees, and x86-64 generates more
> or less the same code for put_user() and __copy_to_user() for 8-byte and
> smaller accesses.
>
> As an alternative to kvm_put_guest() entirely, is it an option to change
> arm64's raw_copy_to_user() to redirect to __put_user() for sizes that are
> constant at compile time and can be handled by __put_user()? That would
> allow using kvm_write_guest() to update stolen time, albeit with
> arguably an even bigger dependency on the uaccess implementation details.
I think it's important to in some way ensure that the desire that this
is a single write is shown. copy_to_user() is effectively
"setup();memcpy();finish();" and while a good memcpy() implementation
would be identical to put_user() there's a lot more room for this being
broken in the future by changes to the memcpy() implementation. (And I
don't want to require that memcpy() has to detect this case).
One suggestion is to call it something like kvm_put_guest_atomic() to
reflect the atomicity requirement. Presumably that would be based on a
new put_user_atomic() which architectures could override as necessary if
put_user() doesn't provide the necessary guarantees.
Steve
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

On 27/08/2019 09:57, Christoffer Dall wrote:
> On Wed, Aug 21, 2019 at 04:36:47PM +0100, Steven Price wrote:
>> Introduce a paravirtualization interface for KVM/arm64 based on the
>> "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A.
>>
>> This only adds the details about "Stolen Time" as the details of "Live
>> Physical Time" have not been fully agreed.
>>
>> User space can specify a reserved area of memory for the guest and
>> inform KVM to populate the memory with information on time that the host
>> kernel has stolen from the guest.
>>
>> A hypercall interface is provided for the guest to interrogate the
>> hypervisor's support for this interface and the location of the shared
>> memory structures.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>> Documentation/virt/kvm/arm/pvtime.txt | 100 ++++++++++++++++++++++++++
>> 1 file changed, 100 insertions(+)
>> create mode 100644 Documentation/virt/kvm/arm/pvtime.txt
>>
>> diff --git a/Documentation/virt/kvm/arm/pvtime.txt b/Documentation/virt/kvm/arm/pvtime.txt
>> new file mode 100644
>> index 000000000000..1ceb118694e7
>> --- /dev/null
>> +++ b/Documentation/virt/kvm/arm/pvtime.txt
>> @@ -0,0 +1,100 @@
>> +Paravirtualized time support for arm64
>> +======================================
>> +
>> +Arm specification DEN0057/A defined a standard for paravirtualised time
>> +support for AArch64 guests:
>> +
>> +https://developer.arm.com/docs/den0057/a
>> +
>> +KVM/arm64 implements the stolen time part of this specification by providing
>> +some hypervisor service calls to support a paravirtualized guest obtaining a
>> +view of the amount of time stolen from its execution.
>> +
>> +Two new SMCCC compatible hypercalls are defined:
>> +
>> +PV_FEATURES 0xC5000020
>> +PV_TIME_ST 0xC5000022
>> +
>> +These are only available in the SMC64/HVC64 calling convention as
>> +paravirtualized time is not available to 32 bit Arm guests. The existence of
>> +the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES
>> +mechanism before calling it.
>> +
>> +PV_FEATURES
>> + Function ID: (uint32) : 0xC5000020
>> + PV_func_id: (uint32) : Either PV_TIME_LPT or PV_TIME_ST
>> + Return value: (int32) : NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
>> + PV-time feature is supported by the hypervisor.
>> +
>> +PV_TIME_ST
>> + Function ID: (uint32) : 0xC5000022
>> + Return value: (int64) : IPA of the stolen time data structure for this
>> + (V)CPU. On failure:
>> + NOT_SUPPORTED (-1)
>> +
>> +The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
>> +with inner and outer write back caching attributes, in the inner shareable
>> +domain. A total of 16 bytes from the IPA returned are guaranteed to be
>> +meaningfully filled by the hypervisor (see structure below).
>> +
>> +PV_TIME_ST returns the structure for the calling VCPU.
>> +
>> +Stolen Time
>> +-----------
>> +
>> +The structure pointed to by the PV_TIME_ST hypercall is as follows:
>> +
>> + Field | Byte Length | Byte Offset | Description
>> + ----------- | ----------- | ----------- | --------------------------
>> + Revision | 4 | 0 | Must be 0 for version 0.1
>> + Attributes | 4 | 4 | Must be 0
>> + Stolen time | 8 | 8 | Stolen time in unsigned
>> + | | | nanoseconds indicating how
>> + | | | much time this VCPU thread
>> + | | | was involuntarily not
>> + | | | running on a physical CPU.
>> +
>> +The structure will be updated by the hypervisor prior to scheduling a VCPU. It
>> +will be present within a reserved region of the normal memory given to the
>> +guest. The guest should not attempt to write into this memory. There is a
>> +structure per VCPU of the guest.
>> +
>> +User space interface
>> +====================
>> +
>> +User space can request that KVM provide the paravirtualized time interface to
>> +a guest by creating a KVM_DEV_TYPE_ARM_PV_TIME device, for example:
>> +
>> + struct kvm_create_device pvtime_device = {
>> + .type = KVM_DEV_TYPE_ARM_PV_TIME,
>> + .attr = 0,
>> + .flags = 0,
>> + };
>> +
>> + pvtime_fd = ioctl(vm_fd, KVM_CREATE_DEVICE, &pvtime_device);
>> +
>> +Creation of the device should be done after creating the vCPUs of the virtual
>> +machine.
>> +
>> +The IPA of the structures must be given to KVM. This is the base address
>> +of an array of stolen time structures (one for each VCPU). The base address
>> +must be page aligned. The size must be at least 64 * number of VCPUs and be a
>> +multiple of PAGE_SIZE.
>> +
>> +The memory for these structures should be added to the guest in the usual
>> +manner (e.g. using KVM_SET_USER_MEMORY_REGION).
>> +
>> +For example:
>> +
>> + struct kvm_dev_arm_st_region region = {
>> + .gpa = <IPA of guest base address>,
>> + .size = <size in bytes>
>> + };
>
> This feel fragile; how are you handling userspace creating VCPUs after
> setting this up,
In this case as long as the structures all fit within the region created
VCPUs can be created/destroyed at will. If the VCPU index is too high
then the kernel will bail out in kvm_update_stolen_time() so the
structure will not be written. I consider this case as user space
messing up, so beyond protecting the host from the mess, user space gets
to keep the pieces.
> the GPA overlapping guest memory, etc.
Again, the (host) kernel is protected against this, but clearly this
will end badly for the guest.
> Is the
> philosophy here that the VMM can mess up the VM if it wants, but that
> this should never lead attacks on the host (we better hope not) and so
> we don't care?
Yes. For things like GPA overlapping guest memory it's not really the
host's position to work out what is "guest memory". It's quite possible
that user space could decide to place the stolen time structures right
in the middle of guest memory - it's just up to user space to inform the
guest what memory is usable. Obviously the expectation is that the
shared structures would be positioned "out of the way" in GPA space in
any normal arrangement.
> It seems to me setting the IPA per vcpu throught the VCPU device would
> avoid a lot of these issues. See
> Documentation/virt/kvm/devices/vcpu.txt.
That is certainly a possibility, I'm not really sure what the benefit is
though? It would still lead to corner cases:
* What if only some VCPUs had stolen time setup on them?
* What if multiple VCPUs pointed to the same location?
* The structures can still overlap with guest memory
It's also more work to setup in user space with the only "benefit" being
that user space could choose to organise the structures however it sees
fit (e.g. no need for them to be contiguous in memory). But I'm not sure
I see a use case for that flexibility.
Perhaps there's some benefit I'm not seeing?
Steve
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

On 28/08/2019 14:49, Christoffer Dall wrote:
> On Tue, Aug 27, 2019 at 10:57:06AM +0200, Christoffer Dall wrote:
>> On Wed, Aug 21, 2019 at 04:36:47PM +0100, Steven Price wrote:
>>> Introduce a paravirtualization interface for KVM/arm64 based on the
>>> "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A.
>>>
>>> This only adds the details about "Stolen Time" as the details of "Live
>>> Physical Time" have not been fully agreed.
>>>
>>> User space can specify a reserved area of memory for the guest and
>>> inform KVM to populate the memory with information on time that the host
>>> kernel has stolen from the guest.
>>>
>>> A hypercall interface is provided for the guest to interrogate the
>>> hypervisor's support for this interface and the location of the shared
>>> memory structures.
>>>
>>> Signed-off-by: Steven Price <steven.price@arm.com>
>>> ---
>>> Documentation/virt/kvm/arm/pvtime.txt | 100 ++++++++++++++++++++++++++
>>> 1 file changed, 100 insertions(+)
>>> create mode 100644 Documentation/virt/kvm/arm/pvtime.txt
>>>
>>> diff --git a/Documentation/virt/kvm/arm/pvtime.txt b/Documentation/virt/kvm/arm/pvtime.txt
>>> new file mode 100644
>>> index 000000000000..1ceb118694e7
>>> --- /dev/null
>>> +++ b/Documentation/virt/kvm/arm/pvtime.txt
>>> @@ -0,0 +1,100 @@
>>> +Paravirtualized time support for arm64
>>> +======================================
>>> +
>>> +Arm specification DEN0057/A defined a standard for paravirtualised time
>>> +support for AArch64 guests:
>>> +
>>> +https://developer.arm.com/docs/den0057/a
>>> +
>>> +KVM/arm64 implements the stolen time part of this specification by providing
>>> +some hypervisor service calls to support a paravirtualized guest obtaining a
>>> +view of the amount of time stolen from its execution.
>>> +
>>> +Two new SMCCC compatible hypercalls are defined:
>>> +
>>> +PV_FEATURES 0xC5000020
>>> +PV_TIME_ST 0xC5000022
>>> +
>>> +These are only available in the SMC64/HVC64 calling convention as
>>> +paravirtualized time is not available to 32 bit Arm guests. The existence of
>>> +the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES
>>> +mechanism before calling it.
>>> +
>>> +PV_FEATURES
>>> + Function ID: (uint32) : 0xC5000020
>>> + PV_func_id: (uint32) : Either PV_TIME_LPT or PV_TIME_ST
>>> + Return value: (int32) : NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
>>> + PV-time feature is supported by the hypervisor.
>>> +
>>> +PV_TIME_ST
>>> + Function ID: (uint32) : 0xC5000022
>>> + Return value: (int64) : IPA of the stolen time data structure for this
>>> + (V)CPU. On failure:
>>> + NOT_SUPPORTED (-1)
>>> +
>>> +The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
>>> +with inner and outer write back caching attributes, in the inner shareable
>>> +domain. A total of 16 bytes from the IPA returned are guaranteed to be
>>> +meaningfully filled by the hypervisor (see structure below).
>>> +
>>> +PV_TIME_ST returns the structure for the calling VCPU.
>>> +
>>> +Stolen Time
>>> +-----------
>>> +
>>> +The structure pointed to by the PV_TIME_ST hypercall is as follows:
>>> +
>>> + Field | Byte Length | Byte Offset | Description
>>> + ----------- | ----------- | ----------- | --------------------------
>>> + Revision | 4 | 0 | Must be 0 for version 0.1
>>> + Attributes | 4 | 4 | Must be 0
>>> + Stolen time | 8 | 8 | Stolen time in unsigned
>>> + | | | nanoseconds indicating how
>>> + | | | much time this VCPU thread
>>> + | | | was involuntarily not
>>> + | | | running on a physical CPU.
>>> +
>>> +The structure will be updated by the hypervisor prior to scheduling a VCPU. It
>>> +will be present within a reserved region of the normal memory given to the
>>> +guest. The guest should not attempt to write into this memory. There is a
>>> +structure per VCPU of the guest.
>>> +
>>> +User space interface
>>> +====================
>>> +
>>> +User space can request that KVM provide the paravirtualized time interface to
>>> +a guest by creating a KVM_DEV_TYPE_ARM_PV_TIME device, for example:
>>> +
>>> + struct kvm_create_device pvtime_device = {
>>> + .type = KVM_DEV_TYPE_ARM_PV_TIME,
>>> + .attr = 0,
>>> + .flags = 0,
>>> + };
>>> +
>>> + pvtime_fd = ioctl(vm_fd, KVM_CREATE_DEVICE, &pvtime_device);
>>> +
>>> +Creation of the device should be done after creating the vCPUs of the virtual
>>> +machine.
>>> +
>>> +The IPA of the structures must be given to KVM. This is the base address
>>> +of an array of stolen time structures (one for each VCPU). The base address
>>> +must be page aligned. The size must be at least 64 * number of VCPUs and be a
>>> +multiple of PAGE_SIZE.
>>> +
>>> +The memory for these structures should be added to the guest in the usual
>>> +manner (e.g. using KVM_SET_USER_MEMORY_REGION).
>>> +
>>> +For example:
>>> +
>>> + struct kvm_dev_arm_st_region region = {
>>> + .gpa = <IPA of guest base address>,
>>> + .size = <size in bytes>
>>> + };
>>
>> This feel fragile; how are you handling userspace creating VCPUs after
>> setting this up, the GPA overlapping guest memory, etc. Is the
>> philosophy here that the VMM can mess up the VM if it wants, but that
>> this should never lead attacks on the host (we better hope not) and so
>> we don't care?
>>
>> It seems to me setting the IPA per vcpu throught the VCPU device would
>> avoid a lot of these issues. See
>> Documentation/virt/kvm/devices/vcpu.txt.
>>
>>
> I discussed this with Marc the other day, and we realized that if we
> make the configuration of the IPA per-PE, then a VMM can construct a VM
> where these data structures are distributed within the IPA space of a
> VM, which could lead to a lower TLB pressure for some
> configurations/workloads.
Ok, I'm dubious it will make much difference in terms of TLB pressure,
but I've done the refactoring and I think it actually simplifies the
code. So I'll post a new version where the base address is set via the
VCPU device.
Thanks for the review,
Steve
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

On 29/08/2019 18:15, Andrew Jones wrote:
> On Wed, Aug 21, 2019 at 04:36:47PM +0100, Steven Price wrote:
>> Introduce a paravirtualization interface for KVM/arm64 based on the
>> "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A.
>>
>> This only adds the details about "Stolen Time" as the details of "Live
>> Physical Time" have not been fully agreed.
>>
>> User space can specify a reserved area of memory for the guest and
>> inform KVM to populate the memory with information on time that the host
>> kernel has stolen from the guest.
>>
>> A hypercall interface is provided for the guest to interrogate the
>> hypervisor's support for this interface and the location of the shared
>> memory structures.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>> Documentation/virt/kvm/arm/pvtime.txt | 100 ++++++++++++++++++++++++++
>> 1 file changed, 100 insertions(+)
>> create mode 100644 Documentation/virt/kvm/arm/pvtime.txt
>>
>> diff --git a/Documentation/virt/kvm/arm/pvtime.txt b/Documentation/virt/kvm/arm/pvtime.txt
>> new file mode 100644
>> index 000000000000..1ceb118694e7
>> --- /dev/null
>> +++ b/Documentation/virt/kvm/arm/pvtime.txt
>> @@ -0,0 +1,100 @@
>> +Paravirtualized time support for arm64
>> +======================================
>> +
>> +Arm specification DEN0057/A defined a standard for paravirtualised time
>> +support for AArch64 guests:
>> +
>> +https://developer.arm.com/docs/den0057/a
>> +
>> +KVM/arm64 implements the stolen time part of this specification by providing
>> +some hypervisor service calls to support a paravirtualized guest obtaining a
>> +view of the amount of time stolen from its execution.
>> +
>> +Two new SMCCC compatible hypercalls are defined:
>> +
>> +PV_FEATURES 0xC5000020
>> +PV_TIME_ST 0xC5000022
>> +
>> +These are only available in the SMC64/HVC64 calling convention as
>> +paravirtualized time is not available to 32 bit Arm guests. The existence of
>> +the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES
>> +mechanism before calling it.
>> +
>> +PV_FEATURES
>> + Function ID: (uint32) : 0xC5000020
>> + PV_func_id: (uint32) : Either PV_TIME_LPT or PV_TIME_ST
>> + Return value: (int32) : NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
>> + PV-time feature is supported by the hypervisor.
>> +
>> +PV_TIME_ST
>> + Function ID: (uint32) : 0xC5000022
>> + Return value: (int64) : IPA of the stolen time data structure for this
>> + (V)CPU. On failure:
>
> Why the () around the V in VCPU?
There's nothing preventing the same mechanism being used without the
virtualisation of CPUs. For example a hypervisor like Jailhouse could
implement this interface even though there the CPU isn't being
virtualised but is being handed over to the guest. Equally it is
possible for firmware to provide the same mechanism (using the SMC64
calling convention).
Admittedly that's a little confusing here because the rest of this
document is talking about KVM's implementation and normal hypervisors.
So I'll drop the brackets.
>> + NOT_SUPPORTED (-1)
>> +
>> +The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
>> +with inner and outer write back caching attributes, in the inner shareable
>> +domain. A total of 16 bytes from the IPA returned are guaranteed to be
>> +meaningfully filled by the hypervisor (see structure below).
>> +
>> +PV_TIME_ST returns the structure for the calling VCPU.
>
> The above sentence seems redundant here.
It is an important detail that each VCPU must use PV_TIME_ST to fetch
the address of the structure for that VCPU. E.g. It could have been
implemented so that the hypercall took a VCPU number. So while redundant
I do feel it's worth pointing this out explicitly.
>> +
>> +Stolen Time
>> +-----------
>> +
>> +The structure pointed to by the PV_TIME_ST hypercall is as follows:
>> +
>> + Field | Byte Length | Byte Offset | Description
>> + ----------- | ----------- | ----------- | --------------------------
>> + Revision | 4 | 0 | Must be 0 for version 0.1
>> + Attributes | 4 | 4 | Must be 0
>> + Stolen time | 8 | 8 | Stolen time in unsigned
>> + | | | nanoseconds indicating how
>> + | | | much time this VCPU thread
>> + | | | was involuntarily not
>> + | | | running on a physical CPU.
>> +
>> +The structure will be updated by the hypervisor prior to scheduling a VCPU. It
>> +will be present within a reserved region of the normal memory given to the
>> +guest. The guest should not attempt to write into this memory. There is a
>> +structure per VCPU of the guest.
>> +
>> +User space interface
>> +====================
>> +
>> +User space can request that KVM provide the paravirtualized time interface to
>> +a guest by creating a KVM_DEV_TYPE_ARM_PV_TIME device, for example:
>> +
>> + struct kvm_create_device pvtime_device = {
>> + .type = KVM_DEV_TYPE_ARM_PV_TIME,
>> + .attr = 0,
>> + .flags = 0,
>> + };
>> +
>> + pvtime_fd = ioctl(vm_fd, KVM_CREATE_DEVICE, &pvtime_device);
>
> The ioctl doesn't return the fd. If the ioctl returns zero the fd will be
> in pvtime_device.fd.
Good catch - I'm not sure quite why I wrote that. Anyway I've agreed to
change the interface to operate on the VCPU device instead so this text
is rewritten completely.
>> +
>> +Creation of the device should be done after creating the vCPUs of the virtual
>> +machine.
>
> Or else what? Will an error be reported in that case?
This is now enforced by calling the ioctl on the VCPU device, so it's
impossible to do in the wrong order.
>> +
>> +The IPA of the structures must be given to KVM. This is the base address
>> +of an array of stolen time structures (one for each VCPU). The base address
>> +must be page aligned. The size must be at least 64 * number of VCPUs and be a
>> +multiple of PAGE_SIZE.
>> +
>> +The memory for these structures should be added to the guest in the usual
>> +manner (e.g. using KVM_SET_USER_MEMORY_REGION).
>
> Above it says the guest shouldn't attempt to write the memory. Should
> KVM_MEM_READONLY be used with KVM_SET_USER_MEMORY_REGION for it?
That is optional - the specification states the guest must not attempt
to write to it - so marking it read-only for the guest should work fine
with a conforming guest. But it's not required.
>> +
>> +For example:
>> +
>> + struct kvm_dev_arm_st_region region = {
>> + .gpa = <IPA of guest base address>,
>> + .size = <size in bytes>
>> + };
>> +
>> + struct kvm_device_attr st_base = {
>> + .group = KVM_DEV_ARM_PV_TIME_PADDR,
>
> This is KVM_DEV_ARM_PV_TIME_REGION in the code.
Gah - I obviously missed that when I refactored to define the region
rather than just the base address. Anyway this has all changed (again)
because each VCPU has its own base address so the size is no longer
necessary.
Thanks for the review,
Steve
>> + .attr = KVM_DEV_ARM_PV_TIME_ST,
>> + .addr = (u64)&region
>> + };
>> +
>> + ioctl(pvtime_fd, KVM_SET_DEVICE_ATTR, &st_base);
>> --
>> 2.20.1
>>
>
> Thanks,
> drew
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

On Wed, Aug 28, 2019 at 01:09:15PM +0100, Steven Price wrote:
> On 27/08/2019 09:57, Christoffer Dall wrote:
> > On Wed, Aug 21, 2019 at 04:36:47PM +0100, Steven Price wrote:
> >> Introduce a paravirtualization interface for KVM/arm64 based on the
> >> "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A.
> >>
> >> This only adds the details about "Stolen Time" as the details of "Live
> >> Physical Time" have not been fully agreed.
> >>
> >> User space can specify a reserved area of memory for the guest and
> >> inform KVM to populate the memory with information on time that the host
> >> kernel has stolen from the guest.
> >>
> >> A hypercall interface is provided for the guest to interrogate the
> >> hypervisor's support for this interface and the location of the shared
> >> memory structures.
> >>
> >> Signed-off-by: Steven Price <steven.price@arm.com>
> >> ---
> >> Documentation/virt/kvm/arm/pvtime.txt | 100 ++++++++++++++++++++++++++
> >> 1 file changed, 100 insertions(+)
> >> create mode 100644 Documentation/virt/kvm/arm/pvtime.txt
> >>
> >> diff --git a/Documentation/virt/kvm/arm/pvtime.txt b/Documentation/virt/kvm/arm/pvtime.txt
> >> new file mode 100644
> >> index 000000000000..1ceb118694e7
> >> --- /dev/null
> >> +++ b/Documentation/virt/kvm/arm/pvtime.txt
> >> @@ -0,0 +1,100 @@
> >> +Paravirtualized time support for arm64
> >> +======================================
> >> +
> >> +Arm specification DEN0057/A defined a standard for paravirtualised time
> >> +support for AArch64 guests:
> >> +
> >> +https://developer.arm.com/docs/den0057/a
> >> +
> >> +KVM/arm64 implements the stolen time part of this specification by providing
> >> +some hypervisor service calls to support a paravirtualized guest obtaining a
> >> +view of the amount of time stolen from its execution.
> >> +
> >> +Two new SMCCC compatible hypercalls are defined:
> >> +
> >> +PV_FEATURES 0xC5000020
> >> +PV_TIME_ST 0xC5000022
> >> +
> >> +These are only available in the SMC64/HVC64 calling convention as
> >> +paravirtualized time is not available to 32 bit Arm guests. The existence of
> >> +the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES
> >> +mechanism before calling it.
> >> +
> >> +PV_FEATURES
> >> + Function ID: (uint32) : 0xC5000020
> >> + PV_func_id: (uint32) : Either PV_TIME_LPT or PV_TIME_ST
> >> + Return value: (int32) : NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
> >> + PV-time feature is supported by the hypervisor.
> >> +
> >> +PV_TIME_ST
> >> + Function ID: (uint32) : 0xC5000022
> >> + Return value: (int64) : IPA of the stolen time data structure for this
> >> + (V)CPU. On failure:
> >> + NOT_SUPPORTED (-1)
> >> +
> >> +The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
> >> +with inner and outer write back caching attributes, in the inner shareable
> >> +domain. A total of 16 bytes from the IPA returned are guaranteed to be
> >> +meaningfully filled by the hypervisor (see structure below).
> >> +
> >> +PV_TIME_ST returns the structure for the calling VCPU.
> >> +
> >> +Stolen Time
> >> +-----------
> >> +
> >> +The structure pointed to by the PV_TIME_ST hypercall is as follows:
> >> +
> >> + Field | Byte Length | Byte Offset | Description
> >> + ----------- | ----------- | ----------- | --------------------------
> >> + Revision | 4 | 0 | Must be 0 for version 0.1
> >> + Attributes | 4 | 4 | Must be 0
> >> + Stolen time | 8 | 8 | Stolen time in unsigned
> >> + | | | nanoseconds indicating how
> >> + | | | much time this VCPU thread
> >> + | | | was involuntarily not
> >> + | | | running on a physical CPU.
> >> +
> >> +The structure will be updated by the hypervisor prior to scheduling a VCPU. It
> >> +will be present within a reserved region of the normal memory given to the
> >> +guest. The guest should not attempt to write into this memory. There is a
> >> +structure per VCPU of the guest.
> >> +
> >> +User space interface
> >> +====================
> >> +
> >> +User space can request that KVM provide the paravirtualized time interface to
> >> +a guest by creating a KVM_DEV_TYPE_ARM_PV_TIME device, for example:
> >> +
> >> + struct kvm_create_device pvtime_device = {
> >> + .type = KVM_DEV_TYPE_ARM_PV_TIME,
> >> + .attr = 0,
> >> + .flags = 0,
> >> + };
> >> +
> >> + pvtime_fd = ioctl(vm_fd, KVM_CREATE_DEVICE, &pvtime_device);
> >> +
> >> +Creation of the device should be done after creating the vCPUs of the virtual
> >> +machine.
> >> +
> >> +The IPA of the structures must be given to KVM. This is the base address
> >> +of an array of stolen time structures (one for each VCPU). The base address
> >> +must be page aligned. The size must be at least 64 * number of VCPUs and be a
> >> +multiple of PAGE_SIZE.
> >> +
> >> +The memory for these structures should be added to the guest in the usual
> >> +manner (e.g. using KVM_SET_USER_MEMORY_REGION).
> >> +
> >> +For example:
> >> +
> >> + struct kvm_dev_arm_st_region region = {
> >> + .gpa = <IPA of guest base address>,
> >> + .size = <size in bytes>
> >> + };
> >
> > This feel fragile; how are you handling userspace creating VCPUs after
> > setting this up,
>
> In this case as long as the structures all fit within the region created
> VCPUs can be created/destroyed at will. If the VCPU index is too high
> then the kernel will bail out in kvm_update_stolen_time() so the
> structure will not be written. I consider this case as user space
> messing up, so beyond protecting the host from the mess, user space gets
> to keep the pieces.
>
> > the GPA overlapping guest memory, etc.
>
> Again, the (host) kernel is protected against this, but clearly this
> will end badly for the guest.
>
> > Is the
> > philosophy here that the VMM can mess up the VM if it wants, but that
> > this should never lead attacks on the host (we better hope not) and so
> > we don't care?
>
> Yes. For things like GPA overlapping guest memory it's not really the
> host's position to work out what is "guest memory". It's quite possible
> that user space could decide to place the stolen time structures right
> in the middle of guest memory - it's just up to user space to inform the
> guest what memory is usable. Obviously the expectation is that the
> shared structures would be positioned "out of the way" in GPA space in
> any normal arrangement.
>
> > It seems to me setting the IPA per vcpu throught the VCPU device would
> > avoid a lot of these issues. See
> > Documentation/virt/kvm/devices/vcpu.txt.
>
> That is certainly a possibility, I'm not really sure what the benefit is
> though? It would still lead to corner cases:
>
> * What if only some VCPUs had stolen time setup on them?
> * What if multiple VCPUs pointed to the same location?
> * The structures can still overlap with guest memory
>
> It's also more work to setup in user space with the only "benefit" being
> that user space could choose to organise the structures however it sees
> fit (e.g. no need for them to be contiguous in memory). But I'm not sure
> I see a use case for that flexibility.
>
> Perhaps there's some benefit I'm not seeing?
>
So this is now mostly moot as you said you were going to change this,
but overall I think the assumption that it's trivial to maintain an
interface such the one proposed here to never be an attack vector for
the host is not as easy as it may sound. In the past, we've had
numerous bugs related to things like calculating an offset into some
region based on some index and a size, because suddenly someone changed
how an index is calculated, and it had unintended side-effects etc.
So if an API is used to specify something which is effectively per-CPU,
it's better to use the per-CPU handles (VCPU fds) to directly describe
this relationship.
Thanks,
Christoffer
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel