*[PATCH 00/10] IOASID extensions for guest SVA@ 2020-03-25 17:55 Jacob Pan
2020-03-25 17:55 ` [PATCH 01/10] iommu/ioasid: Introduce system-wide capacity Jacob Pan
` (10 more replies)0 siblings, 11 replies; 57+ messages in thread
From: Jacob Pan @ 2020-03-25 17:55 UTC (permalink / raw)
To: Joerg Roedel, Alex Williamson, Lu Baolu, iommu, LKML,
David Woodhouse, Jean-Philippe Brucker
Cc: Tian, Kevin, Raj Ashok, Jonathan Cameron
IOASID was introduced in v5.5 as a generic kernel allocator service for
both PCIe Process Address Space ID (PASID) and ARM SMMU's Sub Stream
ID. In addition to basic ID allocation, ioasid_set was introduced as a
token that is shared by a group of IOASIDs. This set token can be used
for permission checking but lack of some features needed by guest Shared
Virtual Address (SVA). In addition, IOASID support for life cycle
management is needed among multiple users.
This patchset introduces two extensions to the IOASID code,
1. IOASID set operations
2. Notifications for IOASID state synchronization
Part #1:
IOASIDs used by each VM fits naturally into an ioasid_set. The usage
for per set management requires the following features:
- Quota enforcement - This is to prevent one VM from abusing the
allocator to take all the system IOASIDs. Though VFIO layer can also
enforce the quota, but it cannot cover the usage with both guest and
host SVA on the same system.
- Stores guest IOASID-Host IOASID mapping within the set. To
support live migration, IOASID namespace should be owned by the
guest. This requires per IOASID set look up between guest and host
IOASIDs. This patchset does not introduce non-identity guest-host
IOASID lookup, we merely introduce the infrastructure in per set data.
- Set level operations, e.g. when a guest terminates, it is likely to
free the entire set. Having a single place to manage the set where the
IOASIDs are stored makes iteration much easier.
New APIs are:
- void ioasid_install_capacity(ioasid_t total);
Set the system capacity prior to any allocations. On x86, VT-d driver
calls this function to set max number of PASIDs, typically 1 million
(20 bits).
- int ioasid_alloc_system_set(int quota);
Host system has a default ioasid_set, during boot it is expected that
this default set is allocated with a reasonable quota, e.g. PID_MAX.
This default/system set is used for baremetal SVA.
- int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int
*sid);
Allocate a new set with a token, returned sid (set ID) will be used
to allocate IOASIDs within the set. Allocation of IOASIDs cannot
exceed the quota.
- void ioasid_free_set(int sid, bool destroy_set);
Free the entire set and notify all users with an option to destroy
the set. Set ID can be used for allocation again if not destroyed.
- int ioasid_find_sid(ioasid_t ioasid);
Look up the set ID from an ioasid. There is no reference held,
assuming set has a single owner.
- int ioasid_adjust_set(int sid, int quota);
Change the quota of the set, new quota cannot be less than the number
of IOASIDs already allocated within the set. This is useful when
IOASID resource needs to be balanced among VMs.
Part #2
Notification service. Since IOASIDs are used by many consumers that
follow publisher-subscriber pattern, notification is a natural choice
to keep states synchronized. For example, on x86 system, guest PASID
allocation and bind call results in VFIO IOCTL that can add and change
guest-host PASID states. At the same time, IOMMU driver and KVM need to
maintain its own PASID contexts. In this case, VFIO is the publisher
within the kernel, IOMMU driver and KVM are the subscribers.
This patchset introduces a global blocking notifier chain and APIs to
operate on. Not all events nor all IOASIDs are of interests to all
subscribers. e.g. KVM is only interested in the IOASIDs within its set.
IOMMU driver is not ioasid_set aware. A further optimization could be
having both global and per set notifier. But consider the infrequent
nature of bind/unbind and relatively long process life cycle, this
optimization may not be needed at this time.
To register/unregister notification blocks, use these two APIs:
- int ioasid_add_notifier(struct notifier_block *nb);
- void ioasid_remove_notifier(struct notifier_block *nb)
To send notification on an IOASID with one of the commands (FREE,
BIND/UNBIND, etc.), use:
- int ioasid_notify(ioasid_t id, enum ioasid_notify_val cmd);
This work is a result of collaboration with many people:
Liu, Yi L <yi.l.liu@intel.com>
Wu Hao <hao.wu@intel.com>
Ashok Raj <ashok.raj@intel.com>
Kevin Tian <kevin.tian@intel.com>
Thanks,
Jacob
Jacob Pan (10):
iommu/ioasid: Introduce system-wide capacity
iommu/vt-d: Set IOASID capacity when SVM is enabled
iommu/ioasid: Introduce per set allocation APIs
iommu/ioasid: Rename ioasid_set_data to avoid confusion with
ioasid_set
iommu/ioasid: Create an IOASID set for host SVA use
iommu/ioasid: Convert to set aware allocations
iommu/ioasid: Use mutex instead of spinlock
iommu/ioasid: Introduce notifier APIs
iommu/ioasid: Support ioasid_set quota adjustment
iommu/vt-d: Register PASID notifier for status change
drivers/iommu/intel-iommu.c | 20 ++-
drivers/iommu/intel-svm.c | 89 ++++++++--
drivers/iommu/ioasid.c | 387 +++++++++++++++++++++++++++++++++++++++-----
include/linux/intel-iommu.h | 1 +
include/linux/ioasid.h | 86 +++++++++-
5 files changed, 522 insertions(+), 61 deletions(-)
--
2.7.4
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu^permalinkrawreply [flat|nested] 57+ messages in thread

*Re: [PATCH 00/10] IOASID extensions for guest SVA
2020-03-25 17:55 [PATCH 00/10] IOASID extensions for guest SVA Jacob Pan
` (9 preceding siblings ...)
2020-03-25 17:55 ` [PATCH 10/10] iommu/vt-d: Register PASID notifier for status change Jacob Pan
@ 2020-04-01 14:03 ` Jean-Philippe Brucker
2020-04-01 23:38 ` Jacob Pan10 siblings, 1 reply; 57+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-01 14:03 UTC (permalink / raw)
To: Jacob Pan
Cc: Tian, Kevin, Raj Ashok, Jean-Philippe Brucker, LKML, iommu,
Alex Williamson, David Woodhouse, Jonathan Cameron
Hi Jacob,
On Wed, Mar 25, 2020 at 10:55:21AM -0700, Jacob Pan wrote:
> IOASID was introduced in v5.5 as a generic kernel allocator service for
> both PCIe Process Address Space ID (PASID) and ARM SMMU's Sub Stream
> ID. In addition to basic ID allocation, ioasid_set was introduced as a
> token that is shared by a group of IOASIDs. This set token can be used
> for permission checking but lack of some features needed by guest Shared
> Virtual Address (SVA). In addition, IOASID support for life cycle
> management is needed among multiple users.
>
> This patchset introduces two extensions to the IOASID code,
> 1. IOASID set operations
> 2. Notifications for IOASID state synchronization
My main concern with this series is patch 7 changing the spinlock to a
mutex, which prevents SVA from calling ioasid_free() from the RCU callback
of MMU notifiers. Could we use atomic notifiers, or do the FREE
notification another way?
Most of my other comments are just confusion on my part, I think, as I
haven't yet properly looked through the VFIO and VT-d changes. I'd rather
avoid the change to ioasid_find() if possible, because it adds a seemingly
unnecessary indirection to the fast path, but it's probably insignificant.
Thanks,
Jean
>
> Part #1:
> IOASIDs used by each VM fits naturally into an ioasid_set. The usage
> for per set management requires the following features:
>
> - Quota enforcement - This is to prevent one VM from abusing the
> allocator to take all the system IOASIDs. Though VFIO layer can also
> enforce the quota, but it cannot cover the usage with both guest and
> host SVA on the same system.
>
> - Stores guest IOASID-Host IOASID mapping within the set. To
> support live migration, IOASID namespace should be owned by the
> guest. This requires per IOASID set look up between guest and host
> IOASIDs. This patchset does not introduce non-identity guest-host
> IOASID lookup, we merely introduce the infrastructure in per set data.
>
> - Set level operations, e.g. when a guest terminates, it is likely to
> free the entire set. Having a single place to manage the set where the
> IOASIDs are stored makes iteration much easier.
>
>
> New APIs are:
> - void ioasid_install_capacity(ioasid_t total);
> Set the system capacity prior to any allocations. On x86, VT-d driver
> calls this function to set max number of PASIDs, typically 1 million
> (20 bits).
>
> - int ioasid_alloc_system_set(int quota);
> Host system has a default ioasid_set, during boot it is expected that
> this default set is allocated with a reasonable quota, e.g. PID_MAX.
> This default/system set is used for baremetal SVA.
>
> - int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int
> *sid);
> Allocate a new set with a token, returned sid (set ID) will be used
> to allocate IOASIDs within the set. Allocation of IOASIDs cannot
> exceed the quota.
>
> - void ioasid_free_set(int sid, bool destroy_set);
> Free the entire set and notify all users with an option to destroy
> the set. Set ID can be used for allocation again if not destroyed.
>
> - int ioasid_find_sid(ioasid_t ioasid);
> Look up the set ID from an ioasid. There is no reference held,
> assuming set has a single owner.
>
> - int ioasid_adjust_set(int sid, int quota);
> Change the quota of the set, new quota cannot be less than the number
> of IOASIDs already allocated within the set. This is useful when
> IOASID resource needs to be balanced among VMs.
>
> Part #2
> Notification service. Since IOASIDs are used by many consumers that
> follow publisher-subscriber pattern, notification is a natural choice
> to keep states synchronized. For example, on x86 system, guest PASID
> allocation and bind call results in VFIO IOCTL that can add and change
> guest-host PASID states. At the same time, IOMMU driver and KVM need to
> maintain its own PASID contexts. In this case, VFIO is the publisher
> within the kernel, IOMMU driver and KVM are the subscribers.
>
> This patchset introduces a global blocking notifier chain and APIs to
> operate on. Not all events nor all IOASIDs are of interests to all
> subscribers. e.g. KVM is only interested in the IOASIDs within its set.
> IOMMU driver is not ioasid_set aware. A further optimization could be
> having both global and per set notifier. But consider the infrequent
> nature of bind/unbind and relatively long process life cycle, this
> optimization may not be needed at this time.
>
> To register/unregister notification blocks, use these two APIs:
> - int ioasid_add_notifier(struct notifier_block *nb);
> - void ioasid_remove_notifier(struct notifier_block *nb)
>
> To send notification on an IOASID with one of the commands (FREE,
> BIND/UNBIND, etc.), use:
> - int ioasid_notify(ioasid_t id, enum ioasid_notify_val cmd);
>
> This work is a result of collaboration with many people:
> Liu, Yi L <yi.l.liu@intel.com>
> Wu Hao <hao.wu@intel.com>
> Ashok Raj <ashok.raj@intel.com>
> Kevin Tian <kevin.tian@intel.com>
>
> Thanks,
>
> Jacob
>
> Jacob Pan (10):
> iommu/ioasid: Introduce system-wide capacity
> iommu/vt-d: Set IOASID capacity when SVM is enabled
> iommu/ioasid: Introduce per set allocation APIs
> iommu/ioasid: Rename ioasid_set_data to avoid confusion with
> ioasid_set
> iommu/ioasid: Create an IOASID set for host SVA use
> iommu/ioasid: Convert to set aware allocations
> iommu/ioasid: Use mutex instead of spinlock
> iommu/ioasid: Introduce notifier APIs
> iommu/ioasid: Support ioasid_set quota adjustment
> iommu/vt-d: Register PASID notifier for status change
>
> drivers/iommu/intel-iommu.c | 20 ++-
> drivers/iommu/intel-svm.c | 89 ++++++++--
> drivers/iommu/ioasid.c | 387 +++++++++++++++++++++++++++++++++++++++-----
> include/linux/intel-iommu.h | 1 +
> include/linux/ioasid.h | 86 +++++++++-
> 5 files changed, 522 insertions(+), 61 deletions(-)
>
> --
> 2.7.4
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu^permalinkrawreply [flat|nested] 57+ messages in thread

*Re: [PATCH 00/10] IOASID extensions for guest SVA
2020-04-01 14:03 ` [PATCH 00/10] IOASID extensions for guest SVA Jean-Philippe Brucker
@ 2020-04-01 23:38 ` Jacob Pan
2020-04-02 12:26 ` Jean-Philippe Brucker0 siblings, 1 reply; 57+ messages in thread
From: Jacob Pan @ 2020-04-01 23:38 UTC (permalink / raw)
To: Jean-Philippe Brucker
Cc: Tian, Kevin, Raj Ashok, Jean-Philippe Brucker, LKML, iommu,
Alex Williamson, Wu, Hao, David Woodhouse, Jonathan Cameron
On Wed, 1 Apr 2020 16:03:01 +0200
Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:
> Hi Jacob,
>
> On Wed, Mar 25, 2020 at 10:55:21AM -0700, Jacob Pan wrote:
> > IOASID was introduced in v5.5 as a generic kernel allocator service
> > for both PCIe Process Address Space ID (PASID) and ARM SMMU's Sub
> > Stream ID. In addition to basic ID allocation, ioasid_set was
> > introduced as a token that is shared by a group of IOASIDs. This
> > set token can be used for permission checking but lack of some
> > features needed by guest Shared Virtual Address (SVA). In addition,
> > IOASID support for life cycle management is needed among multiple
> > users.
> >
> > This patchset introduces two extensions to the IOASID code,
> > 1. IOASID set operations
> > 2. Notifications for IOASID state synchronization
>
> My main concern with this series is patch 7 changing the spinlock to a
> mutex, which prevents SVA from calling ioasid_free() from the RCU
> callback of MMU notifiers. Could we use atomic notifiers, or do the
> FREE notification another way?
>
Maybe I am looking at the wrong code, I thought
mmu_notifier_ops.free_notifier() is called outside spinlock with
call_srcu(), which will be invoked in the thread context.
in mmu_notifier.c mmu_notifier_put()
spin_unlock(&mm->notifier_subscriptions->lock);
call_srcu(&srcu, &subscription->rcu, mmu_notifier_free_rcu);
Anyway, if we have to use atomic. I tried atomic notifier first, there
are two subscribers to the free event on x86.
1. IOMMU
2. KVM
For #1, the problem is that in the free operation, VT-d driver
needs to do a lot of clean up in thread context.
- hold a mutex to traverse a list of devices
- clear PASID entry and flush cache
For #2, KVM might be able to deal with spinlocks for updating VMCS
PASID translation table. +Hao
Perhaps two solutions I can think of:
1. Use a cyclic IOASID allocator. The main reason of clean up at free
is to prevent race with IOASID alloc. Similar to PID, 2M IOASID
will take long time overflow. Then we can use atomic notifier and a
deferred workqueue to do IOMMU cleanup. The downside is a large and
growing PASID table, may not be a performance issue since it has TLB.
2. Let VFIO ensure free always happen after unbind. Then there is no
need to do cleanup. But that requires VFIO to keep track of all the
PASIDs within each VM. When the VM terminates, VFIO is responsible for
the clean up. That was Yi's original proposal. I also tried to provide
an IOASID set iterator for VFIO to free the IOASIDs within each VM/set,
but the private data belongs to IOMMU driver.
Thanks,
Jacob
> Most of my other comments are just confusion on my part, I think, as I
> haven't yet properly looked through the VFIO and VT-d changes. I'd
> rather avoid the change to ioasid_find() if possible, because it adds
> a seemingly unnecessary indirection to the fast path, but it's
> probably insignificant.
>
> Thanks,
> Jean
>
> >
> > Part #1:
> > IOASIDs used by each VM fits naturally into an ioasid_set. The usage
> > for per set management requires the following features:
> >
> > - Quota enforcement - This is to prevent one VM from abusing the
> > allocator to take all the system IOASIDs. Though VFIO layer can
> > also enforce the quota, but it cannot cover the usage with both
> > guest and host SVA on the same system.
> >
> > - Stores guest IOASID-Host IOASID mapping within the set. To
> > support live migration, IOASID namespace should be owned by the
> > guest. This requires per IOASID set look up between guest and host
> > IOASIDs. This patchset does not introduce non-identity guest-host
> > IOASID lookup, we merely introduce the infrastructure in per set
> > data.
> >
> > - Set level operations, e.g. when a guest terminates, it is likely
> > to free the entire set. Having a single place to manage the set
> > where the IOASIDs are stored makes iteration much easier.
> >
> >
> > New APIs are:
> > - void ioasid_install_capacity(ioasid_t total);
> > Set the system capacity prior to any allocations. On x86, VT-d
> > driver calls this function to set max number of PASIDs, typically 1
> > million (20 bits).
> >
> > - int ioasid_alloc_system_set(int quota);
> > Host system has a default ioasid_set, during boot it is expected
> > that this default set is allocated with a reasonable quota, e.g.
> > PID_MAX. This default/system set is used for baremetal SVA.
> >
> > - int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int
> > *sid);
> > Allocate a new set with a token, returned sid (set ID) will be
> > used to allocate IOASIDs within the set. Allocation of IOASIDs
> > cannot exceed the quota.
> >
> > - void ioasid_free_set(int sid, bool destroy_set);
> > Free the entire set and notify all users with an option to destroy
> > the set. Set ID can be used for allocation again if not destroyed.
> >
> > - int ioasid_find_sid(ioasid_t ioasid);
> > Look up the set ID from an ioasid. There is no reference held,
> > assuming set has a single owner.
> >
> > - int ioasid_adjust_set(int sid, int quota);
> > Change the quota of the set, new quota cannot be less than the
> > number of IOASIDs already allocated within the set. This is useful
> > when IOASID resource needs to be balanced among VMs.
> >
> > Part #2
> > Notification service. Since IOASIDs are used by many consumers that
> > follow publisher-subscriber pattern, notification is a natural
> > choice to keep states synchronized. For example, on x86 system,
> > guest PASID allocation and bind call results in VFIO IOCTL that can
> > add and change guest-host PASID states. At the same time, IOMMU
> > driver and KVM need to maintain its own PASID contexts. In this
> > case, VFIO is the publisher within the kernel, IOMMU driver and KVM
> > are the subscribers.
> >
> > This patchset introduces a global blocking notifier chain and APIs
> > to operate on. Not all events nor all IOASIDs are of interests to
> > all subscribers. e.g. KVM is only interested in the IOASIDs within
> > its set. IOMMU driver is not ioasid_set aware. A further
> > optimization could be having both global and per set notifier. But
> > consider the infrequent nature of bind/unbind and relatively long
> > process life cycle, this optimization may not be needed at this
> > time.
> >
> > To register/unregister notification blocks, use these two APIs:
> > - int ioasid_add_notifier(struct notifier_block *nb);
> > - void ioasid_remove_notifier(struct notifier_block *nb)
> >
> > To send notification on an IOASID with one of the commands (FREE,
> > BIND/UNBIND, etc.), use:
> > - int ioasid_notify(ioasid_t id, enum ioasid_notify_val cmd);
> >
> > This work is a result of collaboration with many people:
> > Liu, Yi L <yi.l.liu@intel.com>
> > Wu Hao <hao.wu@intel.com>
> > Ashok Raj <ashok.raj@intel.com>
> > Kevin Tian <kevin.tian@intel.com>
> >
> > Thanks,
> >
> > Jacob
> >
> > Jacob Pan (10):
> > iommu/ioasid: Introduce system-wide capacity
> > iommu/vt-d: Set IOASID capacity when SVM is enabled
> > iommu/ioasid: Introduce per set allocation APIs
> > iommu/ioasid: Rename ioasid_set_data to avoid confusion with
> > ioasid_set
> > iommu/ioasid: Create an IOASID set for host SVA use
> > iommu/ioasid: Convert to set aware allocations
> > iommu/ioasid: Use mutex instead of spinlock
> > iommu/ioasid: Introduce notifier APIs
> > iommu/ioasid: Support ioasid_set quota adjustment
> > iommu/vt-d: Register PASID notifier for status change
> >
> > drivers/iommu/intel-iommu.c | 20 ++-
> > drivers/iommu/intel-svm.c | 89 ++++++++--
> > drivers/iommu/ioasid.c | 387
> > +++++++++++++++++++++++++++++++++++++++-----
> > include/linux/intel-iommu.h | 1 + include/linux/ioasid.h |
> > 86 +++++++++- 5 files changed, 522 insertions(+), 61 deletions(-)
> >
> > --
> > 2.7.4
> >
[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu^permalinkrawreply [flat|nested] 57+ messages in thread

*Re: [PATCH 00/10] IOASID extensions for guest SVA
2020-04-01 23:38 ` Jacob Pan@ 2020-04-02 12:26 ` Jean-Philippe Brucker
2020-04-02 16:09 ` Jacob Pan0 siblings, 1 reply; 57+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-02 12:26 UTC (permalink / raw)
To: Jacob Pan
Cc: Tian, Kevin, Raj Ashok, Jean-Philippe Brucker, LKML, iommu,
Alex Williamson, Wu, Hao, David Woodhouse, Jonathan Cameron
On Wed, Apr 01, 2020 at 04:38:42PM -0700, Jacob Pan wrote:
> On Wed, 1 Apr 2020 16:03:01 +0200
> Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:
>
> > Hi Jacob,
> >
> > On Wed, Mar 25, 2020 at 10:55:21AM -0700, Jacob Pan wrote:
> > > IOASID was introduced in v5.5 as a generic kernel allocator service
> > > for both PCIe Process Address Space ID (PASID) and ARM SMMU's Sub
> > > Stream ID. In addition to basic ID allocation, ioasid_set was
> > > introduced as a token that is shared by a group of IOASIDs. This
> > > set token can be used for permission checking but lack of some
> > > features needed by guest Shared Virtual Address (SVA). In addition,
> > > IOASID support for life cycle management is needed among multiple
> > > users.
> > >
> > > This patchset introduces two extensions to the IOASID code,
> > > 1. IOASID set operations
> > > 2. Notifications for IOASID state synchronization
> >
> > My main concern with this series is patch 7 changing the spinlock to a
> > mutex, which prevents SVA from calling ioasid_free() from the RCU
> > callback of MMU notifiers. Could we use atomic notifiers, or do the
> > FREE notification another way?
> >
> Maybe I am looking at the wrong code, I thought
> mmu_notifier_ops.free_notifier() is called outside spinlock with
> call_srcu(), which will be invoked in the thread context.
> in mmu_notifier.c mmu_notifier_put()
> spin_unlock(&mm->notifier_subscriptions->lock);
>
> call_srcu(&srcu, &subscription->rcu, mmu_notifier_free_rcu);
free_notifier() is called from RCU callback, and according to
Documentation/RCU/checklist.txt:
5. If call_rcu() or call_srcu() is used, the callback function will
be called from softirq context. In particular, it cannot block.
When applying the patch I get the sleep-in-atomic warning:
[ 87.861793] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
[ 87.863293] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 74, name: kworker/6:1
[ 87.863993] 2 locks held by kworker/6:1/74:
[ 87.864493] #0: ffffff885ac12538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: process_one_work+0x740/0x1880
[ 87.865593] #1: ffffff88591efd30 ((work_completion)(&sdp->work)){+.+.}-{0:0}, at: process_one_work+0x740/0x1880
[ 87.866993] CPU: 6 PID: 74 Comm: kworker/6:1 Not tainted 5.6.0-next-20200331+ #121
[ 87.867393] Hardware name: FVP Base (DT)
[ 87.867893] Workqueue: rcu_gp srcu_invoke_callbacks
[ 87.868393] Call trace:
[ 87.868793] dump_backtrace+0x0/0x310
[ 87.869293] show_stack+0x14/0x20
[ 87.869693] dump_stack+0x124/0x180
[ 87.870193] ___might_sleep+0x2ac/0x428
[ 87.870693] __might_sleep+0x88/0x168
[ 87.871094] __mutex_lock+0xa0/0x1270
[ 87.871593] mutex_lock_nested+0x1c/0x28
[ 87.872093] ioasid_free+0x28/0x48
[ 87.872493] io_mm_free+0x1d0/0x608
[ 87.872993] mmu_notifier_free_rcu+0x74/0xe8
[ 87.873393] srcu_invoke_callbacks+0x1d0/0x2c8
[ 87.873893] process_one_work+0x858/0x1880
[ 87.874393] worker_thread+0x314/0xcd0
[ 87.874793] kthread+0x318/0x400
[ 87.875293] ret_from_fork+0x10/0x18
>
> Anyway, if we have to use atomic. I tried atomic notifier first, there
> are two subscribers to the free event on x86.
> 1. IOMMU
> 2. KVM
>
> For #1, the problem is that in the free operation, VT-d driver
> needs to do a lot of clean up in thread context.
> - hold a mutex to traverse a list of devices
> - clear PASID entry and flush cache
>
> For #2, KVM might be able to deal with spinlocks for updating VMCS
> PASID translation table. +Hao
>
> Perhaps two solutions I can think of:
> 1. Use a cyclic IOASID allocator. The main reason of clean up at free
> is to prevent race with IOASID alloc. Similar to PID, 2M IOASID
> will take long time overflow. Then we can use atomic notifier and a
> deferred workqueue to do IOMMU cleanup. The downside is a large and
> growing PASID table, may not be a performance issue since it has TLB.
That might be a problem for SMMU, which has 1024 * 64kB leaf PASID tables,
for a total of 64MB per endpoint if there is too much fragmentation in
the IOASID space.
> 2. Let VFIO ensure free always happen after unbind. Then there is no
> need to do cleanup. But that requires VFIO to keep track of all the
> PASIDs within each VM. When the VM terminates, VFIO is responsible for
> the clean up. That was Yi's original proposal. I also tried to provide
> an IOASID set iterator for VFIO to free the IOASIDs within each VM/set,
> but the private data belongs to IOMMU driver.
Not really my place to comment on this, but I find it nicer to use the
same gpasid_unbind() path when VFIO frees a PASID as when the guest
explicitly unbinds before freeing.
Thanks,
Jean
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu^permalinkrawreply [flat|nested] 57+ messages in thread

*Re: [PATCH 05/10] iommu/ioasid: Create an IOASID set for host SVA use
2020-04-13 22:06 ` Jacob Pan@ 2020-04-15 15:10 ` Jean-Philippe Brucker0 siblings, 0 replies; 57+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-15 15:10 UTC (permalink / raw)
To: Jacob Pan
Cc: Tian, Kevin, Raj Ashok, Jean-Philippe Brucker, LKML, iommu,
Alex Williamson, David Woodhouse, Jonathan Cameron
On Mon, Apr 13, 2020 at 03:06:31PM -0700, Jacob Pan wrote:
> > > > But quotas are only necessary for VMs, when the host shares the
> > > > PASID space with them (which isn't a use-case for Arm systems as
> > > > far as I know, each VM gets its own PASID space).
> > > Is there a host-guest PASID translation? or the PASID used by the
> > > VM is physical PASID? When a page request comes in to SMMU, how
> > > does it know the owner of the PASID if PASID range can overlap
> > > between host and guest?
> >
> > We assign PCI functions to VMs, so Page Requests are routed with
> > RID:PASID, not PASID alone. The SMMU finds the struct device
> > associated with the RID, and submits the fault with
> > iommu_report_device_fault(). If the VF is assigned to a VM, then the
> > page request gets injected into the VM, otherwise it uses the host
> > IOPF handler
> >
> Got it, VM private PASID space works then.
> For VM, the IOASID search is within the VM ioasid_set.
> For SVA, the IOASID search is within host default set.
> Should be faster than global search once we have per set xarray.
> I guess the PASID table is per VM instead of per RID (device)? Sorry if
> you already answered it before.
The PASID table is per IOMMU domain, so it's closer to per RID than per
VM, unless userspace puts all devices in the same VFIO container (hence in
the same IOMMU domain).
Thanks,
Jean
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu^permalinkrawreply [flat|nested] 57+ messages in thread