Your example is missing some important detail. When I was looking at thisI thought of the same scenario because initially I thought this might bethe problem Dave's test case was hitting. Obviously I then proceeded tomess up anyway so take this with a grain of salt but why is this particularsituation not prevented by vma_merge? is_mergeable_vma() should have spottedthat the vm_files differed and mbind_range() should not have triedsharing them.

> Look at alloc_pages_vma(), it uses get_vma_policy() and mpol_cond_put() pair> for maintaining mempolicy refcount. The current rule is, get_vma_policy() does> NOT increase a refcount if the policy is not attached shmem vma and mpol_cond_put()> DOES decrease a refcount if mpol has MPOL_F_SHARED.>

The rules about refcounting are indeed annoying. It would be a lot easierto understand if the reference counting was unconditional but then everypage allocation in a large VMA would also bounce the cacheline storingthe count which would just generate a new bug later.

> In above case, vma1 is not shmem vma and vma->policy has MPOL_F_SHARED! then,> get_vma_policy() doesn't increase a refcount and mpol_cond_put() decrease a > refcount whenever alloc_page_vma() is called.> > The bug was introduced by commit 52cd3b0740 (mempolicy: rework mempolicy Reference> Counting) at 4 years ago.> > More unfortunately mempolicy has one another serious broken. Currently,> mempolicy rebind logic (it is called from cpuset rebinding) ignore a refcount> of mempolicy and override it forcibly. Thus, any mempolicy sharing may> cause mempolicy corruption. The bug was introduced by commit 68860ec10b> (cpusets: automatic numa mempolicy rebinding) at 7 years ago.>

I suspect these bugs were not noticed because the shmem policies aretypically large and very long lived without much use of mbind() butthat's not an excuse.

If we're going to change this, change the policy_vma() name as well toset_vma_policy. We currently have policy_vma() and vma_policy() which meantotally different things which is partially why I deleted it entirely thefirst time around. It's a small issue but it might make mempolicy.c 0.0001%easier to follow.