The sloppy cleanup of migration PTEs in V2 was to no ones liking, the fork()change was unnecessary and Rik devised a locking scheme for anon_vma thatwas more robust for transparent hugepage support than was purposed in V2.

Andrew, patch one of this series is about the correctness of locking ofanon_vma with respect to migration. While I am not aware of any reproductioncases, it is potentially racy. Rik will probably release another versionsso I'm not expecting this one to be picked up but I'm including it forcompleteness.

Patch two of this series addresses the swapops bug reported that is a racebetween migration due to compaction and execve where pages get migrated fromthe temporary stack before it is moved. Technically, it would be best if theanon_vma lock was held while the temporary stack is moved but it would makeexec significantly more complex, particularly in move_page_tables to handlea corner case in migration. I don't think adding complexity is justified. Ifthere are no objections, please pick it up and place it between the patches

There are a number of races between migration and other operations that mean amigration PTE can be left behind. Broadly speaking, migration works by lockinga page, unmapping it, putting a migration PTE in place that looks like a swapentry, copying the page and remapping the page removing the old migration PTE.If a fault occurs, the faulting process waits until migration completes.

The problem is that there are some races that either allow migration PTEs tobe copied or a migration PTE to be left behind. Migration still completes andthe page is unlocked but later a fault will call migration_entry_to_page()and BUG() because the page is not locked. This series aims to close someof these races.

Patch 1 notes that with the anon_vma changes, taking one lock is not necessarily enough to guard against changes in all VMAs on a list. It introduces a new lock to allow taking the locks on all anon_vmas to exclude migration from VMA changes.

Patch 2 notes that while a VMA is moved under the anon_vma lock, the page tables are not similarly protected. To avoid migration PTEs being left behind, pages within a temporary stack are simply not migrated.

The reproduction case was as follows;

1. Run kernel compilation in a loop2. Start four processes, each of which creates one mapping. The three stress different aspects of the problem. The operations they undertake are; a) Forks a hundred children, each of which faults the mapping Purpose: stress tests migration pte removal b) Forks a hundred children, each which punches a hole in the mapping and faults what remains Purpose: stress test VMA manipulations during migration c) Forks a hundred children, each of which execs and calls echo Purpose: stress test the execve race d) Size the mapping to be 1.5 times physical memory. Constantly memset it Purpose: stress swapping3. Constantly compact memory using /proc/sys/vm/compact_memory so migration is active all the time. In theory, you could also force this using sys_move_pages or memory hot-remove but it'd be nowhere near as easy to test.