With PTI enabled, the LDT must be mapped in the usermode tables somewhere.The LDT is per process, i.e. per mm.

An earlier approach mapped the LDT on context switch into a fixmap area,but that's a big overhead and exhausted the fixmap space when NR_CPUS gotbig.

Take advantage of the fact that there is an address space hole whichprovides a completely unused pgd. Use this pgd to manage per-mm LDTmappings.

This has a down side: the LDT isn't (currently) randomized, and an attackthat can write the LDT is instant root due to call gates (thanks, AMD, forleaving call gates in AMD64 but designing them wrong so they're only usefulfor exploits). This can be mitigated by making the LDT read-only orrandomizing the mapping, either of which is strightforward on top of thispatch.

This will significantly slow down LDT users, but that shouldn't matter forimportant workloads -- the LDT is only used by DOSEMU(2), Wine, and veryold libc implementations.

#else /*- * User space process size. 47bits minus one guard page. The guard- * page is necessary on Intel CPUs: if a SYSCALL instruction is at- * the highest possible canonical userspace address, then that- * syscall will enter the kernel with a non-canonical return- * address, and SYSRET will explode dangerously. We avoid this- * particular problem by preventing anything from being mapped- * at the maximum canonical address.+ * User space process size. This is the first address outside the user range.+ * There are a few constraints that determine this:+ *+ * On Intel CPUs, if a SYSCALL instruction is at the highest canonical+ * address, then that syscall will enter the kernel with a+ * non-canonical return address, and SYSRET will explode dangerously.+ * We avoid this particular problem by preventing anything executable+ * from being mapped at the maximum canonical address.+ *+ * On AMD CPUs in the Ryzen family, there's a nasty bug in which the+ * CPUs malfunction if they execute code from the highest canonical page.+ * They'll speculate right off the end of the canonical space, and+ * bad things happen. This is worked around in the same way as the+ * Intel problem.+ *+ * With page table isolation enabled, we map the LDT in ... [stay tuned] */ #define TASK_SIZE_MAX ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE)