Date: Fri, 12 Jan 2018 17:19:23 +0300
From: Alexander Popov <alex.popov@...ux.com>
To: kernel-hardening@...ts.openwall.com,
Kees Cook <keescook@...omium.org>,
PaX Team <pageexec@...email.hu>,
Brad Spengler <spender@...ecurity.net>,
Ingo Molnar <mingo@...nel.org>,
Andy Lutomirski <luto@...nel.org>,
Tycho Andersen <tycho@...ho.ws>,
Laura Abbott <labbott@...hat.com>,
Mark Rutland <mark.rutland@....com>,
Ard Biesheuvel <ard.biesheuvel@...aro.org>,
Borislav Petkov <bp@...en8.de>,
Thomas Gleixner <tglx@...utronix.de>,
"H . Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
"Dmitry V . Levin" <ldv@...linux.org>,
x86@...nel.org,
alex.popov@...ux.com
Subject: [PATCH RFC v7 0/6] Introduce the STACKLEAK feature and a test for it
This is the 7th version of the patch series introducing STACKLEAK to the
mainline kernel. STACKLEAK is a security feature developed by Grsecurity/PaX
(kudos to them), which:
- reduces the information that can be revealed through kernel stack leak bugs;
- blocks some uninitialized stack variable attacks (e.g. CVE-2010-2963);
- introduces some runtime checks for kernel stack overflow detection.
Changes in v7
=============
1. Improved erase_kstack(). Now it detects which stack we are currently using.
If we are on the thread stack, erase_kstack() writes the poison value up
to the stack pointer. If we are not on the thread stack (we are on the
trampoline stack introduced by Andy Lutomirski), erase_kstack() writes
the poison value up to the cpu_current_top_of_stack.
N.B. Right now there are two situations when erase_kstack() is called
on the thread stack:
- from sysret32_from_system_call in entry_64_compat.S;
- from syscall_trace_enter() (see the 3rd patch of this series).
2. Introduced STACKLEAK_POISON_CHECK_DEPTH macro and BUILD_BUG_ON() checking
its consistency with CONFIG_STACKLEAK_TRACK_MIN_SIZE.
3. Added "disable" option for the STACKLEAK gcc plugin (asked by Laura Abbott).
4. Improved CONFIG_STACKLEAK_METRICS. Now /proc/<pid>/stack_depth shows
the maximum kernel stack consumption for the current and previous syscalls.
Although this information is not precise, it can be useful for estimating
the STACKLEAK performance impact for different workloads.
5. Removed redundant erase_kstack() calls from syscall_trace_enter().
There is no need to erase the stack in syscall_trace_enter() when it
returns -1 (thanks to Dmitry Levin for noticing that).
6. Introduced MIN_STACK_LEFT to get rid of hardcoded numbers in check_alloca().
STACKLEAK instrumentation statistics
====================================
These numbers were obtained for the 4th version of the patch series.
Size of vmlinux (x86_64_defconfig):
file size:
- STACKLEAK disabled: 35014784 bytes
- STACKLEAK enabled: 35044952 bytes (+0.086%)
.text section size (calculated by size utility):
- STACKLEAK disabled: 10752983
- STACKLEAK enabled: 11062221 (+2.876%)
The readelf utility shows 45602 functions in vmlinux. The STACKLEAK gcc plugin
inserted 36 check_alloca() calls and 1265 track_stack() calls (42274 calls are
inserted during GIMPLE pass and 41009 calls are deleted during RTL pass).
So 2.853% of functions are instrumented.
STACKLEAK performance impact
============================
The STACKLEAK description in Kconfig includes:
"The tradeoff is the performance impact: on a single CPU system kernel
compilation sees a 1% slowdown, other systems and workloads may vary and you are
advised to test this feature on your expected workload before deploying it".
Here are the results of a brief performance test on x86_64 (for the 2nd version
of the patch). The numbers are very different because the STACKLEAK performance
penalty depends on the workloads.
Hardware: Intel Core i7-4770, 16 GB RAM
Test #1: building the Linux kernel with Ubuntu config (time make -j9)
Result on 4.11-rc8:
real 32m14.893s
user 237m30.886s
sys 11m12.273s
Result on 4.11-rc8+stackleak:
real 32m26.881s (+0.62%)
user 238m38.926s (+0.48%)
sys 11m36.426s (+3.59%)
Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P
Average result on 4.11-rc8: 8.71s
Average result on 4.11-rc8+stackleak: 9.08s (+4.29%)
Previous changes
================
Changes in v6
1. Examined syscall entry/exit paths.
- Added missing erase_kstack() call at ret_from_fork() for x86_32.
- Added missing erase_kstack() call at syscall_trace_enter().
- Solved questions previously marked with TODO.
2. Rebased onto v4.15-rc2, which includes Andy Lutomirski's entry changes.
Andy removed sp0 from thread_struct for x86_64, which was the only issue
during rebasing.
3. Removed the recursive BUG() in track_stack() that was caused by the code
instrumentation. Instead, CONFIG_GCC_PLUGIN_STACKLEAK now implies
CONFIG_VMAP_STACK and CONFIG_SCHED_STACK_END_CHECK, which seems to be
an optimal solution.
4. Put stack erasing in syscall_trace_enter() into a separate patch and
fixed my mistake with secure_computing() (found by Tycho Andersen).
5. After some experiments, kept the asm implementation of erase_kstack(),
because it gives a full control over the stack for clearing it neatly
and doesn't offend KASAN.
6. Improved the comments describing STACKLEAK.
Changes in v5 (mostly according to the feedback from Ingo Molnar)
1. Introduced the CONFIG_STACKLEAK_METRICS providing STACKLEAK information
about tasks via the /proc file system. That information can be useful for
estimating the STACKLEAK performance impact for different workloads.
In particular, /proc/<pid>/lowest_stack shows the current lowest_stack
value and its final value from the previous syscall.
2. Introduced a single check_alloca() implementation working for both
x86_64 and x86_32.
3. Fixed coding style issues and did some refactoring in the STACKLEAK
gcc plugin.
4. Put the gcc plugin and the kernel stack erasing into separate (working)
patches.
Changes in v4
1. Introduced the CONFIG_STACKLEAK_TRACK_MIN_SIZE parameter instead of
hard-coded track-lowest-sp.
2. Carefully looked into the assertions in track_stack() and check_alloca().
- Fixed the incorrect BUG() condition in track_stack(), thanks to Tycho
Andersen. Also disabled that check if CONFIG_VMAP_STACK is enabled.
- Fixed the surplus and erroneous code for calculating stack_left in
check_alloca() on x86_64. That code repeats the work which is already
done in get_stack_info() and it misses the fact that different
exception stacks on x86_64 have different size.
3. Introduced two lkdtm tests for the STACKLEAK feature (developed together
with Tycho Andersen).
4. Added info about STACKLEAK to Documentation/security/self-protection.rst.
5. Improved the comments.
6. Decided not to change "unsigned long sp = (unsigned long)&sp" to
current_stack_pointer. The original variant is more platform independent
since current_stack_pointer has different type on x86 and arm.
Changes in v3
1. Added the detailed comments describing the STACKLEAK gcc plugin.
Read the plugin from the bottom up, like you do for Linux kernel drivers.
Additional information:
- gcc internals documentation available from gcc sources;
- wonderful slides by Diego Novillo
https://www.airs.com/dnovillo/200711-GCC-Internals/
- nice introduction to gcc plugins at LWN
https://lwn.net/Articles/457543/
2. Improved the commit message and Kconfig description according to the
feedback from Kees Cook. Also added the original notice describing
the performance impact of the STACKLEAK feature.
3. Removed the arch-specific ix86_cmodel check in stackleak_track_stack_gate().
It caused skipping the kernel code instrumentation for i386.
4. Fixed a minor mistake in stackleak_tree_instrument_execute().
First versions of the plugin used ENTRY_BLOCK_PTR_FOR_FN(cfun)->next_bb
to get the basic block with the function prologue. That was not correct
since the control flow graph edge from ENTRY_BLOCK_PTR doesn't always
go to that basic block.
So later it was changed to single_succ(ENTRY_BLOCK_PTR_FOR_FN(cfun)),
but not completely. next_bb was still used for entry_bb assignment,
which could cause the wrong value of the prologue_instrumented variable.
I've reported this issue to Grsecurity and proposed the fix for it, but
unfortunately didn't get any reply.
5. Introduced the STACKLEAK_POISON macro and renamed the config option
according to the feedback from Kees Cook.
Ideas for further work
======================
- Think of erasing stack on the way out of exception handlers (idea by
Andy Lutomirski).
- Think of erasing the kernel stack after invoking EFI runtime services
(idea by Mark Rutland).
- Try to port STACKLEAK to arm64 (Laura Abbott is working on it).
Alexander Popov (6):
x86/entry: Add STACKLEAK erasing the kernel stack at the end of
syscalls
gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack
x86/entry: Erase kernel stack in syscall_trace_enter()
lkdtm: Add a test for STACKLEAK
fs/proc: Show STACKLEAK metrics in the /proc file system
doc: self-protection: Add information about STACKLEAK feature
Documentation/security/self-protection.rst | 23 +-
arch/Kconfig | 53 ++++
arch/x86/Kconfig | 1 +
arch/x86/entry/common.c | 7 +
arch/x86/entry/entry_32.S | 69 +++++
arch/x86/entry/entry_64.S | 113 +++++++
arch/x86/entry/entry_64_compat.S | 11 +
arch/x86/include/asm/processor.h | 7 +
arch/x86/kernel/asm-offsets.c | 11 +
arch/x86/kernel/dumpstack.c | 20 ++
arch/x86/kernel/process_32.c | 8 +
arch/x86/kernel/process_64.c | 8 +
drivers/misc/Makefile | 3 +
drivers/misc/lkdtm.h | 4 +
drivers/misc/lkdtm_core.c | 2 +
drivers/misc/lkdtm_stackleak.c | 139 +++++++++
fs/exec.c | 33 ++
fs/proc/base.c | 18 ++
include/linux/compiler.h | 6 +
scripts/Makefile.gcc-plugins | 3 +
scripts/gcc-plugins/stackleak_plugin.c | 471 +++++++++++++++++++++++++++++
21 files changed, 1001 insertions(+), 9 deletions(-)
create mode 100644 drivers/misc/lkdtm_stackleak.c
create mode 100644 scripts/gcc-plugins/stackleak_plugin.c
--
2.7.4