[PATCH 1/3] kmemcheck: add the kmemcheck core - Kernel

This is a discussion on [PATCH 1/3] kmemcheck: add the kmemcheck core - Kernel ; Hi,
This is kmemcheck against 2.6.25 as of today. This should be more or less
what's in x86.git, the most notable exception being that x86.git contains the
development history of kmemcheck since v4 or so, while this is the condensed
...

[PATCH 1/3] kmemcheck: add the kmemcheck core

Hi,

This is kmemcheck against 2.6.25 as of today. This should be more or less
what's in x86.git, the most notable exception being that x86.git contains the
development history of kmemcheck since v4 or so, while this is the condensed
three-patch set.

General description: kmemcheck is a patch to the linux kernel that
detects use of uninitialized memory. It does this by trapping every
read and write to memory that was allocated dynamically (e.g. using
kmalloc()). If a memory address is read that has not previously been
written to, a message is printed to the kernel log.

diff --git a/Documentation/kmemcheck.txt b/Documentation/kmemcheck.txt
new file mode 100644
index 0000000..a3c9a83
--- /dev/null
+++ b/Documentation/kmemcheck.txt
@@ -0,0 +1,101 @@
+Technical description
+=====================
+
+kmemcheck works by marking memory pages non-present. This means that whenever
+somebody attempts to access the page, a page fault is generated. The page
+fault handler notices that the page was in fact only hidden, and so it calls
+on the kmemcheck code to make further investigations.
+
+When the investigations are completed, kmemcheck "shows" the page by marking
+it present (as it would be under normal circumstances). This way, the
+interrupted code can continue as usual.
+
+But after the instruction has been executed, we should hide the page again, so
+that we can catch the next access too! Now kmemcheck makes use of a debugging
+feature of the processor, namely single-stepping. When the processor has
+finished the one instruction that generated the memory access, a debug
+exception is raised. From here, we simply hide the page again and continue
+execution, this time with the single-stepping feature turned off.
+
+
+Changes to the memory allocator (SLUB)
+======================================
+
+kmemcheck requires some assistance from the memory allocator in order to work.
+The memory allocator needs to
+
+1. Request twice as much memory as would normally be needed. The bottom half
+ of the memory is what the user actually sees and uses; the upper half
+ contains the so-called shadow memory, which stores the status of each byte
+ in the bottom half, e.g. initialized or uninitialized.
+2. Tell kmemcheck which parts of memory should be marked uninitialized. There
+ are actually a few more states, such as "not yet allocated" and "recently
+ freed".
+
+If a slab cache is set up using the SLAB_NOTRACK flag, it will never return
+memory that can take page faults because of kmemcheck.
+
+If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still
+request memory with the __GFP_NOTRACK flag. This does not prevent the page
+faults from occurring, however, but marks the object in question as being
+initialized so that no warnings will ever be produced for this object.
+
+
+Problems
+========
+
+The most prominent problem seems to be that of bit-fields. kmemcheck can only
+track memory with byte granularity. Therefore, when gcc generates code to
+access only one bit in a bit-field, there is really no way for kmemcheck to
+know which of the other bits will be used or thrown away. Consequently, there
+may be bogus warnings for bit-field accesses. There is some experimental
+support to detect this automatically, though it is probably better to work
+around this by explicitly initializing whole bit-fields at once.
+
+Some allocations are used for DMA. As DMA doesn't go through the paging
+mechanism, we have absolutely no way to detect DMA writes. This means that
+spurious warnings may be seen on access to DMA memory. DMA allocations should
+be annotated with the __GFP_NOTRACK flag or allocated from caches marked
+SLAB_NOTRACK to work around this problem.
+
+
+Parameters
+==========
+
+In addition to enabling CONFIG_KMEMCHECK before the kernel is compiled, the
+parameter kmemcheck=1 must be passed to the kernel when it is started in order
+to actually do the tracking. So by default, there is only a very small
+(probably negligible) overhead for enabling the config option.
+
+Similarly, kmemcheck may be turned on or off at run-time using, respectively:
+
+echo 1 > /proc/sys/kernel/kmemcheck
+ and
+echo 0 > /proc/sys/kernel/kmemcheck
+
+Note that this is a lazy setting; once turned off, the old allocations will
+still have to take a single page fault exception before tracking is turned off
+for that particular page. Enabling kmemcheck on will only enable tracking for
+allocations made from that point onwards.
+
+The default mode is the one-shot mode, where only the first error is reported
+before kmemcheck is disabled. This mode can be enabled by passing kmemcheck=2
+to the kernel at boot, or running
+
+echo 2 > /proc/sys/kernel/kmemcheck
+
+when the kernel is already running.
+
+
+Future enhancements
+===================
+
+There is already some preliminary support for catching use-after-free errors.
+What still needs to be done is delaying kfree() so that memory is not
+reallocated immediately after freeing it. [Suggested by Pekka Enberg.]
+
+It should be possible to allow SMP systems by duplicating the page tables for
+each processor in the system. This is probably extremely difficult, however.
+[Suggested by Ingo Molnar.]
+
+Support for instruction set extensions like XMM, SSE2, etc.
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 702eb39..6256b2e 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -134,6 +134,81 @@ config IOMMU_LEAK
Add a simple leak tracer to the IOMMU code. This is useful when you
are debugging a buggy device driver that leaks IOMMU mappings.

+config KMEMCHECK
+ bool "kmemcheck: trap use of uninitialized memory"
+ depends on X86_32
+ depends on !X86_USE_3DNOW
+ depends on !CC_OPTIMIZE_FOR_SIZE
+ depends on !DEBUG_PAGEALLOC && SLUB
+ select FRAME_POINTER
+ select STACKTRACE
+ default n
+ help
+ This option enables tracing of dynamically allocated kernel memory
+ to see if memory is used before it has been given an initial value.
+ Be aware that this requires half of your memory for bookkeeping and
+ will insert extra code at *every* read and write to tracked memory
+ thus slow down the kernel code (but user code is unaffected).
+
+ The kernel may be started with kmemcheck=0 or kmemcheck=1 to disable
+ or enable kmemcheck at boot-time. If the kernel is started with
+ kmemcheck=0, the large memory and CPU overhead is not incurred.
+
+choice
+ prompt "kmemcheck: default mode at boot"
+ depends on KMEMCHECK
+ default KMEMCHECK_ONESHOT_BY_DEFAULT
+ help
+ This option controls the default behaviour of kmemcheck when the
+ kernel boots and no kmemcheck= parameter is given.
+
+config KMEMCHECK_DISABLED_BY_DEFAULT
+ bool "disabled"
+ depends on KMEMCHECK
+
+config KMEMCHECK_ENABLED_BY_DEFAULT
+ bool "enabled"
+ depends on KMEMCHECK
+
+config KMEMCHECK_ONESHOT_BY_DEFAULT
+ bool "one-shot"
+ depends on KMEMCHECK
+ help
+ In one-shot mode, only the first error detected is reported before
+ kmemcheck is disabled.
+
+endchoice
+
+config KMEMCHECK_QUEUE_SIZE
+ int "kmemcheck: error queue size"
+ depends on KMEMCHECK
+ default 64
+ help
+ Select the maximum number of errors to store in the queue. This
+ queue will be emptied once every second, so this is effectively a
+ limit on how many reports to print in one go. Note however, that
+ if the number of errors occuring between two bursts is larger than
+ this number, the extra error reports will get lost.
+
+config KMEMCHECK_PARTIAL_OK
+ bool "kmemcheck: allow partially uninitialized memory"
+ depends on KMEMCHECK
+ default y
+ help
+ This option works around certain GCC optimizations that produce
+ 32-bit reads from 16-bit variables where the upper 16 bits are
+ thrown away afterwards. This may of course also hide some real
+ bugs.
+
+config KMEMCHECK_BITOPS_OK
+ bool "kmemcheck: allow bit-field manipulation"
+ depends on KMEMCHECK
+ default n
+ help
+ This option silences warnings that would be generated for bit-field
+ accesses where not all the bits are initialized at the same time.
+ This may also hide some real bugs.
+
#
# IO delay types:
#
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 4eb5ce8..e1fcc1e 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -86,6 +86,8 @@ endif
obj-$(CONFIG_SCx200) += scx200.o
scx200-y += scx200_32.o