From Heap to RIP

I know, I know. With every new paper/post/challenge around ptmalloc2 if feels like everything
has been said. And yes, one rarely needs to understand the Malloc
Maleficarum to exploit a heap bug. Neither does one need to get PC control from
wizardly metadata overwrites to exploit applications where function pointers,
indexed arrays and the like already live on the heap.

Nowadays exploiting heap structures is often nothing more than finding
a way to overwrite interesting structures
residing on the heap, by having old chunks overwritten or new chunks allocated
within areas under attacker control. Pure heap exploits have been discarded as
almost impossible for quite a few years now.

So, what about this application where you have a full unbounded overflow on the
heap but no interesting data structure worth targeting? Well, with glibc's
malloc it may still be generically exploitable for code execution if it provides
good enough heap massaging primitives.

Disclaimer

Of course, we are not talking about a one-shot unlink() anymore. The two example
techniques I will expose here require a strong control of the heap and several
malloc/free/metadata overwrites to reach our goal.

By strong control, I mean having initial control (write) over an area of the heap
spanning a few chunks and further perform arbitrary free/malloc sequences. Free can
be performed on any address previously returned by malloc and we need to be able to
normally write into newly allocated chunks (but don't require special
control over the size of allocations).

If we don't have a libc leak, we need to add to that the ability to read into
a free chunk, or an allocated chunk that hasn't been initialized at some point.

A ptmalloc2 refresher

The global algorithm of ptmalloc2, the structure of heap chunks and the type
of chunks have been described many,
many,
many times (and more) so I'll
consider this as a pre-requisite to understanding what follows.
What I feel hasn't been described as many times is what goes on within the
blackbox of malloc, i.e. what the unsorted chunks list is, what the small and
large bins are, how malloc chooses chunks to return, etc.

As one may know, free() is a fairly quick operation. Its main purpose is
to place the newly freed chunk into the "unsorted chunks" doubly-linked
list. It may also merge that chunk with the chunks right before and after if
they were already free, after having unlinked those from the freelists they
were in. The resulting chunk finishes up in the unsorted chunk list in any case.

The unsorted chunks list is a temporary list, where all kind of free chunks
await to be "sorted" into bins. bins are doubly-linked list as well, and each bin
is tied to a size range. The bin associated with the requested size is generally
checked first by malloc to allow for an efficient allocation.

All of this is done in malloc(). When malloc is called, it tries different
strategies to get a big enough memory space to fit the allocation request. At a
high level, its strategy is, in order:

requested size is a fast chunk: try to get a chunk out of the free fast bins list

requested size is a small chunk: try to get a chunk out of the small bin of that size

until unsorted list is empty, get next unsorted chunk

if requested size is small, if that chunk is the remainder of the chunk that
was last split, and that chunk is big enough, return it ("last_remainder")

if the chunk is of the same size as the size requested, return it

otherwise insert the chunk in the bin associated with its size

try all the non-empty sorted bins for larger chunks and split them if required

allocate from the top chunk (last chunk of the heap which is always free)

The "bins" are essentially 127 doubly-linked lists. The first one is the
unsorted chunks list, then half of the remaining lists are for small bins (one
for each 0x10 size), and the rest service large bin requests (with a wider
range of size in each bin). Each sorted large chunk is on a secondary doubly-linked
list, which sorts chunks in a large bin by size (that way malloc only has to
check the first chunk of each bin to know if the bin can satisfy the request).

Memory-wise, these structures live in the malloc_state of the current arena.
The main arena lives in libc's .data section and this will be the case for most non-threaded
applications.

Getting forged addresses into the freelists is easy, but getting them
out is a different story. Something interesting is that chunks need to satisfy
different conditions to be taken out of the different free lists. In particular:

a chunk out of the unsorted chunks list needs to have a size > 0x10 and
smaller than the total amount of memory allocated in the current heap

a chunk out of a sorted bin needs to have a valid linked list in its bk field
(i.e. chunk->bk->fd == chunk)

a chunk out of a "generic" freelist (which can be any of the above) needs to have both of its fields being
valid linked lists (i.e. chunk->bk->fd == chunk and chunk->fd->bk == chunk),
plus a coherent next prev_size field (the latest hardening measure). This is the
infamous "unlink" macro.

I'll now outline two techniques which will take advantage of these "inconsistencies"
to achieve arbitrary code execution. In both cases our aim is to obtain control
of the memory area where the sorted bin that will be used for the next
allocation request is.

Technique 1: returning the unsorted_chunks chunk

The main thing that I leverage in this post is the fact that getting
a chunk out of the sorted bins only requires that chunk to be part of
a valid linked list. This is particularly interesting when taking into account
that the malloc_state of the arena itself has many memory addresses with correctly
linked chunks in the form of bins.

In this first technique, we will also use the fact that unlinking a chunk from
the unsorted chunks list sets chunk->bk->fd to the unsorted list chunk itself
without further checks. We can leverage this to insert the unsorted list chunk
in a sorted bin and get it out with the next alloc (the linked list condition
will be satisfied unless the unsorted chunk is corrupted from previous allocs).

When controlling the unsorted chunks chunk and the first few small bins, we can
get another chunk farther away in the bss as long as we can set its size to
match the next allocation. By repeating this process we can control any bin.

Proof of Concept #1

I abstracted out the vulnerable application, structure offsets as well as
reading and writing memory into wrapper classes, so that the snippets
hereafter read as near pseudocode (hopefully). Also, for the sake of readability
I assume that the initial heap bug (double free, use-after-free, overflow) +
the ability to read/write into chunks and free them gets us free reign over
chunks' metadata (excluding the first chunk under our control). This can be done
in a variety of ways depending on the original bug and I feel proving this would
unnecessarily contextualize the techniques here. However we won't read/write out
of bounds of the initially controlled area or new chunks. I use fixed size allocations
of 0x300 bytes, not very important but I feel it makes the technique more generic without
inducing significant hurdles.

For each snippet I also report debug traces to show what happens at a lower level.

First things first, we allocate 5 small chunks on the heap and free the 2nd and
4th ones to get libc and heap leaks. Note that, since no chunk is free at that
point, the libc leak is the address of the "top" field of the arena's malloc_state.
This is also the chunk address of the unsorted chunks list (unsorted_chunks->fd - 0x10). We
actually don't need the heap leak, but since it's there..

When taking the chunk out the unsorted bin, we have a chunk->bk->fd =
&unsorted_chunks, which is why we can get the unsorted_chunks address written
almost anywhere. If we get it written to the right small bin, the next
allocation will return a pointer to the unsorted_chunks chunk's fd field.

So we have control of the sorted bin that will serve the next allocation
, as well as the memory that is right after. We will actually end up
in a very similar state after the next technique so we'll tackle this at the
end (and you won't believe what happens next).

Technique 2: unlinking into bins

A fairly well-known consequence of the current unlink() is that any address that
points to an attackers-controlled chunk can be rewritten to point to ~itself or
another location that points to the same chunk. An example where this would be
directly exploitable is an application that stores a table of heap addresses
at a predictable location.

The idea of this 2nd technique is similar: the heap state itself
contains quite a few pointers to attacker-controlled locations in the form of
the bin list heads. If we can leverage this to successfully unlink() a chunk pointing
to a bin, we would then get back our libc .data pointer with the next allocation

Note that in the following example I use small bins but this should be possible with
any chunk type.

Proof of Concept #2

So we'll reboot the PoC exploit started in the previous section after the
initialization code (i.e after the small_bin_0x320 definition).

To follow the idea explained above, we'll get the address of a chunk we control,
chunks[1], into small bins of size 0x300 and 0x310. Something surprising
about how chunks are gotten out of the unsorted chunks list is that it is possible
to have a chunk that will be put in the sorted bin matching the current allocation,
yet won't be returned when processing the unsorted chunks list. This is
possible because small bins have a 0x10 granularity while chunksize() used to
get the chunk's size out of unsorted bins aligns sizes to 8 bytes boundaries. Very
important if the targeted application can only allocate from one bin like
in our imaginary case.

They do not have to be consecutive, but having two separate places with our
attacker-controlled address allows us to perform an unlink which does not only
empty the target bin. We craft chunks[1] to have its fields pointing to the
two bins and satisfy the unlink preconditions, and free chunks[0] to trigger
forward coalescing into chunks[1]. chunks[2]->size does not have PREV_INUSE from
the last round of frees.

As we can see on the trace above, the first chunk in the bin (0x7f4935721e48
here) is the chunk address of the 0x300 bin (if we had only used the 0x310 small
bin, the bk field would have become 0x7f4935721e58 and the bin would have been
considered empty).

The small bin 0x310 will be checked first to satisfy 0x300 allocs, we can
therefore get the 0x7f4935721e48 chunk directly out:

We're back at the same level of control than with the first PoC: we control
the bin that will serve the next allocation and the surrounding memory.

From bin control to arbitrary code execution

There are probably many ways to finish up from that point, here I'll use the last_remainder
mechanism outlined above when explaining how chunks get out of the unsorted chunks list.

The idea I chose is to recreate fake chunks in the controlled section of libc memory and
control which addresses are returned by malloc() through the sorted bins.
Taking advantage of the last_remainder heuristic we can then allocate
indefinitely from libc's .data, and then .bss sections. The bss has many data
structures that yield direct PC control, amongst which __free_hook which would
be directly exploitable here.

From where we left off the two exploitations, we need to control a > 0x300 chunk
if we want the last remainder to come from a chunk we controlled. The snippets
hereafter are the end of PoC #2.

Since the data (fd field) of the chunk that was just returned points to the small bin 0x300
chunk (i.e. 0x10 bytes before the fd field of that bin), we can control
the address of the next chunk that will come out of the bin. We can set it to
an arbitrary bin farther ahead which will satisfy the chunk->bk->fd == chunk
precondition if the bin is empty.

We can then craft a larger chunk that starts in bss_chunk and ends in
bss_chunk_2, and add that chunk in a small bin for larger allocations. We then
place that chunk into the unsorted chunks list. Upon the next allocation,
if smaller bins are correctly linked and empty, malloc() will split the chunk and have its
second part becoming the last_remainder.

To start from a clean state we first empty the unsorted chunks list which contains
chunks[0] from the previous unlink operation.

We can now allocate as much as we want in libc's .data and .bss until we reach
whatever data structure we target. For instance, we can overwrite __free_hook
and free an arbitrary chunk to gain code exec:

And that's it, we've been able to execute arbitrary code from a pure heap
exploit and independently from the data structures on the heap. To sum up, these
techniques require to be able to freely write an area of the heap that spans a
few small bins, most likely through a repeatable heap bug, and then malloc/free
chunks of our choosing.

This is unlikely to be useful when exploiting complex applications which often
have numerous data structures and/or vtables on the heap already, but could be
interesting if more common techniques cannot yield code execution and more generally
to keep in mind that heap bugs are exploitable until proven otherwise.

NB: a small check has been introduced to harden the unsorted chunks list a bit. This ensures that one cannot go from a single controlled pointer to overwritten arena metadata (as shown in Technique 1), but it's obviously still possible to get the unsorted chunks list out with slightly more control.