15/06/2016

TrustZone Kernel Privilege Escalation (CVE-2016-2431)

In this blog post we'll continue our journey from zero permissions to code execution in the TrustZone kernel. Having previously elevated our privileges to QSEE, we are left with the task of exploiting the TrustZone kernel itself.

"Why?", I hear you ask.

Well... There are quite a few interesting things we can do solely from the context of the TrustZone kernel. To name a few:

We could hijack any QSEE application directly, thus exposing all of it's internal secrets. For example, we could directly extract the stored real-life fingerprint or various secret encryption keys (more on this in the next blog post!).

We could disable the hardware protections provided by the SoC's XPUs, allowing us to read and write directly to all of the DRAM. This includes the memory used by the peripherals on the board (such as the modem).

As we've previously seen, we could blow the QFuses responsible for various device features. In certain cases, this could allow us to unlock a locked bootloader (depending on how the lock is implemented).

So now that we've set the stage, let's start by surveying the attack surface!

Attack Surface

Qualcomm's Secure Environment Operating System (QSEOS), like most operating systems, provides services to the applications running under it by means of system-calls.

As you know, operating systems must take great care to protect themselves from malicious applications. In the case of system-calls, this means the operating system mustn't trust any information provided by an application and should always validate it. This forms a "trust-boundary" between the operating system itself and the running applications.

So... This sounds like a good place to start looking! Let's see if the TrustZone kernel does, in fact, cover all the bases.

In the "Secure World", just like the "Normal World", user-space applications can invoke system-calls by issuing the "SVC" instruction. All system-calls in QSEE are invoked via a single function, which I've dubbed "qsee_syscall":

As we can see, the function is a simple wrapper which does the following:

Stores the syscall number in R0

Stores the arguments for the syscall in R4-R9

Invokes the SVC instruction with the code 0x1400

Returns the syscall result via R0

So we know how syscalls are invoked, now let's look for the code in the TrustZone kernel which is used to handle SVC requests. Recall that when executing an SVC instruction in the "Secure World", similarly to
the "Normal World", the "Secure World" must register the address of the vector to
which the processor will jump when such an instruction is invoked.

Unlike SMC instructions (used to request "Secure World" services from the "Normal World"), which use the MVBAR (Monitor Vector Base Address Register) register to provide the vector's base address, SVC instructions simply use the "Secure" version of the VBAR (Vector Base Address Register).

Accessing the VBAR is done using the MRC/MCR opcodes, with the following operands:

So
this means we can simply search for an MCR opcode with the following
operands in the TrustZone kernel, and we should be able to find the
address of secure copy of the VBAR. Indeed, searching for the opcode in the TrustZone image returns the
following match:

At this point we can start tracing the execution from the SVC handler in the vector table.

The code initially does some boilerplate preparations, such as saving the passed arguments and context, and finally gets to the main entry point which is used to actually handle the requested system-call. Qualcomm have helpfully left a single logging string in this function containing it's original name "app_syscall_handler", so we'll use that name as well. Let's take a look at the function's high-level graph overview:

app_syscall_handler graph overview

...Okay... That's a lot of code.

However, on closer inspection, the graph seems very shallow, so while there are a lot of different code-paths, they are all relatively simple. In fact, the function is simply a large switch-case, which uses the syscall command-code supplied by the user (in R0) in order to select which syscall should be executed.

snippet from app_syscall_handler's switch-case

But something's obviously missing! Where are the validations on the arguments passed in by the user? app_syscall_handler does no such effort, so this means the validation can only possibly be in the syscalls themselves... Time to dig deeper once more!

As you can see in the screenshot above, most of the syscalls aren't directly invoked, but rather indirectly called by using a set of globally-stored pointers, each pointing to a different table of supported system-calls. I've taken to using the following (imaginative) names to describe them:

Cross-referencing these pointers reveals the locations of the actual system-call tables to which they point. The tables' structure is very simple - each entry contains a 32-bit number representing the syscall number within the table, followed by a pointer to the syscall handler function itself. Here is one such table:

As you can see, there is some logic behind the "grouping" of each set of syscalls. For example, the sixth table (above) contains only syscalls relating to memory management (although, admittedly, most tables are more loosely cobbled together).

Finally, let's take a look at a simple syscall which must perform validation in order to function correctly. A good candidate would be a syscall which receives a pointer as an argument, and subsequently writes data to that pointer. Obviously, this is incredibly dangerous, and would therefore require extra validation to make sure the pointer is strictly within the memory regions belonging to the QSEE application.

Digging through the widevine application, we find the following syscall:

This syscall receives four arguments:

A pointer to a "cipher" object, which has previously been initialized by calling "qsee_cipher_init"

The type of parameter which is going to be retrieved from the cipher object

The address to which the read parameter will be written

An unknown argument

Of course, QSEE applications always play nice and set the output pointer to a sensible address, but what's actually going on under the hood in the TrustZone kernel? Well, we now know enough to pop the literary hood and check out for ourselves. Going through app_syscall_handler's switch-case, we find the syscall table and offset of the kernel implementation of "qsee_cipher_get_param", leading us to the actual implementation of qsee_cipher_get_param:

This is our lucky day! Apparently the TrustZone kernel blindly trusts nearly all the parameters passed in by the user. Although the function does perform some sanity checks to make sure the given pointers are not NULL and the param_type is within the allowed range, it automatically trusts the user-supplied "output" argument. More importantly, we can see that if we use the parameter type 3, the function will write a single byte from our cipher to the supplied pointer!

Note that this was more than just a stroke of luck - taking a peek at the implementation of all the other syscalls reveals that the TrustZone kernel does not perform any validation on QSEE-supplied arguments (more specifically, it freely uses any given pointers), meaning that at the time all syscalls were vulnerable.

For the sake of our exploit, we'll stick to qsee_cipher_get_param, since we've already started reviewing it.

Full Read-Write

As always, before we start writing an exploit, let's try and improve our primitives. This is nearly always worth our while; the more time we spend on improving the primitives, the cleaner and more robust our exploit will be. We might even end up saving time in the long-run.

Right now we have an uncontrolled-write primitive - we can write some uncontrolled data from our cipher object to a controlled memory location. Of course, it would be much easier if we were able to control the written data as well.

Intuitively, since "qsee_cipher_get_param" is used to read a parameter from a cipher object, it stands to reason that there would be a matching function which is used to set the parameter. Indeed, searching for "qsee_cipher_set_param" in the widevine application confirms our suspicion:

Let's take a look at the implementation of this syscall:

Great!

It looks like we can set the parameter's value by using the same param_type value (3), and supplying a pointer to a controlled memory region within QSEE which will contain the byte we would later like to write. The TrustZone kernel will happily store the value we supplied in the cipher object, allowing us to later write that value to any address by calling qsee_cipher_get_param with our target pointer.

Putting this together, we now have relatively clean write-what-where primitive. Here's a run-down of our new primitive:

Initialize a cipher object using qsee_cipher_init

Allocate a buffer in QSEE

Write the wanted byte to our allocated QSEE buffer

Call qsee_cipher_set_param using our QSEE-allocated buffer as the param_value argument

Call qsee_cipher_get_param, but supply the target address as the output argument

You might have also noticed that we could use the inverse of this in order to get an arbitrary read primitive. All we would need to do is call qsee_cipher_set_param supplying the address we'd like to read as the param_value argument - this'll cause the TrustZone kernel to read the value at that address and store it in our cipher object. Then, we can simply retrieve that value by calling qsee_cipher_get_param.

Writing an Exploit

Using the primitives we just crafted, we finally have full read-write access to the TrustZone kernel. All that's left is to achieve code-execution within the TrustZone kernel in a controllable way.

The first obvious choice would be to write some shellcode into the TrustZone kernel's code segments and execute it. However, there's a tiny snag - the TrustZone kernel's code segments in newer devices are protected by special memory protection units (called XPUs), which prevent us for directly modifying the kernel's code (along with many different protected memory regions). We could still modify the kernel's code (more information in the next blog post!), but it would be much harder...

...However, we have already come across a piece of dynamically allocated code in the "Secure World" - the QSEE applications themselves!

So here's a plan - if we could ignore the access-protection bits on the code pages of the QSEE applications (since they are all marked as read-execute), we should be able to directly modify them from the context of the TrustZone kernel. Then, we could simply jump to the our newly-created code from the context of the kernel in order to execute any piece of code we'd like.

Luckily, ignoring the access-protection bits can actually be done without modifying the translation table at all, by using a convenient feature of the ARM MMU called
"domains".

In the ARM translation table, each entry has
a field which lists its permissions, as well as a 4-bit field denoting the
"domain" to which the translation belongs.

Within the ARM MMU, there is a register called the DACR (Domain Access Control Register).
This 32-bit register has 16 pairs of bits, one pair for each domain,
which are used to specify whether faults for read access, write access,
both, or neither, should be generated for translations of the given
domain.

Whenever
the processor attempts to access a given memory address, the MMU first
checks if the access is possible using the access permissions of the
given translation for that address. If the access is allowed, no fault
is generated.

Otherwise, the MMU checks if the bits
corresponding to the given domain in the DACR are set. If so, the fault
is suppressed and the access is allowed.

This means
that simply setting the DACR's value to 0xFFFFFFFF will cause
the MMU to enable access to any mapped memory address, for both read and
write access, without generating a fault (and more importantly, without
having to modify the translation table).

Moreover, the TrustZone kernel already has a piece of code that is used to set the value of the DACR, which we can simply call using our own value (0xFFFFFFFF) in order to fully set the DACR.

TrustZone kernel function which sets the DACR

All that said and done, we're still missing a key component in our exploit! All we have right now is read/write access to the TrustZone kernel, we still need a way to execute arbitrary functions within the TrustZone kernel and restore execution. This would allow us to change the DACR using the gadget above and subsequently write and execute shellcode in the "Secure World".

Hijacking Syscalls

As we've seen, most QSEE system-calls are invoked indirectly by using a set of globally-stored pointers, each of which pointing to a corresponding system-call table.

While the system-call tables themselves are located in a memory region that is protected by an XPU, the pointers to these tables are not protected in any way! This is because they are only populated during runtime, and as such must reside in a modifiable memory region.

This little tidbit actually makes it much simpler for us to hijack code execution in the kernel in a controllable manner!

All we need to do is allocate our own "fake" system-call table. Our table would be identical to the real system-call table, apart from a single "poisoned" entry, which would point to a function of our choice (instead of pointing to the original syscall handler).

It should be noted that since we don't want to cause any adverse effects for other QSEE applications, it is important that we choose to modify an entry corresponding to an unused (or rarely used) system call.

Once we've crafted the "fake" syscall table, we can simply use our write primitive in order to modify the global syscall table pointer to point to our newly created "fake" table.

Then, whenever the "poisoned" system-call is invoked from QSEE, our function will be executed within the context of the TrustZone kernel! Not only that, but app_syscall_handler will also conveniently make sure the return value from our executed code will be returned to QSEE upon returning from the SVC call.

Putting it all together

By now we have all the pieces we need to write a simple exploit which writes a chunk of shellcode in the "Secure World", executes that shellcode in the context of the TrustZone kernel, and restores execution.

Here's what we need to do:

Allocate a "fake" syscall table in QSEE

Use the write primitive to overwrite the syscall table pointer to point to our crafted "fake" syscall table

Set the single "poison" syscall entry in the "fake" syscall table to
point to the DACR-modifying function in the TrustZone kernel

Invoke the "poison" syscall in order to call the DACR-modifying function in the TrustZone kernel - thus setting the DACR to 0xFFFFFFFF

Use the write gadget to write our shellcode directly to a code page in QSEE belonging to our QSEE application

Invalidate the instruction cache (to avoid conflicts with the newly written code)

Set the single "poison" syscall entry in the "fake" syscall table to
point to the written shellcode

Invoke the "poison" syscall in order to jump to our newly-written shellcode from the context of the TrustZone kernel!

Playing With The Code

The exploit builds upon the previous QSEE exploit, in order to achieve QSEE code-execution. If you'd like to play around with it, you might want to use the following two useful functions:

tzbsp_execute_function - calls the given function with the given arguments within the context of the TrustZone kernel.

tzbsp_load_and_exec_file - Loads the shellcode from a given file and executes it within the context of the TrustZone kernel.

I've also included a small shell script called "build_shellcode.sh", which can be used to build the shellcode supplied in the file "shellcode.S" and write it into a binary blob (which can then be loaded and executed using the function above).

Have fun!

Timeline

13.10.2015 - Vulnerability disclosed and minimal PoC sent

15.10.2015 - Initial response from Google

16.10.2015 - Full exploit sent to Google

30.03.2016 - CVE assigned

02.05.2016 - Issue patched and released in the Nexus public bulletin

As far as I know, this vulnerability has been present in all devices and all versions of QSEOS, until it was finally patched in 02.05.2016. This means that effectively up to that point, obtaining code-execution within QSEE was equivalent to having code-execution within the TrustZone kernel (i.e., fully controlling nearly every aspect of the device).

As there was no public research into QSEE up to that point, this issue wasn't discovered. Hopefully in the future further research into QSEE and TrustZone in general will help uncover similar issues and make the security boundary between QSEOS and QSEE stronger.

Thank you for reading the post. As the the shellcode - it's executed in the TZ kernel, which isn't a POSIX OS, but rather a proprietary OS written by Qualcomm. This means you don't have any commands like "execv", etc. Instead, you can directly execute assembly code in the kernel.

Just write the ARM assembly you want to execute under shellcode.S, run build_shellcode.sh, and execute the exploit with the generated payload.

and also the exploit only works on shamu right? how can we adapt to other devices? which parameters should be changed? i found only this device-spesific parameter/address https://github.com/laginimaineb/cve-2016-2431/blob/master/jni/symbols.h#L17

The code you posted would work when running an application *under the Linux Kernel*. In this case, we are executing shellcode directly in the TrustZone kernel - so no SWIs (because there are no syscalls to call - you're already in the kernel), also no public documentation available for whatever APIs are exposed in the TZ kernel.

I did post some neat stuff you could do from that context, like reading/writing QFuses, and hijacking the "Normal World" OS (see previous posts). I'm going to upload another post soon about more interesting stuff you can do using the TZ kernel.

As for the exploit - all the parameters that are device/version specific are under symbols.h (the file you linked). You'll have to follow the QSEE post closely to understand exactly which changes need to be made, but it's do-able :)

Hi laginimaineb. Sorry for spamming but I have decided to put my questions here as in most blog to be seen.The questions are:1) How did you define the values SECURE_APP_REGION_START, SECURE_APP_REGION_SIZE ? Are these value same for different families of the Qualcomm SoC's ?2) What is the memory management of the TZ kernel ? While scanning the secapp region, the trustlet is crushed, I suppose by the TZ kernel, in case if it tries to access not own memory region. What is the probability that TZ kernel will load the crashed trastlet in the same memory ?3) And stupid question... Does TZ kernel operate with the virtual addresses or with physical addresses through switching the modes by means of the special flags in the system register/s ?

1. These values are constant per-device. They are also a part of the kernel dtb. In any case, you can find by looking the region by looking at dmesg when the device boots. You'll see something along the lines of:

2. That's a great question, but hard to answer. I've reversed some of the code responsible for loading applications in the secure region, but don't have a definitive answer... Sorry.

3. The MMU is always present, so we're always working with virtual addresses. But - most TZ kernel contexts simply have a "flat" translation table - that is, every virtual address is mapped to the corresponding physical address. You could change the mappings and map in whatever you like, just like you would in a regular kernel.

You replied "But - most TZ kernel contexts simply have a "flat" translation table - that is, every virtual address is mapped to the corresponding physical address."It was my doubt, thanks.Another interesting thing is to estimate the memory range allocation by the TZ kernel to find out whether the memory range from the secapp region will be reserved for the specific TA once it was run within a cycle from CPU's reset to reset. Another words, to find out whether the TZ kernel retains metadata to indentify the specific TA to load it in the predefined (where it was loaded at the first time) place. Run exploit at first time to find memory location. Modify the exploit so that to load another one TA after crashing the original TA and compare the result....and another bundle of questions of you don't mind1) When are the /persist/data/app_g/sfs/*.dat files decrypted ? Immediately after invoking the QSEE_sfs_open or before read/write operations ?2) Is it possible to load encrypted TA for security reason. Does Qualcomm's secure kernel support such feature ? It can be useful to prevent analyzing of the TA's. Did you ever face with the encrypted secure kernel which is decrypted on boot up by means of boot loader ? In my opinion it would decrease the number of 0-day vulnerabilities.3) I noticed that the data segment is used to allocate/deallocate the dynamic memory through the SVC directed to the TZ kernel. But how we can find the bottom of the stack and its size ? As I understand, it is possible to dump out whole current state of the TA if somehow read the data segment. Right ? Because she/he will dump the static and dynamic memory.4) The R9 register is used to point to the data segment and it is initialized by the sub_50() but I didn't find any reference to this function. When it is invoked and who it invokes ?

For some reason blogger marked your comment as spam... I un-spammed it.

About the allocation pattern - I'm pretty sure there's randomization involved. Booting the device up normally twice results in two different load addresses (deduced on an older MSM8974 device, on which I have TZ kernel code exec without going through QSEE).

As for the questions:

1. The SFS always remains encrypted on the flash, it's only decrypted on a per-block basis in QSEE's memory, never on-disk.

2. This kind of model isn't used on QC devices, but you can find something rather similar in the Apple ecosystem. There's a Crypto Engine which has different GID keys accessible to each core, which are used to decrypt the firmware itself only on-chip. It might help, but one could argue that it's also doing some damage... On the one hand, it prevents researchers from looking for bugs. On the other hand, government agencies/people with access to the source code will be more likely to find bugs, since the code hasn't been audited by white-hats at all. Anyway, as far as I know, there's no support for such a feature.

3. Once you have code-execution in QSEE, you can dump the whole data segment and, as you said, you'll get the full state of the application (stack, heap, globals).

4. It's initialized by the TZ kernel when setting up the context for the QSEE application (before jumping in to the application's initialization function).

Can you also a bit explain the loading of the tz on boot up stage. For me, it should look like the boot loader parses the ELF header of the tz image, finds the entry point there and jumps at 0xfe810000. But I am confused with the vector table stored at 0xfe810000. Disassembler listing tells address of the symbol "start" is loaded in the secure vector base (VBAR) ldr r0, =startmcr p15, 0, r0, c12, c0,0

but one of your blog (exploring-qualcomms-trustzone) claims that the address of the start symbol is loaded in the monitor vector base (MVBAR).What statement is correct ?

The segments that you pointed out refer to a special memory region which initially contains the TZ code, but only when the device is booting. Afterwards the TZ code isn't actually stored at that address but rather at the location of the special NULL segment you pointed out. I explained some of this in the very first TZ blog post, but in short you need to remove the dummy NULL segment from the ELF and relocate the third and fourth segments to their correct load address.

Hi laginimaineb, So in this scenario what is the correct address of third and four segment? How do I find it?

I tried to give 0xfe840000 to the third segment with RWE permissions, 0xfe843c00 to the fourth segment and invalid address(0xfe890000) to Null segment . However, I see lot of un-referenced code. Could you please help me here?

When you change the load addresses does the binary load correctly in IDA? If so, which segment contains the unreferenced code? I would try and look for pointers to invalid memory locations in other segments and try and correlate those with the addresses in the incorrect segment.

Did you ever try to change the pointer to the SMC handler from the monitor vector table from the secure world user mode context ? I mean whether is it possible to change the address space of the TZ kernel at address 0xfe80de28 (pointer to the smc handler) from the trustlet ?

From what I recall by looking at the secure world user mode translation table, the only addresses there are mapped in are in the trustlet, so all the high TrustZone kernel addresses are in accessible. Also, you can map them in using qsee_register_xxx

Thanks for the information! Your posts are all really good!!I have a few doubts though.Can you make a post about how would it be possible to blow fuses and get keys from the qsee to then unlock the bootloader? Thanks :)

"There are quite a few interesting things we can do solely from the context of the TrustZone kernel...To name a few:We could hijack any QSEE application directly, thus exposing all of it's internal secrets..."

I guess this is not true...since in your next post (extracting qualcomm keys) you mention that one application cannot interact with another due to XPU constrains. Am i correct?

Trustlets can't interact with one another directly, but instead rely on the TrustZone kernel to do so. Therefore, as I wrote, the kernel can be used to hijack any trustlet. Read the next blog post in the series (breaking FDE) for more information.

I own an AT&T Galaxy S5 (MSM8974 SoC), so my bootloader is locked down tight. However, if we were to use this exploit to get full access to the TrustZone, wouldn't we be able to overwrite the public key use to verify firmware packages, and replace it with a key that has a publicly available private key? (Or better yet remove the verification step entirely, but that doesn't sound very possible).Let me know what your thoughts are, I'd be down to test anything you come up with! :)