CVE-2015-2783: Exploiting Buffer Over-read in Php's Phar

The Phar extension is built into PHP > 5.3.0. It allows developers to use manipulate the following archives: tar, zip, phar.

I found this vulnerability while I was assessing the security of the phar extension. When parsing a phar file, it is possible to trigger a buffer over-read condition and leak memory information. This vulnerability is interesting as it's not a typical exploitation just by controlling the read size.

Affected version are PHP < 5.6.8RC1

Technical Details

Phar files metadata are stored in php serialized format. When processing a phar file, php attempts to unserialize the medatadata in phar.c:

623

if(!php_var_unserialize(metadata,&p,p+buf_len,&var_hashTSRMLS_CC)){

p points to the start of serialized metadata in the file buffer.

p + buf_len points to the end

buf_len is a field specified in the phar file and user controllable

php_var_unserialize() is the same function that is called when unserialize() is invoked in PHP "user-land".

Within Php_var_unserialize () there is a sanity check to ensure that p does not go beyond p + buf_len:

Line 894 looks like it's sanity checking as well. But this is just template code left over by re2c when generating var_unserializer.c file. It is typically used by the regex library to fill up the data buffer when it's getting low. However in PHP case, php_var_unserialize will always have the full serialized data in the buffer from the very beginning. Thus YYFILL is just a while(0) loop that does nothing. It's interesting to note that the compiler actually optimized this line of code out. In essence, this line of code never gets executed.

Theres another sanity check at line 908 to ensure the length of the data doesn't go beyond p+buf_len

When we start unserializing a string in the format: s:<len>:"<Data>", lines 890-900 is essentially a loop to extract out <len>. As long as <len> is a digit, it would keep looping even if YYCURSOR goes beyond p+buf_len.

When YYCURSOR goes beyond max (aka p+ buf_len or YYLIMIT), it results in an integer underflow on line 907. Thus the sanity check on line 908 will always pass.

There's also some format checks:

900

915

916

917
918
919
920

if(yych !='"') goto yy18;

....

YYCURSOR += len;

if(*(YYCURSOR)!='"'){*p=YYCURSOR;return0;}

Lines 900, 915 to 920 checks the data format for the tokens as highlighted : s:<len>:"<Data>"

In essence, what this means is that once YYCURSOR goes beyond max when extracting <len>, <len> can be as large a number as you want and it will unserialize to a string successfully as long as it's in the format s:<len>:"<Data>"

This is the state you want it look like during exploitation:

At this point one might ask, how do we ensure that the ending byte is " since it's in a memory region beyond our control? Well you have 1 in 255 chance of strike gold. But one possible way to exploit it is just to keep "hammering" the various values of
xxxxxxxxxxxx . Sooner or later you will encounter a ".

One might also ask, why can't I trigger this buffer over read through a typical unserialize call? For example why can't we trigger it via unserialize("s:0010") + some heap massaging? Why is it only vulnerable when we unserialize through phar?

Reason is when we do a typical unserialize(), we are passing in a string. In PHP, a string is always null terminated. So essentially you are passing in s:0110\0 . Even if we massage the heap such that :" appears immediately after the string, it still wouldn't unserialize properly as the null byte breaks the unserializing process.

In Phar, the attacker control the data immediately past p+buf_len as it is still part of the phar file.

Here's a screenshot of successful exploitation to leak memory a la Heartbleed style:

After Thoughts

I exploited this by using a serialized string type. It would be interesting to see if we can get code execution using other types.