Saturday, May 2, 2015

Fuzzing prep

For last 2 weeks I have done code coverage work on 366K pdf-s that i downloaded. As a base for code coverage I used Foxit pdf reader (it's a single exe file - much simpler to break apart in IDA and find all basic code blocks to use for monitoring - simple tracing is too slow) and as a tool I used my own scripts that I built using Python and WinAppDbg.
Explaining all of the work of finding pdf-s, writing (and optimizing - very important when doing stuff on one home machine!) code coverage tool and finding smallest subset of files, would be too long of a post to write for me now - but I thought that some of the statistics would be interesting

BASE INFO
Base software: Foxit
Executable and dlls: Single exe file, size ~47MB
Basic code blocks found: 611927 (using IDA and my IDAPython script)
Files covered: 366027
Code blocks covered: 133661 (21.8%)
Final subset of files: 727 (0.2%)
Machines used: 5VM, each running single instance (sadly it was most stable solution when tested - have to try some other approches because it's just waste of resources)
Time cost: ~2 weeks
My own time spent (not counting tool developement time before): maybe couple of hours total. Tools did not crash or stop working even once (damn proud of it)

STATISTICS (taken during the process)

ANALYSIS
It's clear that I should have downloaded more files to get as good code coverage as possible with this method. The addition of new files to the resulting list did not stop even in the final patch - it was still 0.29 new files per 1000 input files covered. That means that for about every 3450 pdf files analyzed, I got one additional file to my final set. If the graph would be trusted, this trend should end in somewhere between 400K and 500K files. I will test this when I have downloaded additional files. But up until that, I will now start fuzzing the 727 files that I resulted with - let see what happens.

FIRST FUZZING RESULTS (first 10 hours of fuzzing Foxit)
All together 41 crashes and 10 unique ones(based of my tool that sorts by type and relative-EIP):

1 unique writeAV - could be exploitable but quick glance did not strengthen that opinion

6 unique readAV - all of them close to 0, so probably not exploitable

1 unique readAV where it tries to read from address 0xBAADF00D, so uninitialized allocated heap (DEBUG version of HeapAlloc) content was used for pointer. Could be interesting

2 unique crashes caused by unknown execptions that were not caught by the handlers - did not have time to investigate further