Archives

Categories

Our recent USENIX Sec’16 paper on x86/x64 disassembly has been getting a fair amount of attention on twitter and reddit, which is great to see! I’ve also talked to a few people who had some interesting additional insights which are not included in the paper. I thought it might be interesting to share them here.

Inline jump tables on ARM

One of our main findings is that on x86/x64, both gcc v5.1 and clang v3.6 are extremely well-behaved when it comes to jump tables. Rather than placing these inline in the .text section, both compilers place jump tables in .rodata. They emit no inline data at all, which means that linear disassembly produces 100% correct results.

It seems things are not quite so convenient on ARM. Apparently, arm-linux-gccdoes produce inline jump tables (just like Visual Studio does for x86/x64). A quick check confirms this, as illustrated in the listing below, which shows a snippet of objdump output for lighttpd cross-compiled with arm-linux-gcc.

Assembly (x86)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

/home/dennis/servers/lighttpd-1.4.39/src/server.c:95

14f1c:e51b3008 ldrr3,[fp,#-8]

14f20:e2433001 subr3,r3,#1

14f24:e3530010 cmpr3,#16

14f28:979ff103ldrlspc,[pc,r3,lsl#2]

14f2c:ea000046 b1504c<sigaction_handler+0x15c>

14f30:00014ffc.word0x00014ffc

14f34:00014fa0.word0x00014fa0

14f38:0001504c.word0x0001504c

14f3c:0001504c.word0x0001504c

14f40:0001504c.word0x0001504c

14f44:0001504c.word0x0001504c

14f48:0001504c.word0x0001504c

14f4c:0001504c.word0x0001504c

14f50:0001504c.word0x0001504c

14f54:0001504c.word0x0001504c

14f58:0001504c.word0x0001504c

14f5c:0001504c.word0x0001504c

14f60:0001504c.word0x0001504c

14f64:00014fec.word0x00014fec

14f68:00014f74.word0x00014f74

14f6c:0001504c.word0x0001504c

14f70:00015048.word0x00015048

/home/dennis/servers/lighttpd-1.4.39/src/server.c:97

14f74:e59f30e0 ldrr3,[pc,#224]; 1505c <sigaction_handler+0x16c>

14f78:e3a02001 movr2,#1

14f7c:e5832000 strr2,[r3]

/home/dennis/servers/lighttpd-1.4.39/src/server.c:98

14f80:e59f20d8 ldrr2,[pc,#216]; 15060 <sigaction_handler+0x170>

14f84:e51b300cldrr3,[fp,#-12]

14f88:e1a00002 movr0,r2

14f8c:e1a01003 movr1,r3

14f90:e3a03080 movr3,#128; 0x80

14f94:e1a02003 movr2,r3

14f98:ebfffe0abl147c8<memcpy@plt>

/home/dennis/servers/lighttpd-1.4.39/src/server.c:99

14f9c:ea00002ab1504c<sigaction_handler+0x15c>

/home/dennis/servers/lighttpd-1.4.39/src/server.c:101

14fa0:e59f30bcldrr3,[pc,#188]; 15064 <sigaction_handler+0x174>

14fa4:e5933000 ldrr3,[r3]

14fa8:e3530000 cmpr3,#0

14fac:0a000003beq14fc0<sigaction_handler+0xd0>

Indeed, we do see some inline data (the .word lines), and it looks like a jump table. You can see that it contains an array of valid addresses, presumably pointing to the case blocks of a switch. The DWARF information tells us the inline data is produced from somewhere near line 95 in server.c, shown in the following listing.

C

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

staticvoidsigaction_handler(intsig,siginfo_t*si,void*context){

staticsiginfo_t empty_siginfo;

UNUSED(context);

if(!si)si=&empty_siginfo;

switch(sig){

caseSIGTERM:

srv_shutdown=1;

last_sigterm_info=*si;

break;

caseSIGINT:

if(graceful_shutdown){

srv_shutdown=1;

}else{

graceful_shutdown=1;

}

last_sigterm_info=*si;

break;

caseSIGALRM:

handle_sig_alarm=1;

break;

caseSIGHUP:

/**

* we send the SIGHUP to all procs in the process-group

* this includes ourself

*

* make sure we only send it once and don't create a

* infinite loop

*/

if(!forwarded_sig_hup){

handle_sig_hup=1;

last_sighup_info=*si;

}else{

forwarded_sig_hup=0;

}

break;

caseSIGCHLD:

break;

}

}

Clearly, this is indeed a switch statement, and the inline data is a jump table containing the addresses of the case blocks. Fortunately, if the binary is not stripped, then objdumpcan use symbols to differentiate the data from the code. This is the reason why objdump, in our test, is able to accurately mark the data as .word lines. However, if the binary is stripped, this is no longer possible, and the inline data will cause disassembly errors.

(Thanks to Ammar Ben Khadra from Uni Kaiserslautern for bringing this to my attention.)

Detecting functions using the .eh_frame section

We also show in our paper that function detection (i.e., accurately identifying the start address and size of each function in the binary) is currently the most problematic primitive for disassemblers. False positive and false negative rates in excess of 20% are not an uncommon sight, despite the fact that function detection is one of the most used and important primitives for virtually all areas of binary analysis.

It seems there is an interesting way to get around this problem, based on the .eh_frame section. This section contains information needed for DWARF-based stack unwinding. It’s primarily used for C++ exception handling, but also for various other applications such as backtrace(), and gcc intrinsics such as __attribute__((__cleanup__(f))) and __builtin_return_address(n) (more information in this StackOverflow post). Due to its many uses, .eh_frame is present by default not only in C++ binaries that use exception handling, but in all binaries produced by gcc, including plain C binaries.

The point of all this is that .eh_frame contains function boundary information that identifies all functions, and can thus be used to circumvent the function detection problem entirely. Here’s what a dump of the section looks like for 470.lbm (one of the SPEC CPU2006 benchmarks) compiled with gcc v5.1 at optimization level O0 for x64.

Note that the strip command will not strip the .eh_frame section. If you want to get rid of it (for anti-reversing or binary size reasons), you need to prevent it from being generated in the first place by passing -fno-asynchronous-unwind-tables to gcc.

We recently published a paper which is devoted entirely to exploring several aspects of x86/x64 disassembly. Among other things, we measured the prevalence of complex corner cases generated by modern compilers, and the precision with which disassemblers handle these cases. We released our complete data set, in part because there are too many results to fit in the paper, and also to allow others to compare their own results to ours.

Since we’ve received several questions asking for details on how to implement such a comparison, the below provides an example. Assuming that you’ve already downloaded our data set and generated the ground truth (as detailed in ~/disasm/README in the provided VM), getting results for a new disassembler requires two steps.

Write a script that parses the output of the disassembler you want to evaluate, and puts it into a format useful for further processing.

Compare the disassembler output to the ground truth, using another script for the specific primitive you want to evaluate.

We give examples of both steps. Though at first it may look like lots of work to fit these scripts to your own evaluation requirements, this should actually be quite straightforward, since you can reuse much of the code verbatim regardless of the specific test setup.

Parsing disassembler output

Since every disassembler is different, we need to make a specifically tailored script that parses the output of the disassembler we want to test, and puts it into a normalized format that we can process further. To keep things simple, the example presented here is based on objdump, but to create a script for another disassembler you can use the exact same basic idea. Without further ado, here is the bash script we used in our paper to parse the instructions output by objdump for our SPEC CPU2006 test suite (the scripts for our other tests are nearly identical).

Lines 3-4 are simply lists of all the SPEC CPU2006 C and C++ test cases, which we later iterate over to disassemble each test. On lines 40-66, we call the main disassembly function (described next) with various parameters, for each of the compiler configurations we test.

The important bit is the disasm function declared on line 8. It starts by reading its parameters into named variables and making the directories where we will output our results. Then, on line 19, we begin a loop over all test cases for the given configuration.

For each test case, we loop over all the optimization levels (line 23), and determine the name of the binary for the current test case/optimization level, skipping an iteration and yielding a warning if the file does not exist (lines 25-33). Note that we assume a particular format for the directory and binary names. For instance, we assume that all the stripped C++ test binaries as compiled with gcc 5.1/64-bit are located in a directory called truth/gcc510-64/bin/stripped/C++, and that binaries generated with Visual Studio have the .exe extension. If you are using the ground truth provided by us, these requirements are all met.

So far, the entire script has been disassembler-agnostic; you can reuse those parts for any disassembler you want to test. Lines 34-35 are the only lines that need to be tailored to the specific disassembler that is being tested. These are the lines where the actual disassembler is run, and its output parsed and dumped to file. Moreover, both these lines are identical except that line 34 disassembles a binary with symbols, while line 35 disassembles a stripped binary. For our example, in both cases we simply run objdump, grep for all the disassembled addresses, give each address a 0x prefix, and write the results to an output file for the specific test case/configuration. We store instruction addresses instead of mnemonics because the addresses are much easier to compare to our ground truth (as discussed below).

As you can see, the script generalizes to other disassemblers in a very straightforward way. Some disassemblers, such as IDA Pro, have a more complicated user interface that we cannot just parse with grep. In such cases, we require that the disassembler is scriptable, and can be run in an automated way. For instance, for IDA Pro we created a simple IDA Python script that dumps all the primitives we are interested in to file, and then ran the script in the above loop using IDA Pro’s “autonomous mode” (requiring no user interaction). In our objdump example, we save only instruction output, but for disassemblers which support other primitives, these can be parsed and written to file in an analogous way.

Comparing to the ground truth

So far, we have created a bash script which uses our chosen disassembler (objdump) to disassemble all our test cases and save the instruction addresses to file. Now, we want to compare these addresses to the ground truth provided in our data set. For this, we use a Python script (called ins-cmp.py) that takes as input the ground truth file for a single test case (one of the *.truth.map files provided in our data set), and a disassembler output file as generated by the disassembler-specific bash script described above.

The script compares instruction addresses (as found by the disassembler) to the ground truth. To create scripts for other primitives, please refer to the README file provided in our data set. It completely describes our ground truth format, which is designed to be easily parseable by both humans and machines. The README file also describes the output format of our comparison scripts.

Let’s take a look at the main function, at line 93. It consists of three phases.

Read the instruction-level ground truth into the bounds dictionary (lines 98-113), using instruction addresses as key, and mapping them to a descriptor of the instruction type (as described in the ground truth format section in the README file).

Load all the instruction addresses found by the disassembler into the ins dictionary (lines 118-127).

Compare the ground truth (bounds) to the disassembled instructions (ins), counting true positives, false positives and false negatives and then printing out the statistics (lines 129-160).

The certain_code and certain_data functions are used to parse a ground truth instruction descriptor, and find out if a particular address is code or data. To this end, both of these functions rely on insmap_byte, which is just a utility function that returns the type of a particular byte in the descriptor. (Each descriptor describes a single instruction, which may consist of multiple bytes.)

As an example of how to evaluate a primitive other than instructions, suppose that we instead want to measure the correctness of function information. In that case, you would fill the bounds dictionary in a similar way, but this time loading the function-level ground truth instead of the instruction-level ground truth. This simply means that instead of loading the lines that start with an '@' symbol (instruction descriptors), you would load the lines that start with 'F ' (an F followed by a space), and then compare the ground truth addresses to those found by the disassembler (in this case you won’t even need the certain_code/certain_data functions, but can just compare addresses directly). To get an intuitive feeling of how to parse for each kind of primitive, it is a good idea to open up one of the *.truth.map files and skim/grep through it.

Now that we can compare ground truth and disassembler output for one test case at a time, it would be convenient to automate the process of doing this for all test cases. For this, we use one last bash script, which is similar in structure to the script used for disassembly.

In essence, the output files created by this script combine the outputs of ins-cmp.py for all test cases given a particular compiler/architecture configuration, one test case per line. As before, we have a loop over all test cases and optimization levels. This time, we have an additional loop at line 38, which goes over an array containing all disassemblers we want to evaluate. This way, we don’t have to manually run the comparison script for each disassembler. Note that the disassembler names, as specified in the array, need to match those used in the output file names generated by our disassembly script.

The script first resets all output files (lines 24-29), and then begins its main loop. The main loop simply calls ins-cmp.py for each possible configuration, and saves the statistics to file, printing warnings for any test cases or ground truth files which cannot be found. After the script completes, you will find a collection of combined statistics files in the ins directory, with one file per combination of compiler/architecture/language/disassembler. The file contents should look something like this (truncated for brevity).