This blog post presents my solution to exercises 7 to 9 on page 78ff from the book Practical Reverse Engineering by Bruce Dang, Alexandre Gazet and Elias Bachaalany (ISBN: 1118787315). The book is my first contact with reverse engineering, so take my statements with a grain of salt. All code snippets are on GitHub. For an overview of my solutions consult this progress page.

For the code in each exercise, do the following in order (whenever possible):

Determine whether it is in Thumb or ARM state.

Explain each instruction’s semantic. If the instruction is LDR/STR, explain the addressing mode as well.

Identify the types (width and signedness) for every possible object. For structures, recover field size, type, and friendly name whenever possible. Not all structure fields will be recoverable because the function may only access a few fields. For each type recovered, explain to yourself (or someone else) how you inferred it.

Recover the function prototype.

Identify the function prologue and epilogue.

Explain what the function does and then write pseudo-code for it.

Decompile the function back to C and give it a meaningful name.

Mystery 7

Figure 2-13 illustrates a common routine, but you may not have seen it implemented this way.

This is the code from Figure 2-13 (note that the listing in the book has a typo in line 13, 00 2B CMB R3, #0 should of course read 00 2B CMP R3, #0):

ARM or Thumb

The code is in Thumb state: The snippet uses 16bit and 32bit instructions and some instructions have the .W suffix.

Instruction Semantic

The only non common instruction is BFC.W R0, #0x1E, #2 which does a bit field clear. The instruction sets the two most significant bits to zero, so 0xFFFFFFFF would become 0x3FFFFFFF.

Types

The function only takes one argument arg1 = R0. The loop in lines 9 to 14 iterate over bytes of arg1 (LDRSB.W in line 11 accesses a single byte, and ADDS R2, #1 in line 10 increments the array index by one byte). Furthermore, the loop ends if in line 13 an array element is . This indicates that arg1 is a pointer to a null terminated string. The function returns an unsigned int.

Function Prototype

The function prototype is:

UNSIGNED INT mystery7(CHAR*);

Prologue and Epilogue

There is no prologue or epilogue. The function does not overwrite any registers except R0, R2 and R3. It returns with BX LR which switches back to ARM state.

Purpose and Pseudo-code

The function searches for the null byte in string arg1. Line 15 SUBS R0, R2, R0 computes the difference between the address of the null byte and the address of the start of the string. This corresponds to the length of the string. The function implements strlen. I don’t understand the purpose of setting the two most significant bits of the difference to zero. Those bits shouldn’t be set in the first place for any reasonable strings.

Instruction Semantic

Nothing special, except maybe LDR R6, =byteArray which is a pseudo-instruction that sets R6 to point to the array {0,1,2,...,255}

Types

The function takes three arguments. arg1 = R0 and arg2 = R1 are both used in an array fashion: Lines 6 to 19 iterate over bytes of those to arrays. The code also compares elements of arg1 to arg2, so the two parameters are probably of the same type. In line 9 there’s a check if the elements in arg1 are , so arg1 and arg2 are null terminated strings.

The third parameter arg3 acts as a limit counter (see lines 8 and 18/19). So the type is probably an unsigned int (or any other unsigned integer type).

Function Prototype

The function prototype is:

UNSIGNED INT mystery8(CHAR*, CHAR*, UNSIGNED INT);

Prologue and Epilogue

Lines 2 and 33 preserve registers R3-R6 and R11. Apart from the three function arguments R0 to R2 the functions doesn’t write any other registers. The PUSH and POP also save and jump to the return address respectively.

Purpose and Pseudo-code

If found it easiest to start with the loop in line 6 to line 19. Here’s an almost one to one translation to pseudo-code:

Line 20 just decrements limit, we can eliminate the instruction by moving limit = limit-1 up and changing to a strict check limit < 0.

Starting with line 21 the snippet checks if the limit is not yet zero (meaning the code did take the second BREAK). If this is the case, the code returns a difference based on first array elements where there was a difference.

If the limit is zero, then the first BREAK must have been taken and the code returns .

From this it should be obvious that the snippet implements strncmp, which compares two strings up to limit characters. If the strings are equal up to limit, the snippet returns 0. Otherwise it returns a negative number if the second string comes lexicographically after the first, and a positive number vice-versa.