Buffer Overflow Attacks and Their Countermeasures

What is buffer overflow, why is it dangerous and how is it preventable?

Overwriting Function's Return Addresses

Because we know it is easy to overwrite a function's return
address, an intelligent hacker might want to spawn a shell (with
root permissions) by jumping the execution path to such code. But,
what if there is no such code in the program to be exploited? The
answer is to place the code we are trying to execute in the
buffer's overflowing area. We then overwrite the return address so
it points back to the buffer and executes the intended code. Such
code can be inserted into the program using environment variables
or program input parameters. An example code that spawns a root
shell can be found in a classic paper written by Aleph One for
Phrack Magazine (see Resources).

Buffer Overflow Countermeasures

The solutions proposed for buffer overflow problems mainly
target the prevention of large-scale system attacks through the
loopholes described above. None of the methods described below can
claim to prevent all possible attacks. These methods, however, can
make it more difficult to access buffer overflows and, hence,
destroy the consistency of stacks.

Write secure code: Buffer overflows are the result
of stuffing more code into a buffer than it is meant to hold. C
library functions such as strcpy (), strcat (), sprintf () and
vsprintf () operate on null terminated strings and perform no
bounds checking. gets () is another function that reads user input
(into a buffer) from stdin until a terminating newline or EOF is
found. The scanf () family of functions also may result in buffer
overflows. Hence, the best way to deal with buffer overflow
problems is to not allow them to occur in the first place.
Developers should be educated about how to minimize the use of
these vulnerable functions.

Stack execute invalidation: Because malicious code
(for example, assembly instructions to spawn a root shell) is an
input argument to the program, it resides in the stack and not in
the code segment. Therefore, the simplest solution is to invalidate
the stack to execute any instructions. Any code that attempts to
execute any other code residing in the stack will cause a
segmentation violation. However, the solution is not easy to
implement. Although possible in Linux, some compilers (including
GCC) use trampoline functions (see Resources) to implement taking
the address of a nested function that works on the system stack
being executable. A trampoline is a small piece of code created at
run-time when the address of a nested function is taken. It
normally resides in the stack, in the stack frame of the containing
function and thus requires the stack to be executable. However, a
version of the Linux kernel that enforces the non executable stack
is freely available (see Resources).

Compiler tools: Over the years, compilers have
become more and more aggressive in optimizations and the checks
they perform. Various compiler tools already offer warnings on the
use of unsafe constructs such as gets (), strcpy () and the like.
For example, this code

/tmp/cc203ViF.o: In function "main":
/tmp/cc203ViF.o(.text+0x1f): the "gets" function is dangerous and should
not be used.

Apart from offering warnings, modern compiler tools change
the way a program is compiled, allowing bounds checking to go into
compiled code automatically, without changing the source code.
These compilers generate the code with built-in safeguards that try
to prevent the use of illegal addresses. Any code that tries to
access an illegal address is not allowed to execute.

These kind of tools, however, require the source code to be
recompiled with a newer compiler. This requirement may be a problem
if the application is not open source. Furthermore, it may affect
the application's performance to a great extent. In some case,
executable size and execution time may increase two-fold.

A patch for GCC that does bounds checking can be found
here.
Recently, however, most of the tools have concentrated on
preventing the return address from being overwritten, as most
attacks occur this way.
StackShield
is a freely available tool that copies the return address of a
function to a safe place (usually to the start of the data segment)
at the start of the function. When the function terminates, it
compares the two function return address, the one in the stack and
the one stored in data segment. In the case of a mismatch, the
function aborts immediately.

Because a function also can call another function, it needs
to maintain a stack kind of structure for storing return addresses.
Another tool available is
StackGuard,
which detects and defeats smash stacking attacks by protecting the
return address on the stack from being altered. It places a canary
word next to the return address whenever a function is called. If
the canary word has been altered when the function returns, then
some attempt has been made on the overflow buffers. It responds by
emitting an alert and halting.

Dynamic run-time checks: In this scheme, an
application has restricted access in order to prevent attacks. This
method primarily relies on the safety code being preloaded before
an application is executed. This preloaded component can either
provide safer versions of the standard unsafe functions, or it can
ensure that return addresses are not overwritten. One example of
such a tool is
libsafe.
The libsafe library provides a way to secure calls to these
functions, even if the function is not available. It makes use of
the fact that stack frames are linked together by frame pointers.
When a buffer is passed as an argument to any of the unsafe
functions, libsafe follows the frame pointers to the correct stack
frame. It then checks the distance to the nearest return address,
and when the function executes, it makes sure that address is not
overwritten.

Hi sandeep,
Thanks a lot for the fantastic effort. I was wondering how a hacker can launch the buffer overflow atack remotely.
It was quite clear from your article how the buffer overfloe happens on a local system. In fact I was reading the following pdf ""A Bu®er Over°ow Study
Attacks & Defenses
Pierre-Alain FAYOLLE, Vincent GLAUME
ENSEIRB
Networks and Distributed Systems
2002"
which talks about the same and mostly shows ways and tools to protect against buffer overflow attacks.
But I wan to know how the attack is actually launched.
Say a user is sitting on a machine A and a hacker is sitting on machine B. They are in different cities. In such a case how will the hacker make its malicious code run on the user's machine, i mean to say how will a hacker sitting in a different geographical location overflow the memory buffer on the user's local system over the network/internet...
it seems that the last response on your article was more than 40 weeks ago... so if you reply I would be very grateful.

> Write secure code: Buffer overflows are the result
> of stuffing more code into a buffer than it is meant
> to hold. C library functions such as strcpy (),
> strcat (), sprintf () and vsprintf () operate on null
> terminated strings and perform no bounds checking.
> gets () is another function that reads user input
> (into a buffer) from stdin until a terminating
> newline or EOF is found. The scanf () family of
> functions also may result in buffer overflows. Hence,
> the best way to deal with buffer overflow problems is
> to not allow them to occur in the first place.
> Developers should be educated about how to minimize
> the use of these vulnerable functions.

Ok, but one clear way to avoid problems with these functions is to absolutely ban their use in the first place. That may seem heavy handed or impractical, but its not if you have an alternative that is ready to use to substitue for the functionality of all those functions. That is to say you simply need a comprehensive string library that performs automatic buffer management and delivers sufficient functionality and simply switch to it.

I think the two main alternatives that I would recommend for C programmers is either James Antil's "Vstr":

They attack the problem with different philosophical mindsets (as James says, his is a buffer manager for struct iovect *, while mine is a buffer manager for char *), but either one is sufficient significantly reduce the occurrence of buffer overflows, while delivering all sorts of side features which you can read about at their respective home pages.

For C++, you can also use my library, however either STL's std::string or Microsoft's MFC based CString class are also good alternatives for automatically buffer managed strings.

The problem with using functions like snprintf/strncat/strlcpy, etc, is that they don't remove the problem of buffer overflow, they simply change them. Remember that buffer overflows, fundamentally come program programmer errors. Just because you've reformulated that problem doesn't mean programmers are going to stop making errors. The advantage of a dynamic string library is that the problem of buffer management for strings (and/or other kinds of memory buffers) is completely solved for you. So these kinds of buffer overflow errors basically cannot happen.

Half of this article has just been ripped and paraphrased from Aleph1's original. It would have been easier just to paste the url to Phrack, or create a .diff for 'Smashing the stack for fun and profit'. Apart from that, it's good

"The standard C library provides a number of functions for copying or appending strings, that perform no boundary checking. They include: strcat(), strcpy(), sprintf(), and vsprintf(). These functions operate on null-terminated strings, and do not check for overflow of the receiving string. gets() is a function that reads a line from stdin into a buffer until either a terminating newline or EOF. It performs no checks for buffer overflows. The scanf() family of functions can also be a problem"

"C library functions such as strcpy (), strcat (), sprintf () and vsprintf () operate on null terminated strings and perform no bounds checking. gets () is another function that reads user input (into a buffer) from stdin until a terminating newline or EOF is found. The scanf () family of functions also may result in buffer overflows"

very good articles...
but there is a thing that leave me very surprised: why we can execute instructions in the data segment?? since now i have believed the memory paging system marks the memory pages with 3 bits: r,w,x ... so if program counter points to something in a data page (r or r,w) a page fault should occour... where's the trick???

I was thinking the same thing. The author says that gcc places "trampoline code" in the stack (segment) making it executable "(in order) to implement taking the address of a nested function that works on the system stack being executable." huh? That would be a big mistake. Makes me wonder if Linux is any better than Windows.
Local C functions are non-standard --I know gcc allows them and I've used them, very nice-- but they are still "static". Only their names are hidden inside the enclosing function.
You don't need any code in the stack to follow C++ exception handling, so what are you talking about Sandeep????

Huh? You seem to be saying that keeping track of the size of the destination buffer doesn't work. Barring an error at the hardware level (or incorrect function implementation), how can a properly used *sn* function lead to an overflow?

Try adding the line below (defining badthing) and then see what happens when you pass 20 chars. You end up with a non-terminated string, which when printed, appends the string immediately following in memory. Woops.

I would just like to thank Linux Journal for updating the article to correct errors mentioned in the discussion. This is especially interesting, as they didn't feel justified to mention that they updated the article. Thank you for the shining exampling of journalism you present to the world.

Buffer overflows would not be nearly the security nightmare they are if the root account were not all-powerful and if network and system daemons like apache were not run from the root account. Fixing this turns out to be simpler than you might think with the LIDS kernel patch.

LIDS stands for Linux Intrusion Detection System. I think of it as a firewall that sits between a process and the kernel when system calls are made. With it, priviledges can be granted to and taken away from individual programs (based on the inode of the executable program file) instead of to user accounts.

If the only root priviledge the apache program possesed was the ability to bind to port 80, and someone found a buffer overflow in apache and exploited it, and they managed to get a remote root shell from it, it would buy them little if the normal powers of the root account had been stripped away.

Although you have to spend some time configuring the permissions table for the applications you want to run, it is quite manageable and only has to be done once. I think this is the ultimate answer to bufferflows: make it so that even when a buffer overflow is found, the attacker can't do anything with it.

Well one solution to this problem could be the new Openbsd 3.2 kernel policy, that most of the stack programming is removed. You can also config systrace and config it but it takes alot of time if you want it to work good.

Those are my best answers to how sysadmins could protect themselfs against Buffer overflow attack. And also one more thing, PROGRAMMERS DONT write shitty code!!! buffer overflow is really easy to avoid, in the programming state.

well one solution to this problem could be the new Openbsd 3.2 kernel policy, that most of the stack programming is removed. You can also config systrace and config it but it takes alot of time.

Those are my best answers to how sysadmins could protect themselfs against Buffer overflow attack. And also one more thing, PROGRAMMERS DONT write shitty code!!! buffer overflow is really easy to avoid, in the programming state.

well one solution to this problem could be like the new Openbsd 3.2 kernel, that most of the stack programming is removed. You can also config systrace and config it.

Those are my best answers two how sysadmins should protect against Buffer overflow attack. end also one point. PROGRAMMERS DONT write shitty code!!! buffer overflow is really easy to avoid, in the programming state.

Performance is seldom important enough to require
accepting the risks of low-level errors such as buffer
overflows

Yes it is, in almost every application.
I despise the attitude that performance doesn't matter because memory and cpu is cheap, well it ain't cheap enough to warrant that attitude, which leads to software getting evermore bloated.

COBOL, Ada, Pascal, BASIC & Python pop instantly into my head as languages where such bugs will never occur.
Programmers that have such a bias towards C should get a real clue. It is a poor tool for many problems.

"COBOL, Ada, Pascal, BASIC & Python pop instantly into my head as languages where such bugs will never occur. Programmers that have such a bias towards C should get a real clue"

This is FUD. It is biased and untrue.

Pascal and BASIC got the same problems. I don't know if COBOL and Python do, so I refrain from arguing on this issue.

Ada, C# and Java are usually compiled into managed environments which solve the problem. Naturally, at the expense of complete bound checking at every buffer access (which I agree on is often acceptable).

It is not REALLY an issue on the side of the language as the poster seem to believe, it is an issue of
- is code running "live" or "managed" ?
- if not managed, are every API and possible way to express yourself within the language bound checked?

I do not give much for the argument of Pascal syntax making code better than C. It's an old and biased argument which I do not believe is based on good statistics.

Bad statistics are based on bad comparisions like what unexperienced programmers yield in different langauges. Additionally, since most code is written in C, it is also a _lot_ of inexperienced programmers coding in C.

Pascal has _never_ been mainstreamed enough to been tested with the "30% of the programmers cannot code, what do they produce" test C is experiencing every day. Pascal, Ada, Fortran etc programmers tend to be better educated because they are mostly used by researchers or people will univerisity or college experience.

I have been trying to cite PAX, especially the ASLR work, but good references are hard to come by. http://pageexec.virtualave.net/ is just a home page with source code. Please provide canonical references to technical documentation, and PAX will be better cited in the future.

... Using strncpy would have caught the problem and set off a compile-time error.

---END QUOTE---

Um, no. Why don't you try that for yourself and see how wrong you are. Seriously. That's embarrassingly bad. And it makes me wonder how useful the rest of the coding tips in the article are.

You want to write function() as follows:

void function( char *str )

{

char buffer[16];

strncpy( buffer, str, sizeof(buffer) - 1 );

buffer[sizeof(buffer) - 1] = 0;

}

This will copy bytes from str to buffer, and still maintain a properly NULL-terminated string. If you're just moving raw bytes around, you can drop the -1 from the length and remove the last line of the function.

You know, it seems to me that this argument posed above is incorrect, as the following code does in fact compile.. What was the argument? That it was insecure? If so, this is true, but the author of this article was expressing that fact, and that is the whole reason it was written, to demonstrate what an overflow was. Please note I don't concider myself as a genius knowing this...

If all you're doing is moving raw bytes round, you should be using
memcpy(), not strncpy().

Just silently truncating the passed buffer is almost always incorrect. You
check before you copy: if the source string is too long, it's either a programming error, in which case there should be an assert(), or the
user needs to be notified that she exceeded hard-coded limits (or the
code needs to be fixed so that it doesn't have a hard-coded limit.

Just to be fair, I'll nitpick on the original article too: The very first
example is described as being "valid code". Wrong. It's invalid, and
explicitly so by the ISO C standard. That it compiles dosn't make it "valid".
But overall, a good overview of how buffer overflow exploitation works.

Just silently truncating the passed buffer is almost always incorrect. You check before you copy: if the source string is too long, it's either a programming error, in which case there should be an assert(), or the user needs to be notified that she exceeded hard-coded limits (or the code needs to be fixed so that it doesn't have a hard-coded limit.

I assume you mean he should check the length with strlen(). But strlen() has unbounded execution time, and if the string does not contain the character at all, strlen() will (potentially) run forever! It can have fatal consequences for a program with real-time requirements. In this situation it is much better to just safe-guard truncate the buffer.

I agree with the first poster. And the attitude is well deserved. If you're going to talk about secure programming, but get something obviously wrong like that, you deserve to get ripped.

Note that in the correction strncpy() is used, but it takes into consideration the size of the buffer that's being copied into. There is still a bug in it though, in that there isn't support for input strings that are shorter than 15 bytes in length.