Wednesday, July 21, 2010

What is exactly the base pointer and stack pointer? To what do they point?

From what I see, I'd say the stack pointer points always to the top of the stack, and the base pointer to the beggining of the the current function? Or what?

[] One important thing to note is that the stack grows "downwards" in memory. This means that to move the stack pointer upward you decrease its value. – Ben Strasser Sep 8 '09 at 19:07

[] One hint to differentiate what EBP/ESP and EIP are doing: EBP & ESP deal with data, while EIP deals with code. – rstevens Sep 8 '09 at 19:19

[] You mean that if I called a new function named for example DrawPixel(), appearing on the top of the current stack, ESP would decrease, is that it? And after the function returned it would increase again(so the picture would look just like it is right now)? – devoured elysium Sep 8 '09 at 19:20

[] In your graph, ebp (usually) is the "frame pointer", esp the "stack pointer". This allows to access locals via [ebp-x] and stack parameters via [ebp+x] consistently, independent of the stack pointer (which frequently changes within a function). Adressing could be done through ESP, freeing up EBP for other operations - but that way, debuggers can't tell call stack or values of locals. – peterchen Sep 8 '09 at 19:31

[] @Ben. Not nesacerily. Some compilers put stack frames into the heap. The concept of stack growing down is just that, a concept that makes it easy to understand. The implementation of the stack can be anything (using random chunks of the heap makes hacks that overwrite parts of the stack a lot harder as they are not as deterministic). – Martin York Sep 8 '09 at 20:06

[] esp is as you say it is, the top of the stack.

ebp is usually set to esp at the start of the function. Local variables are accessed by subtracting a constant offset from ebp. All x86 calling conventions define ebp as being preserved across function calls. ebp itself actually points to the previous frame's base pointer, which enables stack walking in a debugger and viewing other frames local variables to work.

Most function prologs look something like:

push ebp ; Preserve current frame pointer
mov ebp, esp ; Create new frame pointer pointing to current stack top
sub esp, 20 ; allocate 20 bytes worth of locals on stack.
Then later in the function you may have code like (presuming both local variables are 4 bytes)

mov [ebp-4], eax ; Store eax in first local
mov ebx, [ebp - 8] ; Load ebx from second local
FPO or frame pointer omission optimization which you can enable will actually eliminate this and use ebp as another register and access locals directly off of esp, but this makes debugging a bit more difficult since the debugger can no longer directly access the stack frames of earlier function calls.

[] +1 and would do another one if possible. Nice and comprehensive explanation! – rstevens Sep 8 '09 at 19:14

[] Thanks for the explanation! But I am now kinda confused. Let's assume I call a function and I am in the first line of its prolog, still without having executed a single line from it. At that point, what is ebp's value? Does the stack have anything at that point besides the pushed arguments? Thanks! – devoured elysium Sep 9 '09 at 12:33

[] EBP is not magically changed, so until you've established a new EBP for your function you'll still have the callers value. And besides arguments, the stack will also hold the old EIP (return address) – MSalters Sep 9 '09 at 13:34

[] You have it right. The stack pointer points to the top item on the stack and the base pointer points to the "previous" top of the stack before the function was called.

When you call a function, any local variable will be stored on the stack and the stack pointer will be incremented. When you return from the function, all the local variables on the stack go out of scope. You do this by setting the stack pointer back to the base pointer (which was the "previous" top before the function call).

Doing memory allocation this way is very, very fast and efficient.

answered Sep 8 '09 at 18:48
Robert Cartaino

[] @Robert: When you say "previous" top of the stack before the function was called, you are ignoring both the parameters, which are pushed onto the stack just before calling the function and the caller EIP. This might confuse readers. Let's just say that in a standard stack frame, EBP points to the same place where ESP pointed just after entering the function. – wigy Sep 9 '09 at 13:24

[] EDIT: For a better description, see x86 Disassembly/Functions and Stack Frames in a WikiBook about x86 assembly. I try to add some info you might be interested in using Visual Studio.

Storing the caller EBP as the first local variable is called a standard stack frame, and this may be used for nearly all calling conventions on Windows. Differences exist whether the caller or callee deallocates the passed parameters, and which parameters are passed in registers, but these are orthogonal to the standard stack frame problem.

Speaking about Windows programs, you might probably use Visual Studio to compile your C++ code. Be aware that Microsoft uses an optimization called Frame Pointer Omission, that makes it nearly impossible to do walk the stack without using the dbghlp library and the PDB file for the executable.

This Frame Pointer Omission means that the compiler does not store the old EBP on a standard place and uses the EBP register for something else, therefore you have hard time finding the caller EIP without knowing how much space the local variables need for a given function. Of course Microsoft provides an API that allows you to do stack-walks even in this case, but looking up the symbol table database in PDB files takes too long for some use cases.

To avoid FPO in your compilation units, you need to avoid using /O2 or need to explicitly add /Oy- to the C++ compilation flags in your projects. You probably link against the C or C++ runtime, which uses FPO in the Release configuration, so you will have hard time to do stack walks without the dbghlp.dll.
link|flag edited Sep 9 '09 at 9:09

[] I don't get how EIP is stored on the stack. Shouldn't it be a register? How can a register be on the stack? Thanks! – devoured elysium Sep 8 '09 at 22:36

[] The caller EIP is pushed onto the stack by the CALL instruction itself. The RET instruction just fetches the top of the stack and puts it into the EIP. If you have buffer overruns, this fact might be used to jump into user code from a privileged thread. – wigy Sep 9 '09 at 9:05

[] First of all, the stack pointer points to the bottom of the stack since x86 stacks build from high address values to lower address values. The stack pointer is the point where the next call to push (or call) will place the next value. It's operation is equivalent to the C/C++ statement:

// push eax
--*esp = eax
// pop eax
eax = *esp++;

// a function call, in this case, the caller must clean up the function parameters
move eax,some value
push eax
call some address // this pushes the next value of the instruction pointer onto the
// stack and changes the instruction pointer to "some address"
add esp,4 // remove eax from the stack

// a function
push ebp // save the old stack frame
move ebp, esp
... // do stuff
pop ebp // restore the old stack frame
ret
The base pointer is top of the current frame. ebp generally points to your return address. ebp+4 points to the first parameter of your function (or the this value of a class method). ebp-4 points to the first local variable of your function, usually the old value of ebp so you can restore the prior frame pointer.
link|flag answered Sep 8 '09 at 18:59
jmucchiello
6,226719

[] That was indeed very helpful for me. – CDR Oct 4 '09 at 5:21

[] +1 High to low adressing, and ebp +- issues are very helpful to mention. – kolistivra Jun 22 at 23:09

[] ESP is the current stack pointer, which will change any time a word or address is pushed or popped onto/off off the stack. EBP is a more convenient way for the compiler to keep track of a function's parameters and local variables than using the ESP directly.

Generally (and this may vary from compiler to compiler), all of the arguments to a function being called are pushed onto the stack (usually in the reverse order that they're declared in the function prototype, but this varies). Then the function is called, which pushes the return address (EIP) onto the stack.

Upon entry to the function, the old EBP value is pushed onto the stack and EBP is set to the value of ESP. Then the ESP is decremented (because the stack grows downward in memory) to allocate space for the function's local variables and temporaries. From that point on, during the execution of the function, the arguments to the function are located on the stack at positive offsets from EBP (because they were pushed prior to the function call), and the local variables are located at negative offsets from EBP (because they were allocated on the stack after the function entry). That's why the EBP is called the frame pointer, because it points to the center of the function call frame.

Upon exit, all the function has to do is set ESP to the value of EBP, and then the old EBP value is popped, then the function returns (popping the return address into EIP).
link|flag answered Sep 8 '09 at 19:44
Loadmaster

[] Long time since I've done Assembly programming, but this link might be useful...

The processor has a collection of registers which are used to store data. Some of these are direct values while others are pointing to an area within RAM. Registers do tend to be used for certain specific actions and every operand in assembly will require a certain amount of data in specific registers.

The stack pointer is mostly used when you're calling other procedures. With modern compilers, a bunch of data will be dumped first on the stack, followed by the return address so the system will know where to return once it's told to return. The stack pointer will point at the next location where new data can be pushed to the stack, where it will stay until it's popped back again.

Base registers or segment registers just point to the address space of a large amount of data. Combined with a second regiser, the Base pointer will divide the memory in huge blocks while the second register will point at an item within this block. Base pointers therefor point to the base of blocks of data.

Do keep in mind that Assembly is very CPU specific. The page I've linked to provides information about different types of CPU's.
link|flag answered Sep 8 '09

[] ds is also a segment register and in the days of MS-DOS and 16-bits code, you definitely needed to change these segment registers occasionally, since they could never point to more than 64 KB of RAM. Yet DOS could access memory up to 1 MB because it used 20-bits address pointers. Later we got 32-bits systems, some with 36-bits address registers and now 64-bits registers. So nowadays you won't really need to change these segment registers anymore. – Workshop Alex Sep 8 '09 at 19:17

[] No modern OS uses 386 segments – Paul Betts Sep 8 '09 at 19:32

[] @Paul: WRONG! WRONG! WRONG! The 16-bits segments are replaced by 32-bits segments. In protected mode, this allows the virtualization of memory, basically allowing the processor to map physical addresses to logical ones. However, within your application, things still seem to be flat, since the OS has virtualized the memory for you. The kernel operates in protected mode, allowing applications to run in a flat memory model. See also en.wikipedia.org/wiki/Protected_mode – Workshop Alex Sep 9 '09 at 8:30

[] @Workshop ALex: That's a technicality. All modern OSes set all segments to [0, FFFFFFFF]. That doesn't really count. And if you would read the linked page, you'll see that all fancy stuff is done with pages, which are much more fine-grained then segments. – MSalters Sep 9 '09 at 13:39

[] @MSalters, that's not completely true. They do this for the processes that they execute themselves, providing virtual memory for those processes so these segments aren't needed. The operating System just hides the segmentation of memory, but it still uses segments internally. Watcom C/C++ for 32-bits systems actually supports the use of segments when doing far calls! More at users.pjwstk.edu.pl/~jms/qnx/help/watcom/… Watcom C/C++ is now OpenWatcom: openwatcom.org – Workshop Alex Sep 10 '09 at 8:12

[] Btw, I just debugged a Delphi application. The segment registers are 16 bits and CS contains the value 001Bh, DS, ES and SS are all 0023h, FS =s 003Bh and only GS is NULL. They are different values and therefor must each have a special function. (Possibly related to exception handling.) – Workshop Alex Sep 10 '09 at 8:19

[] Edit Yeah, this is mostly wrong. It describes something entirely different in case anyone is interested :)

Yes, the stack pointer points to the top of the stack (whether that's the first empty stack location or the last full one I'm unsure of). The base pointer points to the memory location of the instruction that's being executed. This is on the level of opcodes - the most basic instruction you can get on a computer. Each opcode and its parameters is stored in a memory location. One C or C++ or C# line could be translated to one opcode, or a sequence of two or more depending on how complex it is. These are written into program memory sequentially and executed. Under normal circumstances the base pointer is incremented one instruction. For program control (GOTO, IF, etc) it can be incremented multiple times or just replaced with the next memory address.

In this context, the functions are stored in program memory at a certain address. When the function is called, certain information is pushed on the stack that lets the program find its was back to where the function was called from as well as the parameters to the function, then the address of the function in program memory is pushed into the base pointer. On the next clock cycle the computer starts executing instructions from that memory address. Then at some point it will RETURN to the memory location AFTER the instruction that called the function and continue from there.
link|flag edited Sep 8 '09 at 19:11

answered Sep 8 '09 at 18:46
Stephen Friederichs
2
[] EBP does not point to current instruction, that's eip. – Michael Sep 8 '09 at 18:50
I'm having a bit of trouble understanding what the ebp is. If we have 10 lines of MASM code, that means that as we go down running those lines, ebp will be always increasing? – devoured elysium Sep 8 '09 at 18:58

[] @Devoured - No. That is not true. eip will be increasing. – Michael Sep 8 '09 at 19:00
You mean that what I said is right but not for EBP, but for IEP, is that it? – devoured elysium Sep 8 '09 at 19:03

[] Yes. EIP is the instruction pointer and is implicitly modified after each instruction is executed. – Michael Sep 8 '09 at