Bloog Bot

Memory

What follows is a crash course in some important theory about the memory in your computer; specifically, the anatomy of a program once it's loaded into memory at runtime. This may seem dense, but I promise, it's important to the rest of the discussion.

Computers use Random Access Memory (RAM) to store the machine instructions and data of programs at runtime. Modern operating systems use a memory management technique called virtual memory which is an abstraction that hides the physical underlying data storage of your computer. 32bit operating systems have 2^32 (~4 billion) unique memory addresses, and 64bit operating systems have 2^64 (a lot) of unique memory addresses. For most modern systems, each unique memory address can store 8 bits, or 1 byte, of information (this is mostly a defacto standard, but isn't always true. 8 is also conveniently a power of 2). Additionally, most modern architectures use a 32bit or 64bit memory word size (but not all).

When working with C++ and C#, the size of all the built in data types fit nicely with the memory word size of whichever architecture you're compiling for (for example, the size of an int in C# is 32bit). When using things like char or byte which are only 1 byte in size, compilers will often "pack" an extra byte of space before or after the memory addresses that store those values as an optimization to align your data more efficiently. When you invoke a computer program, a piece of your operating system known as the loader loads your program's machine instructions and data into a chuck of virtual memory provided by the operating system, and that virtual memory is mapped to physical memory using techniques completely invisible to your program.

Why do we care? To answer that, let's use the game we created in the previous section as an example. Our Player is constantly taking damage whenever we hit the space bar (pretend he's fighting monsters or something). But we don't want our Player to die. So one possible hack would be to give the Player a really high health value. That way we could continue to achieve glory through spacebar mashing. We know that our program's machine instructions and data are loaded into memory when we start the program, and remain there until the program exits. If we can find the exact location the Player's health value is stored, we could poke around in the process's memory and modify the Player's health value.

Unfortunately it's not so simple. Let's say we inspect the process's memory while the game is running, and find the Player's health at memory address 0x0008CFA4. If we close the program, fire it up again, and find the Player's health again, it's not going to be at 0x0008CFA4. Why is this the case? The problem is two-fold. First, remember when I mentioned the operating system assigns a chunk of virtual memory for our program to use? The address space our program gets isn't always going to be the same. If you call inspect the base address of a process at runtime, you'll notice the memory address changes each time you run the program. The second problem has to do with how memory is allocated on the heap, but before explaining that, we need to take a look at the memory layout of a program at runtime.

We won't be going too far down the rabbit hole here (yeah, it's a deep one). But bear with me while we discuss the image to the left. The virtual memory given to our process by the operating system is broken into a number of segments. You'll see that our virtual address space is 4gb in size (we're assuming a 32bit architecture, but the concepts discussed here can be extrapolated to 64bit architectures). 1gb is used by the kernel, and 3gb are for our program. By default on windows, it's actually split 2gb / 2gb unless you turn on the Large Address Aware flag in your program, but that's irrelevant for our discussion.

Starting at the bottom, we have the text segment (also known as the code segment). The text segment is readonly. This is where the machine instructions for your program are stored. Additionally, it stores things like string literals. Next is the data segment. This stores global and static variables that can be modified at runtime. The bss segment (also known as uninitialized data), is where global and static variables that haven't been initialized are stored.

The next two sections are the heap and the stack. The stack and the heap are immensely important in programming, and there are plenty of good resources out there to learn more about them (here's a good place to start). For our purposes, I'll give an enormously over-simplified explanation of how they work.

In C#, we have "value types" and "reference types". Examples of value types are int, bool, long, and structs. Examples of reference types are string, and classes. Value types in your code live on the stack, and reference types live on the heap (usually). Each time a method is called in your program, a new "stack frame" is pushed onto the stack. Value types will exist in a predictable location within the stack frame. By contrast, when we create a new reference type variable, new memory is allocated on the heap, but we don't know where on the heap that memory will be. C++ gives you more explicit control about where you variables are stored in memory. Well, actually, C# does have mechanisms that allow you to override the default behavior, but the out-of-the-box C# experience that most developers are likely familiar with can be thought of as an abstraction over conventional best practices of C++ memory management.

Now, consider our game BloogsQuest we wrote in the previous section. We create a new Game object, which stores a reference to a Player object, which stores the health value we want to find the memory address for. How can we consistently find that memory address when the location of those objects on the heap is different each time we run the game?

Notice the pointer to our game object is stored as a global variable outside the main function. Recall our discussion above about the anatomy of virtual memory at runtime. Global variables are stored in data segment, and their memory addresses are essentially hard-coded. If you look at the assembly instructions executed by your CPU at runtime, you'll see the memory addresses for those global variables are hard-coded. If they changed, things would go very wrong. So we can feel confident that global variables like the pointer to our Game object exist in the same place in memory relative to the base address of our process every time we run the program.

But how do we find the memory address of our Game pointer? We could simply use C++'s & (address-of) operator, but that isn't an option for games where we don't have access to the source code (which is almost always the case). Instead, we turn to a memory scanner. The one I'm familiar with using is called Cheat Engine. Cheat Engine searches the program's memory at runtime, allowing us to pinpoint the memory addresses of values we're interested in. In our case, we're going to use it to find the memory address of our Player's health. What we actually want is the memory address of our Game pointer, because that's not going to change between program runs, but the way we're going to find that is by searching for the Player's health, then walking back the pointer chain to get to the Game pointer.

There's one more thing to address before our first Cheat Engine demonstration. We need to know how the fields of a struct are laid out in memory. Consider the Player struct from BloogsQuest:

struct Player
{
int level;
int health;
};

Let's say we create a variable of type Player that exists at the memory address 0x0000AA00. That memory address actually points to the first field on the Player object - in this case, level. Then at which memory address does the player's health exist? In C++, the fields of classes and structs are laid out sequentially in memory. So in order to determine the memory address of subsequent fields on a struct, we have to consider the size of the types of those fields. In this case, we're working with ints. The size of an int in C++ depends on the system, but on Windows it's going to be 4 bytes. So if our Player object exists at memory address 0x0000AA00, we'll find the level field at 0x0000AA00, and the health field at 0x0000AA04. Armed with this knowledge, we're ready to fire up Cheat Engine.

The first thing to do is start BloogsQuest.exe, then start Cheat Engine and attach Cheat Engine to the BloogsQuest process:

Our first goal is to find the memory address of our Player's health. We know that his health starts at 100, and we know that an int is stored as 4 bytes in memory, so we should enter 100 under Value, and we should leave the Value Type set to 4 bytes. Then click First Scan:

You'll notice Cheat Engine found three memory addresses. This isn't uncommon. Essentially what Cheat Engine is doing is scanning every memory address in our process's memory space, and trying to interpret 4 bytes starting from that memory address as the value 100. It's very possible that value will exist in more than one location. So we need to find a way to determine which of these two memory addresses actually holds our Player's health. To do that, let's hit spacebar in our game to drop the Player's health again, then go back to Cheat Engine, change Value to 1, and change Scan Type to Decreased value by..., then click Next Scan.

And voila! We have the memory address that our Player's health is stored at: 00A2DDC4. Like I mentioned above, what we actually want to do is to walk the pointer chain back until we find the location of our Game pointer. But this is our starting place. The Player's health is on the Player object, and the Game stores a Player pointer that points to the Player object. And knowing how pointers work, we know that the value of the Player pointer on the game object is actually the memory address of the Player object.

But consider our Player struct. We have access to the source code, so we know that the memory address of the health field is actually 4 bytes after the start of the Player struct. We won't always have that knowledge. Thankfully, Cheat Engine gives us some help finding the starting address of the Player object.

Double click the address 00A2DDC4 in the upper left to add it to the table at the bottom, and give it a description if that's helpful. Next, we're going to right click that row in the table and click "Find out what accesses this address". This will prompt you to attach a debugger to your process. Click yes. Now you'll see a list of all opcodes that access the memory address that stores the Player's health. It should be empty because nothing is happening in our game right now. So go back to our game and hit spacebar to damage our Player again.

Now you should see some opcodes in the debugger. Cheat Engine is watching that memory address and making note of any opcodes that access it. Notice that in the assembly instructions there aren't any hard coded memory addresses. That's because our program has dynamically allocated memory for our Game and Player objects, and those are passed around on the stack or in CPU registers, as opposed to our global Game pointer that exists in the data segement of memory.

Also notice the +04 appended to ecx in all three instructions? Consider the memory layout of the Player struct. The Player's health is offset 4 bytes from the start of the Player object. This looks promising. But which three of these do we care about? The answer is "it depends". In this case, it doesn't matter. But if you click on any of these opcodes, you'll see a little note under the "More information" on the right. You generally want to choose an instruction that shows "copy memory". In this case, choose the first opcode and double click it to open the "Extra info" window.

There are two important pieces of information here that we want to make note of. The first is the 4 byte offset that I mentioned above. The second is where it says "The value of the pointer needed to find this address is probably 00A2DDC0". Again, this is promising. Our Player's health is stored at 00A2DDC4, so it makes sense that the start of the Player object would be stored 4 bytes before that at 00A2DDC0. In hind sight, we could have figured that out ourselves without the help of Cheat Engine, but again, you won't always have access to the source code so you may not know the layout of the struct you're working with.

So, now that we know the memory address of the start of the Player object, and we know that the Game object maintains a pointer to that Player object. The value of a pointer is the memory address that object is stored at, so we should be able to use Cheat Engine to search our process's memory again, this time searching for the memory address of our Player object. Close the Extra info window and the opcodes window. Click New Scan, check the Hex checkbox, then change the value to 00A2DDC0 (the memory address of our Player object). Scan type should stay as Exact Value, and Value Type should be 4 bytes (pointers are stored as 4 bytes in C++). Then click First Scan.

Cheat Engine found two results. Again, this is not uncommon. This part might require some trial and error, but the second hit is at address 00A2C1B0 which is a lot closer to the address of our Player object, so that's a good place to start. Double click that to add it to the table below and optionally give it a description. Now we're going to go through the same process as before.

Right click that entry in the table below, and click "Find out what accesses this address". Go back to BloogsQuest and hit spacebar to hurt the Player again. You should see 3 opcodes appear. Double click one of them to pull up the Extra info window. Notice there's no offset - this makes sense because the pointer to our Player object is the first field on the Game struct. Also notice that the suggested address to scan is 00A2C1B0, the same address where we found the value of our Player's memory address, which all makes sense when you think about the structure of our Player and Game structs.

Now we know where our Player is stored in memory, and we know where our Game is stored in memory. But we need to go one step further. Our game has a global variable that is a pointer to our Game object. Finding that is our goal. So once more, let's search our process's memory for the memory address 00A2C1B0, which is the memory address where the Game object lives.

You should get two hits. Notice that one of them is green. Cheat Engine has identified this as a static pointer. The memory address of this pointer, relative to the base address of your process, will not change between runs of your program. This allow us to consistently find the memory address of our Player's health by following the pointer chain. To find the correct offset from the base address of your process, double click that green address 00F6C2D0 to add it to the table at the bottom, then double click the entry at the bottom. The UI is sorta weird, but if you scroll around in that textbox you should see BloogsQuest.exe+1C2D0. 1C2D0 is the offset from the base address of our process that will allow us to consistently find out Game pointer:

You can think about it like this. Dereferencing the static pointer found at BaseProcessAddress+1C2D0 will give us the memory address of the Game object. Dereferencing that memory address will give us the memory address of the Player object. Dereferencing THAT memory address + 4 (we have to take into account the offset because health is the second field on the Player struct) will give us the Player's health, every single time. We can confirm this by adding a pointer chain in Cheat Engine. To do so, click "Add Address Manually" in the bottom right, make it look like this, and click Okay:

We can confirm it's working by noticing the value is 97, which is the same as our Player's health. Now, to demonstate that this will stay consistent even after reopening the program, close BloogsQuest.exe, reopen it, and attach Cheat Engine to the new process. When prompted to keep the current address list / code list, click Yes. Check out the table at the bottom. Our pointer chain still points to the correct value (100 in this case, which is the Player's health after restarting the game), even though the memory address of our Game object is something completely different (000DC2D0 this time). This is excellent, because we can now use the static pointer at memory address BaseAddress+1C2D0 to get to the Player's health every single time.

You can see how knowledge about the data structures within the program you're working with is helpful when going through this process. Like I mentioned earlier, you will rarely have access to the source code of the game you're working with. However, there are tools that can help you. The one I'm familiar with using is called IDA. IDA is a disassembler, which means you can feed it a binary and it'll convert it into assembly language. Going one step further, IDA has a plugin called Hex-Rays that is a decompiler. It can take assembly instructions and create a very close approximation to the C code that would generate such assembly.

Obviously it will be missing any information that gets stripped out by the compiler, like variable names. It also has trouble disambiguating things like data types that have the same size. But there's still plenty of useful information it can show you. Remember the diagram showing the anatomy of a program in memory? String literals are stored in the text section, so those can be viewed in IDA. You can also view a call tree that shows a general structure of how the main function calls into other functions throughout the program. This makes it possible to reverse engineer entire applications, including video games.

I've only putzed around with IDA, and it's a beast. Trying to reverse engineer a program as complicated as the WoW client seems like an insurmountable task. Being familiar with C/C++ programming conventions certainly helps. But wizards on the internet have torn the WoW client apart and put it back together again, so the memory offsets for important pieces of game data are all very well documented. There's no point in reinventing the wheel, so it definitely makes more sense to build on the work of others. However, I wanted to go through the process of finding a consistent way to access the Player's health because it's a critical reverse engineering skill. Armed with Cheat Engine and IDA, there's all sorts of interesting stuff you can find.

Another thing to keep in mind is that our strategy relied on the Game pointer being defined as a global variable outside the main function. This won't always be the case, so sometimes things will be a little more complicated. For example, if the Game pointer was defined as local variable in the main function, that would be created on the stack as opposed to the text section, which is a bit harder to find. There's a series of symbols called THREADSTACK that starts at 0 and goes up for each thread running in your process. It's possible to get a reference to the memory address of THREADSTACK0, and you can then look around in memory near that address to find interesting variables defined near the top of the main function. You'll have to be flexible with your strategy, but you can usually find a way to locate the information you want in memory.

So, we can now consistently find the Player's health in memory. What can we do with that information? We'll explore that in the next section.