Disassembling, Decompiling and Modifying executables

Motivation for writing

As professional developers, we create products. We implement ideas, which are usually driven from some business craving for acceptance in the global market, from their target group. We try to deliver elegant, fast and reliable solutions and, quite honestly, we hate when someone use our work without at least saying "thanks, you've really made a great thing". That is why we need to protect our work. And in order to do that, we should be aware of the common vectors used by crackers to hack our software.

In this article, I'm gonna show you how to disassemble and decompile pure executable written in C++, among other interesting things related to managed and unmanaged environments.

First, we’ll need a little bit of a theory so we can really understand what we are doing and why.

Difference between static and dynamic libraries

Historically, the static librariesare the first type of libraries to appear. In Windows you can find them by the extensions .lib and .dll. The main difference between the static and thedynamic librariesis that the static library is directly embedded in the executable, thus increasing its size. The dynamic library, on the other hand, is a separate file which uploads a different image of itself in memory every time it is called from a program. The dll is one, but the image is different and this way any inter-process concurrent issues are avoided. This also enables more manageable updates, but implies a slight performance degradation, which is not considered a big issue.

In general, the dynamic libraries are the preferred approach for building applications. Even in the latest versions of Visual Studio there is no option to create a static library; by default all libraries are considered dynamic. Yet it is still possible to create statically linked libraries through the console environment.

The CPU registers

The CPU registers are the fastest memory located in the CPU itself. They are basically used for every low – level operation, they are the super-fast data storage of the processor. For x86 architectures there are usually 8 32 bit long registers, 2 of which hold the base pointer and the stack pointer that are used for navigation between the instructions. The registers are even faster than the Static RAM (SRAM, known as the cache) and, of course, the Dynamic RAM.

Quick overview of the Assembly language

For this article we need to know few basic things about the assembly language so we can actually understand what we are doing. The Assembly language is unstructured and is based on very primitive instructions, which are divided in the following general types (I’ll describe only the basic operations) :

Data movement instructions

mov – used to copy data from one cell to another, between registers, or between a register and a cell in the memorypush/pop – operates on the memory supported stack

Arithmetic instructions

jmp – jump to label or a cell in memoryjb – jump if condition is metje - jump when equaljne - jump when not equaljz - jump when last result was zerojg - jump when greater thanjge - jump when greater than or equal tojl - jump when less thanjle - jump when less than or equal tocmp – compare the values of the two specified operandscall/ret – these two implement the routine call and return

The Control flow instructions are what we are most interested in here. For a complete tutorial on the x86 assembly language, check this article.

Disassembling and modifying a C++ executable

For our example I’ve created a simple C++ application with basic I/O.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

#include "stdafx.h"

#include <iostream>

#include <sstream>

usingnamespacestd;

voidexecute()

{

stringnumbers;

inthold;

for(;;)

{

cout<<"Please enter the code: \n";

getline(cin,numbers);

if(numbers!="82634")

{

cout<<"\nTry again.\n";

}

else

{

cout<<"Code accepted";

break;

}

}

cin>>hold;

}

int_tmain(intargc,_TCHAR*argv[])

{

execute();

return0;

}

We’ll need to disassemble, debug and optionally decompile our example. Download the following tools that will help us to do that :

I’ve compiled this example which you can download from here. When we start it we see the following simple console application :

It asks for some predefined input. If the wrong code is entered, the following output is presented :

“Try again”

Let’s pretend that we don’t have the source code and we don’t know the code. So what can we do ? Obviously, we have a loop here with some check inside which determines if the program should break from the loop or not.

We also got few strings :

“Please enter the code :”
“Try again”

Debug the executable

Start the OllyDbg debugger (with administrator privileges) and open the exe. (click to enlarge)

What we see in the upper-left window is the disassembled machine code. In other words, you see the instructions written in the Assembly Language. Below that we see the window with the binary code presented in hexademical values, and on the right we see the window with the CPU registers.

Locate the loop conditions

So now that our exe is loaded, started, and the debugger is attached, we have to find the exact place in the assembly code where the check is made. To do that we can use the strings that the UI shows us. Right-click on the assembly code view > Search For > All Referenced Strings . Find the “Try again” string and double-click it. The assembly view will locate the exact instruction which prints that string on the console. We can also see the “Code accepted” related instructions few rows below. It is clear where the loop resides.

Modify the assembly instructions

The next step is to modify some assembly instructions. We see a lot of instructions, but we are most interested in the jmp-related ones that control the position of the stack pointer. If we scroll a little bit up we can see “Please enter the following code…” instruction. In order to escape the loop, we need to change the target address of one of the jmp instructions that we run through.

Let’s take the jb at “00D613A4”, click it twice and change the target memory address to “00D613C7” – the one just before the “Code accepted” ASCII text, which obviously opens a stream.

In order to save it, right-click on the assembly window and press “Copy to executable” -> “Selection” while you’re on the modified row.

An alternative to OllyDbg. What is IDA ?

IDA is a debugger and a disassembler like OllyDbg. But it provides a more user-friendly view of the assembly code, and it can also act as a decompiler. For example, check the following screenshot of its assembly view :

As you can see it is more structured, the various jumps are visualized like graph nodes which facilitates navigation.

So, can we decompile a native image into an understandable source code ? Depends on your idea of "understandable". You have to devote a lot of time and you need to posses serious knowledge of the APIs your operation system use, along with understanding of the C and Assembly syntax.

Decompiling applications written in managed environments

Decompiling .Net apps is also done with debuggers and decompilers for .Net like Reflector, for example (which is actually paid from some time on).

But the exe or dll you see on your desktop is intermediate, not binary code (assuming you do not use NGen). Decompiling C++ apps is hard because the compiler first produces Assembly language code targeted to the specific processor architecture, and next the Assembler gets that code and produces the actual native image. And as we saw, decompiling assembly code is hard.

The MSIL, at the other hand, is very close to the actual source code of your app, e.g. written with C#. You can use programs like Reflector to decompile them, along with some plugins to actually modify them.

So it is actually not so hard to crack an application

Yes, it’s not. With the difference that this process in an actual application will be more time-consuming. Do you know a single popular stand-alone application that has not been cracked ? That is why you need to think of better ways of protecting your software. Understand one simple thing :

Every application can be cracked, if you have access to its native image, just like every computer password can be broken, if you have physical access to the machine.

Of course, there are techniques that allows us to slow an attacker down, which might or might not be enough. But "slowing" doesn't mean "preventing", and that's a topic of another article.

That's from me regarding the topic of decompilation, I hope you learned something new today and, hopefully, this knowledge will help you to better protect your software. Know your enemy before going into battle. Because it's the battle for your own time.

Hi there ! My name is Kosta Hristov and I currently live in London, England. I've been working as a software engineer for the past 6 years on different mobile, desktop and web IT projects. I started this blog almost one year ago with the idea of helping developers from all around the world in their day to day programming tasks, sharing knowledge on various topics. If you find my articles interesting and you want to know more about me, feel free to contact me via the social links below. ;)

Depends on the compilation unit. If it's Java/.Net it will most probably be bytecode/MSIL so it'll be quite easy to decompile, unless obfuscated. You can use tools like Reflector for .Net. If it is C/C++, OllyDbg and IDA are one of the best, nonetheless you don't have a lot of choices here.

The methods I've described in this tutorial are a little bit low-level for your task. I guess you're talking about PicturesToExe deluxe ?
In that case, you've probably tried to use the same program to extract the photos. You got two options:
1. Download this tool and try it on the executable with the pictureshttp://www.picturestoexe.com/forums/index.php?app=core&module=attach&section=attach&attach_id=3655
2. Simply open the slideshow in fullscreen and make a screenshot of the screen. Then use paint to paste it. You might have some quality lost but it depends on the picture itself.

Hello kosta, i want to know about a way to decompress an exe file… I have one and no software works on it..it is kind of combination of many compressing softwares like upx, windows generic…and something like netopsystems fead package on which there is no info that i have .. So kindly if u can help it would be great….thanks

Well…actually i want to do all the things but to bypass i need to unpack it(as i told its multipacked) so that i can know which part of source code takes the password for validation and nop’s it(as it is wrong, obviously)….. Thanx for your attention..

Feel free to take the source from the page, I don’t keep the project anymore. ;)

@Gnamu

Effectively decompiling an executable and extracting readable code is not an easy task and requires substantial amount of time and expertise related to the operating system, platform and language used during the development.

Disassembling is easier, but it’s limited enough. The example I’ve provided is very simple and forms the basics used when cracking a game, for example. But in a real situation this would not be that easy.

If you really want to decompile your exe, I would advice you to hire specialized professionals to do that.

Hi! Thanks for your tutorial! I have a couple of simple questions:
How do I exactly know if the application I am using is a .Net application (given you don’t have any info from the developer or application website) ?

Can I still ry to decompile a .Net application with Ollydbg? What important parts of the code will I not be able to see? I’m asking because ve seen many Ollydbg tutorials not making any distinction on wheter the application is a .Net application or not?

Well, the simplest way I can think of is to try opening it with a .Net decompiler like Reflector. Or ILDASM since it’s free and you already have it installed. That way you’ll know if it’s an assembly.

Regarding OllyDbg, no you can’t. And you don’t need to, OllyDbg is a disassembler, not a decompiler. If you open a .Net assembly with OllyDbg you’ll see absolutely the same as you would have seen with a native C++ application – assembly code.

Wait, I’ve probably missed something here. If decompiling means getting back to something close to the original source code, and disassemble means just looking at the assembly code, then in this case I am picturing I am not interested in decompiling but just disassembling.

So to this purpose, and to one of trying to alter a .Net application behaviour like you show for non .Net apps, is Ollydbg still the right tool?

To this purpose does it make a difference to use Ollydbg or say Reflector?
Hope this clarify my questions! Thanks

Yes decompiling usually means getting back to the original code. Disassembling means getting the assembly instructions.

“you’ll see absolutely the same as you would have seen with a native C++ application – assembly code.”

This was actually not entirely correct.

The .Net exe (or .Net assembly) does not contain assembly code. It contains something called MSIL (Microsoft Intermediate Language), the equivalent of the Bytecode in Java. So when you look at a .Net assembly you see a .Net compliant language (like C# and Visual Basic) translated into MSIL. When this assembly gets executed, the JIT (Just in time) compiler creates the actual assembler code and the binary instructions for the processor to execute. But this happens runtime.

However, you can still see assembly code if the .Net assembly is compiled using something called Native Image Generator (NGen). Then you skip the intermediate language step. But most of the .net assemblies are actually MSIL (the new name for which is Common Intermediate Language (CIL), but I prefer the old one).

Therefore you can’t “disassemble” a .Net exe with OllyDbg, because OllyDbg is a disassembler and the .Net assembly doesn’t contain assembler code but rather MSIL.

…I knew I’d have a second thought on this.. or I should rather call it a bigger doubt?

You say:

“When this assembly gets executed, the JIT (Just in time) compiler creates the actual assembler code and the binary instructions for the processor to execute. But this happens runtime.”

And the question is.. doesn’t Ollydbg operates at application runtime?
I’ve actually attached Ollydbg to a running .Net application, and of course I could see assembly code for it… and seeing the app calling modules, performing jumps and so on.., so isn’t it wrong to say that Ollydbg can’t be used to see and work with a .Net application assembly code?

Perhaps I should do more homework and read more, which I am trying to do.. but.. if you had a quick answer on this… I’d appreciate it again! (don’t have to be a long answer, just tell me “no you are wrong” (if you are 100% sure ;) ) I’ll figure out the details by myself then.
Thanks

You can see it, but you can’t work with it. Think about it. In a C++ application, the physical executable file can easily be read because OllyDbg can understand the binary instructions. You can modify the instructions and save an executable. That’s how cracks work.

In .Net, you have a managed environment. A virtual machine in between. So yes, OllyDbg works at runtime. But the assembler code for your .Net executable is digestible only at runtime, as opposite to a C++ exe, for instance.

Every time you start a .Net exe, you might get a different piece of assembler instructions. Or at least partially. And at least in theory.

So I said you can’t debug it, because you can’t actually do anything useful with it. Because if you modify the assembler instructions you see in OllyDbg, even save them they are no use. Because you need the virtual environment in order to run it. And pure assembler instructions doesn’t give you much information.

Again, it depends what you want. If you simply want to “see” the produced assembler code from the CLR, that’s ok. But you don’t know if the virtual machine will produce the same code next time and you can’t modify anything. But if that’s what you want, then yes, you can use OllyDbg for that.

I found the issue, but to clarify I am using the native c++, specifically win32 console application in Visual Studio 2013 desktop edition. Both debug and release give me the software break exception, but I’m using release now.

I tried a normal crackme that I know has been run through IDA 5.0 before and it also received a breakpoint exception, so it seems to be normal.

I thought that because I was receiving this exception that it would not stop properly at the beginning of the module entry point but apparently that was a mistaken idea of mine. I needed to place a breakpoint at the interesting point of code, otherwise IDA will just plow through the program without stopping. Does this sound right to you?

Hello,
Thank you for the article. I have a few question about the C code that I do not understand.
1) what do lines 36 – 42 do? It seems like a long if statement to compare 2 variables
2) why does like 53 have 2 conditions, one of them redundant? Is this in the assembly and if so, why would the compiler write both those conditions?

My company has an exe program that runs from the DOS prompt. The source code is long ago lost. Nobody knows what development language was. All this indicates pre-1993 or older. What could I use to extract the logic and formulas? How can I discern the platform? It is a small engineering program.

The company I am working with has an engineering program that needs to be modernized. It runs from the DOS prompt and is pre 1992. I need to decompile to figure out some formulas. The code is lost and there is nobody around from the build days. Is there a decompiler anyone can recommend? A methodology? Thank you.

@RMK
You are in a tough situation here. Believe me, depending on the length of the application it can take you weeks or even months. Consider the option to invest less time in finding the developer or redevelop the algorithm, or pay s.o. else do it.

You will find that in all source coding is for life users mostly anything new now will be obsolete before you understand it . But this article did pass me good to see thanks for the simple lesson kudos for your knowledge . Not every one gets what you just posted just cause they will never try.

Thank you sir for sharing your knowledge with us. Your work is really amazing and it really helped me in a lot of ways.I wanted to learn about basics of processors and assembly programs ,and this is really helpful for me to start.Thank you sir…

I have a question and maybe you can answer it. We host a game online with a game client and it shows the launcher exe is infected with a virus in windows defender. We are positive there is no issues with the client but we cant extract the files from the .exe to find the flag. Any idea on how to do this. We want this game to grow but we cant do anything if windows defender scares our intended audience away.

IL / .net code is NOT assembly. it may look like assembly, it may act like assembly, but it object orientated in nature (assembly is not). It also exists only as higher level functions of your operating system and only as objects, no flats. And only with C++ and CSHARP. it is by no way means shape or form assembly and is not at all related to the question outside of the fact that it could share a namespace, tottally different. But most people think javascript is jquery and not the other way around so who am I to argue