The information provided in this tutorial must not be used for Reverse
Engineering any application.

THE TEXT HAS BEEN WRITTEN IN SUCH A WAY THAT THE READER CAN LEARN, AND NOT JUST
GAIN INFORMATION WITHOUT KNOWING HOW STUFF WORKS.

If the Reader still chooses to break Protection Mechamisms after reading this
tutorial, he/she shall alone be responsible for the damages cause and not the
Author.

If you wish to post certain Sections of the Tutorial on a Website, you are free
to do so provided you inform the author and publish the Selected Text from the
Tutorial as it is without modification.

The Author has not copied text or any other information directly from a Source.
However, some information from some sources has been used to write this tutorial.
These Sources have been mentioned in the References Section.

You are permitted to continue reading the tutorial only if you agree to the text
given above.

II. READING THIS TUTORIAL [RTUT]

Each Section in this Tutorial has a specific Topic Code enclosed in square
brackets. This arrangement has been made so that you can jump to a specific
topic simply by searching for the topic code from your Browser.

At many places in the tutorial, I've explained a few things which are almost
unnecessary to know when dealing with Visual BASIC programs but I've written
them for those who are interested in Hacking (And I don't mean that
'hack-an-email-address' sort of a kid. The original meaning of Hacking has
been ruined by pathetic people like them. Hacking in literal terms stands for
'curiousity'.
The original meaning of a Hacker is:
"A person who enjoys exploring the details of programmable systems, as opposed
to most users, who prefer to learn only the minimum necessary."

The extra details I've given in this tutorial are for those who want to be such
ethical hackers.

The Topic Code [XTRA] and [/XTRA] has been given for marking "extra-information"
sections and you are free to skip such sections. Text within the [XTRA]..[/XTRA]
blocks is given for extra information.
You can search for the Extra Information using the Topic Code.

III. INTRODUCTION [ITRO]

As long as you know Assembly Language, it is easy to read disassembled listings
of executable files written in C/C++ or PASCAL, especially if you are using
IDA Pro as your disassembler. This is so because C and C++ Compilers generate
(or at least try to) efficient code. Some Compilers like Borland C++ use simple
instructions for complex operations(also remember that this is not always the
case) which make it easier to study them. Implementation of Code Constructs such
as loops, IF statements, Ternary IF statements, switch constructs etc. can be
found very easily as each one is unique and distinct.

However the same is not true for Applications written in Visual BASIC. VB
Programs are said to be very slow and hence deliver poor performance. There is a
reason for this. Visual BASIC programs unlike those written in other languages
don't use Windows API Directly. Local functions present in VB Runtime Files are
called which call functions from the Windows API.Most of the Visual BASIC
functions are present in MSVBM60.DLL (if you've got Runtime Files ver. 6.0).
So to study VB programs, we must disassemble and analyze the MSVBM60.DLL file as
well.
Since VB programs use such a complex API Function call procedure, programs tend
to run slower.(There are other reasons as to why VB programs run slower but I
won't be covering it as it's off-topic.)

It becomes difficult to analyze VB Programs as it uses functions which are not
part of the Windows API and hence we are not acquainted with them.

My primary aim in this Tutorial is to teach the reader how to understand
disassembled listings of programs written in Visual BASIC.

My secondary aim is to help you realise why Visual BASIC is not suitable for
writing small,fast and efficient programs.

Almost all authors of Visual BASIC books mention that Visual BASIC does not
give you applications with good performance.
This tutorial tells you why.

The Tutorial will talk about executable files compiled in Visual BASIC in
Native Code ONLY and not p-code.

After reading this tutorial, you should be able to disassemble,debug and
understand Visual Basic Applications. You may also be able to reverse engineer
Protection Mechanisms written in Visual BASIC and that's where the next section
comes in.

IV. ASSUMPTIONS [ASPT]

You are required to have a basic understanding of Visual BASIC,C,the Windows API
and 80x86 Microprocessor Assembly Language.
It would be advisable to have a copy of Intel's 80x86 Instruction Set Manual.
Intel provides this manual free of charge. If you need this manual, contact me.

This Instruction Set Reference is Volume 2 of Intel Architecture Software
Developer's Manual.

The Software Developer's Manual consists of 3 volumes:
: Basic Architecture - Order Number 243190
: Instruction Set Reference - Order Number 243191
: System Programming Guide - Order Number 243192

You can provide these Order Numbers to get a copy of these manuals. For this
tutorial only Volume 2 is required.

VBDE is not required but it's always better to have it as it gives addresses of
entry-points of most VB procedures.

VBReformer is used to see the Property values of all objects in a Visual Basic
Form. It even allows you to change the value of Object Properties such as Forms,
Command Buttons etc. Import Libraries can also been seen. This Application is
not required for this Tutorial but it's better to have it.

NuMeGa SmartCheck is again not required but it is useful when we have no idea
what a particular procedure of VB does. You can run a program from it like a
Debugger and view its log files and find out which procedure is called and what
operations are carried out etc.

You can have API Documentation from MSDN or you can use the API Text Viewer Tool
supplied with Visual Studio or browse MSDN Online (msdn.microsoft.com). Certain
Applications like APIViewer will also do.

I have given the names of the Tools that I have used. But you are free to use
any disassembler and debugger as long as you are comfortable using it but I
advice you to use the tools that I have used above. SoftIce is better than
OllyDebug but the latter is good enough for VB Programs so it doesn't matter
which one you use.

Once you have the necessary knowledge and tools, you can proceed further.
Let's begin.

VI. STRUCTURE OF A VB PROGRAM [SVBP]

When you open a VB Program from IDA, you'll end up with the following code.

This doesn't make any sense does it? If you keep scrolling further you will see
sections of code and data. Each Section has a meaning in VB Programs and you can
see a general idea of a Visual BASIC program's Section Map below.

00401000:
... IAT (First Thunk ok apis)

Next Section(NS):
... some data

NS:
... transfer area (Jumps to imported functions)

NS:
... lots of data

NS:
... local transfer area (for internal event handlers)

NS:
... other data

NS:
... code

NS:
... lots of data

NS:
... .data Section

Let us now start analysis from the entry point of the program.

push offset RT_Struct
call ThunRTMain

It's C equivalent would have been:
ThunRTMain(&RT_Struct);

A function ThunRTMain is called which accepts one parameter. We'll soon find out
that the parameter is a structure.
Simply putting a step over command on the CALL statement results in the
execution of the Application.
Wierd Isn't it?
For Pascal,C and C++ Programs there is always a start() function that takes all
CommandLine Parameters,Gets ProcessThreads,Module Handles etc. We didn't see
anything of the sort in a Visual BASIC Program.

But actually, VB does have a start function. The start function code is placed
in the ThunRTMain Function. Let's verify that by disassembling the MSVBM60.DLL
and viewing the ThunRTMain Function. I've mentioned only a part of the
ThunRTMain Function Code.

As you can see, it does call all the Functions that the start() function does in
C and PASCAL programs. But what about CommandLine() Function from KERNEL32.DLL?
MSVBM60.DLL does call that function as well but that function call is placed in
deeply nested function calls. You can open the Imports Window to see the
Imported Function and see the cross-reference to a procedure in MSVBM60.DLL

The sub_Free_Memory procedure calls various API Functions but if you keep
reading the procedure, you'll soon come across the HeapFree() Function which is
imported from kernel32.dll.

Now I guess you now know the purpose of the ThunRTMain Function.
Let us now see what structure is passed to it.

If we double-click on the RT_Struct offset, we reach an address containing
certain values.
It is a huge structure and each part needs to be seen one at a time.

Explaining the Structure will take up a lot of time and since I want to focus on
the Code Constructs of Visual BASIC, I won't explain the Structure Passed to
ThunRTMain.

All I can tell you is that the structure contains the PE (Portable Executable)
Header Details. It is this header that is read by Resource Editors.

I found a good source for understanding the structure that is passed to
ThunRTMain and I suggest you read it if you are interested in knowing PE Header
Details. The link to the Article is given in the References [REFR] Section. The
article is titled "VISUAL BASIC REVERSED - A decompiling approach" and is
written by Andrea Geddon.
If the link is dead by the time you are reading this, you can contact me on my
email address to get the article.

VII. OUR FIRST PROGRAM [OFPR]

Create a Form with a CommandButton. Click the CommandButton and add a simple
Msgbox Code as shown below:

Private Sub Command1_Click()
Msgbox "Ssup"
End Sub

Open the Compiled EXE File with IDA Pro.
Click the Strings Tab to find the "Ssup" String.
Double-Click the String to find its cross-reference.
Scroll up to the top of the procedure.
You should see something like this:

Simply by looking at the entire procedure you can't exactly figure out what the
hell happens when the whole subroutine is executed. If you know Assembly well
and have had the patience to read through the code, you should notice a few neat
things in the code.

[XTRA]

Before I begin explaining the procedure, I want to teach you how to recognise a
procedure in Visual BASIC. They can be called Procedure Signatures.
1) A Procedure has the open and close Stack Frame instructions.
2) The First Procedure in a VB Program is always preceded by
12 0xCC Bytes (which corresponds to the INT 3 Instruction) followed by
4 'T' bytes (0xE9) followed by 12 0xCC bytes.
3) Procedures other than the first are preceded by 10 NOP(0x90) Instructions.

: 1) STACK FRAME:

The Open/Close Stack Frame Instructions are even found in C/C++ and Pascal
programs and hence can be termed as a universal method of determining procedures.
However that is not always the case.

--> Many compilers just JMP instructions to fake a Call Instruction. This Jump
is at times a CALL to a procedure. IDA Pro does not detect such
CALL 'emulating' instructions but OllyDebug does recognise such
code patterns.

--> Visual C++ allows the programmer to write naked functions. Naked functions
mean that the compiler does not allocate space for its arguments nor does it
include the stack open and close frame instructions.

But since we are dealing with Visual BASIC, we can ignore the second case. You
will see an example of the first case shortly.

: 2) THE 0xCC BYTE

The 0xCC Byte is used to Generate the INT 3 Exception, which is known as the
"CALL TO INTERRUPT" Procedure. It is used by Debuggers such as OllyDebug and
SoftIce to set software Breakpoints. Debuggers insert the 0xCC byte before the
instruction which it wants to set a breakpoint on. As soon as the INT 3
Instruction is executed, Control is passed onto the Debuggers Exception Handler.

Here is the description taken directly from Intel's Software Developers Manual
Volume 2 : Instruction Set Reference.
"The INT 3 instruction generates a special one byte opcode (CC) that is intended
for calling the debug exception handler. (This one byte form is valuable because
it can be used to replace the first byte of any instruction with a breakpoint,
including other one byte instructions, without over-writing other code). To
further support its function as a debug breakpoint, the interrupt generated with
the CC opcode also differs from the regular software interrupts as follows:
• Interrupt redirection does not happen when in VME mode; the interrupt is
handled by a protected-mode handler.
• The virtual-8086 mode IOPL checks do not occur. The interrupt is taken without
faulting at any IOPL level."

That's how debuggers work. That's also the concept of certain anti-debugging
techniques. Since the 0xCC code is injected by Debuggers before an instruction,
the CRC (Cyclic Redundancy Check) Value of the code also changes. Some
Antidebugging techniques encrypt the program with a key which is the CRC value
of the program. When a program is being debugged, its CRC value changes and with
the result the program doesn't get decrypted.
Such methods are effective in stopping amateur wannabe hackers from
understanding their code but its not foolproof and an expert hacker can get past
this technique with ease.

So much for what '0xCC' is. But why is it placed before the First Procedure in
VB Programs?
I've found no answer to that so far. This wastes a lot of space in a program.

If you try to disassemble a Console Program written in Visual C++, you'll find
many instructions which set parts of the stack to the '0xCC' value. You will
also find 0xCC bytes scattered across the disassembled listing.

If only Visual Studio was Open Source, we could have seen the code generation
code and come up with an answer and improve the code generation code too.
I hope you also realise why Open Source is slowly gaining momentum.

"Performs no operation. This instruction is a one-byte instruction that takes up
space in the instruction stream but does not affect the machine context,
except the EIP register.
The NOP instruction is an alias mnemonic for the XCHG EAX, EAX instruction."

This byte is injected into serial generation/checking procedures by amateur
hackers where the protection mechanism is weak. This is known as bit-hacking.
Sadly enough, bit-hacking STILL works for defeating plenty of today's Commercial
Applications. Guess they never realised the importance for code-security.

While writing programs in Assembly Language, if you use Forward Referencing in a
few situations or use a wrong Jump Instruction to jump to certain addresses,
chances are quite bright that the Assembler will fill in some bytes with the NOP
instruction.
As a result, having the presence of the 0x90 Instruction in your code is
considered bad programming.

But again, I see no reason why the 0x90 Byte is present in Visual BASIC.
Removing such entries will reduce the executable size drastically.

Programs like VBDE rely on such Procedure Signatures to identify where a
procedure is present.

[/XTRA]

Let us start by analyzing the procedure in portions.
First the Procedure opens the Stack Frame. Then it allocates 12 Bytes on the
stack for the Destructor and other variables. ( We shall see the Destructor in
detail after a short while. )
Then it allocates Dynamic Resources and calls the Zombie_AddRef Function.

What does the Zombie_AddRef Function do? It Takes the Object Reference.
In this function the parent object (in this case Form) is passed as a parameter
and uses AddRef to increment reference count of the object (instantiation).
Since COM objects are responsible for their lifetime, the resources they use are
allocated until the reference count is 0, when it reaches 0 the objects enter
zombie state & can be deallocated to free resources.
Refer COM object management documentation for more detailed information.

Right after the call of the Zombie_AddRef Function there are MOV instructions
which assigns values to many variables. That follows a reference to the "Ssup"
string followed by a call to the rtcMsgbox procedure.

Why does it seem so wierd? Shouldn't it simply call the rtcMsgbox Function?

Let us find out why in a little more interesting manner.
Intuition tells us that no matter what the function does, it will end up calling
the MessageBoxA or the MessageBoxW Function. So let's set a breakpoint on the
MessageBoxA and MessageBoxW Functions.

To do that, start OllyDebug and load the Executable file by pressing F3.
After the program is loaded, press Alt+E to open the Executable Modules window.
Double click USER32.DLL to open the disassembled listing of the User32.dll file.
From there press Ctrl+N to open the Imports/Exports window. Then Scroll over
till you see the MessageBoxA and MessageBoxW Functions. Click them one at a time
and press F2 to set a breakpoint.

Now press F9 to run the program. The Application should open. Click the
CommandButton. Now instead of the Debugger halting at a breakpoint of MessageBox,
the MessageBox comes up without any halt to the Debugger.

Why does this happen? Does this mean that rtcMsgBox has a seperate copy of the
MessageBox code within itself? Though it seems like a possible reason, it is
unlikely to happen as Microsoft Developers built the Windows API so that they
could be reused for performance. So that means that some API Function is called
which displays the MessageBox.
So let us try another experiment. In the same Imports/Exports Section of
User32.dll we see 2 more MessageBox functions which are MessageBoxIndirectA and
MessageBoxIndirectW. Let's try setting a breakpoint on both these Messages.

After the breakpoint is set, press F9, and click the Command Button.
This time, the Debugger halts at the MessageBoxIndirectA function.
Interesting isn't it? All Visual BASIC Applications which use the Msgbox()
Function are actually calls to MessageBoxIndirectA and not MessageBox as thought.

This is an important characteristic. So the Next time you set a breakpoint on
the MessageBox function and the debugger halts at a breakpoint, you can be
pretty sure that someone has used the MessageBox() API Directly by consulting
the API Text Viewer for the VB Declaration.

Let us now see the prototype of the MessageBoxIndirect() API Function.

Only One Parameter? So then how is the Message Body and Title passed to the
Function? For that we'll need to see the declaration of the MSGBOXPARAMS
Structure.

Private Type MSGBOXPARAMS
cbSize As Long
hwndOwner As Long
hInstance As Long
lpszText As String
lpszCaption As String
dwStyle As Long
lpszIcon As String
dwContextHelpId As Long
lpfnMsgBoxCallback As Long
dwLanguageId As Long
End Type

This suggests that the required parameters are assigned to variables and the
reference to that object is passed to that function.
So That suggests that the many MOV instructions found before the rtcMsgbox call
are used to initialise the MSGBOXPARAMS Structure.

To confirm our doubt, let's compare the MOV instructions with the code found
before the MessageBoxIndirect function is called.

Next comes the __vbaFreeVarList Function. From its name we can see that it
deallocates the address of a certain number of variables. This function actually
does no work except call the __vbaFreeVar Function multiple number of times.

Let us see how both functions work.

__vbaFreeVar : Frees a Temporary Variable.

__vbaFreeVar accepts only 1 Argument, which is the address of the variable to be
deleted. This argument is ALWAYS passed through ECX.
Uses the API Function __imp_SysFreeString()[Ordinal Number 6] from OLEAUT32.DLL
that carries out the actual deallocation of a variable.

The code is pretty easy to understand. This function frees temporary variables
that are passed as arguments to it.Interestingly each memory location is
16 bytes wide.
This is an interesting function as it can accept variable arguments.
It's equivalent function call in C would be:
__vbaFreeVarList(4,&var_24,&var_34,&var_44,&var_54);

As you can see, __vbaFreeVarList uses a while loop to free each variable one by
one using the __vbaFreeVar Function.
Notice that the address of the variable to be freed is stored in ECX always.
You can disassemble the __vbaFreeVar Function to confirm that.

Now let us see what happens when after the MessageBox is shown.
This is the most interesting part.

After the MessageBox is displayed, a clean up code is executed that deallocates
all the variables used in the entire procedure. Have a look at these statements
in the Command1_Click() Code.

This is the 'CALL-Simulation' instruction. If you recall, before a call function
is executed, the processor pushes the location of the instructions which are
supposed to receive control after execution of a function is over.
VB instead of issuing a call instruction simulates it using the push, jmp and
retn instructions. It is small sections of code like this that reduce Visual
BASIC's efficiency and performance.

Let us still see why this is done.

The CALL-Simulation Instruction calls the MSVBM60.Zombie_Release function.
This is the destructor code. Doesn't that remind you of something?
The instruction
mov [ebp+var_8], offset destructor
contains the offset of the destructor code. But is this so?
Double-click on the 'offset destructor' text and you'll land up here.

Hmm...this contains more offsets? By simply double-clicking the offsets you land
up at the destructor code again. That's why the CALL simulation code is used so
that the destructor code looks like its an inline function.

If you're more curious, you can also double-click the 'exception_handler' text
to see where that leads to.

Well, after a long journey into the Command1_Click() Procedure, we're finally
done analyzing it.

From this point onwards, I shall explain only the important section of code
rather than explain such intricate details once again.

Let us proceed further.
This Time let us create a Visual BASIC Application using only a module.
We shall use the Main Subroutine.

Use this code:

Sub Main()
MsgBox "Ssup"
End Sub

What you will realise that the Procedure code is an exact copy of the code we
dealt with earlier. This means that Form Procedures and Module Procedures are
treated alike. This also means that the Command Button code procedure had no
chance of using any information of the Form Object.

Let's take another example.

VIII. STRING COMPARISON [STR1]

Create a form without any controls. The code in the form module is as follows:

So much for what '0xCC' is. But why is it placed before the First Procedure in
VB Programs?

0xCC ('int 3' instruction) can be used for alignment purposes between functions. Because of Intel's x86 memory design, aligned memory access is significantly faster than unaligned access. When calling a function, the instruction stream latency is lower when the function is aligned to a certain value. Before the start of a function, 'int 3' instructions are inserted until the address is aligned to some value (from what I've seen, 16 seems to be the most common on x86). When executing, if the instruction pointer invalidly reaches the 0xCC alignment bytes, the program will crash with a protection fault or trigger a breakpoint if debugged.

Instruction alignment is also useful for loops or segments of code that are repeatedly executed. Sometimes you'll see a seemingly useless 'nop' instruction or a one or two byte instruction that will always be jumped past. For small loop routines, the entire block of code can be cached and the physical memory won't need to be accessed in order to read the instruction stream.

When executing, if the instruction pointer invalidly reaches the 0xCC alignment bytes, the program will crash with a protection fault or trigger a breakpoint if debugged.

Aah, I knew 0xCC served a purpose...Thanks.

deadbeefhash, on 17 Jan, 2007 - 01:07 AM, said:

Instruction alignment is also useful for loops or segments of code that are repeatedly executed. Sometimes you'll see a seemingly useless 'nop' instruction or a one or two byte instruction that will always be jumped past. For small loop routines, the entire block of code can be cached and the physical memory won't need to be accessed in order to read the instruction stream.

Yes, I am aware
NOP Slides are also used for Stack and Buffer Overflow Attacks.