If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Tutorial: Entering the world of x86 assembly.

Most the tutorial has been written already, but I have not included it all, as I will be tidying bits of it up as I post them in the forum.
There will be basically two sections two this:

1.) Introducing assembly language on its own
2.) Integrating it into your VB6 apps.

Well here is the start

Introduction To x86 32-bit Assembly Language

A FAQ:

Why?

A common question asked by many people who have not programmed in assembly language before. Common belief is that assembly programming is pretty much dead, and not useful. This is entirely wrong, as assembly programming has many benefits for a programmer even if he doesn’t use it!

Following the fact that compilers have grown exceptionally good in compiling, the developers behind the compiler themselves have to know assembly language well in order to implement new optimisation techniques. However, just leaving learning the assembly language to the compiler enthusiasts is folly, as the instruction sets of assembly language grows with each generation of new computers. This enables faster machine codes that are available with newer computers.

Now, compiler developers do not have all the time in the world to implement all the new instruction sets and provide new optimisations that work 100% efficiently and even some optimisations just cannot be programmed into a compiler. This creates the void between hand-written assembly and compiler output. Hand-written assembly by a moderately experienced assembly programmer can run from 10%-50% faster than compiled code.

Operating systems also need assembly language as some functionality can only be provided by assembly language. Speed is also of the essence in time critical portions of the system.

Faster? Aren’t computers fast enough already?

Many non-assembly programmers have brought this up.

If you just check the market and see the huge range of different x86 computer processors up for sale, you would notice the processors may be as cheap as £50 or as expensive as £400+. This shows that there is always a need for speed because there is great demand for CPUs that cost £400, to ensure that applications run as fast as possible.

There is always a ‘need for speed’ in programs, and if it is possible to obtain it without additional hardware costs, it should be done. Many scientific applications, large databases have time constraints and even games has a major need to speed optimisation as they need to provide smooth game play in order to immerse an user into a game. Take for example, a batch processed database file takes 6 hours (ie. One that contains millions and millions of records) will take just over 4 hours if there is a 30% improvement due to assembly optimisations. Just a 30% optimisation in speed may determine the playability of a game on a slow computer.

A Beginning

For a ‘dead’ language, there does seem to be a startling number of assemblers. More recently most the assemblers are available free and some are even open-source. The current free x86 assemblers available are:
MASM, FASM, RosASM, NASM, GoASM, TASM, LZASM, YASM

MASM (Microsoft Macro Assembler) is definitely the most popular as it has a good reputation as a fully featured assembler and has been available free for about 8 years.
However, FASM and YASM that have been developed in recent years possibly surpass the MASM feature set now. I, personally, use FASM that is well known as the fastest assembler and has extremely good macro capability. Another interesting fact is that FASM itself is written in assembly language, which can be compiled using itself.

The examples given use the FASM variant of the INTEL-syntax, which is slightly different from the MASM variant.

This tutorial will be focused upon 32-bit assembly programming, and the sample programs will be geared for windows. 16-bit assembly programming is relatively disused today as they have extremely tight memory constraints and require the use of difficult to manage segment registers.

The joy of hexadecimal numbers is every two digits can represent any combination of a byte, for example 0 in hex means 0 and FF in hex means 255.
You can manually change hexadecimals to decimal numbers, but to save tedious math – its easy enough to load up Windows Calc, which will do the conversion for you.

Ambiguity

There are various terms and words used in this presentation/tutorial, which have ambiguous definitions, so I will attempt to explain their usage.
Those highlighted in bold are the definitions used here.

Word – unsigned 2 byte integer or the Number of bits treated as a single unit by a CPU.
Dword – unsigned 4 byte integer or double a word

Statements

Assembly language makes use of mnemonics that represent a machine code instruction. An example of a mnemonic is ‘add’. Operands are like a parameter that a code instruction can take.

For example:
add destination, source

The instruction takes a number in ‘source’ and adds it to the number in ‘destination’ and the result is in ‘destination’. ‘source’ and ‘destination’ are known as operands. Operands can be registers or memory locations. However, in the x86 instruction set, there can be no more than one memory operand in an instruction.

Unlike high-level languages, which translate each statement into a number of different machine code instructions, there is a 1 to 1 relationship between an assembly statement and a single machine code instruction, providing full control to the programmer. A mnemonic like add is translated into an opcode (the machine code representation of a ‘operator’ of a instruction). Note: However, a single mnemonic can be translated into different opcodes, either because it translates to another opcode that does the same operation on different operands or that there is an opcode that does the same operation but has a shorter code length overall.

Re: Tutorial: Entering the world of x86 assembly.

Registers

A register is a temporary storage location for a small amount of data to be processed on the CPU. Performing operations on values stored in registers is much faster than performing the same operation on a memory location because the register is stored on the processor itself. Certain registers need to be preserved as they are global throughout a system, i.e. they must be saved before being used for a different purpose.

There are several different types of registers used in a 32-bit x86 computer as shown in the table below:

In general programming, EAX, ECX and EDX can be used anywhere and they do not need to be preserved. EBX, ESI and EDI can be used as well but they need to be preserved as they are used when calling the Windows API (Application Programmer’s Interface), which is a set of functions available to a user from the windows operating system.
EBP and ESP are generally used to preserve the stack and allocate space for local data to a routine. (More about this when we move to memory and stacks)

The AX refers to the lower 16-bits of EAX and likewise for all the other 16-bit registers. You cannot access the upper 16-bits of EAX (or any other 32-bit register) unless you manipulate the contents. AH and AL refer to the upper 8-bits of the AX register and lower 8-bits of the AX register respectively. This can be seen in the diagram below:

EAX 43 78 23 BB
AX 43 78 23BB
AH 43 78 23 BB
AL 43 78 23 BB

In 32-bit programming the registers below are not used by general application programming, except for FS which is used for structured error handling (which is not going to be covered in the this tutorial). Here they are for reference purposes only:

EIP contains a pointer to the instruction the processor is about to execute. So you can't use EIP as general-purpose register, but you can modify it to move to a different location in your program.
Memory

In Windows 32-bit assembly programming, the memory can be treated as a flat storage space (flat memory model). Thus, with a 32-bit unsigned integer, one can theoretically access 4 gigabytes of different memory locations (However, the maximum number of memory locations depends on how the operating system maps the memory addresses for each running process/program.)

On the x86-architecture, data stored in the memory is in the little endian format. This means that the right most byte is the most significant.

For example:
Offset 1 2 3 4 5 6 7
Data 4A F2 54 7F 3B 5 5C

If you read a 32-bit integer from offset 3,
The hexadecimal value would be, 053B7F54h

This does NOT mean that all the bytes are backwards, it just means that read integer values are reversed. (I.e. values read into registers are most significant from the right.)

Re: Tutorial: Entering the world of x86 assembly.

Will an ASM program work on both windows and linux machines? I have a few machines that run either windos and linux at various times and it would be neat to have some tools i can share between win/unix

Re: Tutorial: Entering the world of x86 assembly.

I really need to sort out the next installation of this tutorial - I got it written, but the tedious bit is trying to get the tables in - maybe i should make the tables images so they display properly

Yes x86 asm will allow cross-platform development to different x86 computers.
However you need to follow some cross-platform libraries like gtk(for GUI) & use them as you would in C. Then you would need to compile two different versions, an exe file for windows & an elf binary for linux.

FASM does this quite easily, MASM is a bit harder to do this ( you need to convert its output to make Linux executables). I can't remember about other assemblers as I'm have no experience in using them, but theoretically if they can't do it (like MASM) you can still convert their output using other tools.

However you can't write asm for PowerPCs( Macintosh), because they use a different machine language. In this case, you have to use C, and compile on different computers (using various cross-platform toolkits) or use Java (that works on all machines with a single binary).
I recommend C for most larger projects in this case as Java, despite being 'friendly' , is very resource heavy and sluggish in response.
Personally, I have not met a java program that does not possess the inherent sluggish feel to its interface, and slow load times . Perhaps on fast machines its not noticeable... but give me an asm program anyday

-----

If you just need a few tools to run on both windows and linux, look at Wine

This allows running of Windows programs on linux via an implementation of the windows API. Despite popular belief, Wine is not a emulator and for some programs they actually run faster on Linux than Windows!
Wine hasn't implemented DirectX fully yet, but most programs run fine - with just the odd misalignment and misdrawing.