Instructions

Here are the most important instructions (in my opinion) that have been
available on all Intel processors since the 8086. Different assemblers
may have minor variations in how these instructions are represented in
assembly code; I give the NASM form here. Throughout this section, when
specifying the valid forms of operands, I will write reg8 to
stand for any 8-bit register, reg16 for any of the eight general-
and special-purpose 16-bit registers, mem8 for a memory reference
to a single byte, mem16 for a memory reference to a word (with
the low-order byte at the given address), imm8 for an 8-bit
immediate value, and imm16 for a 16-bit immediate value. If an
operand may be either a register or memory reference, I will write
r/m8 or r/m16; if it may also be an immediate value, then
I will write r/m/i8 or r/m/i16. A segment register as an
operand will be written segreg.

Data Movement Instructions
The fundamental data movement operation is MOV dest, source,
which copies a byte or a word from the source location to the
destination. In general, either the source or the destination must be a
register (you can't copy directly from one memory location to another
with MOV); the only exception is that an immediate value may be
moved straight to memory (however, there is no way to put an immediate
value into a segment register in one operation). Here are the accepted
forms:

The CS segment register may not be used as a destination (you wouldn't
want to do this anyway, since it would change where the next instruction
comes from; to get this effect, you need to use a proper flow control
instruction such as JMP).

To perform a swap of two locations instead of a one-way copy, there is
also an exchange operation:

XCHG reg8, r/m8
XCHG reg16, r/m16

As a special case of this that does nothing except occupy space and take
up processor time, the instruction to exchange the accumulator with
itself (XCHG AX, AX) is given the special "no-operation''
mnemonic:

NOP

For the special purpose of copying a far pointer (that is, a pointer
that includes a segment address, so that it can refer to a location
outside the current segment) from memory into registers, there are the
LDS and LES instructions. Here are the accepted forms:

LDS reg16, mem32
LES reg16, mem32

For example, the instruction LDS SI, [200h] is equivalent to the
pair of instructions MOV SI, [200h] and MOV DS, [202h].
The 8086 only supports loading the pointer into the DS or ES segment
register.

An operation that is frequently useful when setting up pointers is to
load the "effective address'' of a memory reference. That is, this
instruction does the displacement plus base plus index calculation, but
just stores the resulting address in the destination register, rather
than actually fetching the data from the address. Here is the only form
allowed on the 8086:

LEA reg16, mem

To push and pop data from the stack, the 8086 provides the following
instructions. The top of stack is located at offset SP within the stack
segment, so PUSH AX, for example, is equivalent to
SUB SP, 2 (recall that the stack grows downward) followed by
MOV [SS:SP], AX (except that [SS:SP] isn't a valid form of
memory reference).

PUSH r/m16
PUSH segreg
POP r/m16
POP segreg

As with MOV, you are not allowed to POP into the CS
register (although you may PUSH CS).

Although they were not provided on the original 8086, the instructions
to push and pop the FLAGS register (as mentioned earlier) are available
in Virtual-8086 mode on the Pentium (they were actually introduced in
the 80186):

PUSHF
POPF

Here are the other ways of reading or modifying the FLAGS register
(apart from setting flags as the result of an arithmetic operation, or
testing them with a conditional branch, of course). The Carry,
Direction, and Interrupt Enable flags may be cleared and set:

CLC
CLD
CLI
STC
STD
STI

The Carry flag may also be complemented, or "toggled'' between 0 and 1:

CMC

Finally, the bottom eight bits of the FLAGS register (containing the
Carry, Parity, Auxiliary Carry, Zero, and Sign flags, as described
above) may be transferred to and from the AH register:

LAHF
SAHF

Arithmetic and Logical Instructions
All of the two-operand arithmetic and logical instructions offer the
same range of addressing modes. For example, here are the valid forms
of the ADD operation:

Just as with the MOV instruction, the first operand is the
destination and the second is the source; the result of performing the
operation on the two operands is stored in the destination (if it gets
stored anywhere). Unlike MOV, most of these instructions also
set or clear the appropriate status flags to reflect the result of the
operation (for some of the instructions, this is their only
effect).

To add two numbers, use the ADD instruction. To continue adding
further bytes or words of a multi-part number, use the ADC
instruction to also add one if the Carry flag is set (indicating a
carry-over from the previous byte or word). For example, to add the
32-bit immediate value 12345678h to the 32-bit double word stored
at location 500h, do ADD [500h], 5678h followed by
ADC [502h], 1234h.

Subtraction is analogous: use the SUB instruction to subtract
a single pair of bytes or words, and then use the SBB ("Subtract
with Borrow'') instruction to take the Carry into account for further
bytes or words.

An important use of subtraction is in comparing two numbers; in this
case, we are not interested in the exact value of their difference, only
in whether it is zero or negative, or whether there was a carry or
overflow. The CMP ("Compare'') instruction performs this task;
it subtracts the source from the destination and adjusts the status
flags accordingly, but throws away the result. This is exactly what is
needed to get conditions such as LE to work; after doing
CMP AX, 10, for example, the status flags will be set in such a
way that the LE condition is true precisely when the value in AX
(treated as a signed integer) is less than or equal to 10.

The two-operand logical instructions are AND, OR,
XOR, and TEST. The first three perform the expected
bitwise operations; for example, the nth bit of the destination after
the AND operation will be 1 (set, true) if the nth bit of both the
source and the destination were 1 before the operation, otherwise it
will be 0 (clear, false). The TEST instruction is to AND
as CMP is to SUB; it performs a bitwise and operation, but
the result is only reflected in the flags. For example, after the
instruction TEST [321h], BYTE 12h, the Zero flag will be set if
neither bit 1 nor bit 4 (12h is 00010010 in binary,
indicating that bits 1 and 4 are to be tested) of the byte at address
321h were 1, otherwise it will be clear.

Multiplication and division are also binary operations, but the
corresponding instructions on the 8086 only allow one of the operands to
be specified (and it can only be a register or memory reference, not an
immediate value). The other operand is implicitly contained in the
accumulator (and sometimes also the DX register). The MUL and
DIV instructions operate on unsigned numbers, while IMUL
and IDIV operate on two's-complement signed numbers. Here are
the valid forms for MUL; the others are analogous:

MUL reg8
MUL BYTE mem8
MUL reg16
MUL WORD mem16

For 8-bit multiplication, the quantity in AL is multiplied by the given
operand and the 16-bit result is placed in AX. For 16-bit
multiplication, the 32-bit product of AX and the operand is split, with
the low word in AX and the high word in DX. In both cases, if the
result spills into the high-order byte/word, then the Carry and Overflow
flags will be set, otherwise they will be clear. The other flags will
have garbage in them; in particular, you will not get correct information
from the Zero or Sign flags (if you want that information, follow the
multiplication with CMP AX, 0, for example).

For division, the process is reversed. An 8-bit operand will be divided
into the number in AX, with the quotient stored in AL and the remainder
left in AH. A 16-bit operand will be divided into the 32-bit quantity
whose high word is in DX and whose low word is in AX; the quotient will
be in AX and the remainder will be in DX after the operation. None of
the status flags are defined after a division. Also, if the division
results in an error (division by zero, or a quotient that is too large),
the processor will trigger interrupt zero (as if it had executed
INT 0).

The CBW and CWD instructions, which take no operands, will
sign-extend AL into AX or AX into DX, respectively, just as needed
before performing a signed division. For example, if AL contains
11010110, then after CBW the AH register will contain
11111111 (and AL will be unchanged).

Multiplication and division by powers of two are frequently performed by
shifting the bits to the left or right. There are several varieties of
shift and rotate instructions, all of which allow the following forms:

The second operand specifies how many bit positions the result should be
shifted by: either one or the number in the CL register. For example,
the accumulator may be multiplied by 2 with SHL AX, 1; if CL
contains the number 4, the accumulator may be multiplied by 16 with
SHL AX, CL.

There are three shift instructions---SAR, SHR, and
SHL. The "shift-left'' instruction, SHL, shifts the
highest bit of the operand into the Carry flag and fills in the lowest
bit with zero. The "shift-right'' instruction, SHR, does the
opposite, moving zero in from the top and shifting the lowest bit out
into the Carry; this is appropriate for an unsigned division, with the
Carry flag giving a 1-bit remainder. On the other hand, the
"shift-arithmetic-right'' instruction, SAR, leaves a copy of the
highest bit in place as it shifts; this is appropriate for a signed
division, since it preserves the sign bit.

For example, -53 is represented in 8-bit two's-complement by the
binary number 11001011. After a SHL by one position, it will be
10010110, which represents -106. After a SAR, it will
be 11100101, which represents -27. After a SHR, it will
be 01100101, which represents +101 in decimal; this corresponds
to the interpretation of the original bits as the unsigned number 203
(which yields 101 when divided by 2).

When shifting multiple words by one bit, the Carry can serve as the
bridge from one word to the next. For example, suppose we want to
multiply the double word (4 bytes) starting at address 1230h by 2;
the instruction SHL WORD [1230], 1 will shift the low-order word,
putting its highest bit into the Carry flag. Now we need an instruction
that will shift the Carry into the lowest bit of the word at 1232h;
if we wanted to continue the process, we would also need it to shift the
highest bit of that word back out into the Carry. The effect here is
that the bits in the operand plus the Carry have been rotated one
position to the left. The desired instruction is RCL WORD [1232], 1
("rotate-carry-left''). There is a corresponding
"rotate-carry-right'' instruction, RCR; there are also two
rotate instructions which directly shift the highest bit down to the
lowest and vice versa, called ROL and ROR.

There are four unary arithmetic and logical instructions. The increment
and decrement operations, INC and DEC, add or subtract one
from their operand; they do not affect the Carry bit. The negation
instruction, NEG, takes the two's-complement of its operand,
while the NOT instruction takes the one's-complement (flip each
bit from 1 to 0 or 0 to 1). NEG affects all the usual flags, but
NOT does not affect any of them. The valid forms of operand are
the same for all of these instructions; here are the forms for
INC:

The string instructions facilitate operations on sequences of bytes or
words. None of them take an explicit operand; instead, they all work
implicitly on the source and/or destination strings. The
current element (byte or word) of the source string is at DS:SI, and the
current element of the destination string is at ES:DI. Each instruction
works on one element and then automatically adjusts SI and/or DI; if the
Direction flag is clear, then the index is incremented, otherwise it is
decremented (when working with overlapping strings it is sometimes
necessary to work from back to front, but usually you should leave the
Direction flag clear and work on strings from front to back).

To work on an entire string at a time, each string instruction can be
accompanied by a repeat prefix, either REP or one of REPE
and REPNE (or their synonyms REPZ and REPNZ).
These cause the instruction to be repeated the number of times in the
count register, CX; for REPE and REPNE, the Zero flag is
tested at the end of each operation and the loop is stopped if the
condition (Equal or Not Equal to zero) fails.

The MOVSB and MOVSW instructions have the following forms:

MOVSB
REP MOVSB
MOVSW
REP MOVSW

The first form copies a single byte from the source string, at address
DS:SI, to the destination string, at address ES:DI, then increments (or
decrements, if the Direction flag is set) both SI and DI. The second
form performs this operation and then decrements CX; if CX is not zero,
the operation is repeated. The effect is equivalent to the following
pseudo-C code:

while (CX != 0) {
*(ES*16 + DI) = *(DS*16 + SI);
SI++;
DI++;
CX--;
}

(recall that ES*16 + DI is the physical address corresponding
to the segment and offset ES:DI). The remaining two forms move a word
at a time, instead of a single byte; correspondingly, SI and DI are
incremented or decremented by 2 each time through the loop.

The STOSB and STOSW instructions are similar to
MOVSB and MOVSW, except the source byte or word comes from
AL or AX instead of the memory address in DS:SI. For example, the
following is a very fast way to initialize the block of memory from
ES:1000h to ES:4FFFh with zeroes:

Correspondingly, the LODSB and LODSW instructions are
variations on the move instructions where the destination is the
accumulator (instead of the memory address in ES:DI). These are not
very useful operations with the repeat prefix; instead, they are used as
part of larger loops to perform more complex string processing. For
example, here is a program fragment that will convert the NUL-terminated
string starting at the address in DX to be all lower-case (there is a
faster way to do the conversion of each character, using the
XLATB instruction, but that is not the point here):

None of the preceding string operations have any effect on the status
flags. By contrast, the remaining two string operations are executed
solely for their effect on the status flags, just like the
CMP operation on numbers. The CMPSB and CMPSW
operations compare the current bytes or words of the source and
destination strings by subtracting the destination from the source and
recording the properties of the result in FLAGS. The SCASB and
SCASW operations are the variants of this that use the
accumulator (AL or AX) for the source. Each of these may be preceded by
either of the repeat prefixes REPE or REPNE, which cause
the operation to be repeated up to CX times, as long as the condition
holds true after each iteration. Here is the corresponding pseudo-C for
REPE CMPSB:

Program Flow Instructions
All of the previous instructions execute sequentially; that is, when one
instruction finishes, the next instruction is taken from the very next
memory location. This is the default operation for the instruction
pointer, IP---after each byte of instruction is fetched, the IP is
incremented in preparation for the next fetch. The program flow
instructions provide the facilities to modify the course of execution,
allowing conditional execution (by jumping over parts of the code if
certain conditions are met) and looping (by jumping backwards in the
code).

The unconditional jump instruction, JMP, causes IP (and sometimes
CS) to be modified so that the next instruction is fetched from the
location given in the operand (the target). Here are the valid forms:

JMP SHORT imm8
JMP imm16
JMP imm16:imm16
JMP r/m16
JMP FAR mem32

The short version saves space when the target of the jump is within a
few dozen instructions forward or backward; the assembler computes the
difference between the new address and the next address sequentially,
and just stores this difference as one (signed) byte. The second (and
most common) version allows a jump to any location in the current code
segment, while the third allows a jump to any location in memory by also
specifying an immediate value to be loaded into CS. The fourth version
will take the target address from a register or memory location; since
this address is only 16 bits, the target has to be within the segment.
Finally, the far version fetches both the offset and the segment from
four consecutive bytes in memory (compare to the LDS and
LES instructions; JMP FAR mem32 could have been called
"LCS IP, mem32'').

The conditional jump instructions, Jcc, where cc is one of
the condition codes listed earlier (E, NE, ...),
perform a short jump if the condition is true, based on the current
contents of the status flags. For example, the code sample that was
given in the discussion of LODSB, to convert a string to
lower-case, used the JA and JB instructions; these
made their jump if the result of the previous comparison
found that the current character was above 'Z' or below 'A'. Since a
conditional jump can only be to a nearby target, it is sometimes
necessary to combine conditional and unconditional jumps as follows:

JNLE NoJLE
JMP target
NoJLE:

This will have the same effect as JLE target, except there is no
restriction on how far away the target may be (within the code segment).

There are two specialized versions of conditional jump that are
particularly useful when executing a loop a fixed number of times. The
looping statements

LOOP imm8
LOOPE imm8
LOOPNE imm8

(as usual, the synonyms LOOPZ and LOOPNZ are also
available) are very similar to the REP, REPE, and
REPNE prefixes from the string instructions. The LOOP
instruction decrements CX and makes a short jump if the count has not
reached zero. The LOOPE instruction adds the condition that it
will only take the jump if the Zero flag is set (usually indicating that
the last comparison had equal operands); the LOOPNE will only
take the jump if the Zero flag is clear. The string operation
REP MOVSB, for example, could have been performed with

Repeat MOVSB
LOOP Repeat

(except this would have been considerably slower, since it requires
repeatedly fetching and decoding the two instructions instead of just
fetching and decoding the single REP MOVSB instruction once).

After looping or repetitive string operations, it is occasionally
necessary to test whether the count register reached zero (to check
whether the loop ran for the full count or whether it exited early
because the Zero flag changed). The instruction

JCXZ imm8

serves exactly this purpose; it takes a short jump if the CX register
contains zero. It is short for performing CMP CX, 0 followed by
JZ imm8.

All of the above branching instructions are variations on the infamous
GOTO statement; they cause a permanent change in the course of
execution. To perform an operation more like a function or subroutine
call, where the flow of control will eventually return to pick up with
the next instruction, the 8086 provides two mechanisms:
CALL/RET and INT/IRET.

The CALL instruction offers a similar range of addressing modes
to the JMP instruction, except there is no "short'' call:

CALL imm16
CALL imm16:imm16
CALL r/m16
CALL FAR mem32

A call is the same as a jump, except the instruction pointer is first
pushed onto the stack (in the second and fourth versions, which include
a new segment, the current CS register is also pushed).

To reverse the effect of a CALL, when the subroutine is done it
should execute a RET or RETF instruction; this pops the
return address off of the stack and back into IP (and RETF also
pops the saved value of CS, to return from a far call). After the
return, the next instruction that will be fetched will be from the next
location after the CALL. There is an optional 16-bit immediate
operand that may be specified with a return instruction; this value is
added to the stack pointer after popping off the return address, to
recover however many bytes had been pushed onto the stack with
parameters before the call. For example, here is one way to implement a
subroutine to print a character, where the calling code first pushes the
character (as the low byte of a word, since there is no option to push a
single byte) before making the call:

This is just one of several common conventions for passing parameters to
subroutines; even more common is to just specify that, for example, the
character will be passed directly in the DL register.

The other function-call-like mechanism is the interrupt. We have
been using this all along to call the standard DOS services, such as
printing a character or a '$'-terminated string. The INT
instruction behaves much like the CALL FAR instruction except for two
things: it pushes the FLAGS register before pushing CS and IP (the idea
is that an interrupt should be able to completely restore the state of
the processor when it is finished, since this is also the mechanism used
for handling hardware interrupts from the rest of the system---they can
happen at any time, independent of what the processor might be working
on, and they should occur as transparently to the current process as
possible), and it gets the target address from a standard table of
interrupt handler vectors kept at the bottom of memory. When the
processor executes INT n, where n is an 8-bit immediate
value, it fetches a far pointer (that is, a 4-byte combination of
segment and offset) from the memory address 0000:4n; this is the
target address for the interrupt call. For example, the address of the
DOS interrupt handler, the routine called when INT 21h is
executed, is stored at locations 0000:0084 through
0000:0087; the first two bytes give the offset, to load into IP,
and the second two bytes give the segment, to load into CS.

To return from an interrupt handler, the IRET instruction is
used. It pops the IP, CS, and FLAGS registers, which causes the state
of the machine to return to where it left off when the interrupt
occurred.