samuel (SAMIGWE@worldnet.att.net) wrote:: What are the steps an assembler takes during its pass(es)?

The front end of an assembler is much like any other compiler
front end. One has to perform the usual lexical and syntax analysis
steps. A recursive descent parser would be suitable in most cases.

Usually, assemblers make one or two passes. Two-pass assemblers
analyze the syntax and build the symbol table in the first pass and
generate object code in the second pass. Some errors like short
branches which are not in the allowed range may be detected during the
second pass, too.

One-pass assembler have to memorize where unresolved references occur
and backpatch them later (when the previously undefined symbols gets
defined). Personally, I find two-pass assemblers easier to implement,
but because they have to read the source program two times, they are
intrinsically slower.

: How is the mnemonic/opcode table arranged?

When writing an assembler for a processor with a large instruction set
like the 386, I would suggest to keep the mnemonics in an
alphabetically sorted list and perform a binary search.

: How is the label name and address stored (for later refrences)?

Unless you want to perform some type checking in the style of TASM, it
should be sufficient to store the name of each label, its address, its
segment (data, code, bss, etc), and its type (public, external, ...).
In a one-pass assembler, a list of references which still have to be
resolved must be kept, too. When this list is not empty, the symbol is
implicitly undefined. No additional flag is required in this case.

: How are strings processed -- meaning which does that assembler search: for first: mnemonics?: labels?: pseudoops?: etc?

Registers?

In any case, I would attempt to keep the patterns of these objects
totally different so that no conflicts can arise (for example, start
register names with a special character, like in the AT&T
notation). Where conflicts cannot be avoided in a simple way (like
labels<->mnemonics), mnemonics have to be searched first, because they
are a subset of labels.

: What is the best possible steps to take in order to maximize the amount: of processing of asm source and minimize the amount of time wasted?

Since assembly language programs grow very quickly, I would
concentrate on an efficient lexical and syntax analysis
procedure. Because typical programs waste a lot of symbol names for
local and global labels, an efficient symbol lookup routine is
important, too. When processing automatically generated code, for
example, one can make use of the common form of local labels. If all
local labels have the form

Lnnn

for example, where 'nnn' is a number, this number can be used for
directly indexing a separate symbol table. I have already used this
method successfully to speed up my own 8086 assembler.