I have come across the source code for an HC08 assembler. The code
takes an assembly-style file that contains things like lda and sta and
assembles them into an s19 record that can be downloaded into an HC08.

I am looking to upgrade and speed up the code a little, and since I am
working on a Mac - I thought I might try my hand at writing a small
IDE in Cocoa.

What I am looking for is assistance in understanding how the assembler
works. For example, I know that multiple passes are made to locate
symbols and variables in memory and opcode lookups are performed. I
know that some of these things are generic to all assemblers/compilers
and some may be specific to an HC08

Does anyone here have any resources that might be useful for something
like this?

The canonical assembler is two-pass. On the first pass, it builds a symbol
table. The basic entry of a symbol table has the name of the symbol, its type
(constant or address), and the address (or value for a constant). Of course in
order to know what address to put in for the symbol, you have to figure out the
opcodes and how many bytes they take. You don't generate binary code on the
first pass. Once you have completed the first pass, all the symbols should now
be known and in the symbol table. On the second pass, you generate the code.

The reason for two passes is to look for forward references:

JMP LABEL399
.
.
LABEL399:
.

When the assembler gets to the JMP, it has no idea what the value of the label
is, so it has no idea what value to put in for the jump. You do know (usually)
how many bytes the JMP takes so you just update your location counter (which
keeps track of the current code address), add the label to the symbol table
(marked as a forward reference) and keep going. When you hit the label and
look it up, you can now fill in the symbol value and unmark it as a forward
reference. At the end of pass 1, you scan the symbol table for entries still
marked as a forward reference. These are errors. On pass 2, you can now
generate the code.

Some assemblers use 1 pass. They generate the code but put place holders where
forward reference values should go. They also have to keep track of where the
forward reference appeared. That forward reference might be part of an
expression, so you need to keep a reference to the expression (or to copy the
expression somewhere) so you can evaluate the expression once the forward
reference(s) is defined. When you can evaluate the forward reference, you go
back and fix up all those places in the code where you put place holders.

Some CPUs have a long jump instruction which uses the full address of the
destination and a shorter jump which uses an offset to the destination. Some
assemblers have their own jump operation which uses the short one when
possible. Such a thing makes it harder to do it in one pass (though probably
not impossible, you just have to fixup all the symbols that come after it in
the symbol table and backpatch any code that has already used those symbols).

As for books, the ones I have are probably out of print. You might try a search
on Amazon.

In addition to the basics that Gary provided, you may want to add
a preprocessor pass to provide a macro capability. In its most
simple form, a macro processor isn't much more than a piece of
code that replaces argument tokens with argument text. If you're
planning to use the assembler much, then macros may save a
substantial amount of time and effort.