well, basically, here is the issue:
my compiler architecture has somewhat drifted from what it was originally,
and I am essentially considering writing a new codegen (to hopefully
eliminate the current mess).

original architecture:
C was parsed into an XML-based AST;
this AST was translated into an RPN-based IL;
this RPN-based IL was translated into x86 assembler;
the ASM was assembled and linked.

the RPN->ASM stage was originally fairly direct, as most RPN ops mapped
fairly directly to the produced ASM (the RPN was actually largely laid out
in the order items would need to be in for x86, meaning arguments were
typically passed right-to-left, ...).

however, several changes have somewhat made a mess of things:
x86 and x86-64 are both supported, meaning that much internal hackery was
done to make the latter work.
as well, many internal changes have left much internal inconsistency, with
different parts of the codegen from different times in its development doing
things in different (and often incompatible) ways, often with kludges to
make the parts work together.

the frontend now deals with C, Java, C#, and a JavaScript variant, with many
HLL features partly hacked onto the existing codegen.

I have also gone to using a modified XML AST as the IL, meaning that now the
RPN stage is almost entirely pointless (some parts were migrated from the
compiler frontend to the codegen to make this work).

all of this has turned into a terrible mess in the codegen, and has
apparently moved beyond my ability to fully understand, much less debug, the
thing.

so, here is the current thinking:
likely, I can abandon the older codegen, and write a new codegen likely
starting from a clean slate...
note that the existing frontend will be kept, and most external code will
likely remain largely unchanged.

here is the considered architecture:
1. C/Java/C#/JS are parsed into language-specific ASTs;
2. the language-specific AST is converted into a Generalized AST (GAST),
which serves as the IL;
3. new codegen converts GAST to ASM;
4. the ASM was assembled and linked.

the goal in this case will not be for optimal performance in the output, but
rather for hopefully keeping the codegen relatively simple and flexible, so
that new features can be more easily added, and the thing can be debugged
without too much effort.

the AST's are still XML-based, and may be stored as either textual or binary
XML (I am mostly using a binary encoding internally, where textual XML will
be more considered as a debugging aide...).

most stuff within the compiler generally handles XML in a form similar to
that of DOM, and most compiler operations are done directly against this
XML-based form.

I am thinking it may be possible to structure the codegen as, essentially, a
top-down process driven directly by the AST.

I am thinking I may not need any additional IR stage (it is possible, but I
don't think may be necessary), nor will I need SSA or similar with the
considered design (something "similar" was going on in the RPN-based
codegen).

likely, this process will allocate "temporary variables" (tvars), and any
CPU registers will be mapped to the tvars behind the scenes (the AST
handling code will not likely directly see the CPU registers).

likely, there will be a partial split with the backend code more being an
API which is called directly by the front part of the codegen for emitting
things (such as adding 2 tvars and putting the result in a 3rd tvar, ...).
so, the frontend code will think mostly in terms of flattening out the AST
into a reasonable structural layout for the target, and for making calls.
the backend code will mostly think about things like mapping variables to
registers, managing stack locations, and producing the final ASM.

I am thinking that likely a single-pass code-generator will work, and my
current leaning is to always preserve any callee-save registers (mostly to
simplify a single-pass register allocator, and because the caller-save
registers are much more likely to need to be spilled into stack variables
whenever making calls and so will generally be reserved for things like
intermediate results of calculations, ...).

in this case, basically the codegen will maintain several print buffers (for
storing ASM code), and so when unfolding the AST, it will make calls which
essentially just print into these buffers.

there will be a buffer for the code which has already been processed and
gone into the ".text" section, as well others for ".data" and ".bss". all of
these will be merged into a single glob when generating the final ASM
(basically, these sections end-to-end with the appropriate ASM section
headers added, ...).

there will also be a buffer for the current function being compiled, but
will omit the prologue and epilogue, which will be emitted after the
function body is generated (the reason is so that I can generate the body
not having to know the final size of the stack frame, again as a
simplification measure).

function calls will also be handled (internally) as atomic operations, which
will take a list of tvars, put them into the correct places on the stack (or
into the correct registers), and call the function in question.

well, these is a lot more that could be written on this topic, but in all is
this a sane design?

is there anything in particular I should look at for maybe better ideas?