compiler and metadata, request opinions...

originally I started writing all this to the mono people, but it is OT and I
doubt they would care...

so, here is the status:
I have partial/incomplete frontend support for both Java and C#, as well as
good old C.
I am actually using the same parser for all 3 languages (and C++ as well,
but this is a lower priority), where an internal "lang" variable is used to
remember which language is being processed at the time (and adapt behavior
as appropriate).

a lot of the runtime machinery (classes/objects/interfaces, structs,
exception handling, ...) is in place.

a lot of the upper/middle compiler machinery is still lacking (such as
support for all the above features...). so, C is still the only language
which currently works...

generics currently fill me with dread (C# and Java have them, C++ calls them
templates, but they look to be a horrible pain if/when I have to implement
them...).

currently, the existing path used for C is being "widened" to facilitate the
newer features, leading to (thus far) much internal reworking of the
compiler.

I have determined that, due to semantic and architectural issues, I can't
embed the metadata directly into the object modules (the reason being that
COFF and ELF modules are loaded as needed, but for technical reasons all of
the metadata needs to be available to the runtime prior to the linking
process).

further context:
portions of the runtime may register themselves with the linker, where a
request for a particular piece of information is embedded in a symbol (sort
of like in HTTP CGI requests), and so when a module is linked, the runtime
may recieve the request and generate any code or data necessary to fullfill
this request.

I had decided this approach was preferable to having masses of API calls in
the produced code for reasons of both implementing the compiler machinery
(having to manually manage and generate thunks to call into the API is
awkward), as well as performance (since, for example, the runtime can
generate much lighter-weight code, for example, when accessing static
members or methods, than when accessing instance members or virtual methods,
but this benefit would be lost when using API calls).

all this is because caching is used (most code is dynamically loaded at
runtime, and it is preferable to only recompile changed modules), and COFF
or ELF modules may be used for caching purposes, where collections of such
modules may be packaged in a GNU-AR library (ZIP is another possibility...).

after much mental debate, I settled on using a table-based structure for
storing metadata, although the exact contents of these tables differs
somewhat from .NET metadata (the structure and contents were more influenced
by Java .class files, but are organized in tables more like in .NET). these
tables are more or less based on the relational model (and are querried in a
similar manner), but differ in that rows may be referred to by index (not
technically allowed in relational databases, where one can usually only
refer to a row via its primary key or by its contents).

my primary reason for choosing tables was mostly related to the amount of
information likely to be present, and concern for memory overhead, where my
other options were S-Expressions and DOM trees, which although more
convinient, would have a much higher memory overhead (S-Exps would take
easily 2x to 3x more memory, and DOM trees far more, and I have many more
things in need of ram than metadata...).

externally, these tables are represented as line-orientated text files (it
is within the realm of possibility that these be stored in the AR/ZIP
libraries as well).

probably when loading the library, this text file would be checked for, and
if present, it will be loaded into an in-memory version of the database. as
needed, contents may be queried from the database, and used to build other
structures (such as the in-memory class contexts, ...). the reason for not
building all of these structures outright, is that it is likely that not all
of these classes/interfaces/namespaces/... may be needed, and an in-memory
class context will use more space than its representation in the tables.

so, yeah, the upper compiler when compiling an "assembly" may access
existing databases and query contents from them, and may produce a new
database representing the contents of the current assembly, which will be
stored along with the associated object modules.
.