"Jonathan Thornburg [remove -animal to reply]"
<jthorn@astro.indiana-zebra.edu> wrote in message> Philip Herron <herron.philip@googlemail.com> writes:>>From what i understand to Jit some kind of language, do your parsing>>into an IR maybe some optimizations to a code generator. So ok if we>>have successfully output target code for the processor would you use>>simply a system assembler or what i see some people have runtime>>assemblers to assemble this code, then where would this target code>>go?>>>>Then how do you link it in a useful way?>>>>And ultimately how would you execute this>> To reduce the reinvent-the-wheel overhead, you might consider using> a library to help with the mechanics of this. For example, I have> heard of (but not personally used) VCODE:> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.8634> http://pdos.csail.mit.edu/~engler/pldi96-abstract.html> http://www.stanford.edu/~engler/vcode-pldi.ps

<snip, blurb from site>

well, this does bring up a thought as to what is the generally "best"
strategy for general-purpose dynamic code generation.

personally, I had chosen to implement more or less all of the traditional
compiler stages in memory, since to me this seemed to make the most sense at
the time.

now, whether or not a textual assembler is too slow for practical use for
JIT is a bit of a debate. I contend that it has not been much of an issue in
my case.

the vast amount of time in my case tends to be more used in the main
compiler stages, particularly the preprocessor, parser, and upper-compiler,
followed by the codegen.

admitted, the situation might be different if the IL were generated and
JIT'ed at different times (rather than typically in series).

but, then again, I also do lookups via hashing (hell, maybe this is
"cheating" here or something).

so, maybe better than debating whether or not it is fast enough, one can
debate the value-added to the use of an assembler, vs directly crafting
machine instructions in the codegen (or similar).

so, the questions would be:
how much is cost or saved in writing the code generator?;
what is the relative cost of having written the assembler (in addition to
the codegen, ...);
how might this help or hurt portability?;
...

as for writing the code generator, I will assert that hand-crafting ASM is
much easier than working out the actual machine code, since at least in my
case I am much more familiar with ASM than with all of the possible opcode
encodings.
as well, using printf-like statements, a good number of opcodes can be
specified in the same space as would be needed for emitting the bytes
(although the Quake3 strategy of using hex strings would be an option).

the cost of writing the assembler does consist likely of the cost of writing
logic for all of the potentially bizarre encoding rules of a given
architecture (thinking here partly of x86 and x86-64), as well as some basic
parsing logic, opcode listings, and (potentially) logic for serializing
and/or deserializing object modules (such as ELF or COFF).

however, if not done in the assembler, this complexity would likely have
been offloaded into the codegen.

in my case, some parts of my assembler are autogenerated via special purpose
tools, which have been also adjusted and used for other ISAs (such as JBC
and CIL), as well as used in my x86 interpreter to help with the opcode
decoder (this part of the interpreter is essentially a somewhat modified
form of my disassembler logic).

admittedly, similar tools could be used to write parts of a codegen (which
produces raw machine code), although I personally suspect that this strategy
would be generally more complex.

as for portability:
well, it could be argued that now one has several components to target to a
new architecture: an assembler, which could require alterations and new
opcode listings in order to be able to target a new architecture, as well as
probably alterations in the parsing (such as to more closely match the
native ASM syntax for a given target, well, it is that or GAS'ing
everything...).

although, it is possible that a generic assembler could be written with
support for multiple archs, requiring the arch to be either inferred or
specified in the ASM code (as well as supporting multiple listings and
register sets, and possibly target-specialized instruction-encoding logic).
admitted, this would add a little complexity to the assembler.

as well, the compiler may require that many or most of the particular
instruction-fragments would need to be modified.

I will contend though that it would probably still be easier to rewrite
chunks of textual ASM (which in many cases may amount to simply changing
nmonics and other minor details), than it would be to essentially rework the
particular binary sequences for the new arch (or possibly even rework the
entire structure given sufficiently different ISA encodings, such as moving
from a byte-orientated ISA such as x86 to a VLIW arch such as IA-64 or
similar).

all this may well matter more than the raw performance of the JIT backend.

people may disagree with me here, but I am more just speaking from my own
narrow experience.