Hello everybody.
Here comes the second round of inline asm discussion related to LDC,
the LLVM D Compiler.
Last time was about naked inline asm and the problems it poses for a
backend like LLVM.
Since revision 920 in our mercurial tree, naked inline asm support is
good enough that Don's Bigint code from Tango now works. This is a
great step forward...
I implemented it by using a feature in LLVM that allows you to insert
raw assembly at the codeunit level, and modified the asm processor to
support generating that as well. It wasn't that big a job really,
isn't completely finished yet, and still needs a lot of testing of
course ;)
Now Christian Kamm also finished the last ABI / calling-convention
bits we were missing on x86-32 Linux. This naturally lead us to try
out defining the "controversial" D_InlineAsm_X86 version identifier...
Now in Tango there's a bunch of code, like the following (copied from
tango.math.IEEE.d)
real ldexp(real n, int exp) /* intrinsic */
{
version(Really_D_InlineAsm_X86)
{
asm {
fild exp;
fld n;
fscale;
fstp ST(1), ST(0);
}
}
else
{
return tango.stdc.math.ldexpl(n, exp);
}
}
This code assumes that the value of ST(0) is preserved after the asm
block ends and that the compiler simply inserts a return instruction
as appropriate.
This doesn't work with LLVM. For the function to be valid in codegen,
we must insert a return instruction in the LLVM IR code after the
block, and the only choices we have for the value to return is an
undefined value. This kind of code usually works when the program
isn't optimized, however, if optimization is enabled, a caller of
ldexp will most likely notice that the return value is undefined or a
constant, and so has a lot of freedom to do what it wants. Breaking
the way the return value is received in the process.
This is almost exactly the same problem I had with naked inline asm,
and the only fix is to somehow generate an inline asm expression
(that's what llvm has, not statements like D), that produces the right
return value. Something a bit like:
return asm { ... }
Since D has no way to express this directly, it means we would have to
analyze the inline asm and somehow capture the right registers etc.
This is not something I want to implement right now, if ever...
The LLVM people are not interested in adding some kind of feature to
allow this, since the inline asm expressions already suffice for
normal GCC (which has inline asm expressions) C/C++ code.
Now the real question is, does this code even have well defined
semantics in terms of the D spec? and if not, could we possibly
specify it as implementation specific behaviour.
Everything is in place to specify the D_InlineAsm_X86 version
identifier in LDC, but a lot of asm still isn't going to work, due to
reasons like this.
I hope to hear some feedback on how to move on from here.
Thank you all,
Tomas Lindquist Olsen and the LDC Team.

Is the inline assembling actually done by the LLVM back end, or the LDC
front end?

The frontend turns it into an GCC-style asm statement (with explicit
input and output constraints) that shows up as a function literal in the
IR (only valid as target of a 'call' instruction).
The LLVM codegen then uses those constraints to allocate registers,
substitutes them in the asm string and emits it directly to the
assembler as part of the output[1]. (LLVM, like GCC, normally uses an
external assembler)
[1]: With a few exceptions, IIRC. For example, some part of LLVM turns
single "bswap"s into an intrinsic llvm.bswap.i<bitsize>() call to help
analyses, optimizers and the JIT since they don't generally support
inline asm otherwise.

Now in Tango there's a bunch of code, like the following (copied from
tango.math.IEEE.d)
real ldexp(real n, int exp) /* intrinsic */
{
version(Really_D_InlineAsm_X86)
{
asm {
fild exp;
fld n;
fscale;
fstp ST(1), ST(0);
}
}
else
{
return tango.stdc.math.ldexpl(n, exp);
}
}
This code assumes that the value of ST(0) is preserved after the asm
block ends and that the compiler simply inserts a return instruction
as appropriate.
This doesn't work with LLVM. For the function to be valid in codegen,
we must insert a return instruction in the LLVM IR code after the
block, and the only choices we have for the value to return is an
undefined value. This kind of code usually works when the program
isn't optimized, however, if optimization is enabled, a caller of
ldexp will most likely notice that the return value is undefined or a
constant, and so has a lot of freedom to do what it wants. Breaking
the way the return value is received in the process.
This is almost exactly the same problem I had with naked inline asm,
and the only fix is to somehow generate an inline asm expression
(that's what llvm has, not statements like D), that produces the right
return value. Something a bit like:
return asm { ... }
Since D has no way to express this directly, it means we would have to
analyze the inline asm and somehow capture the right registers etc.
This is not something I want to implement right now, if ever...

Is it really that hard? Can't you just detect this case (non-void
function without a 'return' at the end but with inline asm inside)?
Since the compiler should know the calling convention[1], the register
that will contain the return value of the function should be a simple
lookup (based on target architecture, cc and return type).
Just add that register as an output of the inline asm and return it...
It gets a bit trickier with things like
-----
if (cpu.hasFeatureX())
asm { ... }
else
asm { ... }
-----
of course, but storing the value of the register in question into a
hidden variable and returning its value at the end shouldn't be that hard...
In other words, change every inline asm in a qualifying function to add
an output of the "return register", store its value into an alloca'd
stack slot and load & return it at the end of the function.
[1]: Given that LLVM normally handles this, this probably requires an
extra lookup table in LDC that needs to be kept up-to-date.

Is it really that hard? Can't you just detect this case (non-void
function without a 'return' at the end but with inline asm inside)?
Since the compiler should know the calling convention[1], the register
that will contain the return value of the function should be a simple
lookup (based on target architecture, cc and return type).
Just add that register as an output of the inline asm and return it...

dmd doesn't attempt to figure out which register is the return value. It
just assumes that the registers specified by the ABI for the function's
return type have the proper return value in them.

Is it really that hard? Can't you just detect this case (non-void
function without a 'return' at the end but with inline asm inside)?
Since the compiler should know the calling convention[1], the register
that will contain the return value of the function should be a simple
lookup (based on target architecture, cc and return type).
Just add that register as an output of the inline asm and return it...

dmd doesn't attempt to figure out which register is the return value. It
just assumes that the registers specified by the ABI for the function's
return type have the proper return value in them.

That isn't an option for LDC, which is why I suggested another approach.

Is it really that hard? Can't you just detect this case (non-void
function without a 'return' at the end but with inline asm inside)?
Since the compiler should know the calling convention[1], the
register that will contain the return value of the function should be
a simple lookup (based on target architecture, cc and return type).
Just add that register as an output of the inline asm and return it...

dmd doesn't attempt to figure out which register is the return value.
It just assumes that the registers specified by the ABI for the
function's return type have the proper return value in them.

That isn't an option for LDC, which is why I suggested another approach.

What's the difference? Walter's approach assumes there's a "return EAX;"
at the end of every function returning an int, for example; your
approach seems to be to add it.

Is it really that hard? Can't you just detect this case (non-void
function without a 'return' at the end but with inline asm inside)?
Since the compiler should know the calling convention[1], the
register that will contain the return value of the function should
be a simple lookup (based on target architecture, cc and return type).
Just add that register as an output of the inline asm and return it...

dmd doesn't attempt to figure out which register is the return value.
It just assumes that the registers specified by the ABI for the
function's return type have the proper return value in them.

That isn't an option for LDC, which is why I suggested another approach.

What's the difference? Walter's approach assumes there's a "return EAX;"
at the end of every function returning an int, for example; your
approach seems to be to add it.

His approach depends on DMD directly emitting x86 machine code, so it
can just emit 'RET' and be done with it.
LDC on the other hand needs to emit LLVM asm, which requires it to
specify an explicit return value. My approach is a way to extract that
return value from the inline asm, allowing it to emulate DMD behavior
within the LLVM IR.

Is it really that hard? Can't you just detect this case (non-void
function without a 'return' at the end but with inline asm inside)?
Since the compiler should know the calling convention[1], the register
that will contain the return value of the function should be a simple lookup
(based on target architecture, cc and return type).
Just add that register as an output of the inline asm and return it...

just assumes that the registers specified by the ABI for the function's
return type have the proper return value in them.

at the end of every function returning an int, for example; your approach
seems to be to add it.

just emit 'RET' and be done with it.
LDC on the other hand needs to emit LLVM asm, which requires it to specify
an explicit return value. My approach is a way to extract that return value
from the inline asm, allowing it to emulate DMD behavior within the LLVM IR.

I had really hoped I didn't have to do something like this, but I
can't come up with a better approach. I just hope it actually works
when I'm done ...
Also I have no idea if code quality is going to be optimal. I imagine
people write code like this for efficiency, if LLVM adds extra
instructions there is little point in writing code like this for LDC,
and we'd want to version things in any case, providing a true naked
version for LDC. In this case I'm not sure it's worth it to actually
do this work in the first place.

The only reason a function like this isn't written as naked, is so that
it has a chance to be inlined. If that's impossible with this syntax on
all compilers, there doesn't seem much point - it might as well be illegal.
If D provided a "return EAX,EDX;" fake asm instruction, would inlining
be possible?

The approach Fritz mentions should still allow inlining. Having a fake

Why do people keep performing s/s/z/ on my name? :(

Clearly, changing your name iz the eaziezt zolution.
--
Simen

The European Commission have just announced an agreement
whereby English will be the official language of the EU, rather than
German, which was the other possibility. As part of the
negotiations, Her Majesty's government conceded that English spelling
had some room for improvement and has accepted a five year phase in plan
that would be known as "EuroEnglish".
In the first year, "s" will replace the soft "c". Sertainly, this
will make the sivil servants jump for joy. The hard "c" will be dropped
in favour of the "k". This should klear up konfusion and keyboards
kan have 1 less letter.
There will be growing publik enthusiasm in the sekond year, when
the troublesome "ph" will be replaced with the "f". This will make
words like "fotograf" 20% shorter.
In the third year, publik akseptanse of the new spelling kan
be expekted to reach the stage where more komplikated changes are
possible. Governments will enkorage the removal of double letters,
which have always ben a deterent to akurate speling. Also, al wil agre
that the horible mes of the silent "e"s in the language is disgraseful,
and they should go away.
By the 4th year, peopl wil be reseptiv to steps such as replasing
"th" with "z" and "w" with "v".
During ze fifz year, ze unesesary "o" kan be dropd from
vords kontaining "ou" and similar changes vud of kors be aplid
to ozer kombinations of leters. After zis fifz year, ve vil hav a
realy sensibl riten styl. Zer vil be no mor trubls or difikultis and
evrivun vil find it ezi to understand each ozer
ZE DREAM VIL FINALI KUM TRU!
<http://lib.ru/ENGLISH/rekonstr.txt>

His approach depends on DMD directly emitting x86 machine code, so it
can just emit 'RET' and be done with it.
LDC on the other hand needs to emit LLVM asm, which requires it to
specify an explicit return value. My approach is a way to extract that
return value from the inline asm, allowing it to emulate DMD behavior
within the LLVM IR.

Ok, so why not, for a function that returns an int, simply have the
compiler silently tack on the LLVM equivalent of "return EAX"?

His approach depends on DMD directly emitting x86 machine code, so it
can just emit 'RET' and be done with it.
LDC on the other hand needs to emit LLVM asm, which requires it to
specify an explicit return value. My approach is a way to extract that
return value from the inline asm, allowing it to emulate DMD behavior
within the LLVM IR.

Ok, so why not, for a function that returns an int, simply have the
compiler silently tack on the LLVM equivalent of "return EAX"?

I've created bugzilla 2648 for one of the issues -- struct returns are a
particular nuisance. (I think you may encounter this on DMD-Mac).

His approach depends on DMD directly emitting x86 machine code, so it
can just emit 'RET' and be done with it.
LDC on the other hand needs to emit LLVM asm, which requires it to
specify an explicit return value. My approach is a way to extract that
return value from the inline asm, allowing it to emulate DMD behavior
within the LLVM IR.

Ok, so why not, for a function that returns an int, simply have the
compiler silently tack on the LLVM equivalent of "return EAX"?

Because LLVM doesn't allow specification of hardware registers in the
IR. Everything must be a virtual register. The way I proposed LDC
implement this is basically to tell the inline-asm IR "Put EAX in
%virtual-eax" and to then return that register.
It will in all likelyhood have the same effect though, assuming a tiny
bit of optimization and a minimally competent register allocator.

LDC on the other hand needs to emit LLVM asm, which requires it to specify
an explicit return value. My approach is a way to extract that return
value from the inline asm, allowing it to emulate DMD behavior within the
LLVM IR.

Sorry, perhaps I'm missing something: Why should you have to deduct that
from the asm? Doesn't the function prototype give enough information? If the
function returns "int/uint/...", assume "eax"; if it returns
"float/double/..." assume "st(0)", etc....
L.

LDC on the other hand needs to emit LLVM asm, which requires it to
specify an explicit return value. My approach is a way to extract that
return value from the inline asm, allowing it to emulate DMD behavior
within the LLVM IR.

Sorry, perhaps I'm missing something: Why should you have to deduct that
from the asm? Doesn't the function prototype give enough information? If
the function returns "int/uint/...", assume "eax"; if it returns
"float/double/..." assume "st(0)", etc....

LLVM IR doesn't know about hardware registers, except when dealing with
inline asm. So if you need to know the value a hardware register has at
the end of some inline asm, you need to tell that asm to "return" it
into a virtual register that you can actually use in regular IR (such as
returning it from a function).

LDC on the other hand needs to emit LLVM asm, which requires it to
specify an explicit return value. My approach is a way to extract
that return value from the inline asm, allowing it to emulate DMD
behavior within the LLVM IR.

Sorry, perhaps I'm missing something: Why should you have to deduct
that from the asm? Doesn't the function prototype give enough
information? If the function returns "int/uint/...", assume "eax"; if
it returns "float/double/..." assume "st(0)", etc....

LLVM IR doesn't know about hardware registers, except when dealing with
inline asm. So if you need to know the value a hardware register has at
the end of some inline asm, you need to tell that asm to "return" it
into a virtual register that you can actually use in regular IR (such as
returning it from a function).

I think I might just sortof maybe kinda understand the problem now.
So I take some of many options and consider the consequences:
- Don't put any return statement into the IR. EAX/st(0)/etc has the
return value so don't bother. Consequence: LLVM errors, there MUST be
a return value represented in IR.
- Put some stupid return statement into the IR, like return 0.
Consequence: Programmer places result into EAX. LLVM generated code
places 0 into EAX. Function always returns 0. Whoops.
- Mark the function as returning void. Now we don't need to put a
return into the IR. Consequence: User writes something like auto bar =
foo();. foo contains inline ASM. But what is assigned to bar in the IR
code? There is no way to tell the IR to assign a value from
EAX/st(0)/whatever. EAX is set to the correct value when foo()
returns, but there is no way to USE it, so it just floats around
uselessly until it is overwritten by something else.
- Casting fun. So the function returns an integer. Well, return a
floating point value NaN instead. EAX still gets set to the correct
value due to the inline ASM. Consequences: Similar problem as before:
int bar = foo(); violates type safety since we've rewritten foo() so
that LLVM thinks it returns a float. I don't know if the IR has any
loopholes, but if it does then maybe there is some snowball's chance in
hell of making it work anyways.
I hope I understand this correctly. It seems like the problem at hand
is difficult to communicate and thus stomps useful dialog :(

LDC on the other hand needs to emit LLVM asm, which requires it to
specify an explicit return value. My approach is a way to extract
that return value from the inline asm, allowing it to emulate DMD
behavior within the LLVM IR.

that from the asm? Doesn't the function prototype give enough
information? If the function returns "int/uint/...", assume "eax"; if
it returns "float/double/..." assume "st(0)", etc....

inline asm. So if you need to know the value a hardware register has at
the end of some inline asm, you need to tell that asm to "return" it
into a virtual register that you can actually use in regular IR (such as
returning it from a function).

I think I might just sortof maybe kinda understand the problem now.

I hope I understand this correctly. It seems like the problem at hand
is difficult to communicate and thus stomps useful dialog :(

That seems to be a pretty good summary of what's wrong with most of the
alternatives, yes.
You missed one though, that Lindquist mentioned: they could also return
a special "undefined value" (which LLVM supports, and means "I don't
care what it is") and the return value would (in practice) be whatever
was in the relevant register at the time *if no optimizations are run*.
The problem is that optimizations can see "Hey, that function only ever
returns one value (or returns either a normal value or an undefined
value)" and change all places where the return value is used with that
one value. This would break the asm + ret undef, yet be a perfectly
valid optimization according the semantics of LLVM IR.
Luckily, inline asm is treated as a function literal in LLVM, and it can
return one or more values to the caller if the constraints string
specifies which registers will contain them. So if LDC just specifies
(e.g.) EAX/EDX:EAX/ST(0) to contain the result of the inline asm, it can
get the value in the register(s) in question as an LLVM value that can
be returned without any problem.
The only really tricky bits are (a) figuring out how the constraints
string works, exactly[1] and (b) figuring out which register(s) the
return value should be in.
[1]: There's no documentation that I'm aware of (unless it was added
very recently) other than the LLVM(-GCC) source and llvm-gcc output when
compiling code containing extended asm (which is similar but not
identical to LLVM-style inline asm, and documented pretty well at
http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html). The similarity is
not an accident as the main requirement in the inline asm design for
LLVM was probably "support extended asm in llvm-gcc" :).

Luckily, inline asm is treated as a function literal in LLVM, and it can
return one or more values to the caller if the constraints string
specifies which registers will contain them. So if LDC just specifies
(e.g.) EAX/EDX:EAX/ST(0) to contain the result of the inline asm, it can
get the value in the register(s) in question as an LLVM value that can
be returned without any problem.
The only really tricky bits are (a) figuring out how the constraints
string works, exactly[1] and (b) figuring out which register(s) the
return value should be in.

You understand the LLVM better than anyone else here <g>, so I suggest
that you pick what you think will work best, and leave it at that for
now. I don't think there's a good reason to get too stuck on this. If a
better solution emerges during testing, it can be corrected.
For example, in my work on the OSX version of dmd, it turns out that OSX
requires that the stack be aligned on 16 bytes whenever a function gets
called. (If it isn't so aligned, the program crashes with a misaligned
stack fault exception.) This naturally affects all 'naked' inline
assembly, as well as all function calls made from inline assembly. I
don't think there's any hope for the compiler automatically fixing this,
and more importantly, the compiler *should not* automatically fix it.
When you use inline assembler, you've got to expect that it won't be
very portable.
I've gone through and corrected all the inline assembler in Phobos for
this. There isn't much of it, and the fixes aren't difficult. It just
comes with the territory of using inline assembler.
Another ABI difference is that on windows, reals take up 10 bytes. On
linux, it's 12. On OSX, it's 16. The hardware operands still take up
only 10, the rest is padding.
What is expected from the inline assembler, however, is that the syntax
of the instructions remains the same, and it looks like you've got that
one nailed.

Luckily, inline asm is treated as a function literal in LLVM, and it
can return one or more values to the caller if the constraints string
specifies which registers will contain them. So if LDC just specifies
(e.g.) EAX/EDX:EAX/ST(0) to contain the result of the inline asm, it
can get the value in the register(s) in question as an LLVM value that
can be returned without any problem.
The only really tricky bits are (a) figuring out how the constraints
string works, exactly[1] and (b) figuring out which register(s) the
return value should be in.

You understand the LLVM better than anyone else here <g>, so I suggest

I just happened to remember the broad strokes of how inline asm works in
LLVM[1], and saw the relevance to this discussion. I'm not sure if that
qualifies as knowing LLVM as a whole better than anyone else here.
[1]: i.e. "like gcc extended asm, but with different syntax"

that you pick what you think will work best, and leave it at that for
now. I don't think there's a good reason to get too stuck on this. If a
better solution emerges during testing, it can be corrected.

Lindquist says (on IRC) that he has already implemented this on his
local machine and it's working quite nicely (for x86).
He hasn't pushed it to the dsource repository yet; it requires people
compiling from there to update their LLVM since apparently with LLVM 2.4
the EDX:EAX 64-bit int constraint doesn't work. Updating to the 2.5
branch fixes it though, so it looks like LDC will be requiring v2.5 soon.
Porting to other architectures shouldn't be too hard either. The code
that generates the constraint string for x86 is pretty clean and short
so adding (e.g.) x86-64 support should only require a bit of
copy+paste+edit after some research into calling conventions.

Is it really that hard? Can't you just detect this case (non-void
function without a 'return' at the end but with inline asm inside)?
Since the compiler should know the calling convention[1], the
register that will contain the return value of the function should be
a simple lookup (based on target architecture, cc and return type).
Just add that register as an output of the inline asm and return it...

dmd doesn't attempt to figure out which register is the return value.
It just assumes that the registers specified by the ABI for the
function's return type have the proper return value in them.

approach.

What's the difference? Walter's approach assumes there's a "return EAX;"
at the end of every function returning an int, for example; your
approach seems to be to add it.

Is it really that hard? Can't you just detect this case (non-void
function without a 'return' at the end but with inline asm inside)?
Since the compiler should know the calling convention[1], the register
that will contain the return value of the function should be a simple lookup
(based on target architecture, cc and return type).
Just add that register as an output of the inline asm and return it...

dmd doesn't attempt to figure out which register is the return value. It
just assumes that the registers specified by the ABI for the function's
return type have the proper return value in them.

That isn't an option for LDC, which is why I suggested another approach.

What's the difference? Walter's approach assumes there's a "return EAX;"
at the end of every function returning an int, for example; your approach
seems to be to add it.

His approach depends on DMD directly emitting x86 machine code, so it can
just emit 'RET' and be done with it.
LDC on the other hand needs to emit LLVM asm, which requires it to specify
an explicit return value. My approach is a way to extract that return value
from the inline asm, allowing it to emulate DMD behavior within the LLVM IR.

I had really hoped I didn't have to do something like this, but I
can't come up with a better approach. I just hope it actually works
when I'm done ...
Also I have no idea if code quality is going to be optimal. I imagine
people write code like this for efficiency, if LLVM adds extra
instructions there is little point in writing code like this for LDC,
and we'd want to version things in any case, providing a true naked
version for LDC. In this case I'm not sure it's worth it to actually
do this work in the first place.

Is it really that hard? Can't you just detect this case (non-void
function without a 'return' at the end but with inline asm inside)?
Since the compiler should know the calling convention[1], the
register
that will contain the return value of the function should be a simple
lookup
(based on target architecture, cc and return type).
Just add that register as an output of the inline asm and return
it...

dmd doesn't attempt to figure out which register is the return value.
It
just assumes that the registers specified by the ABI for the
function's
return type have the proper return value in them.

That isn't an option for LDC, which is why I suggested another
approach.

What's the difference? Walter's approach assumes there's a "return EAX;"
at the end of every function returning an int, for example; your
approach
seems to be to add it.

His approach depends on DMD directly emitting x86 machine code, so it can
just emit 'RET' and be done with it.
LDC on the other hand needs to emit LLVM asm, which requires it to
specify
an explicit return value. My approach is a way to extract that return
value
from the inline asm, allowing it to emulate DMD behavior within the LLVM
IR.

I had really hoped I didn't have to do something like this, but I
can't come up with a better approach. I just hope it actually works
when I'm done ...
Also I have no idea if code quality is going to be optimal. I imagine
people write code like this for efficiency, if LLVM adds extra
instructions there is little point in writing code like this for LDC,
and we'd want to version things in any case, providing a true naked
version for LDC. In this case I'm not sure it's worth it to actually
do this work in the first place.

The only reason a function like this isn't written as naked, is so that it
has a chance to be inlined. If that's impossible with this syntax on all
compilers, there doesn't seem much point - it might as well be illegal.
If D provided a "return EAX,EDX;" fake asm instruction, would inlining be
possible?

The approach Fritz mentions should still allow inlining. Having a fake
asm instruction like that could make it a bit simpler to implement
this though, since it would be up to the programmer to know the ABI,
not our asm translator frontend. Otherwise it seems to me to be the
same thing really.
At the moment, LDC won't inline anything containing inline asm, but
this restriction could be loosened a bit. The reason we disable
inlining right now, is that if the asm contains labels, and the
function is inlined, LLVM doesn't rewrite the labels, and thus you
might get conflicting labels when you get to assembling. I asked on
the LLVM IRC channel about this, but it's probably not going to be
fixed. The argument was that GCC has the same restriction for extended
inline asm expressions, if you use labels, you must also manually mark
the function with a never-inline function attribute. This might change
when LLVM gets its own assembler I guess..
Another thing is that if inlining is the main reason for functions
like these, perhaps it would be better to somehow get this
optimization into LLVM itself? There is already a pass that tries to
lower common C library function calls...
Yet another thing about inlining with LDC is that, currently the DMD
inliner is disabled. Some of the AST rewrites it does broke our
codegen last time I tried, and we simply haven't tried turning it back
on since. This means that LDC will only inline when it has access to a
LLVM IR representation of the function, this basically means that only
functions from the same module will be inlined, or template functions
- which are always emitted. This is going to change once we get proper
LTO support into LDC, and for now people can still compile to .bc
files instead of .o, and link manually using LLVM tools to get this
feature, so it's not that critical imho.
I guess I'll investigate how much LLVM can help with providing me the
register details to implement something that works automagically... It
just feels wrong to have to duplicate all that information...
</end rant>

The approach Fritz mentions should still allow inlining. Having a
fake

Oh come now, Jaret.
I distinctly remember forgetting an 'r', so I'm not part of /that/ group! :D
The extra 'r' and 't' are superfluous anyway. They don't make any useful
sounds. Although, there is the slight possibility that they affect syllable
stress, I suppose. ;D
-JJR