http://llvm.org/bugs/show_bug.cgi?id=20049
Basically when you have a closure in a closure and the whole
thing get inlined, LLVM mess up, which result in compiler not
being able to optimize GC allocation away.
Probably worth pushing for. It does probably affect other
functional languages as well, but I didn't checked.

http://llvm.org/bugs/show_bug.cgi?id=20049
Basically when you have a closure in a closure and the whole
thing get inlined, LLVM mess up, which result in compiler not
being able to optimize GC allocation away.
Probably worth pushing for. It does probably affect other
functional languages as well, but I didn't checked.

http://llvm.org/bugs/show_bug.cgi?id=20049
Basically when you have a closure in a closure and the whole
thing get inlined, LLVM mess up, which result in compiler not
being able to optimize GC allocation away.
Probably worth pushing for. It does probably affect other
functional languages as well, but I didn't checked.

http://llvm.org/bugs/show_bug.cgi?id=20049
Basically when you have a closure in a closure and the whole thing get
inlined, LLVM mess up, which result in compiler not being able to optimize
GC allocation away.
Probably worth pushing for. It does probably affect other functional
languages as well, but I didn't checked.

http://llvm.org/bugs/show_bug.cgi?id=20049
Basically when you have a closure in a closure and the whole
thing get
inlined, LLVM mess up, which result in compiler not being
able to optimize
GC allocation away.
Probably worth pushing for. It does probably affect other
functional
languages as well, but I didn't checked.

http://llvm.org/bugs/show_bug.cgi?id=20049
Basically when you have a closure in a closure and the whole thing get
inlined, LLVM mess up, which result in compiler not being able to
optimize
GC allocation away.
Probably worth pushing for. It does probably affect other functional
languages as well, but I didn't checked.

Yeah, I did get that bit. I'm not sure of the optimisation though.
IMO, the closure/frame generation should occur *after* inlining.

How would that work if your inliner operates on some language-independent
IR?

I don't know LLVM to comment. But the way GCC operates at a higher
level so that all information is available to use (the inlined
function is just duplicated with all its parameters remapped into
variables, and the return expression is turned into an assignment to a
dedicated return-value variable).
Though the fact still is that the same is true with GDC, it's IR is
generated before optimisation passes.

How would that work if your inliner operates on some
language-independent
IR?

I don't know LLVM to comment. But the way GCC operates at a
higher
level so that all information is available to use (the inlined
function is just duplicated with all its parameters remapped
into
variables, and the return expression is turned into an
assignment to a
dedicated return-value variable).

You stated that closure/frame generation should occur after
inlining. I doubt that this is feasible to implement in the
current LDC architecture, and probably also in GDC (although I
don't know its internals well enough to be sure).
What we do in LDC, by the way, is just to optimize the closure GC
allocations into a stack allocation if we can prove the context
is not escaped after inlining. This happens in a custom
optimization pass on the IR level. deadalnix is presumably
talking about something very similar he is working on for SDC.
David

You stated that closure/frame generation should occur after
inlining. I doubt that this is feasible to implement in the
current LDC architecture, and probably also in GDC (although I
don't know its internals well enough to be sure).
What we do in LDC, by the way, is just to optimize the closure
GC allocations into a stack allocation if we can prove the
context is not escaped after inlining. This happens in a custom
optimization pass on the IR level. deadalnix is presumably
talking about something very similar he is working on for SDC.
David

Yes, but the problem is not limited to SDC. LDC exhibit the same
behavior (because it is an LLVM bug, not a SDC or LDC one).

Yes, but the problem is not limited to SDC. LDC exhibit the same
behavior (because it is an LLVM bug, not a SDC or LDC one).

Yes, certainly. To me, this looks like a limitation in GVN or so.
But coming back to the D side of things, do you have an actual D
test case showing the problem? The remaining load in your example
shouldn't be enough to trip up LDC's optimizer pass by itself,
but I'm rather certain that there might be more complex code with
missed optimization opportunities due to this.
David

Yeah, I did get that bit. I'm not sure of the optimisation
though.
IMO, the closure/frame generation should occur *after* inlining.

That doesn't really work that way for LLVM. You generate language
independent IR and optimizations passes run on it. The front can
add passes of its own in the optimization process to do language
dependent optimizations.

Yeah, I did get that bit. I'm not sure of the optimisation though.
IMO, the closure/frame generation should occur *after* inlining.

That doesn't really work that way for LLVM. You generate language
independent IR and optimizations passes run on it. The front can add passes
of its own in the optimization process to do language dependent
optimizations.

That is the final goal. A first goal should be:
int *i = new int;
*i = 42;
return 42;
That first step is supposed to be done by LLVM infra itself (and
it does for such a simple example, but if you multiply the new,
it gets confused). It is necessary because at this point, the
language specific pass will be able to detect that nobody ever
read from the allocated memory and that it doesn't escape, so it
can be optimized away.
If the first step do not happen, then the second step won't
either, and it cascade down to pretty stupid code generation.

That is the final goal. A first goal should be:
int *i = new int;
*i = 42;
return 42;
That first step is supposed to be done by LLVM infra itself (and
it does for such a simple example, but if you multiply the new,
it gets confused). It is necessary because at this point, the
language specific pass will be able to detect that nobody ever
read from the allocated memory and that it doesn't escape, so it
can be optimized away.
If the first step do not happen, then the second step won't
either, and it cascade down to pretty stupid code generation.

I just tried out doing something simple in gdc to see if I could
trigger this - got optimisation passes to compile it down to:
_d_allocmemory (16);
_d_allocmemory (16);
return 36;
Which is more than what I expected... it managed to const-fold all
operations into a single return, just haven't lost the (now) useless
GC allocations for the closures that were removed as dead code.