Re: [Sbcl-devel] Contributing, peephole optimizer

On 30 August 2010 16:46, Nathan Froyd <froydnj@...> wrote:
> However, if the above instructions come from VOPs that don't match
> that pattern, then we can't usefully combine them.
Right -- and I think the running optimizations across VOP borders is
potentially a big win.
Though the last time I looked, IIRC most of optimizable sequences
coming from multiple vops require constructing a flow-graph -- which
in turn requires annotating instructions with side-effects. Of course
that doesn't have to be a huge one-time effort: the default for
unannotated instructions can be "don't know, so don't assume
anything".
...but I would leave that till local sequences that don't need
liveness information have been implemented. :)
Anyways, Red, hugely cool that you are working on this.
Cheers,
-- Nikodemus

Thread view

Hi,
I've been working on implementing a peephole optimizer for SBCL.
(http://sbcl-internals.cliki.net/Peephole%20Optimizer)
I've done a lot of Google searching for 'prior art' with regard to this
optimization, but I couldn't find much other than that it was supposed to be
not easy.
So far it is going pretty well, as I've got hooks in the code to buffer up
all of the assembly (as structs used by the 'scheduler', 'label' structs,
and functions that do the alignment) and then emit it to a code-vector
later.
This allows peephole optimization according to option 1 of the above link.
It was pretty straightforward after spending a couple of days figuring out
how things work in python...
The (extremely) preliminary version of the optimizer that I have does the
optimization mentioned in number 3 (seems to catch it about 400 times
compiling SBCL from source). It compiles fully (using vanilla 1.0.40 sbcl as
bootstrap and bootstrapping itself) and it seems to pass all of the tests in
the tests directory.
I am still working on an elegant interface to pattern matching (I'm thinking
a 'unification' style function is the way to go?) and the actual 'sliding
peephole window'... thing.
I have a few questions about the process of contributing, (once it is a bit
further along):
1.) How do I contribute? The project page (link to sourceforge) on
sbcl.orgis a dead link for me. "This
page has been deprecated and is now controlled by the consume team". I was
lucky that i was already subscribed to the SBCL-devel mailing list in my
gmail, as I had difficulty finding it on the site. I expect that the
procedure is to fork the main branch and merge in my changes, and that they
then get considered for addition? Where is the main branch?
It obviously needs a lot more work to be included, but I'd like to know
where to put the repository for downstream synchronization/bus-insurance.
2.) It seems like these types of changes are pretty important to really,
really test thoroughly.
(judging by the number of times I have landed myself in LDB working on this
stuff).
Is there a list of commonly-used applications that I could run to test the
compiler?
(I was thinking I would try MAXIMA for starters).
3.) These changes would be applicable to all architectures (I think), I've
made the changes only within the generic sections of the compiler (with the
notable exception that the optimizations are platform specific). I only have
an x86_64 machine currently, I can test regular x86 on it, but I cannot
test, for example, SPARC or MIPS on it. Is there a procedure for doing this?
(An emulator, perhaps?).
That's it for now, I think
Thanks,
-Jon

On Thu, Aug 26, 2010 at 10:46 PM, Jonathan Smith
<jonathansmith415@...> wrote:
> Hi,
> I've been working on implementing a peephole optimizer for SBCL.
> (http://sbcl-internals.cliki.net/Peephole%20Optimizer)
Wow, that's cool. I think people would be interested and perhaps even
provide some feedback if you publish what you have even if it's
preliminary.
> I have a few questions about the process of contributing, (once it is a bit
> further along):
> 1.) How do I contribute? The project page (link to sourceforge) on sbcl.org
> is a dead link for me. "This page has been deprecated and is now controlled
> by the consume team". I was lucky that i was already subscribed to the
> SBCL-devel mailing list in my gmail, as I had difficulty finding it on the
> site. I expect that the procedure is to fork the main branch and merge in my
> changes, and that they then get considered for addition? Where is the main
> branch?
The official cvs is on sourceforge.
http://sourceforge.net/projects/sbcl/develop
The link works for me, in case it still doesn't work for you:
cvs -d:pserver:anonymous@...:/cvsroot/sbcl login
cvs -z3 -d:pserver:anonymous@...:/cvsroot/sbcl co -P sbcl
That said, most devs use the git gateway these days:
git://sbcl.boinkor.net/sbcl.githttp://sbcl.boinkor.net/git/sbcl.git
> 2.) It seems like these types of changes are pretty important to really,
> really test thoroughly.
> (judging by the number of times I have landed myself in LDB working on this
> stuff).
> Is there a list of commonly-used applications that I could run to test the
> compiler?
> (I was thinking I would try MAXIMA for starters).
Anything with an extensive test suite is valuable. CL-PPCRE comes to
mind. Still, I think it may happen that an innocent transformation
changes semantics slightly by for instance optimizing away a piece of
code that acted as a memory barrier. Maybe there are less contrived
examples. Anyway, it would be nice to have a way to quickly glance
through differences in the assembly for a set of functions (maybe
functions that belong to a particular package).
> 3.) These changes would be applicable to all architectures (I think), I've
> made the changes only within the generic sections of the compiler (with the
> notable exception that the optimizations are platform specific). I only have
> an x86_64 machine currently, I can test regular x86 on it, but I cannot
> test, for example, SPARC or MIPS on it. Is there a procedure for doing this?
You may want to get an account on the gcc compile farm (as some of us did):
http://gcc.gnu.org/wiki/CompileFarm
> (An emulator, perhaps?).
> That's it for now, I think
> Thanks,
> -Jon
Cheers,
Gabor

Gábor Melis <mega@...> writes:
> On Thu, Aug 26, 2010 at 10:46 PM, Jonathan Smith
> <jonathansmith415@...> wrote:
>> 3.) These changes would be applicable to all architectures (I think), I've
>> made the changes only within the generic sections of the compiler (with the
>> notable exception that the optimizations are platform specific). I only have
>> an x86_64 machine currently, I can test regular x86 on it, but I cannot
>> test, for example, SPARC or MIPS on it. Is there a procedure for doing this?
>
> You may want to get an account on the gcc compile farm (as some of us did):
>
> http://gcc.gnu.org/wiki/CompileFarm
>
>> (An emulator, perhaps?).
For most architectures you could also probably use qemu if you prefer to
work locally. However the compile farm will ost likely be more efficient,
Regards
Christoph
--
9FED 5C6C E206 B70A 5857 70CA 9655 22B9 D49A E731
Debian Developer | Lisp Hacker | CaCert Assurer
A. Because it breaks the logical sequence of discussion
Q. Why is top posting bad?

2010/8/27 Gábor Melis <mega@...>
> On Thu, Aug 26, 2010 at 10:46 PM, Jonathan Smith
> <jonathansmith415@...> wrote:
> > Hi,
> > I've been working on implementing a peephole optimizer for SBCL.
> > (http://sbcl-internals.cliki.net/Peephole%20Optimizer)
>
> Wow, that's cool. I think people would be interested and perhaps even
> provide some feedback if you publish what you have even if it's
> preliminary.
>
>
Okay, I have put a TGZ of my current folder up on Sourceforge.
It is from a fork of sbcl 1.0.40
https://sourceforge.net/projects/sbcl-peep-opt/files/
The majority of the changes are in Codegen.lisp and assem.lisp in the
compiler.
Currently it compiles with a *really, really* dumb implementation of a
single optimization.
There is also a file called 'peephole.lisp' which is intended to eventually
be used as the compile-time pattern matcher.
It is also in the compiler folder, but is not loaded during the compile
(mostly because of not-doneness).
----
Pretty much all we do is buffer assembly instruction structures, label
structures, and alignments (as lambdas), into lists,
and then using a closure of the 'instruction (or label) emitter' which is
contained within the structure, we can emit them to a segment at any
point...
----
The main thing that I am having difficulty understanding now is how exactly
TNs map onto machine registers, memory locations, etc.
I have a general idea, but it seems that in certain situations, my
expectations are violated.
It is possible that using TNs as the basis for peephole optimization could
lead to a more powerful (different?) version of peephole optimization than
standard regex pattern matching (You may have information about what
registers are dead, and when, for example).
The other issue, is that this is my first foray into assembly language
programming,
so I am not exactly full of insight as to what optimizations are actually
optimizations...
----
I apologize for the primitive delivery of the code, I will see about setting
up something with git or CVS later today,
and possibly merge it with a more recent version of SBCL.
> I have a few questions about the process of contributing, (once it is a
> bit
> > further along):
> > 1.) How do I contribute? The project page (link to sourceforge) on
> sbcl.org
> > is a dead link for me. "This page has been deprecated and is now
> controlled
> > by the consume team". I was lucky that i was already subscribed to the
> > SBCL-devel mailing list in my gmail, as I had difficulty finding it on
> the
> > site. I expect that the procedure is to fork the main branch and merge in
> my
> > changes, and that they then get considered for addition? Where is the
> main
> > branch?
>
> The official cvs is on sourceforge.
>
> http://sourceforge.net/projects/sbcl/develop
>
> The link works for me, in case it still doesn't work for you:
>
> cvs -d:pserver:anonymous@...:/cvsroot/sbcl login
> cvs -z3 -d:pserver:anonymous@...:/cvsroot/sbcl co -P
> sbcl
>
> That said, most devs use the git gateway these days:
>
> git://sbcl.boinkor.net/sbcl.git
> http://sbcl.boinkor.net/git/sbcl.git
>
>
> > 2.) It seems like these types of changes are pretty important to really,
> > really test thoroughly.
> > (judging by the number of times I have landed myself in LDB working on
> this
> > stuff).
> > Is there a list of commonly-used applications that I could run to test
> the
> > compiler?
> > (I was thinking I would try MAXIMA for starters).
>
> Anything with an extensive test suite is valuable. CL-PPCRE comes to
> mind. Still, I think it may happen that an innocent transformation
> changes semantics slightly by for instance optimizing away a piece of
> code that acted as a memory barrier. Maybe there are less contrived
> examples. Anyway, it would be nice to have a way to quickly glance
> through differences in the assembly for a set of functions (maybe
> functions that belong to a particular package).
>
> > 3.) These changes would be applicable to all architectures (I think),
> I've
> > made the changes only within the generic sections of the compiler (with
> the
> > notable exception that the optimizations are platform specific). I only
> have
> > an x86_64 machine currently, I can test regular x86 on it, but I cannot
> > test, for example, SPARC or MIPS on it. Is there a procedure for doing
> this?
>
> You may want to get an account on the gcc compile farm (as some of us did):
>
> http://gcc.gnu.org/wiki/CompileFarm
>
> > (An emulator, perhaps?).
> > That's it for now, I think
> > Thanks,
> > -Jon
>
> Cheers,
> Gabor
>
I forgot about the GCC compiler farm, that will certainly do the trick.
I'll sign up for it.
Thanks,
-Jon

On Sat, Aug 28, 2010 at 1:56 PM, Jonathan Smith
<jonathansmith415@...> wrote:
> Okay, I have put a TGZ of my current folder up on Sourceforge.
>
> It is from a fork of sbcl 1.0.40
> https://sourceforge.net/projects/sbcl-peep-opt/files/
Next time a diff would be preferable. Even if it's against an old
version (so long as the version is not ancient).
> The main thing that I am having difficulty understanding now is how exactly
> TNs map onto machine registers, memory locations, etc.
> I have a general idea, but it seems that in certain situations, my
> expectations are violated.
TNs represent machine registers and memory locations (such as stack
slots). It's hard to help you with your understanding/expectations
since you haven't explained exactly what they are.
> It is possible that using TNs as the basis for peephole optimization could
> lead to a more powerful (different?) version of peephole optimization than
> standard regex pattern matching (You may have information about what
> registers are dead, and when, for example).
Certainly. There are peephole patterns that are only valid if you
know that certain registers are dead after the sequence, for example.
It's encouraging to see this work done. I personally would have done
it slightly differently; I know there are people who favored your
approach. However it gets done, it will be nice to groan slightly
less when looking at DISASSEMBLE output. :)
-Nathan

On Sat, Aug 28, 2010 at 3:07 PM, Nathan Froyd <froydnj@...> wrote:
> On Sat, Aug 28, 2010 at 1:56 PM, Jonathan Smith
> <jonathansmith415@...> wrote:
> > Okay, I have put a TGZ of my current folder up on Sourceforge.
> >
> > It is from a fork of sbcl 1.0.40
> > https://sourceforge.net/projects/sbcl-peep-opt/files/
>
> Next time a diff would be preferable. Even if it's against an old
> version (so long as the version is not ancient).
>
>
Ok, I will do it that way next time!
> > The main thing that I am having difficulty understanding now is how
> exactly
> > TNs map onto machine registers, memory locations, etc.
> > I have a general idea, but it seems that in certain situations, my
> > expectations are violated.
>
> TNs represent machine registers and memory locations (such as stack
> slots). It's hard to help you with your understanding/expectations
> since you haven't explained exactly what they are.
>
>
Right, that is pretty much my understanding up until this point.
I meant to indicate that I just need to read more of the source code.
> It is possible that using TNs as the basis for peephole optimization could
> > lead to a more powerful (different?) version of peephole optimization
> than
> > standard regex pattern matching (You may have information about what
> > registers are dead, and when, for example).
>
> Certainly. There are peephole patterns that are only valid if you
> know that certain registers are dead after the sequence, for example.
>
> It's encouraging to see this work done. I personally would have done
> it slightly differently; I know there are people who favored your
> approach. However it gets done, it will be nice to groan slightly
> less when looking at DISASSEMBLE output. :)
>
>
-Nathan
>
This seemed like the 'least resistance' approach to me, and I admit is is
kind of a hack at this stage.
What would be your preferred alternative?

On Sat, Aug 28, 2010 at 7:30 PM, Jonathan Smith
<jonathansmith415@...> wrote:
>> It's encouraging to see this work done. I personally would have done
>> it slightly differently; I know there are people who favored your
>> approach. However it gets done, it will be nice to groan slightly
>> less when looking at DISASSEMBLE output. :)
>
> This seemed like the 'least resistance' approach to me, and I admit is is
> kind of a hack at this stage.
>
> What would be your preferred alternative?
My alternative was to do the peephole matching on the VOPs themselves,
rather than the assembly emitted by the VOPs. I think this way is
slightly better because for the example:
mov [mem], reg
mov reg, [mem]
we'd only have to write the matcher once, rather than N times, once
for each backend. Assuming, of course, that those instructions come
from something like:
VOP MOVE #<memory TN> #<reg TN>
VOP MOVE #<reg TN> #<memory TN>
However, if the above instructions come from VOPs that don't match
that pattern, then we can't usefully combine them.
I can see pluses and minuses to each approach.
-Nathan

On 30 August 2010 16:46, Nathan Froyd <froydnj@...> wrote:
> However, if the above instructions come from VOPs that don't match
> that pattern, then we can't usefully combine them.
Right -- and I think the running optimizations across VOP borders is
potentially a big win.
Though the last time I looked, IIRC most of optimizable sequences
coming from multiple vops require constructing a flow-graph -- which
in turn requires annotating instructions with side-effects. Of course
that doesn't have to be a huge one-time effort: the default for
unannotated instructions can be "don't know, so don't assume
anything".
...but I would leave that till local sequences that don't need
liveness information have been implemented. :)
Anyways, Red, hugely cool that you are working on this.
Cheers,
-- Nikodemus