Thursday, April 30, 2009

So, according to our jit
plan we're mostly done with point 1, that is to provide a JIT that compiles
python code to assembler in the most horrible manner possible but doesn't
break. That meant mostly 4 weeks of glaring at GDB and megabytess of assembler
generated by C code generated from python code. The figure of 4 weeks proves
that our approach is by far superior to the one of psyco, since Armin says it's
"only 4 weeks" :-)

Right now, pypy compiled with JIT can run the whole CPython test suite
without crashing, which means we're done with obvious bugs and the only
ones waiting for us are really horrible. (Or they really don't exist.
At least they should never be about obscure Python corner cases: they can
only be in the 10'000 lines of relatively clear code that is our JIT
generator.)

But... the fun thing is that we can actually concentrate on optimizations!
So the next step is to provide a JIT that is correct *and* actually speeds
up python. Stay tuned for more :-)

Cheers,
fijal, armin & benjamin

UPDATE: for those of you blessed with no knowledge of C, gdb stands for GNU debugger, a classic debugger for C. (It's also much more powerful than python debugger, pdb, which is kind of surprising).

Hello.

So, according to our jit
plan we're mostly done with point 1, that is to provide a JIT that compiles
python code to assembler in the most horrible manner possible but doesn't
break. That meant mostly 4 weeks of glaring at GDB and megabytess of assembler
generated by C code generated from python code. The figure of 4 weeks proves
that our approach is by far superior to the one of psyco, since Armin says it's
"only 4 weeks" :-)

Right now, pypy compiled with JIT can run the whole CPython test suite
without crashing, which means we're done with obvious bugs and the only
ones waiting for us are really horrible. (Or they really don't exist.
At least they should never be about obscure Python corner cases: they can
only be in the 10'000 lines of relatively clear code that is our JIT
generator.)

But... the fun thing is that we can actually concentrate on optimizations!
So the next step is to provide a JIT that is correct *and* actually speeds
up python. Stay tuned for more :-)

Cheers,
fijal, armin & benjamin

UPDATE: for those of you blessed with no knowledge of C, gdb stands for GNU debugger, a classic debugger for C. (It's also much more powerful than python debugger, pdb, which is kind of surprising).

Tuesday, April 21, 2009

First a disclaimer. This post is more about plans for future than current
status. We usually try to write about things that we have done, because
it's much much easier to promise things than to actually make it happen,
but I think it's important enough to have some sort of roadmap.

In recent months we came to the point where the 5th generation of
JIT prototype was working as nice
or even a bit nicer than 1st one back in 2007. Someone might ask "so why
did you spend all this time without going forward?". And indeed, we spend
a lot of time moving sideways, but as posted, we also spent a lot of time
doing some other things, which are important as well.
The main advantage of current JIT incarnation is much much simpler than
the first one. Even I can comprehend it, which is much of an improvement :-)

So, the prototype is working and gives very nice speedups in range of 20-30x
over CPython. We're pretty confident this prototype will work and will
produce fast python interpreter eventually. So we decided that now we'll
work towards changing prototype into something stable and solid. This
might sound easy, but in fact it's not. Having stable assembler backend
and optimizations that keep semantics is not as easy as it might sound.

The current roadmap, as I see it, looks like as following:

Provide a JIT that does not speedup things, but produce assembler without
optimizations turned on, that is correct and able to run CPython's library
tests on a nightly basis.

Introduce simple optimizations, that should make above JIT a bit faster than
CPython. With optimizations disabled JIT is producing incredibly dumb
assembler, which is slower than correspoding C code, even with removal
of interpretation overhead (which is not very surprising).

Backport optimizations from JIT prototype, one by one, keeping an eye
on how they perform and making sure they don't break anything.

Create new optimizations, like speeding up attribute access.

Profit.

This way, we can hopefully provide a working JIT, which gives fast python
interpreter, which is a bit harder than just a nice prototype.

Tell us what you think about this plan.

Cheers,
fijal & others.

Hello.

First a disclaimer. This post is more about plans for future than current
status. We usually try to write about things that we have done, because
it's much much easier to promise things than to actually make it happen,
but I think it's important enough to have some sort of roadmap.

In recent months we came to the point where the 5th generation of
JIT prototype was working as nice
or even a bit nicer than 1st one back in 2007. Someone might ask "so why
did you spend all this time without going forward?". And indeed, we spend
a lot of time moving sideways, but as posted, we also spent a lot of time
doing some other things, which are important as well.
The main advantage of current JIT incarnation is much much simpler than
the first one. Even I can comprehend it, which is much of an improvement :-)

So, the prototype is working and gives very nice speedups in range of 20-30x
over CPython. We're pretty confident this prototype will work and will
produce fast python interpreter eventually. So we decided that now we'll
work towards changing prototype into something stable and solid. This
might sound easy, but in fact it's not. Having stable assembler backend
and optimizations that keep semantics is not as easy as it might sound.

The current roadmap, as I see it, looks like as following:

Provide a JIT that does not speedup things, but produce assembler without
optimizations turned on, that is correct and able to run CPython's library
tests on a nightly basis.

Introduce simple optimizations, that should make above JIT a bit faster than
CPython. With optimizations disabled JIT is producing incredibly dumb
assembler, which is slower than correspoding C code, even with removal
of interpretation overhead (which is not very surprising).

Backport optimizations from JIT prototype, one by one, keeping an eye
on how they perform and making sure they don't break anything.

Create new optimizations, like speeding up attribute access.

Profit.

This way, we can hopefully provide a working JIT, which gives fast python
interpreter, which is a bit harder than just a nice prototype.

The Leysin sprint is nearing its end, as usual here is an attempt at a summary

of what we did.

Release Work

Large parts of the sprint were dedicated to fixing bugs. Since the easy bugs
seem to have been fixed long ago, those were mostly very annoying and hard bugs.
This work was supported by our buildbots, which we tried to get free of
test-failures. This was worked on by nearly all participants of the sprint
(Samuele, Armin, Anto, Niko, Anders, Christian, Carl Friedrich). One
particularly annoying bug was the differences in the tracing events that PyPy
produces (fixed by Anders, Samuele and Christian). Some details about larger
tasks are in the sections below.

Stackless

A large number of problems came from our stackless features, which do some
advanced things and thus seem to contain advanced bugs. Samuele and Carl
Friedrich spent some time fixing tasklet pickling and unpickling. This was
achieved by supporting the (un)pickling of builtin code objects. In addition
they fixed some bugs in the finalization of tasklets. This needs some care
because the __del__ of a tasklet cannot run at arbitrary points in time, but
only at safe points. This problem was a bit subtle to get right, and popped up
nearly every morning of the sprint in form of a test failure.

Armin and Niko added a way to restrict the stack depth of the RPython-level
stack. This can useful when using stackless, because if this is not there it is
possible that you fill your whole heap with stack frames in the case of an
infinite recursion. Then they went on to make stackless not segfault when
threads are used at the same time, or if a callback from C library code is in
progress. Instead you get a RuntimeError now, which is not good but better
than a segfault.

Killing Features

During the sprint we discussed the fate of the LLVM and the JS backends. Both
have not really been maintained for some time, and even partially untested
(their tests were skipped). Also their usefulness appears to be limited. The JS
backend is cool in principle, but has some serious limitations due to the fact
that JavaScript is really a dynamic language, while RPython is rather static.
This made it hard to use some features of JS from RPython, e.g. RPython does not
support closures of any kind.

The LLVM backend had its own set of problems. For
a long time it produced the fastest form of PyPy's Python interpreter, by first
using the LLVM backend, applying the LLVM optimizations to the result, then
using LLVM's C backend to produce C code, then apply GCC to the result :-).
However, it is not clear that it is still useful to directly produce LLVM
bitcode, since LLVM has rather good C frontends nowadays, with llvm-gcc and
clang. It is likely that we will use LLVM in the future in our JIT (but that's
another story, based on different code).

Therefore we decided to remove these two backends from SVN, which Samuele and
Carl Friedrich did. They are not dead, only resting until somebody who is
interested in maintaining them steps up.

Windows

One goal of the release is good Windows-support. Anders and Samuele set up a new
windows buildbot which revealed a number of failures. Those were attacked by
Anders, Samuele and Christian as well as by Amaury (who was not at the sprint,
but thankfully did a lot of Windows work in the last months).

OS X

Christian with some help by Samuele tried to get translation working again under
Mac OS X. This was a large mess, because of different behaviours of some POSIX
functionality in Leopard. It is still possible to get the old behaviour back,
but whether that was enabled or not depended on a number of factors such as
which Python is used. Eventually they managed to successfully navigate that maze
and produce something that almost works (there is still a problem remaining
about OpenSSL).

Documentation

The Friday of the sprint was declared to be a documentation day, where (nearly)
no coding was allowed. This resulted in a newly structured and improved getting
started document (done by Carl Friedrich, Samuele and some help of Niko) and
a new document describing differences to CPython (Armin, Carl Friedrich) as
well as various improvements to existing documents (everybody else). Armin
undertook the Sisyphean task of listing all talks, paper and related stuff
of the PyPy project.

Various Stuff

Java Backend Work

Niko and Anto worked on the JVM backend for a while. First they had to fix
translation of the Python interpreter to Java. Then they tried to improve the
performance of the Python interpreter when translated to Java. Mostly they did a
lot of profiling to find performance bottlenecks. They managed to improve
performance by 40% by overriding fillInStackTrace of the generated exception
classes. Apart from that they found no simple-to-fix performance problems.

JIT Work

Armin gave a presentation about the current state of the JIT to the sprinters as
well as Adrian Kuhn, Toon Verwaest and Camillo Bruni of the University of Bern
who came to visit for one day. There was a bit of work on the JIT going on too;
Armin and Anto tried to get closer to having a working JIT on top of the CLI.

The Leysin sprint is nearing its end, as usual here is an attempt at a summary

of what we did.

Release Work

Large parts of the sprint were dedicated to fixing bugs. Since the easy bugs
seem to have been fixed long ago, those were mostly very annoying and hard bugs.
This work was supported by our buildbots, which we tried to get free of
test-failures. This was worked on by nearly all participants of the sprint
(Samuele, Armin, Anto, Niko, Anders, Christian, Carl Friedrich). One
particularly annoying bug was the differences in the tracing events that PyPy
produces (fixed by Anders, Samuele and Christian). Some details about larger
tasks are in the sections below.

Stackless

A large number of problems came from our stackless features, which do some
advanced things and thus seem to contain advanced bugs. Samuele and Carl
Friedrich spent some time fixing tasklet pickling and unpickling. This was
achieved by supporting the (un)pickling of builtin code objects. In addition
they fixed some bugs in the finalization of tasklets. This needs some care
because the __del__ of a tasklet cannot run at arbitrary points in time, but
only at safe points. This problem was a bit subtle to get right, and popped up
nearly every morning of the sprint in form of a test failure.

Armin and Niko added a way to restrict the stack depth of the RPython-level
stack. This can useful when using stackless, because if this is not there it is
possible that you fill your whole heap with stack frames in the case of an
infinite recursion. Then they went on to make stackless not segfault when
threads are used at the same time, or if a callback from C library code is in
progress. Instead you get a RuntimeError now, which is not good but better
than a segfault.

Killing Features

During the sprint we discussed the fate of the LLVM and the JS backends. Both
have not really been maintained for some time, and even partially untested
(their tests were skipped). Also their usefulness appears to be limited. The JS
backend is cool in principle, but has some serious limitations due to the fact
that JavaScript is really a dynamic language, while RPython is rather static.
This made it hard to use some features of JS from RPython, e.g. RPython does not
support closures of any kind.

The LLVM backend had its own set of problems. For
a long time it produced the fastest form of PyPy's Python interpreter, by first
using the LLVM backend, applying the LLVM optimizations to the result, then
using LLVM's C backend to produce C code, then apply GCC to the result :-).
However, it is not clear that it is still useful to directly produce LLVM
bitcode, since LLVM has rather good C frontends nowadays, with llvm-gcc and
clang. It is likely that we will use LLVM in the future in our JIT (but that's
another story, based on different code).

Therefore we decided to remove these two backends from SVN, which Samuele and
Carl Friedrich did. They are not dead, only resting until somebody who is
interested in maintaining them steps up.

Windows

One goal of the release is good Windows-support. Anders and Samuele set up a new
windows buildbot which revealed a number of failures. Those were attacked by
Anders, Samuele and Christian as well as by Amaury (who was not at the sprint,
but thankfully did a lot of Windows work in the last months).

OS X

Christian with some help by Samuele tried to get translation working again under
Mac OS X. This was a large mess, because of different behaviours of some POSIX
functionality in Leopard. It is still possible to get the old behaviour back,
but whether that was enabled or not depended on a number of factors such as
which Python is used. Eventually they managed to successfully navigate that maze
and produce something that almost works (there is still a problem remaining
about OpenSSL).

Documentation

The Friday of the sprint was declared to be a documentation day, where (nearly)
no coding was allowed. This resulted in a newly structured and improved getting
started document (done by Carl Friedrich, Samuele and some help of Niko) and
a new document describing differences to CPython (Armin, Carl Friedrich) as
well as various improvements to existing documents (everybody else). Armin
undertook the Sisyphean task of listing all talks, paper and related stuff
of the PyPy project.

Various Stuff

Java Backend Work

Niko and Anto worked on the JVM backend for a while. First they had to fix
translation of the Python interpreter to Java. Then they tried to improve the
performance of the Python interpreter when translated to Java. Mostly they did a
lot of profiling to find performance bottlenecks. They managed to improve
performance by 40% by overriding fillInStackTrace of the generated exception
classes. Apart from that they found no simple-to-fix performance problems.

JIT Work

Armin gave a presentation about the current state of the JIT to the sprinters as
well as Adrian Kuhn, Toon Verwaest and Camillo Bruni of the University of Bern
who came to visit for one day. There was a bit of work on the JIT going on too;
Armin and Anto tried to get closer to having a working JIT on top of the CLI.

Sunday, April 19, 2009

Today we are releasing a beta of the upcoming PyPy 1.1 release. There
are some Windows and OS X issues left that we would like to address
between now and the final release but apart from this things should be
working. We would appreciate feedback.

The PyPy development team.

PyPy 1.1: Compatibility & Consolidation

Welcome to the PyPy 1.1 release - the first release after the end of EU
funding. This release focuses on making PyPy's Python interpreter more
compatible with CPython (currently CPython 2.5) and on making the
interpreter more stable and bug-free.

Through a large number of tweaks, performance has been improved by
10%-50% since the 1.0 release. The Python interpreter is now between
0.8-2x (and in some corner case 3-4x) slower than CPython. A large
part of these speed-ups come from our new generational garbage
collectors.

Stackless improvements: PyPy's stackless module is now more
complete. We added channel preferences which change details of the
scheduling semantics. In addition, the pickling of tasklets has been
improved to work in more cases.

Classic classes are enabled by default now. In addition, they have
been greatly optimized and debugged:

Some effort was spent to make the Python interpreter more
memory-efficient. This includes the implementation of a mark-compact
GC which uses less memory than other GCs during collection.
Additionally there were various optimizations that make Python
objects smaller, e.g. class instances are often only 50% of the size
of CPython.

The support for the trace hook in the Python interpreter was
improved to be able to trace the execution of builtin functions and
methods. With this, we implemented the _lsprof module, which is
the core of the cProfile module.

A number of rarely used features of PyPy were removed since the previous
release because they were unmaintained and/or buggy. Those are: The
LLVM and the JS backends, the aspect-oriented programming features,
the logic object space, the extension compiler and the first
incarnation of the JIT generator. The new JIT generator is in active
development, but not included in the release.

What is PyPy?

Technically, PyPy is both a Python interpreter implementation and an
advanced compiler, or more precisely a framework for implementing dynamic
languages and generating virtual machines for them.

The framework allows for alternative frontends and for alternative
backends, currently C, Java and .NET. For our main target "C", we can
"mix in" different garbage collectors and threading models,
including micro-threads aka "Stackless". The inherent complexity that
arises from this ambitious approach is mostly kept away from the Python
interpreter implementation, our main frontend.

Socially, PyPy is a collaborative effort of many individuals working
together in a distributed and sprint-driven way since 2003. PyPy would
not have gotten as far as it has without the coding, feedback and
general support from numerous people.

Today we are releasing a beta of the upcoming PyPy 1.1 release. There
are some Windows and OS X issues left that we would like to address
between now and the final release but apart from this things should be
working. We would appreciate feedback.

The PyPy development team.

PyPy 1.1: Compatibility & Consolidation

Welcome to the PyPy 1.1 release - the first release after the end of EU
funding. This release focuses on making PyPy's Python interpreter more
compatible with CPython (currently CPython 2.5) and on making the
interpreter more stable and bug-free.

Through a large number of tweaks, performance has been improved by
10%-50% since the 1.0 release. The Python interpreter is now between
0.8-2x (and in some corner case 3-4x) slower than CPython. A large
part of these speed-ups come from our new generational garbage
collectors.

Stackless improvements: PyPy's stackless module is now more
complete. We added channel preferences which change details of the
scheduling semantics. In addition, the pickling of tasklets has been
improved to work in more cases.

Classic classes are enabled by default now. In addition, they have
been greatly optimized and debugged:

Some effort was spent to make the Python interpreter more
memory-efficient. This includes the implementation of a mark-compact
GC which uses less memory than other GCs during collection.
Additionally there were various optimizations that make Python
objects smaller, e.g. class instances are often only 50% of the size
of CPython.

The support for the trace hook in the Python interpreter was
improved to be able to trace the execution of builtin functions and
methods. With this, we implemented the _lsprof module, which is
the core of the cProfile module.

A number of rarely used features of PyPy were removed since the previous
release because they were unmaintained and/or buggy. Those are: The
LLVM and the JS backends, the aspect-oriented programming features,
the logic object space, the extension compiler and the first
incarnation of the JIT generator. The new JIT generator is in active
development, but not included in the release.

What is PyPy?

Technically, PyPy is both a Python interpreter implementation and an
advanced compiler, or more precisely a framework for implementing dynamic
languages and generating virtual machines for them.

The framework allows for alternative frontends and for alternative
backends, currently C, Java and .NET. For our main target "C", we can
"mix in" different garbage collectors and threading models,
including micro-threads aka "Stackless". The inherent complexity that
arises from this ambitious approach is mostly kept away from the Python
interpreter implementation, our main frontend.

Socially, PyPy is a collaborative effort of many individuals working
together in a distributed and sprint-driven way since 2003. PyPy would
not have gotten as far as it has without the coding, feedback and
general support from numerous people.

Wednesday, April 15, 2009

The Leysin Sprint started today. The weather is great and the view is wonderful, as usual. Technically we are working on the remaining test failures of the nightly test runs and are generally trying to fix various long-postponed bugs. I will try to give more detailed reports as the sprint progresses.

The Leysin Sprint started today. The weather is great and the view is wonderful, as usual. Technically we are working on the remaining test failures of the nightly test runs and are generally trying to fix various long-postponed bugs. I will try to give more detailed reports as the sprint progresses.