Monday, February 14, 2011

A few weeks ago I had the great fortune to attend the PyPy winter sprint in Leysin Switzerland. I've wanted to contribute to PyPy for a long time and I thought diving into a sprint might be a good way to get familiar with some of the code. What I wasn't expecting was to be using RPython to implement new methods on built-in Python objects on the first day. The main thing I took away from the sprint was just how easy it is to get involved in developing PyPy (well, some bits of it at least and being surrounded by core developers helps). I wrote up a very short description of how to get started here, but I'll do a longer blog post with examples on my own blog soon(ish).

The sprint was kicked off by Armin merging the "fast-forward" branch of PyPy onto trunk. "fast-forward" brings PyPy from Python 2.5 compatibility to Python 2.7. Along with this it brought a large number of test failures, as the sterling work done by Benjamin Peterson and Amaury Forgeot d'Arc was not complete. This immediately set the primary sprint goal to reduce the number of test failures.

We made a great deal of progress on this front, and you can see how close PyPy is now from the buildbots.

Jacob Hallén and I started working through the list of tests with failures alphabetically. We made short work of test_asyncore and moved onto test_bytes where I was stuck for the rest of the sprint. I spent much of the remaining days working with Laura Creighton on the pypy bytearray implementation to make it more compatible with Python 2.7. This meant adding new methods, changing some of the Python protocol method implementations and even changing the way that bytearray is constructed. All in all great fun and a great introduction to working with RPython.

A big part of the compatibility with Python 2.7 work was done by Laura and Armin who basically rewrote the math module from scratch. This was needed to incorporate all the improvements made (mostly by Mark Dickinson) in CPython in 2.7. That involved a lot of head-scratching about such subtleties as whether -0.0 should be considered almost equal to 0.0 and other fun problems.

If you add on top of this the wonderful people, the beautiful scenery, the Swiss cheese fondues, managing to not kill myself with a days skiing and traditional pypy card games, I can heartily recommend pypy sprints as a close approximation of geek nirvana.

Working on 2.7 compatibility wasn't the only work that happened during the sprint. Other activities included:

Antonio Cuni worked on the "jittypes" branch. This is a reimplementation of the core of the PyPy ctypes code to make it jittable. The goal is that for common cases the jit should be able to turn ctypes calls from Python into direct C level calls. This work was not completed but very close and is great for the future of integrating C libraries with PyPy. As ctypes is also available in CPython and IronPython, and hopefully will be available in Jython soon, integrating C code with Python through ctypes is the most "implementation portable" technique.

David Schneider continued his work on the JIT backend for ARM. PyPy has been cross-compilable to ARM for a long time, but bringing the JIT to ARM will provide a *fast* PyPy for ARM, which includes platforms like Android. Again David didn't complete this work but did complete the float support.

Håkan Ardo was present for two days and continued his crazy-clever work on JIT optimisations, some of which are described in the Loop invariant code motion blog entry.

Holger Krekel worked on updating the PyPy test suite to the latest version of py.test and also worked with me on the interminable bytearray changes for part of the sprint.

No one was sure what Maciej Fijałkowski worked on but he seemed to be quite busy.

I think that was most of the work done during the actual sprint. There was also a great deal of healthy discussion about the future of PyPy. Expect lots more interesting and exciting developments over the coming year.

A few weeks ago I had the great fortune to attend the PyPy winter sprint in Leysin Switzerland. I've wanted to contribute to PyPy for a long time and I thought diving into a sprint might be a good way to get familiar with some of the code. What I wasn't expecting was to be using RPython to implement new methods on built-in Python objects on the first day. The main thing I took away from the sprint was just how easy it is to get involved in developing PyPy (well, some bits of it at least and being surrounded by core developers helps). I wrote up a very short description of how to get started here, but I'll do a longer blog post with examples on my own blog soon(ish).

The sprint was kicked off by Armin merging the "fast-forward" branch of PyPy onto trunk. "fast-forward" brings PyPy from Python 2.5 compatibility to Python 2.7. Along with this it brought a large number of test failures, as the sterling work done by Benjamin Peterson and Amaury Forgeot d'Arc was not complete. This immediately set the primary sprint goal to reduce the number of test failures.

We made a great deal of progress on this front, and you can see how close PyPy is now from the buildbots.

Jacob Hallén and I started working through the list of tests with failures alphabetically. We made short work of test_asyncore and moved onto test_bytes where I was stuck for the rest of the sprint. I spent much of the remaining days working with Laura Creighton on the pypy bytearray implementation to make it more compatible with Python 2.7. This meant adding new methods, changing some of the Python protocol method implementations and even changing the way that bytearray is constructed. All in all great fun and a great introduction to working with RPython.

A big part of the compatibility with Python 2.7 work was done by Laura and Armin who basically rewrote the math module from scratch. This was needed to incorporate all the improvements made (mostly by Mark Dickinson) in CPython in 2.7. That involved a lot of head-scratching about such subtleties as whether -0.0 should be considered almost equal to 0.0 and other fun problems.

If you add on top of this the wonderful people, the beautiful scenery, the Swiss cheese fondues, managing to not kill myself with a days skiing and traditional pypy card games, I can heartily recommend pypy sprints as a close approximation of geek nirvana.

Working on 2.7 compatibility wasn't the only work that happened during the sprint. Other activities included:

Antonio Cuni worked on the "jittypes" branch. This is a reimplementation of the core of the PyPy ctypes code to make it jittable. The goal is that for common cases the jit should be able to turn ctypes calls from Python into direct C level calls. This work was not completed but very close and is great for the future of integrating C libraries with PyPy. As ctypes is also available in CPython and IronPython, and hopefully will be available in Jython soon, integrating C code with Python through ctypes is the most "implementation portable" technique.

David Schneider continued his work on the JIT backend for ARM. PyPy has been cross-compilable to ARM for a long time, but bringing the JIT to ARM will provide a *fast* PyPy for ARM, which includes platforms like Android. Again David didn't complete this work but did complete the float support.

Håkan Ardo was present for two days and continued his crazy-clever work on JIT optimisations, some of which are described in the Loop invariant code motion blog entry.

Holger Krekel worked on updating the PyPy test suite to the latest version of py.test and also worked with me on the interminable bytearray changes for part of the sprint.

No one was sure what Maciej Fijałkowski worked on but he seemed to be quite busy.

I think that was most of the work done during the actual sprint. There was also a great deal of healthy discussion about the future of PyPy. Expect lots more interesting and exciting developments over the coming year.

As is usual for us, there is vastly more material that is available for
us to cover than time, especially when it comes to possible future
directions for PyPy. We want to reserve a certain amount of time at
each talk purely to discuss things that are of interest to audience
members. However, if you already know what you wish we would discuss,
and are attending a talk (or even if you aren't), please let us know.
You can either reply to this blog post, or mail Laura directly at
lac at openend.se .

Apart from getting more technical and project insight, our travel is
also a good possibility for companies in the SF area to talk to us
regarding contracting. In September 2011 our current "Eurostars" research
project ends and some of us are looking for ways to continue working on
PyPy through consulting, subcontracting or hiring. The two companies,
Open End and merlinux, have successfully done a number of such contracts
and projects in the past. If you want to talk business or get together for
lunch or dinner, let us know! If you would like us to come to your company
and make a presentation, let us know! If you have any ideas about what
we should discuss in a presentation so that you could use it to convince
the powers-that-be at your place of employment that investing time and
money in PyPy would be a good idea, let us know!

On Tuesday March 8th we will be heading for Atlanta for the Python VM
and Language Summits before attending PyCon. Maciej Fijałkowski and
Alex Gaynor will be giving a talk entitled
Why is
Python slow and how can PyPy help?
Maciej will also be giving the talk
Running
ultra large telescopes in Python which is
partially about his experiences using PyPy in the Square Kilometer Array
project in South Africa. There will be a PyPy Sprint March 14-17.
All are welcome.

PyPy is coming to the San Francisco Bay Area in the beginning of March with
a series of talks and a mini sprint.

As is usual for us, there is vastly more material that is available for
us to cover than time, especially when it comes to possible future
directions for PyPy. We want to reserve a certain amount of time at
each talk purely to discuss things that are of interest to audience
members. However, if you already know what you wish we would discuss,
and are attending a talk (or even if you aren't), please let us know.
You can either reply to this blog post, or mail Laura directly at
lac at openend.se .

Apart from getting more technical and project insight, our travel is
also a good possibility for companies in the SF area to talk to us
regarding contracting. In September 2011 our current "Eurostars" research
project ends and some of us are looking for ways to continue working on
PyPy through consulting, subcontracting or hiring. The two companies,
Open End and merlinux, have successfully done a number of such contracts
and projects in the past. If you want to talk business or get together for
lunch or dinner, let us know! If you would like us to come to your company
and make a presentation, let us know! If you have any ideas about what
we should discuss in a presentation so that you could use it to convince
the powers-that-be at your place of employment that investing time and
money in PyPy would be a good idea, let us know!

On Tuesday March 8th we will be heading for Atlanta for the Python VM
and Language Summits before attending PyCon. Maciej Fijałkowski and
Alex Gaynor will be giving a talk entitled
Why is
Python slow and how can PyPy help?
Maciej will also be giving the talk
Running
ultra large telescopes in Python which is
partially about his experiences using PyPy in the Square Kilometer Array
project in South Africa. There will be a PyPy Sprint March 14-17.
All are welcome.

Friday, February 4, 2011

Recent round of optimizations, especially loop invariant code motion
has been very good for small to medium examples. There is work ongoing to
make them scale to larger ones, however there are few examples worth showing
how well they perform. This one following example, besides getting benefits
from loop invariants, also shows a difference between static and dynamic
compilation. In fact, after applying all the optimizations C does, only a
JIT can use the extra bit of runtime information to run even faster.

Hence, PyPy 50% faster than C on this carefully crafted example. The reason
is obvious - static compiler can't inline across file boundaries. In C,
you can somehow circumvent that, however, it wouldn't anyway work
with shared libraries. In Python however, even when the whole import system
is completely dynamic, the JIT can dynamically find out what can be inlined.
That example would work equally well for Java and other decent JITs, it's
however good to see we work in the same space :-)

Cheers,
fijal

EDIT: Updated GCC version

Good day everyone.

Recent round of optimizations, especially loop invariant code motion
has been very good for small to medium examples. There is work ongoing to
make them scale to larger ones, however there are few examples worth showing
how well they perform. This one following example, besides getting benefits
from loop invariants, also shows a difference between static and dynamic
compilation. In fact, after applying all the optimizations C does, only a
JIT can use the extra bit of runtime information to run even faster.

Hence, PyPy 50% faster than C on this carefully crafted example. The reason
is obvious - static compiler can't inline across file boundaries. In C,
you can somehow circumvent that, however, it wouldn't anyway work
with shared libraries. In Python however, even when the whole import system
is completely dynamic, the JIT can dynamically find out what can be inlined.
That example would work equally well for Java and other decent JITs, it's
however good to see we work in the same space :-)