Monday, November 21, 2011

We're pleased to announce the 1.7 release of PyPy. As became a habit, this
release brings a lot of bugfixes and performance improvements over the 1.6
release. However, unlike the previous releases, the focus has been on widening
the "sweet spot" of PyPy. That is, classes of Python code that PyPy can greatly
speed up should be vastly improved with this release. You can download the 1.7
release here:

The main topic of this release is widening the range of code which PyPy
can greatly speed up. On average on
our benchmark suite, PyPy 1.7 is around 30% faster than PyPy 1.6 and up
to 20 times faster on some benchmarks.

Highlights

Numerous performance improvements. There are too many examples which python
constructs now should behave faster to list them.

Bugfixes and compatibility fixes with CPython.

Windows fixes.

PyPy now comes with stackless features enabled by default. However,
any loop using stackless features will interrupt the JIT for now, so no real
performance improvement for stackless-based programs. Contact pypy-dev for
info how to help on removing this restriction.

NumPy effort in PyPy was renamed numpypy. In order to try using it, simply
write:

import numpypy as numpy

at the beginning of your program. There is a huge progress on numpy in PyPy
since 1.6, the main feature being implementation of dtypes.

JSON encoder (but not decoder) has been replaced with a new one. This one
is written in pure Python, but is known to outperform CPython's C extension
up to 2 times in some cases. It's about 20 times faster than
the one that we had in 1.6.

The memory footprint of some of our RPython modules has been drastically
improved. This should impact any applications using for example cryptography,
like tornado.

There was some progress in exposing even more CPython C API via cpyext.

Things that didn't make it, expect in 1.8 soon

There is an ongoing work, which while didn't make it to the release, is
probably worth mentioning here. This is what you should probably expect in
1.8 some time soon:

Specialized list implementation. There is a branch that implements lists of
integers/floats/strings as compactly as array.array. This should drastically
improve performance/memory impact of some applications

There are two brand new JIT assembler backends, notably for the PowerPC and
ARM processors.

Fundraising

It's maybe worth mentioning that we're running fundraising campaigns for
NumPy effort in PyPy and for Python 3 in PyPy. In case you want to see any
of those happen faster, we urge you to donate to numpy proposal or
py3k proposal. In case you want PyPy to progress, but you trust us with
the general direction, you can always donate to the general pot.

Cheers,Maciej Fijałkowki, Armin Rigo and the entire PyPy team

We're pleased to announce the 1.7 release of PyPy. As became a habit, this
release brings a lot of bugfixes and performance improvements over the 1.6
release. However, unlike the previous releases, the focus has been on widening
the "sweet spot" of PyPy. That is, classes of Python code that PyPy can greatly
speed up should be vastly improved with this release. You can download the 1.7
release here:

The main topic of this release is widening the range of code which PyPy
can greatly speed up. On average on
our benchmark suite, PyPy 1.7 is around 30% faster than PyPy 1.6 and up
to 20 times faster on some benchmarks.

Highlights

Numerous performance improvements. There are too many examples which python
constructs now should behave faster to list them.

Bugfixes and compatibility fixes with CPython.

Windows fixes.

PyPy now comes with stackless features enabled by default. However,
any loop using stackless features will interrupt the JIT for now, so no real
performance improvement for stackless-based programs. Contact pypy-dev for
info how to help on removing this restriction.

NumPy effort in PyPy was renamed numpypy. In order to try using it, simply
write:

import numpypy as numpy

at the beginning of your program. There is a huge progress on numpy in PyPy
since 1.6, the main feature being implementation of dtypes.

JSON encoder (but not decoder) has been replaced with a new one. This one
is written in pure Python, but is known to outperform CPython's C extension
up to 2 times in some cases. It's about 20 times faster than
the one that we had in 1.6.

The memory footprint of some of our RPython modules has been drastically
improved. This should impact any applications using for example cryptography,
like tornado.

There was some progress in exposing even more CPython C API via cpyext.

Things that didn't make it, expect in 1.8 soon

There is an ongoing work, which while didn't make it to the release, is
probably worth mentioning here. This is what you should probably expect in
1.8 some time soon:

Specialized list implementation. There is a branch that implements lists of
integers/floats/strings as compactly as array.array. This should drastically
improve performance/memory impact of some applications

There are two brand new JIT assembler backends, notably for the PowerPC and
ARM processors.

Fundraising

It's maybe worth mentioning that we're running fundraising campaigns for
NumPy effort in PyPy and for Python 3 in PyPy. In case you want to see any
of those happen faster, we urge you to donate to numpy proposal or
py3k proposal. In case you want PyPy to progress, but you trust us with
the general direction, you can always donate to the general pot.

Monday, November 14, 2011

In the past week, we have been busy hacking on PyPy at the Gothenburg sprint, the second of this 2011. The sprint was hold at Laura's and Jacob's place, and here is a brief report of what happened.

In the first day we welcomed Mark Pearse, who was new to PyPy and at his first sprint. Mark worked the whole sprint in the new SpecialisedTuple branch, whose aim is to have a special implementation for small 2-items and 3-items tuples of primitive types (e.g., ints or floats) to save memory. Mark paired with Antonio for a couple of days, then he continued alone and did an amazing job. He even learned how to properly do Test Driven Development :-).

Antonio spent a couple of days investigating whether it is possible to use application checkpoint libraries such as BLCR and DMTCP to save the state of the PyPy interpreter between subsequent runs, thus saving also the JIT-compiled code to reduce the warmup time. The conclusion is that these are interesting technologies, but more work would be needed (either on the PyPy side or on the checkpoint library side) before it can have a practical usage for PyPy users.

Then, Antonio spent most of the rest of the sprint working on his ffistruct branch, whose aim is to provide a very JIT-friendly way to interact with C structures, and eventually implement ctypes.Structure on top of that. The "cool part" of the branch is already done, and the JIT already can compile set/get of fields into a single fast assembly instruction, about 400 times faster than the corresponding ctypes code. What is still left to do is to add a nicer syntax (which is easy) and to implement all the ctypes peculiarities (which is tedious, at best :-)).

As usual, Armin did tons of different stuff, including fixing a JIT bug, improving the performance of file.readlines() and working on the STM branch (for Software Transactional Memory), which is now able to run RPython multithreaded programs using software transaction (as long as they don't fill up all the memory, because support for the GC is still missing :-)). Finally, he worked on improving the Windows version of PyPy. While doing so he discovered together with Anto a terrible bug which lead to a continuous leak of stack space because the JIT called some functions using the wrong calling convention.

Håkan, with some help from Armin, worked on the jit-targets branch, whose goal is to heavily refactor the way the traces are internally represented by the JIT, so that in the end we can produce (even :-)) better code than what we do nowadays. More details in this mail.

Andrew Dalke worked on a way to integrate PyPy with FORTRAN libraries, and in particular the ones which are wrapped by Numpy and Scipy: in doing so, he wrote f2pypy, which is similar to the existing f2py but instead of producing a CPython extension module it produces a pure python modules based on ctypes. More work is needed before it can be considered complete, but f2pypy is already able to produce a wrapper for BLAS which passes most of the tests under CPython, although there's still work left to get it working for PyPy.

Armin and Håkan with Laura's "5x faster" cake

Christian Tismer worked the whole sprint on the branch to make PyPy compatible with Windows 64 bit. This needs a lot of work because a lot of PyPy is written under the assumption that the long type in C has the same bit size than void*, which is not true on Win64. Christian says that in the past Genova-Pegli sprint he completed 90% of the work, and in this sprint he did the other 90% of the work. Obviously, what is left to complete the task is the third 90% :-). More seriously, he estimated a total of 2-4 person-weeks of work to finish it.

But, all in all, the best part of the sprint has been the cake that Laura baked to celebrate the "5x faster than CPython" achievement. Well, actually our speed page reports "only" 4.7x, but that's because in the meantime we switched from comparing against CPython 2.6 to comparing against CPython 2.7, which is slightly faster. We are confident that we will reach the 5x goal again, and that will be the perfect excuse to eat another cake :-)

In the past week, we have been busy hacking on PyPy at the Gothenburg sprint, the second of this 2011. The sprint was hold at Laura's and Jacob's place, and here is a brief report of what happened.

In the first day we welcomed Mark Pearse, who was new to PyPy and at his first sprint. Mark worked the whole sprint in the new SpecialisedTuple branch, whose aim is to have a special implementation for small 2-items and 3-items tuples of primitive types (e.g., ints or floats) to save memory. Mark paired with Antonio for a couple of days, then he continued alone and did an amazing job. He even learned how to properly do Test Driven Development :-).

Antonio spent a couple of days investigating whether it is possible to use application checkpoint libraries such as BLCR and DMTCP to save the state of the PyPy interpreter between subsequent runs, thus saving also the JIT-compiled code to reduce the warmup time. The conclusion is that these are interesting technologies, but more work would be needed (either on the PyPy side or on the checkpoint library side) before it can have a practical usage for PyPy users.

Then, Antonio spent most of the rest of the sprint working on his ffistruct branch, whose aim is to provide a very JIT-friendly way to interact with C structures, and eventually implement ctypes.Structure on top of that. The "cool part" of the branch is already done, and the JIT already can compile set/get of fields into a single fast assembly instruction, about 400 times faster than the corresponding ctypes code. What is still left to do is to add a nicer syntax (which is easy) and to implement all the ctypes peculiarities (which is tedious, at best :-)).

As usual, Armin did tons of different stuff, including fixing a JIT bug, improving the performance of file.readlines() and working on the STM branch (for Software Transactional Memory), which is now able to run RPython multithreaded programs using software transaction (as long as they don't fill up all the memory, because support for the GC is still missing :-)). Finally, he worked on improving the Windows version of PyPy. While doing so he discovered together with Anto a terrible bug which lead to a continuous leak of stack space because the JIT called some functions using the wrong calling convention.

Håkan, with some help from Armin, worked on the jit-targets branch, whose goal is to heavily refactor the way the traces are internally represented by the JIT, so that in the end we can produce (even :-)) better code than what we do nowadays. More details in this mail.

Andrew Dalke worked on a way to integrate PyPy with FORTRAN libraries, and in particular the ones which are wrapped by Numpy and Scipy: in doing so, he wrote f2pypy, which is similar to the existing f2py but instead of producing a CPython extension module it produces a pure python modules based on ctypes. More work is needed before it can be considered complete, but f2pypy is already able to produce a wrapper for BLAS which passes most of the tests under CPython, although there's still work left to get it working for PyPy.

Armin and Håkan with Laura's "5x faster" cake

Christian Tismer worked the whole sprint on the branch to make PyPy compatible with Windows 64 bit. This needs a lot of work because a lot of PyPy is written under the assumption that the long type in C has the same bit size than void*, which is not true on Win64. Christian says that in the past Genova-Pegli sprint he completed 90% of the work, and in this sprint he did the other 90% of the work. Obviously, what is left to complete the task is the third 90% :-). More seriously, he estimated a total of 2-4 person-weeks of work to finish it.

But, all in all, the best part of the sprint has been the cake that Laura baked to celebrate the "5x faster than CPython" achievement. Well, actually our speed page reports "only" 4.7x, but that's because in the meantime we switched from comparing against CPython 2.6 to comparing against CPython 2.7, which is slightly faster. We are confident that we will reach the 5x goal again, and that will be the perfect excuse to eat another cake :-)