Tuesday, August 27, 2013

This is the roadmap for numpy effort in PyPy as discussed on the London sprint.
First, the highest on our priority list is to finish the low-level part
of the numpy module. What
we'll do is to finish the RPython part of numpy and provide a pip installable
numpypy repository that includes the pure python part of Numpy. This would
contain the original Numpy with a few minor changes.

Second, we need to work on the JIT support that will make NumPy on PyPy
faster. In detail:

reenable the lazy loop evaluation

optimize bridges, which is depending on optimizer refactorings

SSE support

On the compatibility front, there were some independent attempts into
making the following stuff working:

In order to make all of the above happen faster, it would be helpful to raise
more funds. You can donate to PyPy's NumPy project on our website. Note
that PyPy is a member of SFC which is a 501(c)(3) US non-profit, so donations
from US companies can be tax-deducted.

Cheers,
fijal, arigo, ronan, rguillebert, anto and others

Hello everyone.

This is the roadmap for numpy effort in PyPy as discussed on the London sprint.
First, the highest on our priority list is to finish the low-level part
of the numpy module. What
we'll do is to finish the RPython part of numpy and provide a pip installable
numpypy repository that includes the pure python part of Numpy. This would
contain the original Numpy with a few minor changes.

Second, we need to work on the JIT support that will make NumPy on PyPy
faster. In detail:

reenable the lazy loop evaluation

optimize bridges, which is depending on optimizer refactorings

SSE support

On the compatibility front, there were some independent attempts into
making the following stuff working:

In order to make all of the above happen faster, it would be helpful to raise
more funds. You can donate to PyPy's NumPy project on our website. Note
that PyPy is a member of SFC which is a 501(c)(3) US non-profit, so donations
from US companies can be tax-deducted.

Tuesday, August 20, 2013

We now have a preliminary agenda for the demo evening in London next week. It takes place on Tuesday, August 27 2013, 18:30-19:30 (BST) at King's College London, Strand. The preliminary agenda is as follows:

All the talks are lightning talks. Afterwards there will be plenty of time for discussion.

There's still free spots, if you want to come, please register on the Eventbrite page. Hope to see you there!

We now have a preliminary agenda for the demo evening in London next week. It takes place on Tuesday, August 27 2013, 18:30-19:30 (BST) at King's College London, Strand. The preliminary agenda is as follows:

Sunday, August 18, 2013

A quick update on Software Transactional Memory. We are
working on two fronts.

On the one hand, the integration of the "c4" C library with PyPy is done
and works well, but is still subject to improvements. The "PyPy-STM"
executable (without the JIT)
seems to be stable, as far as it has been tested. It runs a simple
benchmark like Richards with a 3.2x slow-down over a regular JIT-less
PyPy.

The main factor of this slow-down: the numerous "barriers" in
the code --- checks that are needed a bit everywhere to verify that a
pointer to an object points to a recent enough version, and if not, to
go to the most recent version. These barriers are inserted automatically
during the translation; there is no need for us to manually put 42 million
barriers in the source code of PyPy. But this automatic insertion uses a
primitive algorithm right now, which usually ends up putting more barriers than the
theoretical optimum. I (Armin) am trying to improve that --- and progressing:
last week the slow-down was around 4.5x. This is done in the branch
stmgc-static-barrier.

On the other hand, Remi is progressing on the JIT integration in
the branch stmgc-c4.
This has been working in simple cases since a couple of weeks by now, but the
resulting "PyPy-JIT-STM" often crashes. This is because while the
basics are not really hard, we keep hitting new issues that must be
resolved.

The basics are that whenever the JIT is about to generate
assembler corresponding to a load or a store in a GC object, it must
first generate a bit of extra assembler that corresponds to the barrier
that we need. This works fine by now (but could benefit from the same
kind of optimizations described above, to reduce the number of barriers).
The additional issues are all more subtle. I will describe the current
one as an example: it is how to write constant pointers inside the assembler.

Remember that the STM library classifies objects as either
"public" or "protected/private". A "protected/private" object
is one which has not been seen by another thread so far.
This is essential as an optimization, because we know that no
other thread will access our protected or private objects in parallel,
and thus we are free to modify their content in place. By contrast,
public objects are frozen, and to do any change, we first need to
build a different (protected) copy of the object. See this
blog
post for more details.

So far so good, but the JIT will sometimes (actually often) hard-code
constant pointers into the assembler it produces. For example, this is the
case when the Python code being JITted creates an instance of a known class;
the corresponding assembler produced by the JIT will reserve the memory for
the instance and then write the constant type pointer in it. This type
pointer is a GC object (in the simple model, it's the Python class object;
in PyPy it's actually the "map" object, which is
a different story).

The problem right now is that this constant pointer may point to a
protected object. This is a problem because the same piece of assembler
can later be executed by a different thread. If it does, then this
different thread will create instances whose type pointer is bogus: looking
like a protected object, but actually protected by a different thread.
Any attempt to use this type pointer to change anything on the class
itself will likely crash: the threads will all think they can safely change it
in-place. To fix this, we need to make sure we only write pointers to
public objects in the assembler. This is a bit involved because we need
to ensure that there is a public version of the object to start with.

When this is done, we will likely hit the next problem, and the next one;
but at some point it should converge (hopefully!) and we'll give you our first
PyPy-JIT-STM ready to try. Stay tuned :-)

A bientôt,

Armin.

Hi all,

A quick update on Software Transactional Memory. We are
working on two fronts.

On the one hand, the integration of the "c4" C library with PyPy is done
and works well, but is still subject to improvements. The "PyPy-STM"
executable (without the JIT)
seems to be stable, as far as it has been tested. It runs a simple
benchmark like Richards with a 3.2x slow-down over a regular JIT-less
PyPy.

The main factor of this slow-down: the numerous "barriers" in
the code --- checks that are needed a bit everywhere to verify that a
pointer to an object points to a recent enough version, and if not, to
go to the most recent version. These barriers are inserted automatically
during the translation; there is no need for us to manually put 42 million
barriers in the source code of PyPy. But this automatic insertion uses a
primitive algorithm right now, which usually ends up putting more barriers than the
theoretical optimum. I (Armin) am trying to improve that --- and progressing:
last week the slow-down was around 4.5x. This is done in the branch
stmgc-static-barrier.

On the other hand, Remi is progressing on the JIT integration in
the branch stmgc-c4.
This has been working in simple cases since a couple of weeks by now, but the
resulting "PyPy-JIT-STM" often crashes. This is because while the
basics are not really hard, we keep hitting new issues that must be
resolved.

The basics are that whenever the JIT is about to generate
assembler corresponding to a load or a store in a GC object, it must
first generate a bit of extra assembler that corresponds to the barrier
that we need. This works fine by now (but could benefit from the same
kind of optimizations described above, to reduce the number of barriers).
The additional issues are all more subtle. I will describe the current
one as an example: it is how to write constant pointers inside the assembler.

Remember that the STM library classifies objects as either
"public" or "protected/private". A "protected/private" object
is one which has not been seen by another thread so far.
This is essential as an optimization, because we know that no
other thread will access our protected or private objects in parallel,
and thus we are free to modify their content in place. By contrast,
public objects are frozen, and to do any change, we first need to
build a different (protected) copy of the object. See this
blog
post for more details.

So far so good, but the JIT will sometimes (actually often) hard-code
constant pointers into the assembler it produces. For example, this is the
case when the Python code being JITted creates an instance of a known class;
the corresponding assembler produced by the JIT will reserve the memory for
the instance and then write the constant type pointer in it. This type
pointer is a GC object (in the simple model, it's the Python class object;
in PyPy it's actually the "map" object, which is
a different story).

The problem right now is that this constant pointer may point to a
protected object. This is a problem because the same piece of assembler
can later be executed by a different thread. If it does, then this
different thread will create instances whose type pointer is bogus: looking
like a protected object, but actually protected by a different thread.
Any attempt to use this type pointer to change anything on the class
itself will likely crash: the threads will all think they can safely change it
in-place. To fix this, we need to make sure we only write pointers to
public objects in the assembler. This is a bit involved because we need
to ensure that there is a public version of the object to start with.

When this is done, we will likely hit the next problem, and the next one;
but at some point it should converge (hopefully!) and we'll give you our first
PyPy-JIT-STM ready to try. Stay tuned :-)

Thursday, August 1, 2013

We're pleased to announce PyPy 2.1, which targets version 2.7.3 of the Python
language. This is the first release with official support for ARM processors in the JIT.
This release also contains several bugfixes and performance improvements.

The first beta of PyPy3 2.1, targeting version 3 of the Python language, was
just released, more details can be found here.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for
CPython 2.7. It's fast (pypy 2.1 and cpython 2.7.2 performance comparison)
due to its integrated tracing JIT compiler.

This release supports x86 machines running Linux 32/64, Mac OS X 64 or Windows
32. This release also supports ARM machines running Linux 32bit - anything with
ARMv6 (like the Raspberry Pi) or ARMv7 (like the Beagleboard,
Chromebook, Cubieboard, etc.) that supports VFPv3 should work. Both
hard-float armhf/gnueabihf and soft-float armel/gnueabi builds are
provided. The armhf builds for Raspbian are created using the Raspberry Pi
custom cross-compilation toolchain
based on gcc-arm-linux-gnueabihf and should work on ARMv6 and
ARMv7 devices running Debian or Raspbian. The armel builds are built
using the gcc-arm-linux-gnuebi toolchain provided by Ubuntu and
currently target ARMv7.

Windows 64 work is still stalling, we would welcome a volunteer
to handle that.

Highlights

JIT support for ARM, architecture versions 6 and 7, hard- and soft-float ABI

We're pleased to announce PyPy 2.1, which targets version 2.7.3 of the Python
language. This is the first release with official support for ARM processors in the JIT.
This release also contains several bugfixes and performance improvements.

The first beta of PyPy3 2.1, targeting version 3 of the Python language, was
just released, more details can be found here.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for
CPython 2.7. It's fast (pypy 2.1 and cpython 2.7.2 performance comparison)
due to its integrated tracing JIT compiler.

This release supports x86 machines running Linux 32/64, Mac OS X 64 or Windows
32. This release also supports ARM machines running Linux 32bit - anything with
ARMv6 (like the Raspberry Pi) or ARMv7 (like the Beagleboard,
Chromebook, Cubieboard, etc.) that supports VFPv3 should work. Both
hard-float armhf/gnueabihf and soft-float armel/gnueabi builds are
provided. The armhf builds for Raspbian are created using the Raspberry Pi
custom cross-compilation toolchain
based on gcc-arm-linux-gnueabihf and should work on ARMv6 and
ARMv7 devices running Debian or Raspbian. The armel builds are built
using the gcc-arm-linux-gnuebi toolchain provided by Ubuntu and
currently target ARMv7.

Windows 64 work is still stalling, we would welcome a volunteer
to handle that.

Highlights

JIT support for ARM, architecture versions 6 and 7, hard- and soft-float ABI