Wednesday, February 20, 2008

As part of our efforts of making PyPy's Python interpreter usable we put quite some effort into interfacing with external libraries. We were able, in quite a short amount of time (I think beginning really from Leysin sprint, or slightly earlier) to provide a prototype of the ctypes library. It is written in completely normal Python, at applevel, based on a very thin wrapper around the libffi library. This makes development a lot easier, but it makes the resulting ctypes implementation rather slow. The implementation is not complete yet and it will still need quite some effort to make it feature-complete (ctypes has lots of details and special cases and
do-what-I-mean magic). Yet another point will be to make it faster, but that's for much later.
The implementation is good enough to run those parts of Pyglet that don't depend on PIL (which PyPy doesn't have). Here are a few pictures of running Pyglet demos on top of compiled pypy-c.
To compile a version of PyPy that supports ctypes, use this highly sophisticated command line
./translate.py --gc=generation ./targetpypystandalone.py --allworkingmodules --withmod-_rawffi
Note: this works on linux only right now.
The list of missing small ctypes features is quite extensive, but I consider the current implementation to be usable for most common cases. I would love to hear about libraries written in pure python (using ctypes), to run them on top of PyPy and use them as test cases. If someone knows such library, please provide a link.

As part of our efforts of making PyPy's Python interpreter usable we put quite some effort into interfacing with external libraries. We were able, in quite a short amount of time (I think beginning really from Leysin sprint, or slightly earlier) to provide a prototype of the ctypes library. It is written in completely normal Python, at applevel, based on a very thin wrapper around the libffi library. This makes development a lot easier, but it makes the resulting ctypes implementation rather slow. The implementation is not complete yet and it will still need quite some effort to make it feature-complete (ctypes has lots of details and special cases and
do-what-I-mean magic). Yet another point will be to make it faster, but that's for much later.
The implementation is good enough to run those parts of Pyglet that don't depend on PIL (which PyPy doesn't have). Here are a few pictures of running Pyglet demos on top of compiled pypy-c.
To compile a version of PyPy that supports ctypes, use this highly sophisticated command line
./translate.py --gc=generation ./targetpypystandalone.py --allworkingmodules --withmod-_rawffi
Note: this works on linux only right now.
The list of missing small ctypes features is quite extensive, but I consider the current implementation to be usable for most common cases. I would love to hear about libraries written in pure python (using ctypes), to run them on top of PyPy and use them as test cases. If someone knows such library, please provide a link.

This is an infinite loop in CPython: Every time c is set to None in the
loop, the __del__ method resets it to the C instance again (note that
this is terribly bad programming style, of course. In case anybody was wondering
:-)). CPython can detect resurrection by checking whether the reference count
after the call to __del__ has gotten bigger.

There exist even worse examples of perpetual resurrection in particular in
combination with the cycle GC. If you want to see a particularly horrible one,
see this discussion started by Armin Rigo. In the ensuing thread Tim Peters
proposes to follow Java's example and call the finalizer of every object at most
once.

In PyPy the resurrection problem is slightly more complex, since we have GCs
that run collection from time to time and don't really get to know at which
precise time an object dies. If the GC discovers during a collection that an
object is dead, it will call the finalizer after the collection is finished. If
the object is then dead at the next collection, the GC does not know whether
the object was resurrected by the finalizer and then died in the meantime or
whether it was not resurrected. Therefore it seemed sanest to follow Tim's
solution and to never call the finalizer of an object a second time, which has
many other benefits as well.

This is an infinite loop in CPython: Every time c is set to None in the
loop, the __del__ method resets it to the C instance again (note that
this is terribly bad programming style, of course. In case anybody was wondering
:-)). CPython can detect resurrection by checking whether the reference count
after the call to __del__ has gotten bigger.

There exist even worse examples of perpetual resurrection in particular in
combination with the cycle GC. If you want to see a particularly horrible one,
see this discussion started by Armin Rigo. In the ensuing thread Tim Peters
proposes to follow Java's example and call the finalizer of every object at most
once.

In PyPy the resurrection problem is slightly more complex, since we have GCs
that run collection from time to time and don't really get to know at which
precise time an object dies. If the GC discovers during a collection that an
object is dead, it will call the finalizer after the collection is finished. If
the object is then dead at the next collection, the GC does not know whether
the object was resurrected by the finalizer and then died in the meantime or
whether it was not resurrected. Therefore it seemed sanest to follow Tim's
solution and to never call the finalizer of an object a second time, which has
many other benefits as well.

Monday, February 18, 2008

Python's garbage collection semantics is very much historically grown and
implementation-driven. Samuele Pedroni therefore likes to call it the "'there
is no such thing as too much chocolate'-approach to GC semantics" :-). In this
two-part post series I am going to talk about the semantics of finalization
(__del__ methods) in CPython and PyPy.

The current behaviour is mostly all a consequence of the fact that CPython uses
reference counting for garbage collection. The first consequence is that if
several objects die at the same time, their finalizers are called in a
so-called topological order, which is a feature that some GCs have that
CPython offers by chance. This ensures, that in a __del__ method, all the
attributes of the object didn't get their __del__ called yet. A simple
example:

If the instance of B dies now, both it and the logfile are dead. They will
get their __del__``scalledandit'simportantthatthefile's``__del__
gets called second, because otherwise the __del__ of B would try to
write to a closed file.

The correct ordering happens completely automatically if you use reference
counting: Setting b to None will decref the old value of b. This reduces
the reference count of this instance to 0, so the finalizer will be called.
After the __del__ has finished, this object will be freed and all the
objects it points to decrefed as well, which decreases the reference count of
the file to 0 and call its `` __del__`` as well, which closes the file.

The behaviour of PyPy's semispace and generational GCs wasn't very nice so far:
it just called the finalizers in an essentially random order. Last week Armin
came up with a somewhat complicated algorithm that solves this by emulating
CPython's finalization order, which we subsequently implemented. So PyPy does
what you expect now! The Boehm GC does a topological ordering by default, so it
wasn't a problem there.

A small twist on the above is when
there is a cycle of objects involving finalizers:
In this case a topological ordering is not possible, so that CPython refuses to
guess the finalization order and puts such cycles into gc.garbage. This
would be very hard for PyPy to do, since our GC implementation is essentially
independent from the Python interpreter. The same GCs work for our other
interpreters after all too. Therefore we decided to break such a cycle at an
arbitrary place, which doesn't sound too insane. The insane thing is for
a Python program to create a cycle of objects with finalizers and depend
on the order in which the finalizers are called. Don't do that :-) (After
all, CPython wouldn't even call the finalizers in this case.)

Python's garbage collection semantics is very much historically grown and
implementation-driven. Samuele Pedroni therefore likes to call it the "'there
is no such thing as too much chocolate'-approach to GC semantics" :-). In this
two-part post series I am going to talk about the semantics of finalization
(__del__ methods) in CPython and PyPy.

The current behaviour is mostly all a consequence of the fact that CPython uses
reference counting for garbage collection. The first consequence is that if
several objects die at the same time, their finalizers are called in a
so-called topological order, which is a feature that some GCs have that
CPython offers by chance. This ensures, that in a __del__ method, all the
attributes of the object didn't get their __del__ called yet. A simple
example:

If the instance of B dies now, both it and the logfile are dead. They will
get their __del__``scalledandit'simportantthatthefile's``__del__
gets called second, because otherwise the __del__ of B would try to
write to a closed file.

The correct ordering happens completely automatically if you use reference
counting: Setting b to None will decref the old value of b. This reduces
the reference count of this instance to 0, so the finalizer will be called.
After the __del__ has finished, this object will be freed and all the
objects it points to decrefed as well, which decreases the reference count of
the file to 0 and call its `` __del__`` as well, which closes the file.

The behaviour of PyPy's semispace and generational GCs wasn't very nice so far:
it just called the finalizers in an essentially random order. Last week Armin
came up with a somewhat complicated algorithm that solves this by emulating
CPython's finalization order, which we subsequently implemented. So PyPy does
what you expect now! The Boehm GC does a topological ordering by default, so it
wasn't a problem there.

A small twist on the above is when
there is a cycle of objects involving finalizers:
In this case a topological ordering is not possible, so that CPython refuses to
guess the finalization order and puts such cycles into gc.garbage. This
would be very hard for PyPy to do, since our GC implementation is essentially
independent from the Python interpreter. The same GCs work for our other
interpreters after all too. Therefore we decided to break such a cycle at an
arbitrary place, which doesn't sound too insane. The insane thing is for
a Python program to create a cycle of objects with finalizers and depend
on the order in which the finalizers are called. Don't do that :-) (After
all, CPython wouldn't even call the finalizers in this case.)

Tuesday, February 12, 2008

Hello! I will have the pleasure of presenting PyPy on various conferences in the near future. They're (in chronological order):

Studencki Festiwal Informatyczny in Krakow, POLAND 6-8 March 2008. I think this might be only interesting for polish people (website, in polish)

Pycon Chicago, IL, USA. 14-17 March 2008. There should be also a PyPy sprint afterwards, including newbie-friendly tutorial, everybody is welcome to join us! (Provided that I'll get the US visa, which seems to be non-trivial issue for a polish citizen)

RuPy, Poznan, POLAND 13-14 April 2008 (website). This is small, but very friendly Ruby and Python conference. Last year was amazing, I can strongly recommend to go there (Poznan is only 2h by train from Berlin also has its own airport).

Hope to see you at those places!

Cheers,
fijal

Hello! I will have the pleasure of presenting PyPy on various conferences in the near future. They're (in chronological order):

Studencki Festiwal Informatyczny in Krakow, POLAND 6-8 March 2008. I think this might be only interesting for polish people (website, in polish)

Pycon Chicago, IL, USA. 14-17 March 2008. There should be also a PyPy sprint afterwards, including newbie-friendly tutorial, everybody is welcome to join us! (Provided that I'll get the US visa, which seems to be non-trivial issue for a polish citizen)

RuPy, Poznan, POLAND 13-14 April 2008 (website). This is small, but very friendly Ruby and Python conference. Last year was amazing, I can strongly recommend to go there (Poznan is only 2h by train from Berlin also has its own airport).