Prolog? Yes. To understand this slightly unusual choice of programming
language, here is first some background about our JIT.

PyPy contains not a JIT but a JIT generator, which means that we
only write an interpreter for a language (say, the complete Python
language), and we get a JIT "for free". More precisely, it's not for
free: we had to write the JIT generator, of course, as well as some
amount of subtle generic support code. The JIT generator preprocesses
the (complete Python) interpreter that we wrote and links the result
with the generic support code; the result is a (complete Python) JIT.

The way that this works so far gives us a generated JIT that is very
similar to Psyco in the way
it works.
But Psyco has issues (and so the current PyPy JITs have the same issues):
it can sometimes produce too much machine code,
e.g. by failing to notice that two versions of the machine code are
close enough that they should really be one; and it can also sometimes
fail in the opposite way, by making a single sub-efficient version of
the machine code instead of several efficient specialized versions.

A few months ago we have chosen to experiment with improving this
instead of finishing and polishing what we had so far. The choice was
mostly because we were (and still are) busy finishing and polishing
everything else in PyPy, so it was more fun to keep at least the JIT on
the experimental side. Besides, PyPy is now getting to a rather good
and complete state, and it is quite usable without the JIT already.

Anyway, enough excuses. Why is this about Prolog?

In PyPy, both the (complete Python) interpreter and the JIT support code
are in RPython. Now RPython is not
an extremely complicated language, but still, it is far from the top on a
minimalism scale. In general, this is a good in practice (or at least I
think so): it gives
a reasonable balance because it is convenient to write interpreters
in RPython, while not being so bloated that it makes our translation
toolchain horribly complicated (e.g. writing garbage collectors for
RPython - or even JIT generators - is reasonable). Still, it is not the
best choice for early research-level experimentation.

So what we did instead recently is hand-write, in Prolog, a JIT that
looks similar to what we would like to achieve for RPython with our JIT
generator. This gave much quicker turnaround times than we were used to
when we played around directly with RPython. We wrote tiny example
interpreters in Prolog (of course not a complete Python interpreter).
Self-inspection is trivial in Prolog, and generating Prolog code at
runtime is very easy too. Moreover, many other issues are also easier
in Prolog: for example, all data structures are immutable "terms".
Other languages than Prolog would have worked, too, but it happens to be
one that we (Carl Friderich, Michael Leuschel and myself) are familiar
with -- not to mention that it's basically a nice small dynamic
language.

Of course, all this is closely related to what we want to do in PyPy.
The fundamental issues are the same. Indeed, in PyPy, the major goals
of the JIT are to remove, first, the overhead of allocating objects all
the time (e.g. integers), and second, the overhead of dynamic dispatch
(e.g. finding out that it's integers we are adding). The equivalent
goals in Prolog are, first, to avoid creating short-lived terms, and
second, to remove the overhead of dispatch (typically, the dispatching
to multiple clauses). If you are familiar with Prolog you can find more
details about this in the paper. So far we already played with many possible solutions
in the Prolog JIT, and the paper describes the most mature one; we have
more experimentation in mind. The main point here is that these are
mostly language-independent techniques (anything that works both in
Prolog and in RPython has to be language-independent, right? :-)

In summary, besides the nice goal of speeding up Prolog, we are trying
to focus our Prolog JIT on the issues and goals that have equivalents in
the PyPy JIT generator. So in the end we are pretty convinced that it
will give us something that we can backport to PyPy -- good ideas about
what works and what doesn't, as well as some concrete algorithms.

Prolog? Yes. To understand this slightly unusual choice of programming
language, here is first some background about our JIT.

PyPy contains not a JIT but a JIT generator, which means that we
only write an interpreter for a language (say, the complete Python
language), and we get a JIT "for free". More precisely, it's not for
free: we had to write the JIT generator, of course, as well as some
amount of subtle generic support code. The JIT generator preprocesses
the (complete Python) interpreter that we wrote and links the result
with the generic support code; the result is a (complete Python) JIT.

The way that this works so far gives us a generated JIT that is very
similar to Psyco in the way
it works.
But Psyco has issues (and so the current PyPy JITs have the same issues):
it can sometimes produce too much machine code,
e.g. by failing to notice that two versions of the machine code are
close enough that they should really be one; and it can also sometimes
fail in the opposite way, by making a single sub-efficient version of
the machine code instead of several efficient specialized versions.

A few months ago we have chosen to experiment with improving this
instead of finishing and polishing what we had so far. The choice was
mostly because we were (and still are) busy finishing and polishing
everything else in PyPy, so it was more fun to keep at least the JIT on
the experimental side. Besides, PyPy is now getting to a rather good
and complete state, and it is quite usable without the JIT already.

Anyway, enough excuses. Why is this about Prolog?

In PyPy, both the (complete Python) interpreter and the JIT support code
are in RPython. Now RPython is not
an extremely complicated language, but still, it is far from the top on a
minimalism scale. In general, this is a good in practice (or at least I
think so): it gives
a reasonable balance because it is convenient to write interpreters
in RPython, while not being so bloated that it makes our translation
toolchain horribly complicated (e.g. writing garbage collectors for
RPython - or even JIT generators - is reasonable). Still, it is not the
best choice for early research-level experimentation.

So what we did instead recently is hand-write, in Prolog, a JIT that
looks similar to what we would like to achieve for RPython with our JIT
generator. This gave much quicker turnaround times than we were used to
when we played around directly with RPython. We wrote tiny example
interpreters in Prolog (of course not a complete Python interpreter).
Self-inspection is trivial in Prolog, and generating Prolog code at
runtime is very easy too. Moreover, many other issues are also easier
in Prolog: for example, all data structures are immutable "terms".
Other languages than Prolog would have worked, too, but it happens to be
one that we (Carl Friderich, Michael Leuschel and myself) are familiar
with -- not to mention that it's basically a nice small dynamic
language.

Of course, all this is closely related to what we want to do in PyPy.
The fundamental issues are the same. Indeed, in PyPy, the major goals
of the JIT are to remove, first, the overhead of allocating objects all
the time (e.g. integers), and second, the overhead of dynamic dispatch
(e.g. finding out that it's integers we are adding). The equivalent
goals in Prolog are, first, to avoid creating short-lived terms, and
second, to remove the overhead of dispatch (typically, the dispatching
to multiple clauses). If you are familiar with Prolog you can find more
details about this in the paper. So far we already played with many possible solutions
in the Prolog JIT, and the paper describes the most mature one; we have
more experimentation in mind. The main point here is that these are
mostly language-independent techniques (anything that works both in
Prolog and in RPython has to be language-independent, right? :-)

In summary, besides the nice goal of speeding up Prolog, we are trying
to focus our Prolog JIT on the issues and goals that have equivalents in
the PyPy JIT generator. So in the end we are pretty convinced that it
will give us something that we can backport to PyPy -- good ideas about
what works and what doesn't, as well as some concrete algorithms.

Friday, June 27, 2008

Following the great success of code_swarm, I recently produced a
video that shows the commit history of the PyPy project.

The video shows the commits under the dist/ and branch/
directories, which is where most of the development happens.

In the first part of the video, you can see clearly our sprint based
approach: the video starts in February 2003, when the first PyPy
sprint took place in Hildesheim: after a lot of initial activity, few
commits happened in the next two months, until the second PyPy sprint,
which took place in Gothenburg in late May 2003; around the minute
0:15, you can see the high commit rate due to the sprint.

The next two years follow more or less the same pattern: very high
activity during sprints, followed by long pauses between them; the
most interesting breaking point is located around the minute 01:55;
it's January 2005, and when the EU project starts, the number of
commits just explodes, as well as the number of people involved.

I also particularly appreciated minute 03:08 aka March 22, 2006: it's
the date of my first commit to dist/, and my nickname magically
appears; but of course I'm biased :-).

The soundtrack is NIN - Ghosts IV - 34: thanks to xoraxax for
having added the music and uploaded the video.

Following the great success of code_swarm, I recently produced a
video that shows the commit history of the PyPy project.

The video shows the commits under the dist/ and branch/
directories, which is where most of the development happens.

In the first part of the video, you can see clearly our sprint based
approach: the video starts in February 2003, when the first PyPy
sprint took place in Hildesheim: after a lot of initial activity, few
commits happened in the next two months, until the second PyPy sprint,
which took place in Gothenburg in late May 2003; around the minute
0:15, you can see the high commit rate due to the sprint.

The next two years follow more or less the same pattern: very high
activity during sprints, followed by long pauses between them; the
most interesting breaking point is located around the minute 01:55;
it's January 2005, and when the EU project starts, the number of
commits just explodes, as well as the number of people involved.

I also particularly appreciated minute 03:08 aka March 22, 2006: it's
the date of my first commit to dist/, and my nickname magically
appears; but of course I'm biased :-).

The soundtrack is NIN - Ghosts IV - 34: thanks to xoraxax for
having added the music and uploaded the video.

Thursday, June 26, 2008

As readers of this blog already know, PyPy development has
recently focused on getting the code base to a more usable state. One
of the most important parts of this work was creating an
implementation of the ctypes module for PyPy, which
provides a realistic way to interface with external libraries. The
module is now fairly complete (if somewhat slow), and has generated a
great deal of community interest. One of the main reasons this work
progressed so well was that we received funding from Google's Open
Source Programs Office. This is
really fantastic for us, and we cannot thank Google and Guido enough for helping PyPy progress
more rapidly than we could have with volunteer-only time!

This funding opportunity arose from the PyPy US road trip at the end
of last year, which included a visit to Google. You
can check out the video
of the talk we gave during our visit. We wrapped up our day with
discussions about the possibility of Google funding some PyPy work and
soon after a we were at work on the proposal for improvements we'd
submitted.

One nice side-effect of the funding is indeed that we can use some of
the money for funding travels of contributors to our sprint meetings.
The next scheduled Google funding proposal also aims at making our
Python interpreter more usable and compliant with CPython. This will be done by trying to
fully run Django on top of PyPy. With
more efforts like this one we're hoping that PyPy can start to be used
as a CPython replacement before the end of 2008.

Many thanks to the teams at merlinux and Open End for making this development possible, including
Carl Friedrich Bolz, Antonio Cuni, Holger Krekel, Maciek Fijalkowski
at merlinux, Samuele Pedroni and yours truly at Open End.

We always love to hear feedback from the community, and you can get
the latest word on our development and let us know your thoughts here in the comments.

Bea Düring, Open End AB

PS: Thanks Carl Friedrich Bolz for drafting this post.

As readers of this blog already know, PyPy development has
recently focused on getting the code base to a more usable state. One
of the most important parts of this work was creating an
implementation of the ctypes module for PyPy, which
provides a realistic way to interface with external libraries. The
module is now fairly complete (if somewhat slow), and has generated a
great deal of community interest. One of the main reasons this work
progressed so well was that we received funding from Google's Open
Source Programs Office. This is
really fantastic for us, and we cannot thank Google and Guido enough for helping PyPy progress
more rapidly than we could have with volunteer-only time!

This funding opportunity arose from the PyPy US road trip at the end
of last year, which included a visit to Google. You
can check out the video
of the talk we gave during our visit. We wrapped up our day with
discussions about the possibility of Google funding some PyPy work and
soon after a we were at work on the proposal for improvements we'd
submitted.

One nice side-effect of the funding is indeed that we can use some of
the money for funding travels of contributors to our sprint meetings.
The next scheduled Google funding proposal also aims at making our
Python interpreter more usable and compliant with CPython. This will be done by trying to
fully run Django on top of PyPy. With
more efforts like this one we're hoping that PyPy can start to be used
as a CPython replacement before the end of 2008.

Many thanks to the teams at merlinux and Open End for making this development possible, including
Carl Friedrich Bolz, Antonio Cuni, Holger Krekel, Maciek Fijalkowski
at merlinux, Samuele Pedroni and yours truly at Open End.

We always love to hear feedback from the community, and you can get
the latest word on our development and let us know your thoughts here in the comments.

Sunday, June 22, 2008

When hacking on PyPy, I spend a lot of time inside pdb; thus, I tried
to create a more comfortable environment where I can pass my nights
:-).

As a result, I wrote two modules:

pdb.py, which extends the default behaviour of pdb, by adding
some commands and some fancy features such as syntax highlight and
powerful tab completion; pdb.py is meant to be placed somewhere in
your PYTHONPATH, in order to override the default version of pdb.py
shipped with the stdlib;

rlcompleter_ng.py, whose most important feature is the ability
to show coloured completions depending on the type of the objects.

To find more informations about those modules and how to install them,
have a look at their docstrings.

It's important to underline that these modules are not PyPy specific,
and they work perfectly also on top of CPython.

When hacking on PyPy, I spend a lot of time inside pdb; thus, I tried
to create a more comfortable environment where I can pass my nights
:-).

As a result, I wrote two modules:

pdb.py, which extends the default behaviour of pdb, by adding
some commands and some fancy features such as syntax highlight and
powerful tab completion; pdb.py is meant to be placed somewhere in
your PYTHONPATH, in order to override the default version of pdb.py
shipped with the stdlib;

rlcompleter_ng.py, whose most important feature is the ability
to show coloured completions depending on the type of the objects.

To find more informations about those modules and how to install them,
have a look at their docstrings.

It's important to underline that these modules are not PyPy specific,
and they work perfectly also on top of CPython.

Friday, June 20, 2008

Another episode of the "Running Real Application of top of PyPy" series:

Today's topic: Divmod's Nevow. Nevow (pronounced as the French "nouveau", or "noo-voh") is a web application construction kit written in Python. Which means it's just another web framework, but this time built on top of Twisted.
While, due to some small problems we're not yet able to pass full Twisted test suite on top of pypy-c, Nevow seems to be simple enough to work perfectly (959 out of 960 unit tests passing, with the last one recognized as pointless and about to be deleted). Also, thanks to
exarkun, Nevow now no longer relies on ugly details like refcounting.

Of course, obligatory to the series, screenshot:
This is Nevow's own test suite.

Cheers,
fijal

Another episode of the "Running Real Application of top of PyPy" series:

Today's topic: Divmod's Nevow. Nevow (pronounced as the French "nouveau", or "noo-voh") is a web application construction kit written in Python. Which means it's just another web framework, but this time built on top of Twisted.
While, due to some small problems we're not yet able to pass full Twisted test suite on top of pypy-c, Nevow seems to be simple enough to work perfectly (959 out of 960 unit tests passing, with the last one recognized as pointless and about to be deleted). Also, thanks to
exarkun, Nevow now no longer relies on ugly details like refcounting.

Sunday, June 15, 2008

During the Berlin Sprint Holger was interviewed by Tim Pritlove for Tim's
Podcast "Chaosradio Express". The whole thing is in German, so only
interesting to German-speakers. The PyPy episode can be found here. The
interview is touching on a lot of topics, starting with a fairly general intro
about what Python is and why it is interesting and then moving to explaining and
discussing PyPy. The bit about PyPy starts after about 45 minutes. There is also
a comment page about the episode.

During the Berlin Sprint Holger was interviewed by Tim Pritlove for Tim's
Podcast "Chaosradio Express". The whole thing is in German, so only
interesting to German-speakers. The PyPy episode can be found here. The
interview is touching on a lot of topics, starting with a fairly general intro
about what Python is and why it is interesting and then moving to explaining and
discussing PyPy. The bit about PyPy starts after about 45 minutes. There is also
a comment page about the episode.

and run some example application. Here is the obligatory screenshot (of course
it might be fake, as usual with screenshots). Note: I broke application on purpose to showcase cool debugger, default screen is just boring:
Please note that we run example application without DB access, since
we need some more work to get SQLAlchemy run on top of pypy-c together with
pysqlite-ctypes. Just one example of an obscure details that sqlalchemy is
relying on in the test suite:

class A(object):
locals()[42] = 98

Update:This is only about new-style classes.

This works on CPython and doesn't on PyPy.

Cheers,
fijal

The next episode of the "Running Real Applications on Top of PyPy" series:

Yesterday, we spend some time with Philip Jenvey on tweaking Pylons and PyPy to cooperate with each other. While doing this we found some pretty obscure details, but in general things went well.

After resolving some issues, we can now run all (72) Pylons tests on
top of pypy-c compiled with the following command:

and run some example application. Here is the obligatory screenshot (of course
it might be fake, as usual with screenshots). Note: I broke application on purpose to showcase cool debugger, default screen is just boring:
Please note that we run example application without DB access, since
we need some more work to get SQLAlchemy run on top of pypy-c together with
pysqlite-ctypes. Just one example of an obscure details that sqlalchemy is
relying on in the test suite:

List comprehensions are a nice feature in Python. They are, however, just
syntactic sugar for for loops. E.g. the following list comprehension:

def f(l):
return [i ** 2 for i in l if i % 3 == 0]

is sugar for the following for loop:

def f(l):
result = []
for i in l:
if i % 3 == 0:
result.append(i ** 2)
return result

The interesting bit about this is that list comprehensions are actually
implemented in almost exactly this way. If one disassembles the two functions
above one gets sort of similar bytecode for both (apart from some details, like
the fact that the append in the list comprehension is done with a special
LIST_APPEND bytecode).

Now, when doing this sort of expansion there are some classical problems: what
name should the intermediate list get that is being built? (I said classical
because this is indeed one of the problems of many macro systems). What CPython
does is give the list the name _[1] (and _[2]... with nested list
comprehensions). You can observe this behaviour with the following code:

Now to the real reason why I am writing this blog post. PyPy's Python
interpreter implements list comprehensions in more or less exactly the same way,
with on tiny difference: the name of the variable:

Now, that shouldn't really matter for anybody, should it? Turns out it does. The
following way too clever code is apparently used a lot:

__all__ = [__name for __name in locals().keys() if not __name.startswith('_') '
or __name == '_']

In PyPy this will give you a "$list0" in __all__, which will prevent the
import of that module :-(. I guess I need to change the name to match CPython's.

Lesson learned: no detail is obscure enough to not have some code depending
on it. Mostly problems on this level of obscurity are the things we are fixing
in PyPy at the moment.

List comprehensions are a nice feature in Python. They are, however, just
syntactic sugar for for loops. E.g. the following list comprehension:

def f(l):
return [i ** 2 for i in l if i % 3 == 0]

is sugar for the following for loop:

def f(l):
result = []
for i in l:
if i % 3 == 0:
result.append(i ** 2)
return result

The interesting bit about this is that list comprehensions are actually
implemented in almost exactly this way. If one disassembles the two functions
above one gets sort of similar bytecode for both (apart from some details, like
the fact that the append in the list comprehension is done with a special
LIST_APPEND bytecode).

Now, when doing this sort of expansion there are some classical problems: what
name should the intermediate list get that is being built? (I said classical
because this is indeed one of the problems of many macro systems). What CPython
does is give the list the name _[1] (and _[2]... with nested list
comprehensions). You can observe this behaviour with the following code:

Now to the real reason why I am writing this blog post. PyPy's Python
interpreter implements list comprehensions in more or less exactly the same way,
with on tiny difference: the name of the variable:

Monday, June 9, 2008

As PyPy is getting more and more usable, we need better tools to use to work on certain applications running on top of PyPy. Out of this interest, I spent some time implementing the _lsprof module, which is a part of the standard library since Python2.5. It is necessary for the cProfile module, which can profile Python programs with high accuracy and a lot less overhead than the older, pure-python profile module. Together with the excellent
lsprofcalltree script, you can display this data using kcachegrind, which gives you great visualization possibilities for your profile data.

Cheers,
fijal

As PyPy is getting more and more usable, we need better tools to use to work on certain applications running on top of PyPy. Out of this interest, I spent some time implementing the _lsprof module, which is a part of the standard library since Python2.5. It is necessary for the cProfile module, which can profile Python programs with high accuracy and a lot less overhead than the older, pure-python profile module. Together with the excellent
lsprofcalltree script, you can display this data using kcachegrind, which gives you great visualization possibilities for your profile data.