Thursday, February 28, 2013

From a software engineering perspective, 10 years is indistinguishable
from infinity, so I don't care what happens 10 years from now -- as
long as you don't blame me. :-)

- Guido van Rossum, Python creator.

10 years is indeed a long time. PyPy was created approximately 10 years ago,
with the exact date being lost in the annals of the version control system.
We've come a long way during those 10 years, from a "minimal Python" that
was supposed to serve mostly as an educational tool, through to a vehicle for
academic research to a high performance VM for Python and beyond.

Some facts from the PyPy timeline:

In 2007, at the end of the EU funding period, we promised the JIT was just around the corner.
It turned out we misjudged it pretty badly -- the first usable PyPy was released in 2010.

At some point we decided to have a JavaScript backend so one could compile RPython programs
to JavaScript and run them in a browser. Turned out it was a horrible idea.

Another option we tried was using RPython to write CPython C extensions. Again, it turned out RPython
is a bad language and instead we made a fast JIT, so you don't have to write C extensions.

We made N attempts to use LLVM. Seriously, N is 4 or 5. But we haven't fully given up yet :-)
They all run into issues one way or another.

We were huge fans of ctypes at the beginning. Up to the point where we tried to make
a restricted subset with static types, called rctypes for RPython. Turned out to be horrible.
Twice.

We were very hopeful about creating a JIT generator from the beginning. But the first one failed miserably,
generating too much assembler. The second failed too. The third first burned down and then failed.
However, we managed to release a working JIT in 2010, against all odds.

Martijn Faassen used to ask us "how fast is PyPy" so we decided to name an option enabling all
optimizations "--faassen". Then "--no-faassen" was naturally added too. Later we
decided to grow up and renamed it to "-O2", and now "-Ojit".

The first time the Python interpreter successfully compiled to C, it segfaulted because the code generator used signed chars instead of unsigned chars...

To make it more likely to be accepted, the proposal for the EU project contained basically every feature under the sun a language could have. This proved to be annoying, because we had to actually implement all that stuff. Then we had to do a cleanup sprint where we deleted 30% of codebase and 70% of features.

At one sprint someone proposed a new software development methodology: 'Terminology-Driven Programming' means to pick a fancy name, then discuss what it could mean, then implement it. Examples: timeshifter, rainbow interpreter, meta-space bubble, hint annotations (all but one of these really existed).

There is a conspiracy theory that the reason why translation is so slow is because time is stored away during it, which is later retrieved when an actual program runs to make them appear faster

Overall, it was a really long road. However, 10 years later we are in
good shape. A quick look on the immediate future: we are approaching
PyPy 2.0 with stackless+JIT and cffi support,
the support for Python 3 is taking shape, non-standard
extensions like STM are slowly getting ready (more soon), and there are
several non-Python interpreters around the corner (Hippy, Topaz and more).

Cheers,
fijal, arigo, hodgestar, cfbolz and the entire pypy team.

From a software engineering perspective, 10 years is indistinguishable
from infinity, so I don't care what happens 10 years from now -- as
long as you don't blame me. :-)

- Guido van Rossum, Python creator.

10 years is indeed a long time. PyPy was created approximately 10 years ago,
with the exact date being lost in the annals of the version control system.
We've come a long way during those 10 years, from a "minimal Python" that
was supposed to serve mostly as an educational tool, through to a vehicle for
academic research to a high performance VM for Python and beyond.

Some facts from the PyPy timeline:

In 2007, at the end of the EU funding period, we promised the JIT was just around the corner.
It turned out we misjudged it pretty badly -- the first usable PyPy was released in 2010.

At some point we decided to have a JavaScript backend so one could compile RPython programs
to JavaScript and run them in a browser. Turned out it was a horrible idea.

Another option we tried was using RPython to write CPython C extensions. Again, it turned out RPython
is a bad language and instead we made a fast JIT, so you don't have to write C extensions.

We made N attempts to use LLVM. Seriously, N is 4 or 5. But we haven't fully given up yet :-)
They all run into issues one way or another.

We were huge fans of ctypes at the beginning. Up to the point where we tried to make
a restricted subset with static types, called rctypes for RPython. Turned out to be horrible.
Twice.

We were very hopeful about creating a JIT generator from the beginning. But the first one failed miserably,
generating too much assembler. The second failed too. The third first burned down and then failed.
However, we managed to release a working JIT in 2010, against all odds.

Martijn Faassen used to ask us "how fast is PyPy" so we decided to name an option enabling all
optimizations "--faassen". Then "--no-faassen" was naturally added too. Later we
decided to grow up and renamed it to "-O2", and now "-Ojit".

The first time the Python interpreter successfully compiled to C, it segfaulted because the code generator used signed chars instead of unsigned chars...

To make it more likely to be accepted, the proposal for the EU project contained basically every feature under the sun a language could have. This proved to be annoying, because we had to actually implement all that stuff. Then we had to do a cleanup sprint where we deleted 30% of codebase and 70% of features.

At one sprint someone proposed a new software development methodology: 'Terminology-Driven Programming' means to pick a fancy name, then discuss what it could mean, then implement it. Examples: timeshifter, rainbow interpreter, meta-space bubble, hint annotations (all but one of these really existed).

There is a conspiracy theory that the reason why translation is so slow is because time is stored away during it, which is later retrieved when an actual program runs to make them appear faster

Overall, it was a really long road. However, 10 years later we are in
good shape. A quick look on the immediate future: we are approaching
PyPy 2.0 with stackless+JIT and cffi support,
the support for Python 3 is taking shape, non-standard
extensions like STM are slowly getting ready (more soon), and there are
several non-Python interpreters around the corner (Hippy, Topaz and more).

The cppyy module
provides C++ bindings for PyPy by using the reflection information extracted
from C++ header files by means of the
Reflex package.
In order to support C++11, the goal is to move away from Reflex and instead use
cling, an interactive
C++ interpreter, as the backend.
Cling is based on llvm's
clang.
The use of a real compiler under the hood has the advantage that it is now
possible to cover every conceivable corner case.
The disadvantage, however, is that every corner case actually has to be
covered.
Life is somewhat easier when calls come in from the python interpreter, as
those calls have already been vetted for syntax errors and all lookups are
well scoped.
Furthermore, the real hard work of getting sane responses from and for C++
in an interactive environment is done in cling, not in the bindings.
Nevertheless, it is proving a long road (but for that matter clang does not
support all of C++11 yet), so here's a quick status update showing that good
progress is being made.

The following example is on CPython, not PyPy, but moving a third
(after Reflex and
CINT) backend into place
underneath cppyy is straightforward compared to developing the backend
in the first place.
Take this snippet of C++11 code
(cpp11.C):

As a practical matter, most usage of new C++11 features will live in
implementations, not in declarations, and are thus never seen by the bindings.
The above example is therefore somewhat contrived, but it will serve to show
that these new declarations actually work.
The new features used here are
constexpr,
auto, and
decltype.
Here is how you could use these from CPython, using the
PyROOT
package, which has more than a passing resemblance to cppyy, as one is based
on the other:

which, when entered into a file
(cpp11.py) and executed,
prints the expected results:

$ python cpp11.py
N = 5
1+1 = 2

In the example, the C++ code is compiled on-the-fly, rather than first generating
a dictionary as is needed with Reflex.
A deployment model that utilizes stored pre-compiled information is foreseen
to work with larger projects, which may have to pull in headers from many places.

Work is going to continue first on C++03 on cling with CPython (about 85% of
unit tests currently pass), with a bit of work on C++11 support on the side.
Once fully in place, it can be brought into a new backend for cppyy, after
which the remaining parts of C++11 can be fleshed out for both interpreters.

Cheers,
Wim Lavrijsen

The cppyy module
provides C++ bindings for PyPy by using the reflection information extracted
from C++ header files by means of the
Reflex package.
In order to support C++11, the goal is to move away from Reflex and instead use
cling, an interactive
C++ interpreter, as the backend.
Cling is based on llvm's
clang.
The use of a real compiler under the hood has the advantage that it is now
possible to cover every conceivable corner case.
The disadvantage, however, is that every corner case actually has to be
covered.
Life is somewhat easier when calls come in from the python interpreter, as
those calls have already been vetted for syntax errors and all lookups are
well scoped.
Furthermore, the real hard work of getting sane responses from and for C++
in an interactive environment is done in cling, not in the bindings.
Nevertheless, it is proving a long road (but for that matter clang does not
support all of C++11 yet), so here's a quick status update showing that good
progress is being made.

The following example is on CPython, not PyPy, but moving a third
(after Reflex and
CINT) backend into place
underneath cppyy is straightforward compared to developing the backend
in the first place.
Take this snippet of C++11 code
(cpp11.C):

As a practical matter, most usage of new C++11 features will live in
implementations, not in declarations, and are thus never seen by the bindings.
The above example is therefore somewhat contrived, but it will serve to show
that these new declarations actually work.
The new features used here are
constexpr,
auto, and
decltype.
Here is how you could use these from CPython, using the
PyROOT
package, which has more than a passing resemblance to cppyy, as one is based
on the other:

which, when entered into a file
(cpp11.py) and executed,
prints the expected results:

$ python cpp11.py
N = 5
1+1 = 2

In the example, the C++ code is compiled on-the-fly, rather than first generating
a dictionary as is needed with Reflex.
A deployment model that utilizes stored pre-compiled information is foreseen
to work with larger projects, which may have to pull in headers from many places.

Work is going to continue first on C++03 on cling with CPython (about 85% of
unit tests currently pass), with a bit of work on C++11 support on the side.
Once fully in place, it can be brought into a new backend for cppyy, after
which the remaining parts of C++11 can be fleshed out for both interpreters.

Friday, February 22, 2013

We (Armin Rigo and Maciej Fijalkowski) are visiting San Francisco/Silicon Valley
for PyCon and beyond. Alex Gaynor, another core PyPy dev is living there
permanently. My visiting dates are 12-28 of March, Armin's 11-21st.
If you want us to give a talk at your company or simply catch up with us
for a dinner
please get in touch. Write to pypy-dev@python.org, if you want this publically
known or simply send me a mail at fijall@gmail.com if you don't want it public.

Cheers,
fijal

Hello everyone.

We (Armin Rigo and Maciej Fijalkowski) are visiting San Francisco/Silicon Valley
for PyCon and beyond. Alex Gaynor, another core PyPy dev is living there
permanently. My visiting dates are 12-28 of March, Armin's 11-21st.
If you want us to give a talk at your company or simply catch up with us
for a dinner
please get in touch. Write to pypy-dev@python.org, if you want this publically
known or simply send me a mail at fijall@gmail.com if you don't want it public.

Tuesday, February 12, 2013

Last week, Alex Gaynor announced the first public release of
Topaz,
a Ruby interpreter written in RPython. This is the culmination of a
part-time effort over the past 10 months to provide a Ruby interpreter
that implements enough interesting constructs in Ruby to show that the
RPython toolchain can produce a Ruby implementation fast enough to
beat what is out there.

Disclaimer

Obviously the implementation is very incomplete currently in terms of
available standard library. We are working on getting it useable. If
you want to try it, grab a
nightly build.

We have run some benchmarks from the
Ruby benchmark suite
and the
metatracing VMs experiment. The
preliminary results are promising, but at this point we are missing so
many method implementations that most benchmarks won't run yet. So instead of
performance, I'm going to talk about the high-level structure of the
implementation.

Architecture

Topaz interprets a custom bytecode set. The basics are similar to
Smalltalk VMs, with bytecodes for loading and storing locals and
instance variables, sending messages, and stack management. Some
syntactical features of Ruby, such as defining classes and modules,
literal regular expressions, hashes, ranges, etc also have their own
bytecodes. The third kind of bytecodes are for control flow constructs
in Ruby, such as loops, exception handling, break, continue, etc.

In trying to get from Ruby source code to bytecode, we found that the
easiest way to support all of the Ruby syntax is to write a custom
lexer and use an RPython port of PLY
(fittingly called RPly) to create the
parser from the Ruby yacc grammar.

The Topaz interpreter uses an ObjectSpace (similar to how PyPy does
it), to interact with the Ruby world. The object space contains all
the logic for wrapping and interacting with Ruby objects from the
VM. It's __init__ method sets up the core classes, initial globals,
and creates the main thread (the only one right now, as we do not have
threading, yet).

Classes are mostly written in Python. We use ClassDef objects to
define the Ruby hierarchy and attach RPython methods to Ruby via
ClassDef decorators. These two points warrant a little explanation.

Hierarchies

All Ruby classes ultimately inherit from BasicObject. However, most
objects are below Object (which is a direct subclass of
BasicObject). This includes objects of type Fixnum, Float,
Class, and Module, which may not need all of the facilities of
full objects most of the time.

Most VMs treat such objects specially, using tagged pointers to
represent Fixnums, for example. Other VMs (for example from the
SOM Family)
don't. In the latter case, the implementation hierarchy matches the
language hierarchy, which means that objects like Fixnum share a
representation with all other objects (e.g. they have class pointers
and some kind of instance variable storage).

In Topaz, implementation hierarchy and language hierarchy are
separate. The first is defined through the Python inheritance. The
other is defined through the ClassDef for each Python class, where the
appropriate Ruby superclass is chosen. The diagram below shows how the
implementation class W_FixnumObject inherits directly from
W_RootObject. Note that W_RootObject doesn't have any attrs,
specifically no storage for instance variables and no map (for
determining the class - we'll get to that). These attributes are
instead defined on W_Object, which is what most other implementation
classes inherit from. However, on the Ruby side, Fixnum correctly
inherits (via Numeric and Integer) from Object.

This simple structural optimization gives a huge speed boost, but
there are VMs out there that do not have it and suffer performance
hits for it.

Decorators

Ruby methods can have symbols in its names that are not allowed as
part of Python method names, for example !, ?, or =, so we
cannot simply define Python methods and expose them to Ruby by the
same name.

For defining the Ruby method name of a function, as well as argument
number checking, Ruby type coercion and unwrapping of Ruby objects to
their Python equivalents, we use decorators defined on ClassDef. When
the ObjectSpace initializes, it builds all Ruby classes from their
respective ClassDef objects. For each method in an implementation
class that has a ClassDef decorator, a wrapper method is generated and
exposed to Ruby. These wrappers define the name of the Ruby method,
coerce Ruby arguments, and unwrap them for the Python method.

This defines the method * on the Ruby String class. When this is
called, the first argument is converted into a Ruby Fixnum object
using the appropriate coercion method, and then unwrapped into a plain
Python int and passed as argument to method_times. The wrapper
method also supplies the space argument.

Object Structure

Ruby objects have dynamically defined instance variables and may
change their class at any time in the program (a concept called
singleton class
in Ruby - it allows each object to have unique behaviour). To still
efficiently access instance variables, you want to avoid dictionary
lookups and let the JIT know about objects of the same class that have
the same instance variables. Topaz, like PyPy (which got it from
Self), implements instances using maps, which transforms dictionary
lookups into array accesses. See the
blog post
for the details.

Last week, Alex Gaynor announced the first public release of
Topaz,
a Ruby interpreter written in RPython. This is the culmination of a
part-time effort over the past 10 months to provide a Ruby interpreter
that implements enough interesting constructs in Ruby to show that the
RPython toolchain can produce a Ruby implementation fast enough to
beat what is out there.

Disclaimer

Obviously the implementation is very incomplete currently in terms of
available standard library. We are working on getting it useable. If
you want to try it, grab a
nightly build.

We have run some benchmarks from the
Ruby benchmark suite
and the
metatracing VMs experiment. The
preliminary results are promising, but at this point we are missing so
many method implementations that most benchmarks won't run yet. So instead of
performance, I'm going to talk about the high-level structure of the
implementation.

Architecture

Topaz interprets a custom bytecode set. The basics are similar to
Smalltalk VMs, with bytecodes for loading and storing locals and
instance variables, sending messages, and stack management. Some
syntactical features of Ruby, such as defining classes and modules,
literal regular expressions, hashes, ranges, etc also have their own
bytecodes. The third kind of bytecodes are for control flow constructs
in Ruby, such as loops, exception handling, break, continue, etc.

In trying to get from Ruby source code to bytecode, we found that the
easiest way to support all of the Ruby syntax is to write a custom
lexer and use an RPython port of PLY
(fittingly called RPly) to create the
parser from the Ruby yacc grammar.

The Topaz interpreter uses an ObjectSpace (similar to how PyPy does
it), to interact with the Ruby world. The object space contains all
the logic for wrapping and interacting with Ruby objects from the
VM. It's __init__ method sets up the core classes, initial globals,
and creates the main thread (the only one right now, as we do not have
threading, yet).

Classes are mostly written in Python. We use ClassDef objects to
define the Ruby hierarchy and attach RPython methods to Ruby via
ClassDef decorators. These two points warrant a little explanation.

Hierarchies

All Ruby classes ultimately inherit from BasicObject. However, most
objects are below Object (which is a direct subclass of
BasicObject). This includes objects of type Fixnum, Float,
Class, and Module, which may not need all of the facilities of
full objects most of the time.

Most VMs treat such objects specially, using tagged pointers to
represent Fixnums, for example. Other VMs (for example from the
SOM Family)
don't. In the latter case, the implementation hierarchy matches the
language hierarchy, which means that objects like Fixnum share a
representation with all other objects (e.g. they have class pointers
and some kind of instance variable storage).

In Topaz, implementation hierarchy and language hierarchy are
separate. The first is defined through the Python inheritance. The
other is defined through the ClassDef for each Python class, where the
appropriate Ruby superclass is chosen. The diagram below shows how the
implementation class W_FixnumObject inherits directly from
W_RootObject. Note that W_RootObject doesn't have any attrs,
specifically no storage for instance variables and no map (for
determining the class - we'll get to that). These attributes are
instead defined on W_Object, which is what most other implementation
classes inherit from. However, on the Ruby side, Fixnum correctly
inherits (via Numeric and Integer) from Object.

This simple structural optimization gives a huge speed boost, but
there are VMs out there that do not have it and suffer performance
hits for it.

Decorators

Ruby methods can have symbols in its names that are not allowed as
part of Python method names, for example !, ?, or =, so we
cannot simply define Python methods and expose them to Ruby by the
same name.

For defining the Ruby method name of a function, as well as argument
number checking, Ruby type coercion and unwrapping of Ruby objects to
their Python equivalents, we use decorators defined on ClassDef. When
the ObjectSpace initializes, it builds all Ruby classes from their
respective ClassDef objects. For each method in an implementation
class that has a ClassDef decorator, a wrapper method is generated and
exposed to Ruby. These wrappers define the name of the Ruby method,
coerce Ruby arguments, and unwrap them for the Python method.

This defines the method * on the Ruby String class. When this is
called, the first argument is converted into a Ruby Fixnum object
using the appropriate coercion method, and then unwrapped into a plain
Python int and passed as argument to method_times. The wrapper
method also supplies the space argument.

Object Structure

Ruby objects have dynamically defined instance variables and may
change their class at any time in the program (a concept called
singleton class
in Ruby - it allows each object to have unique behaviour). To still
efficiently access instance variables, you want to avoid dictionary
lookups and let the JIT know about objects of the same class that have
the same instance variables. Topaz, like PyPy (which got it from
Self), implements instances using maps, which transforms dictionary
lookups into array accesses. See the
blog post
for the details.

Friday, February 8, 2013

A short notice to tell you that CFFI 0.5 was released. This
contains a number of small improvements from 0.4, but seems to otherwise
be quite stable since a couple of months --- no change since January 10,
apart from the usual last-minute fixes for Python 3 and for Windows.

Have fun!

Armin

Hi all,

A short notice to tell you that CFFI 0.5 was released. This
contains a number of small improvements from 0.4, but seems to otherwise
be quite stable since a couple of months --- no change since January 10,
apart from the usual last-minute fixes for Python 3 and for Windows.