Friday, April 9, 2010

PyPy is now able to load
and run CPython extension modules (i.e. .pyd and .so files) natively by using the new CPyExt
subsystem.
Unlike the solution presented in another blog post (where extension modules like
numpy etc. were run on CPython and proxied through TCP), this solution does not require
a running CPython anymore. We do not achieve full binary compatiblity
yet (like Ironclad), but recompiling the extension is generally enough.

The only prerequisite is that the necessary functions of the C API of CPython are already
implemented in PyPy. If you are a user or an author of a module and miss certain functions
in PyPy, we invite you to implement them. Up until now, a lot of people (including a lot of
new committers) have stepped up and implemented a few functions to get their favorite module
running. See the end of this post for a list of names.

Regarding speed, we tried the following: even though there is a bit of overhead when running
these modules, we could run the regular expression engine of CPython (_sre.so) and execute
the spambayes benchmark of the Unladen Swallow benchmark suite (cf. speed.pypy.org) and
experience a speedup:
It became two times faster on pypy-c than with the built-in regular
expression engine of PyPy. From Amdahl's Law it follows that the _sre.so must run several
times faster than the built-in engine.

Currently pursued modules include PIL and others. Distutils support is nearly ready.
If you would like to participate or want information on how to use this new feature, come and join
our IRC channel #pypy on freenode.

Amaury Forgeot d'Arc and Alexander Schremmer

Further CPyExt Contributors:

Alex Gaynor

Benjamin Peterson

Jean-Paul Calderone

Maciej Fijalkowski

Jan de Mooij

Lucian Branescu Mihaila

Andreas Stührk

Zooko Wilcox-O Hearn

PyPy is now able to load
and run CPython extension modules (i.e. .pyd and .so files) natively by using the new CPyExt
subsystem.
Unlike the solution presented in another blog post (where extension modules like
numpy etc. were run on CPython and proxied through TCP), this solution does not require
a running CPython anymore. We do not achieve full binary compatiblity
yet (like Ironclad), but recompiling the extension is generally enough.

The only prerequisite is that the necessary functions of the C API of CPython are already
implemented in PyPy. If you are a user or an author of a module and miss certain functions
in PyPy, we invite you to implement them. Up until now, a lot of people (including a lot of
new committers) have stepped up and implemented a few functions to get their favorite module
running. See the end of this post for a list of names.

Regarding speed, we tried the following: even though there is a bit of overhead when running
these modules, we could run the regular expression engine of CPython (_sre.so) and execute
the spambayes benchmark of the Unladen Swallow benchmark suite (cf. speed.pypy.org) and
experience a speedup:
It became two times faster on pypy-c than with the built-in regular
expression engine of PyPy. From Amdahl's Law it follows that the _sre.so must run several
times faster than the built-in engine.

Currently pursued modules include PIL and others. Distutils support is nearly ready.
If you would like to participate or want information on how to use this new feature, come and join
our IRC channel #pypy on freenode.

@Anonymous I don't think anyone has started trying to test numpy or scipy yet, however fundamentally it's just a matter of implementing missing functions. For me starting on numpy in my next goal, after PIL.

This is very good news. JIT compiled Python can never fully replace extension modules (existing ones, or the need for new ones), so extension support should be a high priority for the PyPy project. I hope you can eventually get rid of that overhead.

wow, just coming back from vacation and have to say: great news and great work, guys! Historically speaking, this is the third approach to the "ext" module issue and if the promise works out as it seems to do, probably the last as far as leveraging cpy ext modules are concerned! I wonder - does it still make sense to have "native" extension modules, the ones we currently have as "mixed" modules?

Let me ask for a bit more detail. I depend on a module (http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html), that is currently unsupported, as far as I know. I'd really like to port it to pypy. Where to start?

Is it possible that the module runs without modifications? Can I check this simply by building a pypy-trunk, and write "import cmaxent"?

@Anonymous: No it's not in the PPA. We provide only the latest release (1.2 in this case) and weekly builds for trunk (which haven't been announced on the blog yet). CPython extension modules live in their own branch. The branch will be merged into the trunk sooner or later.

PS. The weekly builds are available here at https://launchpad.net/~pypy

To test your module, you need to compile and load it. For compilation, you can use a compiled pypy binary and run setup.py build_ext with your setup file. For hints about manual compilation and module loading, visit our IRC channel.

MixedModules allow you to implement modules in RPython (using the PyPy API) and Python at the same time. CPyExt is for modules written in C using the CPython API. So both solutions are for different needs.

the reason i ask is blender. there were some security concerns among blender developers recently. blender uses embedded cpython for scripting. normal scripts (like exporters) which have to be evoked by the user aren't that much of a problem but blender also supports python expressions for animation parameters. without a sandbox downloading and opening .blend files from unknown sources is kind of risky since a malicious python expression theoretically could wipe your harddisk.

pypy with its support for a sandbox could be a very good replacement for cpython in blender (also because of its speed) but if it isn't compatible with the cpython api then a swap probably would be way too much effort.

@alexander True, mixed modules are for rpython-implemented modules and need to be translated together with the pypy interpreter and could make use of the JIT. My question more aimed at the issue for which use cases / goals which kind of extension module mechanism makes sense. IOW, some discussion and web page regarding rpy-ext/ctypes/cpy-ext would make sense, i guess. Or is it somewhere already?

@holgersome discussion and web page regarding rpy-ext/ctypes/cpy-ext would make sense

Yes, someone could write down guidelines. Using the C API runs your module fast in case of CPython. A bit slower on ironpython and PyPy.

Using ctypes gives your module access to these three interpreters as well, but it will run slower. One advantage here is that you do not need to write C to create a wrapper around a library. If your objective is speed and lower memory usage, then CTypes does not work either.

Mixed modules make your module work only on PyPy and provide a decent speed and a mixture of a decent (Python) and a bit harder to grasp (RPython) programming language. This only makes sense as a platform if your users are also using PyPy.