Thursday, May 5, 2011

NumPy Follow up

Hi everyone. Since yesterday's blog post we got a ton of feedback, so we want
to clarify a few things, as well as share some of the progress we've made, in
only the 24 hours since the post.

Reusing the original NumPy

First, a lot of people have asked why we cannot just reuse the original NumPy
through cpyext, our CPython C-API compatibility layer. We believe this is
not the best approach, for a few reasons:

cpyext is slow, and always will be slow. It has to emulate far too many
details of the CPython object model that don't exist on PyPy (e.g.,
reference counting). Since people are using NumPy primarily for speed this
would mean that even if we could have a working NumPy, no one would want to
use it. Also, as soon as the execution crosses the cpyext boundary, it
becomes invisible to the JIT, which means the JIT has to assume the worst
and deoptimize stuff away.

NumPy uses many obscure documented and undocumented details of the CPython
C-API. Emulating these is often difficult or impossible (e.g. we can't fix
accessing a struct field, as there's no function call for us to intercept).

It's not much fun. Frankly, working on cpyext, debugging the crashes,
and everything else that goes with it is not terribly fun, especially when
you know that the end result will be slow. We've demonstrated we can build
a much faster NumPy, in a way that's more fun, and given that the people
working on this are volunteers, it's important to keep us motivated.

Finally, we are not proposing to rewrite the entirety of NumPy or, god
forbid, BLAST, or any of the low level stuff that operates on C-level arrays,
only the parts that interface with Python code directly.

C bindings vs. CPython C-API

There are two issues on C code, one has a very nice story, and the other not so
much. First is the case of arbitrary C-code that isn't Python related, things
like libsqlite, libbz2, or any random C shared library on your system.
PyPy will quite happily call into these, and bindings can be developed either
at the RPython level (using rffi) or in pure Python, using ctypes.
Writing bindings with ctypes has the advantage that they can run on every
alternative Python implementation, such as Jython and IronPython. Moreover,
once we merge the jittypes2 branch ctypes calls will even be smoking
fast.

On the other hand there is the CPython C-extension API. This is a very specific
API which CPython exposes, and PyPy tries to emulate. It will never be fast,
because there is far too much overhead in all the emulation that needs to be
done.

One of the reasons people write C extensions is for speed. Often, with PyPy
you can just forget about C, write everything in pure python and let the JIT to
do its magic.

In case the PyPy JIT alone isn't fast enough, or you just want to
use existing C code then it might make sense to split
your C-extension into 2 parts, one which doesn't touch the CPython C-API and
thus can be loaded with ctypes and called from PyPy, and another which does
the interfacing with Python for CPython (where it will be faster).

There are also libraries written in C to interface with existing C codebases,
but for whom performance is not the largest goal, for these the right solution
is to try using CPyExt, and if it works that's great, but if it fails the
solution will be to rewrite using ctypes, where it will work on all Python
VMs, not just CPython.

And finally there are rare cases where rewriting in RPython makes more sense,
NumPy is one of the few examples of these because we need to be able to give
the JIT hints on how to appropriately vectorize all of the operations on an
array. In general writing in RPython is not necessary for almost any
libraries, NumPy is something of a special case because it is so ubiquitous
that every ounce of speed is valuable, and makes the way people use it leads to
code structure where the JIT benefits enormously from extra hints and the
ability to manipulate memory directly, which is not possible from Python.

Progress

On a more positive note, after we published the last post, several new people
came and contributed improvements to the numpy-exp branch. We would like to
thank all of them:

nightless_night contributed: An implementation of __len__, fixed bounds
checks on __getitem__ and __setitem__.

Those last two were technically an outstanding branch we finally merged, but
hopefully you get the picture. In addition there was some exciting work done by
regular PyPy contributors. I hope it's clear that there's a place to jump in
for people with any level of PyPy familiarity. If you're interested in
contributing please stop by #pypy on irc.freenode.net, the pypy-dev mailing
list, or send us pull requests on bitbucket.

Alex

Hi everyone. Since yesterday's blog post we got a ton of feedback, so we want
to clarify a few things, as well as share some of the progress we've made, in
only the 24 hours since the post.

Reusing the original NumPy

First, a lot of people have asked why we cannot just reuse the original NumPy
through cpyext, our CPython C-API compatibility layer. We believe this is
not the best approach, for a few reasons:

cpyext is slow, and always will be slow. It has to emulate far too many
details of the CPython object model that don't exist on PyPy (e.g.,
reference counting). Since people are using NumPy primarily for speed this
would mean that even if we could have a working NumPy, no one would want to
use it. Also, as soon as the execution crosses the cpyext boundary, it
becomes invisible to the JIT, which means the JIT has to assume the worst
and deoptimize stuff away.

NumPy uses many obscure documented and undocumented details of the CPython
C-API. Emulating these is often difficult or impossible (e.g. we can't fix
accessing a struct field, as there's no function call for us to intercept).

It's not much fun. Frankly, working on cpyext, debugging the crashes,
and everything else that goes with it is not terribly fun, especially when
you know that the end result will be slow. We've demonstrated we can build
a much faster NumPy, in a way that's more fun, and given that the people
working on this are volunteers, it's important to keep us motivated.

Finally, we are not proposing to rewrite the entirety of NumPy or, god
forbid, BLAST, or any of the low level stuff that operates on C-level arrays,
only the parts that interface with Python code directly.

C bindings vs. CPython C-API

There are two issues on C code, one has a very nice story, and the other not so
much. First is the case of arbitrary C-code that isn't Python related, things
like libsqlite, libbz2, or any random C shared library on your system.
PyPy will quite happily call into these, and bindings can be developed either
at the RPython level (using rffi) or in pure Python, using ctypes.
Writing bindings with ctypes has the advantage that they can run on every
alternative Python implementation, such as Jython and IronPython. Moreover,
once we merge the jittypes2 branch ctypes calls will even be smoking
fast.

On the other hand there is the CPython C-extension API. This is a very specific
API which CPython exposes, and PyPy tries to emulate. It will never be fast,
because there is far too much overhead in all the emulation that needs to be
done.

One of the reasons people write C extensions is for speed. Often, with PyPy
you can just forget about C, write everything in pure python and let the JIT to
do its magic.

In case the PyPy JIT alone isn't fast enough, or you just want to
use existing C code then it might make sense to split
your C-extension into 2 parts, one which doesn't touch the CPython C-API and
thus can be loaded with ctypes and called from PyPy, and another which does
the interfacing with Python for CPython (where it will be faster).

There are also libraries written in C to interface with existing C codebases,
but for whom performance is not the largest goal, for these the right solution
is to try using CPyExt, and if it works that's great, but if it fails the
solution will be to rewrite using ctypes, where it will work on all Python
VMs, not just CPython.

And finally there are rare cases where rewriting in RPython makes more sense,
NumPy is one of the few examples of these because we need to be able to give
the JIT hints on how to appropriately vectorize all of the operations on an
array. In general writing in RPython is not necessary for almost any
libraries, NumPy is something of a special case because it is so ubiquitous
that every ounce of speed is valuable, and makes the way people use it leads to
code structure where the JIT benefits enormously from extra hints and the
ability to manipulate memory directly, which is not possible from Python.

Progress

On a more positive note, after we published the last post, several new people
came and contributed improvements to the numpy-exp branch. We would like to
thank all of them:

nightless_night contributed: An implementation of __len__, fixed bounds
checks on __getitem__ and __setitem__.

Those last two were technically an outstanding branch we finally merged, but
hopefully you get the picture. In addition there was some exciting work done by
regular PyPy contributors. I hope it's clear that there's a place to jump in
for people with any level of PyPy familiarity. If you're interested in
contributing please stop by #pypy on irc.freenode.net, the pypy-dev mailing
list, or send us pull requests on bitbucket.

Isn't there another fairly major drawback to implementing in RPython - that you can only use it if it is compiled (translated) at the same time as pypy. So effectively pypy *has* to be distributed with all the RPython extensions you will ever use, or you have to retranslate *everything* whenever you add a new extension.

Developing cross-platform, cross-architecture, stuff with ctypes can also be a lot more painful than writing extensions using the Python C API (and having the compiler make some decisions at compile time rather than having to do it all at runtime).

Most of python-dev's "antipathy" towards using ctypes is focused on using ctypes for stdlib modules, not on general principles. For security, stability, and portability reasons, many platforms need to disable ctypes when they build Python. Consequently, there is a policy that no stdlib module can use ctypes. They are not recommending against using ctypes in general.

thanks for the follow-up. I won't argue with points 1 and 3, but I think 2 can be reasonably addressed: I don't think the usage of internal details is pervasive in the code, and most of it is for historical reasons. We cannot remove them altogether from the numpy headers for backward compatibility reasons, but we can replace most of it inside numpy itself.

I am still a bit confused though: from your description, it seems that you intend to fork numpy to replace some pieces from C to RPython, but if I look at the numpy-ext branch, I see a rewrite of numpy in rpython. Maybe you are talking about another code ?

I think that the most important part of numpy is array operations (indexing, +-*/, broadcasting, etc). So it would be good enough to implement only array class in RPython and call to numpy using ctypes/cpyext for all other stuff. I've read somewhere about the plans to impose separation between numpy and scipy so numpy holds only implementation of fast arrays and scipy will hold all non-trivial operations on them. IMHO such separation will be ideal for pypy too.

I like the idea of reimplementing part of Numpy in pypy to leverage the JIT in pypy. The existence of numexpr demonstrates the deficiency of Numpy as a Python library. A JIT is much more appropriate for what effectively should be a DSL.

But I would recommend something grander, perhaps for the longer term. I think if pypy could produce do for Python what McVM and McJIT propose to do for Matlab, it would be game-changing for Python and pypy. It would make pypy not only competitive with Matlab in ways that Numpy and Scipy are not yet and may never be, but also with F#. The rapid uptake of F# in financial industry in particular, despite the availability of Matlab, showcases the need for a fast prototyping language that does not rely on calling Fortran code for speed. I know I am looking for such language; Numpy and Python simply don't offer enough power and flexibility. I hope I can choose pypy.

I wonder if an RPython/cython backend might be possible. cython is already my favorite way to write CExtensions and it generates code for both python 2.x and 3.x. It would be great if it could be adapted for PyPy extensions.

Thanks a lot for the previous post and the follow up! I really appreciate that you could find time to make a write up on the progress that you made so far on this extremely important feature.

This all sounds very cool, but also to me it seems that it's very important to work with NumPy / SciPy developers, so that the parts that have to be replaced would be isolated and maintained in parallel for RPython and C API, or rewritten in ctypes (not sure if this is even possible). This way this eternal catch-up trap that many seem to be afraid of will not happen.

Also, I wonder in how much money this would actually translate. Maybe Enthought could sponsor some development...

Regarding Cython... I also use it to write trivial extensions to implement computation kernels outside Python in C. It would be great if Cython were able to generate something that would work with PyPy as well...

I don't know why do you decide to use ctypes - in numpy community it is considered as obsolete already for a long time (maybe several years), is not under active development, and now Cython is recommended by default tool for it: