Pyston 0.5 released

Today we are extremely excited to announce the v0.5 release of Pyston, our high performance Python JIT. We’ve been a bit quiet for the past few months, and that’s because we’ve been working on some behind-the-scenes technology that we are finally ready to unveil. It might be a bit less shiny than some other things we could have worked on, but this change makes Pyston much more ready to use.

Pyston is now using reference counting.

Refcounting

Reference counting (“refcounting”), is a form of automatic memory management. It’s usually viewed as slower and less sophisticated than using a tracing garbage collector (a “GC”), the predominant technique in modern languages. All past versions of Pyston contained tracing garbage collectors, and much of our work from 0.4 to 0.5 was tearing it out in favor of refcounting.

Why did we do this? In short, because CPython (the main Python implementation) uses refcounting. We used a GC initially to try to get more performance. But applying a tracing GC to a refcounting C API, such as the one that Python has, is risky and comes with many performance pitfalls. And most challengingly, Pyston wants to support the large amount of code that has been written that relies on the special properties that refcounting provides (predictable immediate destruction). We found that we had to go to greater and greater lengths to support these programs, and there were also cases where we wouldn’t be able to support the applications in their current form.

So we decided to bite the bullet and convert to refcounting, with the goal of getting better application compatibility.

How did we do?

NumPy

We are very happy to announce: we can run NumPy, unmodified.

Specifically: on their latest release (v1.11), we run their entire test suite with one test failure, for which they’ve accepted our patch. For their latest trunk, we have three test failures. We do need to use a modified version of part of their build chain (Cython), and we are currently slower on the test suite than CPython.

Regardless, we are very happy with this result, especially because we will continue to improve both the compatibility and performance.

Other goodies

There are quite a few non-refcounting features that made it into this release as well:

Signal handling

Frame introspection of exited frames

Generator cleanup

Support for more C API functions, such as custom tracebacks

and many more small fixes than we can list here

These are a large part of our progress on NumPy, and they also help us run other tricky libraries such as py.test, lxml, and cffi. We’ve also greatly reduced the number of modifications that we maintain to the Python standard libraries and C extensions. Overall, refcounting was a big investment, but it’s bought us compatibility wins that we would have had a very hard time getting otherwise.

Performance

Unfortunately, since performance wasn’t our goal for this release, we did slide backwards a bit. v0.5 is about 10% slower than v0.4 was, largely due to the change to refcounting. We are okay with the regression since we explicitly focused on compatibility for the last six months, and our refcounting implementation still has many available optimizations.

As a side note, the “conventional wisdom” is that refcounting should have been even slower compared to using a GC. We attribute this mainly to the compatibility restrictions that hampered our GC implementation.

There is a lot of low-hanging performance fruit available to us right now which we have been explicitly avoiding while we finished refcounting. Now would be a great time to consider contributing since we have more ideas than we can implement ourselves. This is especially true when it comes to NumPy performance.

Currently, we take about twice as long to run the NumPy test suite as CPython does. We don’t know how this will translate to performance on real NumPy programs, but we do know that much of the slowdown falls into two categories: the first is NumPy hits code paths that are otherwise-rare in Pyston and are currently unoptimized. The second is a bit more subtle: NumPy frequently calls from C code back into the Python runtime, which is expensive for us because it doesn’t benefit from our JIT (in addition to being previously-rare). We have techniques inside Pyston to handle these situations and invoke our JIT from C code, and we’d like to start exposing that so that NumPy and other libraries can use it.

Looking forward

We apologize — again — for the lengthy release cycle. We didn’t expect refcounting to take this long, and we even knew that it would take longer than we expected. We’re planning on doing another blog post to talk about what the difficulties were with it and go into more of the technical details of our refcounting system.

Moving forward, our plan for 0.6 is to focus on performance. We would love help from the community on identifying what is important to make performant. We could work on making the NumPy test suite fast, but it may not end up translating to real NumPy workloads.

We’re at the point that trying out Pyston should be easy; it won’t benefit all workloads, but it should be easy to drop it in and see if it does. To test it out, try

docker run -it pyston/pyston

or check out our readme for other options for obtaining Pyston. To try NumPy, use the “pyston/pyston-numpy” image instead.

We have quite a few optimization ideas lined up, and the pressure has been strong to delay the 0.5 release “just one more week” so that we have time to include some of them. Expect to see an 0.5.1 release that improves performance.

Final words

Refcounting brings Pyston one step closer to being a drop-in replacement for CPython. There is still much more work to do, but we feel like with refcounting we’ve reached a threshold where we’d like to start getting Pyston into peoples’ hands. It’s still very much beta software, so there are many rough edges and unoptimized casses. But we want your feedback on what’s working and what’s not.

Finally, we would like to thank all of our open source contributors who have contributed to this release, and especially Nexedi for their employment of Boxiang Sun, one of our core contributors who helped greatly with the NumPy support.

Great news. I really want pyjion guys’ JIT interface with CPython so I can plug pypy’s or pyston’s JIT into the CPython. But nonetheless pyston, improving so fast, is a great news for all the community.

It’s definitely a cool idea, and I’ve looked into it in the past. I just don’t think it’s that practical for Python You have to take so much out of the language that it becomes something like “Go with Python syntax”, and then I think you’re better off just sticking with Go.

Unfortunately, “PEP 484 (MyPy) type hints” != “static types for the compiler”. That’s not to say that they’re not incredibly useful in other areas, but for performance 1) they only address part of what makes Python hard to run fast, and 2) unless you are willing to accept segfaults/memory corruption, we still need to check that the types are correct.

I think compiling Python is similar to trying to compile HTML — it’s not clear exactly what that means. Yes you can probably do it, but what does it get you if you have to just compile in all the expensive behavior? And besides, you can already compile your Python code using Cython, but by default it’s not that much faster since it still has to do all the expensive dynamic stuff.

Here’s one example of “expensive dynamic stuff”: converting to/from unicode. This is expensive because the runtime has to check the encoding registry to see how to encode/decode. And codecs are allowed to have arbitrary behavior, including not even returning something of the right type! So knowing that you have a string object and are calling str.decode is only the very first step. You could remove the codec registry but then we’re back into “not really Python since you can’t run Python programs” territory.

There are definitely more aggressive things one can do if you only want to support basic operations on primitive types — and there are many projects that target that usecase. I think there’s a reason that Shedskin/Nuitka/etc haven’t taken over, and my guess is that it’s due to a lack of demand for that sort of thing.

You are my only hope for a feature complete python JIT. Having a performant runtime will open up the language for much more usecases and hopefully make it possible to write more code in pure python instead of using it as a crutch and glue für C code. I hope it will even be possible to get threads to work in a performant way. At least I think python 3.x should be a primary goal for Pyston as 2.7 still has many design flaws that were overcome with 3.x