Pyston 0.6 released

We are excited to announce the v0.6 release of Pyston, our high performance Python JIT.

In this release our main goal was to reduce the overall memory footprint. It also contains a lot of additional smaller changes for improving compatibility and fixing bugs.

Memory usage reductions

One of the big items which reduced memory usage was moving away from representing the interpreter instructions in a tree and instead storing them as an actual bytecode. Now, each instruction follows each other in memory and does not involve pointer-chasing.

We are also much more aggressive in freeing unused memory now. For example for very hot functions which we compiled using the LLVM JIT (our highest tier) we now free the code which the baseline JIT emitted earlier-on. Additional bigger improvements were accomplished by making our analysis passes more memory efficient and fixing a few leaks.

This is a chart compares the maximal resident set size of several 64bit linux python implementations (lower is better) on a machine with 32GB of RAM.

While max RSS is not a very accurate memory usage number for various reasons, like not taking into account that pages can be shared between processes and only measuring peak usage, we think it shows nevertheless a very useful insight about how much (up to 2x) Pyston 0.6 improved over 0.5.1.

While we are happy that we were able to reduce the memory usage quite significantly in a few weeks we are not yet satisfied with it and will spend more time on reducing the memory usage further and developing better tools to investigate it. We have several ideas for this – some of the bytecode related ones are summarized here.

Additional changes

This release contains a lot of fixes for compatibility issues and other bugs. We also spent time on making it easier to replace CPython with Pyston, such as by more closely matching its directory structure and following its ‘dict’ ordering. We can now, for example, run pip and virtualenv unmodified, without requiring any upstream patches like other implementations do.

Aside: NumPy performance

NumPy hasn’t been a priority for us, but from time to time we check on how well we can run it. We’ve focused on compatibility in the past, but for this post we took a look into performance as well. We don’t have any NumPy-specific optimizations, so we were happy to see this graph from PyPy’s numpy benchmark runner:

As you can see, we closely match CPython’s performance on NumPy microbenchmarks, and are able to beat it on a few of the smaller ones. Our current understanding is that we are doing better on the left two benchmarks because they run much more quickly — in about 1000x less time than the right three. These shorter benchmarks spend a significant amount of time transitioning into and out of NumPy, which Pyston can help with, whereas the right three benchmarks are completely dominated by time inside NumPy.

As a side note, we the Pyston team don’t want to be in the business of picking what NumPy workloads are important. If you have a program that you think shows real-world NumPy usage, please let us know because we would love to start benchmarking real programs rather than simple matrix operations.