Tuesday, August 17, 2010

Call for Benchmarks

As you know, a lot of PyPy's recent development effort has gone into speeding up
execution of Python programs. However, an additional good property of PyPy's
Python interpreter is that most objects are represented in a much more compact
way than in CPython. We would like to investigate some more advanced techniques
to reduce the memory usage of Python programs further.

To do this it is necessary to investigate the memory behaviour of real programs
with large heaps. For speed measurements there are standard benchmarks, but for
memory improvements there is nothing comparable, the memory behaviour of large
programs is not that well understood. Therefore we are looking for programs that we
can study and use as benchmarks.

Specifically we are looking for Python programs with the following properties:

large heaps of about 10MB-1GB

should have non-trivial runtime as well (in the range of a few seconds), to
judge the speed impact of optimizations

ideally pure-Python programs that don't use extension modules so that they run
under both CPython and PyPy (this is optional, but makes my life much easier).

We are also rather interested in programs that do a lot of string/unicode
processing.

We would be grateful for all ideas. Telling us about a program also has the
advantage that we will work on optimizing PyPy for it :-).

As you know, a lot of PyPy's recent development effort has gone into speeding up
execution of Python programs. However, an additional good property of PyPy's
Python interpreter is that most objects are represented in a much more compact
way than in CPython. We would like to investigate some more advanced techniques
to reduce the memory usage of Python programs further.

To do this it is necessary to investigate the memory behaviour of real programs
with large heaps. For speed measurements there are standard benchmarks, but for
memory improvements there is nothing comparable, the memory behaviour of large
programs is not that well understood. Therefore we are looking for programs that we
can study and use as benchmarks.

Specifically we are looking for Python programs with the following properties:

large heaps of about 10MB-1GB

should have non-trivial runtime as well (in the range of a few seconds), to
judge the speed impact of optimizations

ideally pure-Python programs that don't use extension modules so that they run
under both CPython and PyPy (this is optional, but makes my life much easier).

We are also rather interested in programs that do a lot of string/unicode
processing.

We would be grateful for all ideas. Telling us about a program also has the
advantage that we will work on optimizing PyPy for it :-).

You could run a program that determinizes a large NFA. Given an existing Python program that can determinize an NFA, you could give it an expanded version of the NFA on page 15 here: http://www.springerlink.com/content/cq16j1uv511g793g/fulltext.pdf. Another way is to take some complex NFAs, concatenate them, and determinize.

The CDPedia creates and manipulates its index with a pure-python inverted index implementation.

It could be extracted and made into a benchmark - there are other pure-python inverted indices around, those could also work.

They do tend to use lots and lots of memory, the CDPedia's implementation uses the builtin array module for byte sequence manipulation and bare strings as data store (it's highly optimized for lowering CPython's memory usage), but there are a few dict-heavy places yet.

From what I understand, PyExcelerator, a writer/reader for Excel files, takes huge amounts of memory for very large files. It uses pure Python objects for each cell, which kills memory use when you're writing many millions of cells.

I really didn't think about it much, I'm just trying to chew through my RSS backlog, and ran into a post about pkgcore dealing with memory issues just a few minutes after I read this call for benchmarks.

I'm the author of a scientific application that can be suited to your needs. It runs both with Python 2.x and PyPy, so I bundled a distribution with some example benchmarks if this interests you: http://dl.dropbox.com/u/7931953/pypy-bench.tar.bz2 (see bench.README)

An interesting observation in my opinion is that on small runs, CPython outperforms PyPy but this progressively reverses on longer runs.

Hello, i am the author of an chess program being written entirely in python. I haven't published it jet, because i am a bit ashame of its poor quality. However it should suffice for the sole purpose of benchmarking. Please drop me a note if you are interested. My email adress is: larudwer at freenet dot de

Some Notes:The Program is just console mode (UCI), no gui.

it eats up all the memory you have

cpython is almost twice as fast as pypy-1.3 on this program and psyco accelerates it by another factor of two.

You could consider Tahoe-LAFS. A good reason to use it is that it is a practicality-oriented, widely deployed tool with significant memory usage that we routinely spend engineering effort to track and manage.

Here are some graphs of the memory usage of different versions of Tahoe-LAFS over time:

The reason not to use Tahoe-LAFS as a subject is that it uses several native-code libraries to for the CPU-intensive inner loops (cryptography, erasure coding). I really want those libraries, and hence Tahoe-LAFS, to be usable with cpyext as soon as possible, but I haven't tried and I assume that cpyext isn't 100% there yet.

By the way the easiest way to measure the performance of Tahoe-LAFS would be to run its unit tests and measure the memory usage and runtime. This is not only the easiest way, but it is also a pressing issue for us! Tahoe-LAFS unit tests take too long to run, and this causes problems for us, and we very much like it if they could run to completion faster.

http://tahoe-lafs.org/trac/tahoe-lafs/ticket/20# unit tests take too long

rawdog (disclosure of bias: I wrote it) sounds like it might be of use. It's an RSS aggregator that generates static HTML. Pure Python 2, with lots of string processing, mostly in the feedparser module. Memory usage and runtime depends on how many feeds it's reading and how much history it keeps, since it does everything in memory at the moment, using pickle for persistant state. (With my 800-odd feeds and two-month history, writing the entire store to HTML will use a few hundred meg of memory and run for several minutes.)

A future redesign will use a more sensible database-backed approach...

Scapy would be a great one to benchmark. Depending on the size of the packet capture, it can consume quite a bit of proc/mem when loading and dissecting large captures. I run it at work on Cpython and would love to see it running/optimized under pypy. The only problem is that I believe it uses some 2.6 pythonisms.

In my view, the natural competitors to PyPy (in the domain of fast interpreters for dynamic languages) are Tracemonkey and V8. Therefore, translations of the Sunspider, V8, and Dromaeo benchmarks would be appropriate.