Monday, March 1, 2010

Benchmarking twisted

Hello.

I recently did some benchmarking of twisted on top of PyPy. For the very
impatient: PyPy is up to 285% faster than CPython. For more patient people,
there is a full explanation of what I did and how I performed measurments,
so they can judge themselves.

The benchmarks are living in twisted-benchmarks and were mostly written
by Jean Paul Calderone. Even though he called them "initial exploratory
investigation into a potential direction for future development resulting
in performance oriented metrics guiding the process of optimization and
avoidance of complexity regressions", they're still much much better than
average benchmarks found out there.

The methodology was to run each benchmark for
quite some time (about 1 minute), measuring number of requests each 5s.
Then I looked at dump of data and substracted some time it took
for JIT-capable interpreters to warm up (up to 15s), averaging
everything after that. Averages of requests per second are in the table below (the higher the better):

benchname

CPython

Unladen swallow

PyPy

names

10930

11940 (9% faster)

15429 (40% faster)

pb

1705

2280 (34% faster)

3029 (78% faster)

iterations

75569

94554 (25% faster)

291066 (285% faster)

accept

2176

2166 (same speed)

2290 (5% faster)

web

879

854 (3% slower)

1040 (18% faster)

tcp

105M

119M (7% faster)

60M (46% slower)

To reproduce, run each benchmark with:

benchname.py -n 12 -d 5

WARNING: running tcp-based benchmarks that open new connection for each
request (web & accept) can exhaust number of some kernel structures,
limit n or wait until next run if you see drops in request per second.

The first obvious thing is that various benchmarks are more or less amenable
to speedups by JIT compilation. Accept and tcp getting smallest speedups, if at
all. This is understandable, since JIT is mostly about reducing interpretation
and frame overhead, which is probably not large when it comes to accepting
connections. However, if you actually loop around, doing something, JIT
can give you a lot of speedup.

The other obvious thing is that PyPy is the fastest python interpreter
here, almost across-the board (Jython and IronPython won't run twisted),
except for raw tcp throughput. However, speedups can vary and I expect
this to improve after the release, as there are points, where PyPy can
be improved. Regarding raw tcp throughput - this can be a problem for
some applications and we're looking forward to improve this particular
bit.

The main reason to use twisted for this comparison is a lot of support from
twisted team and JP Calderone in particular, especially when it comes to
providing benchmarks. If some open source project wants to be looked at
by PyPy team, please provide a reasonable set of benchmarks and infrastructure.

If, however, you're a closed source project fighting with performance problems
of Python, we're providing contracting for investigating opportunities, how
PyPy and not only PyPy, can speed up your project.

I recently did some benchmarking of twisted on top of PyPy. For the very
impatient: PyPy is up to 285% faster than CPython. For more patient people,
there is a full explanation of what I did and how I performed measurments,
so they can judge themselves.

The benchmarks are living in twisted-benchmarks and were mostly written
by Jean Paul Calderone. Even though he called them "initial exploratory
investigation into a potential direction for future development resulting
in performance oriented metrics guiding the process of optimization and
avoidance of complexity regressions", they're still much much better than
average benchmarks found out there.

The methodology was to run each benchmark for
quite some time (about 1 minute), measuring number of requests each 5s.
Then I looked at dump of data and substracted some time it took
for JIT-capable interpreters to warm up (up to 15s), averaging
everything after that. Averages of requests per second are in the table below (the higher the better):

benchname

CPython

Unladen swallow

PyPy

names

10930

11940 (9% faster)

15429 (40% faster)

pb

1705

2280 (34% faster)

3029 (78% faster)

iterations

75569

94554 (25% faster)

291066 (285% faster)

accept

2176

2166 (same speed)

2290 (5% faster)

web

879

854 (3% slower)

1040 (18% faster)

tcp

105M

119M (7% faster)

60M (46% slower)

To reproduce, run each benchmark with:

benchname.py -n 12 -d 5

WARNING: running tcp-based benchmarks that open new connection for each
request (web & accept) can exhaust number of some kernel structures,
limit n or wait until next run if you see drops in request per second.

The first obvious thing is that various benchmarks are more or less amenable
to speedups by JIT compilation. Accept and tcp getting smallest speedups, if at
all. This is understandable, since JIT is mostly about reducing interpretation
and frame overhead, which is probably not large when it comes to accepting
connections. However, if you actually loop around, doing something, JIT
can give you a lot of speedup.

The other obvious thing is that PyPy is the fastest python interpreter
here, almost across-the board (Jython and IronPython won't run twisted),
except for raw tcp throughput. However, speedups can vary and I expect
this to improve after the release, as there are points, where PyPy can
be improved. Regarding raw tcp throughput - this can be a problem for
some applications and we're looking forward to improve this particular
bit.

The main reason to use twisted for this comparison is a lot of support from
twisted team and JP Calderone in particular, especially when it comes to
providing benchmarks. If some open source project wants to be looked at
by PyPy team, please provide a reasonable set of benchmarks and infrastructure.

If, however, you're a closed source project fighting with performance problems
of Python, we're providing contracting for investigating opportunities, how
PyPy and not only PyPy, can speed up your project.

Question: After having read many comments and posts from pypy's developers lately, I got the impression (I might be wrong though), that you are betting all on tracing for getting speedups, (that the slow interpreter will eventually be compensated by the magic of tracing).However, other projects that rely on tracing seem to favor a dual approach, which is a traditional method-a-time jit (which can evenly speed up all kinds of code) plus tracing for getting the most of highly numerical code (luajit 2.0, mozila's jaegermonkey, for example).

Is this accurate or I'm wrong? Do you think that the current tracing strategy will eventually get speedups for those benchamarks that are currently on par or way bellow cpython? Or will you have to add a more traditional approach for the baseline?

That's a very interesting question. I will try answer couple of your points, but feel free to move to pypy-dev mailing list if you want to continue discussion.

We indeed bet on tracing (or jitting in general) to compensate for slower interpretation than CPython. However, our tracing is far more general than spidermonkeys - for example we can trace a whole function from start and not require an actual loop. We hope to generalize tracing so it can eventually trace all constructs.

The main difference between ahead-of-time and tracing is that tracing requires actual run, while ahead-of-time tries to predict what will happen. Results are generally in favor of tracing, although the variation will be larger (tracing does statistically correct branch prediction, not necesarilly always the correct one).

Regarding benchmarks, most of those benchmarks that we're slower than CPython showcase that our tracing is slow (they don't contain warmup). And again, for some of those we'll just include warmup (like twisted.web which is web server, makes sense in my opinion), for other we'll try to make tracing faster. And again, the speed of tracing is not the property of tracing, but rather pypy's limitation right now.

Some other benchmarks are slow because we don't JIT regular expressions (spambayes). This should be fixed, but it's again unrelated to tracing.

To summarize: I don't expect us trying dual approach (one jit is enough fun, believe me), but instead generalizing tracing and making it more efficient. How this will go, we'll see, I hope pretty well.

other than Maciek's points, which I subscribe, it should be saidthat, since each language has a different semantics, theefficiency of a traditional "method-at-a-time" JIT can varydramatically. In particular, the dynamism of Python is so deepthat a traditional JIT cannot win much: Jython and IronPython doexactly that, but for most use cases are slower than CPython. Ifyou are interested, Chapter 2 of my PhD thesis explores thesetopics :-)http://codespeak.net/svn/user/antocuni/phd/thesis/thesis.pdf