Source

PyPy is faster than C, again: string formatting

String formatting is probably something you do just about every day in Python,
and never think about. It's so easy, just "%d %d" % (i, i) and you're
done. No thinking about how to size your result buffer, whether your output
has an appropriate NULL byte at the end, or any other details. A C
equivalent might be:

char x[44];
sprintf(x, "%d %d", i, i);

Note that we had to stop for a second and consider how big numbers might get
and overestimate the size (44 = length of the biggest number on 64bit (20) +
1 for the sign * 2 + 1 (for the space) + 1 (NUL byte)): it took the authors of
this post, fijal and alex, 3 tries to get the math right on this :-)

This is fine, except you can't even return x from this function because
it's allocated on the stack. A more fair comparison might be:

char *x = malloc(44 * sizeof(char));
sprintf(x, "%d %d", i, i);

x is slightly overallocated in some situations, but that's fine.

But we're not here to just discuss the implementation of string
formatting, we're here to discuss how blazing fast PyPy is at it, with
the new unroll-if-alt branch. Given the Python code:

Run under PyPy, at the head of the unroll-if-alt branch, and
compiled with GCC 4.5.2 at -O4 (other optimization levels were tested,
this produced the best performance). It took 0.85 seconds to
execute under PyPy, and 1.63 seconds with the compiled C binary. We
think this demonstrates the incredible potential of dynamic
compilation, GCC is unable to inline or unroll the sprintf call,
because it sits inside of libc.

Which as discussed above, is more comparable to the Python one, gives a
result of 1.96 seconds.

Summary of performance:

Platform

GCC (stack)

GCC (malloc)

CPython

PyPy (unroll-if-alt)

Time

1.63s

1.96s

10.2s

0.85s

relative to C

1x

0.83x

0.16x

1.9x

Overall PyPy is almost 2x faster. This is clearly win for dynamic
compilation over static - the sprintf function lives in libc and so
cannot be specializing over the constant string, which has to be parsed
every time it's executed. In the case of PyPy, we specialize
the assembler if we detect the left hand string of the modulo operator
to be constant.