A couple of weeks ago, I found the C++ wrapping benchmarks on the PyBindGen homepage. The author posted a short blog intro about them. I wondered why Cython wasn't used as a comparison at the time, until I found out that the wrapper was rather tricky to write in Cython back then due to the lack of good C++ language support (especially for overloaded functions/methods).

Cython has improved its C++ support considerably since then, due to the work of Danilo Freitas and Robert Bradshaw, which was recently merged into mainline and is now scheduled for Cython 0.13. This allowed me to provide a simple and short implementation of the wrapper module used in the above benchmark. The timings are rather unsurprising: Cython beats them all. This is mainly due to the fact that Cython uses highly optimised argument handling code, which greatly reduces the call overhead of a wrapper.

I also like how readable the Cython wrapper code is, especially compared to the rather unwieldy PyBindGen implementation. Obviously, this comparison is a bit unfair because Cython is a programming language with an optimising compiler, whereas the other tools are simply glue code generators. But the benchmark results certainly speak volumes.

"What?", I hear you think, "Cython? Isn't that just a tool for extending CPython?". Well, yes, it is a tool for extending CPython. However, when you think about it the other way round, it actually is a Python implementation that only falls back to CPython for stuff that it doesn't want to do itself, or that it doesn't support yet. Everything else runs in plain C code and only uses parts of CPython that are not worth reimplementing, namely the object model and implementation, the fast container types, and the standard library. It will switch to CPython's eval loop only for Python modules that are not compiled.

Cython even has an on-the-fly compilation mode (pyximport) that can be used to compile Python modules (e.g. standard library modules or external dependencies) into fast C modules transparently on import. This is basically a JIT compiler that automatically falls back to CPython's byte code interpretation if the compilation fails for some reason.

The dependency on CPython (any version from 2.3-3.1) has many advantages for Python users. One is that you get 100% Python compatibility by definition, as CPython is always a part of Cython. This includes the complete standard library, all existing Python software, and all existing C extensions, with which you can sometimes even interact directly at the fast C level (e.g. Numpy, lxml.etree and others). Apart from CPython itself, there is no other Python implementation that currently achieves this.

On top of that, it's trivial to optimise pure Python code into type annotated Cython code (even in pure Python syntax) to speed up certain code sections by factors of several 100 times (1000 times and more is not unheard of). Running cython -a will generate a highlighted HTML representation of your code that shows where type annotations may lead to a speed up. There is no need to change all your code to get that speedup, just concentrate exactly on those sections that need raw speed - usually inner loops and tight algorithms. Or just call into a C, C++ or Fortran library that does the job fast enough already, even if you are not an expert in that language.

And another really cool feature: using Cython will let your code benefit from enhancements and optimisations in both CPython and Cython. Whenever any of the two projects finds a way to make the built-in types or the generated C code faster, it's your code that will become faster. Whenever someone writes a new module or extension for CPython, you can just import it without fearing compatibility issues. Whenever the Python language or the Cython language adds a new syntax feature, you can start using it right away, without waiting for other implementations to catch up. And we do have tons of ideas about stunning features and optimisations that we want to add to the Cython compiler.

So, you can either sit and wait for your code to get optimised for you, or you can get your own hands dirty now and join a very dynamic, open and friendly project that constantly makes Cython faster, better and simpler to use.

I just noticed that lxml reached rank 5 on Google when you look for "elementtree", just after two links for ElementTree itself and another two for the Python standard library, so it's more of a rank 3!

There was a request on the Cython mailing list on how to optimise Cython code. Here's how I do it.

We have some Cython code that we want to benchmark:

x = 1
for i from 0

Ok, obviously this is stupid code, as this can be done much easier without a loop. But let's say for the sake of argument that this is the best algorithm that we can come up with, and that we have a suspicion that it might not run as fast as we think it should.

First thing to do is to make that suspicion evidence by benchmarking. So I copy the code over to a Cython module and wrap it in a Python function:

Since I have no idea what to do better, I first look trough the generated C code. That's not as hard as it sounds, as Cython copies the original Cython code into comments and marks the line that it generates code for. The loop code gets translated into this:

/* ".../TEST/bench.pyx":3
* def run(max):
* x = 1
* for i from 0

The code I stripped (/* ... */) is error handling code. It's emitted in one long line so that it's easy to ignore - which is the best thing to do with it.

What you can see here is that Cython is smart enough to optimise the loop into a C loop with a C run variable (type long), but then the unsuspiciously looking operator '+' uses Python API calls, so this is not what I had in mind when I wrote the code. I wanted it to be as fast and straight forward as it looks in Cython. However, Cython cannot know my intention here, as my code might as well depend on the semantics of Python's '+' operator (which is different from the '+' operator in C).

Cython's way of dealing with Python/C type ambiguity is explicit static type declarations through cdefs. By default, all variables are defined as if I had written cdef object variable, but in this case, I want them to be plain C integers. So here is the straight forward way to tell Cython that I want the variables x and i to have C semantics rather than Python semantics:

# file: bench.pyx
def run(max):
cdef int i,x
x = 1
for i from 0

And the resulting C code shows me that Cython understood what I wanted: