The coverage analysis tool for Python has recently
gained a plugin API that allows external tools to provide source code
information. Cython has line tracing support since release 0.19, but
the new API allows it to support coverage.py for line coverage
reporting.

To test this, you need the latest developer versions of both
coverage.py (pre-4.0) and Cython (post-0.22.0):

Then, enable the Cython plugin in the .coveragerc config file of
your project:

[run]
plugins = Cython.Coverage

And compile your Cython modules with line tracing support. This can
be done by putting the following two comment lines at the top of the
modules that you want to trace:

# cython: linetrace=True
# distutils: define_macros=CYTHON_TRACE=1

That's a double opt-in. The first line instructs Cython to generate
verbose line tracing code (and thus increase the size of the resulting
C/C++ file), and the second line enables this code at C compile time,
which will most likely slow down your program. You can also configure
both settings globally in your setup.py script, see the Cython
documentation.

Then make sure you build your project in place (python setup.py
build_ext --inplace) so that the generated C/C++ code files can be
found right next to the binary modules and their sources.

That's it. Now your Cython modules should show up in your coverage
reports. Any questions, bug reports or suggestions regarding this new
feature can be discussed on the Cython-users mailing list.

Update: Coverage reporting is now also available for code that
frees the GIL if the C macro CYTHON_TRACE_NOGIL=1 is set, i.e.:

C arrays have become first class citizens in Cython's type system.
Previously, they behaved mostly just like pointers, as they do in C.
The new release allows them to coerce to Python lists (by default) and
tuples (when requested by the context). They can also be assigned by
value, without the need for an explicit copy loop.

C/C++ functions have learned to auto-coerce to callable Python
objects when used in an object context. Similarly, in order to
provide a flat wrapper to an external C function, it is now sufficient
to declare it as an external cpdef function in the module that
should export it (or in the corresponding .pxd file).

My bicycle was recently stolen and since I now have to get a new one,
here's a proposal.

From today on until December 24th, I will divert all donations that I receive for my work on
lxml to help in restoring my local mobility.

If you do not like this 'misuse', do not donate in this time frame. I
do hope, however, that some of you like the idea that the money they
give for something they value is used for something that is of value
to the receiver.

I spent some time optimising Python's "fractions" module. Fractions
(i.e. rational numbers) are great for all sorts of exact computations,
especially money calculations. You never have to care about loss of
precision, you can freely mix very large and very small numbers any
way you like in your computations - the result is always exact as it's
all done in integers internally.

But the performance used to suck. Totally. The main problem was the
type instantiation, which is really expensive. For example, simply
changing this code

f=n*Fraction(x,y)

to this

f=Fraction(n*x,y)

(which avoids intermediate Fraction operations) could speed it up by
factors. I provided some patches that streamline common cases
(numerator and denominator will usually be Python ints), and this made
the implementation in CPython 3.5 twice as fast as before. It
actually starts being usable. :)

For those who can't wait for Python 3.5 to come out (in about a year's
time), and also for those who want even better performance (like me),
I dropped the implementation into Cython and optimised it further at
the C level. That gave me another factor of 5, so the result is
currently about 10x faster than what's in the standard library.

Compared to the Decimal type in Python 2.7, it's about 15x faster.
The hugely improved C reimplementation of the "decimal" module in
Python 3.3 is still about 5-6x faster - or less, if you often need to
rescale your values along the way. Plus, with decimal, you always
have to take care of using the right precision scale for your code to
prevent rounding errors, and playing it safe will slow it down.

I released the module to PyPI, it's called quicktions. Hope you like it.

I'm giving an in-depth learn-from-a-core-dev Cython training at the
Python Academy in Leipzig (Germany) next month, October 16-17. In two
days, the course will cover everything from a Cython intro up to the
point where you bring Python code to C speed and use C/C++ libraries
and data types in your code.

What is Cython?

Cython is an optimising static compiler for Python that makes writing
C extensions as easy as Python itself. It greatly extends the limits
of the Python language and thus has found a large user base in the
Scientific Computing community. It also powers various well-known
extension modules in the Python package index. Cython is a great way
to extend the CPython runtime with native code when performance
matters.

and then uses this regular expression only in one place, e.g. like
this:

def func():
numbers = _MATCH_DIGITS_RE.findall(input)
...

Python's re module actually uses expression caching internally, so
it's very unlikely that this is any faster in practice than just
writing this:

def func():
numbers = re.findall('[0-9]+', input)
...

Which is a shorter and much more straight forward way to write what's
going on. Now, for longer and more complex regular expressions, this
can actually get out of hand and it does help to give them a readable
name. However, all-upper-case constant names tend to be pretty far
from readable. So, I always wonder why people don't just write this
using a bound method:

I find this much clearer to read. And it nicely abstracts the code
that uses the function-like callable _find_numbers() from the
underlying implementation, which (in case you really want to know)
happens to be a method of a compiled regular expression object.

I spent some time during the last two weeks reducing the call overhead
for Python functions and methods in Cython.
It was already quite low compared to CPython before, about 30-40%
faster, but profiling then made me stumble over the fact that method
calls in CPython really just do one thing: they repack the argument
tuple and prepend the 'self' object to it. However, that is done
right after Cython has carefully packed up exactly that argument tuple
in the first place, so by simply inlining what PyMethodObject does, we
can avoid packing tuples twice.

Avoiding to create a PyMethodObject at all may also appear as an
interesting goal, but doing that is totally not easy (it happens
during attribute lookup) and it's also most likely not worth it as
method objects are created from a freelist, which makes their
instantiation very fast. Method objects also hold actual state that
the caller must receive: the underlying function and the self object.
So getting rid of them will severly complicate things without a major
gain to expect.

Another obvious optimisation, however, is that Python code calls into
C implemented functions quite often, and if those are implemented as
specialised functions that take exactly one or no argument
(METH_O/METH_NOARGS), then the tuple packing and unpacking can be
avoided completely. Together with the method call optimisation, this
means that Cython can now call very simple methods without creating an
argument tuple, and less simple ones without redundantly creating a
second argument tuple.

I implemented these optimisations and they immediately blew up the
method call micro benchmarks in Python's benchmark suite from about
1/3 to 2-3 times faster than CPython 3.5 (pre). Those are only simple
micro benchmarks, so any real world code will benefit substantially
less overall. However, it turned out that a couple of benchmarks in
the suite that are based on real production code ended up loosing
5-15% of their total runtime. That's quite remarkable, given that the
code they call actually does something (much) more heavy weight than
the call overhead itself. I'm still tuning it a bit, but so far am
really happy with this result.

Hearing a talk about static analysis at EuroPython 2014 and meeting Christian Heimes there
(CPython core dev and member of the security response team) got us
talking about running Coverty Scan on
Cython generated code. They provide a free
service for Open Source projects, most likely because there is a clear
benefit in terms of marketing visibility and distributed filtering
work on a large amount of code.

The problem with a source code generator is that you can only run the
analyser on the generated code, so you need a real world project that
uses the generator. The obvious choice for us was lxml, as it has a rather large code base with more than
230000 lines of C code, generated from some 20000 lines of Cython
code. The first run against the latest lxml release got us about 1200
findings, but a quick glance through them showed that the bulk of them
were false positives for the way Cython generates code for some Python
constructs. There was also a large set of "dead code" findings that I
had already worked on in Cython a couple of months ago. It now
generates substantially less dead code. So I gave it another run
against the current developer versions of both lxml and Cython.

The net result is that the number of findings went down to 428. A
large subset of those relates to constant macros in conditions, which
is what I use in lxml to avoid a need for C level #ifdefs. The C
compiler is happy to discard this code, so Coverty's dead code finding
is ok but not relevant. Other large sets of "dead code" findings are
due to Cython generating generic error handling code in cases where an
underlying C macro actually cannot fail, e.g. when converting a C
boolean value to Python's constant True/False objects. So that's ok,
too.

It's a bit funny that the tool complains about a single "{" being
dead code, although it's followed immediately by a (used) label.
That's not really an amount of code that I'd consider relevant for
reporting.

On the upside, the tool found another couple of cases in the
try-except implementation where Cython was generating dead code, so I
was able to eliminate them. The advantage here is that a goto
statement can be eliminated, which may leave the target label unused
and can thus eliminate further code under that label that would be
generated later but now can also be suppressed. Well, and generating
less code is generally a good thing anyway.

Overall, the results make a really convincing case for Cython.
Nothing of importance was found, and the few minor issues where Cython
still generated more code than necessary could easily be eliminated,
so that all projects that use the next version can just benefit.
Compare that to manually written C extension code, where reference
counting is a large source of errors and the verbose C-API of CPython
makes the code substantially harder to get right and to maintain than
the straight forward Python syntax and semantics of Cython. When run
against the CPython code base for the first time, Coverty Scan found
several actual bugs and even security issues. This also nicely
matches the findings by David Malcolm and his GCC based analysis tool, who ended up using Cython generated code
for eliminating false positives, rather than finding actual bugs in
it.