Navigation

You can use NumPy from Cython exactly the same as in regular Python, but by
doing so you are loosing potentially high speedups because Cython has support
for fast access to NumPy arrays. Let’s see how this works with a simple
example.

The code below does 2D discrete convolution of an image with a filter (and I’m
sure you can do better!, let it serve for demonstration purposes). It is both
valid Python and valid Cython code. I’ll refer to it as both
convolve_py.py for the Python version and convolve1.pyx for
the Cython version – Cython uses ”.pyx” as its file suffix.

from__future__importdivisionimportnumpyasnpdefnaive_convolve(f,g):# f is an image and is indexed by (v, w)# g is a filter kernel and is indexed by (s, t),# it needs odd dimensions# h is the output image and is indexed by (x, y),# it is not croppedifg.shape[0]%2!=1org.shape[1]%2!=1:raiseValueError("Only odd dimensions on filter supported")# smid and tmid are number of pixels between the center pixel# and the edge, ie for a 5x5 filter they will be 2.## The output size is calculated by adding smid, tmid to each# side of the dimensions of the input image.vmax=f.shape[0]wmax=f.shape[1]smax=g.shape[0]tmax=g.shape[1]smid=smax//2tmid=tmax//2xmax=vmax+2*smidymax=wmax+2*tmid# Allocate result image.h=np.zeros([xmax,ymax],dtype=f.dtype)# Do convolutionforxinrange(xmax):foryinrange(ymax):# Calculate pixel value for h at (x,y). Sum one component# for each pixel (s, t) of the filter g.s_from=max(smid-x,-smid)s_to=min((xmax-x)-smid,smid+1)t_from=max(tmid-y,-tmid)t_to=min((ymax-y)-tmid,tmid+1)value=0forsinrange(s_from,s_to):fortinrange(t_from,t_to):v=x-smid+sw=y-tmid+tvalue+=g[smid-s,tmid-t]*f[v,w]h[x,y]=valuereturnh

This should be compiled to produce yourmod.so (for Linux systems). We
run a Python session to test both the Python version (imported from
.py-file) and the compiled Cython module.

There’s not such a huge difference yet; because the C code still does exactly
what the Python interpreter does (meaning, for instance, that a new object is
allocated for each number used). Look at the generated html file and see what
is needed for even the simplest statements you get the point quickly. We need
to give Cython more information; we need to add types.

To add types we use custom Cython syntax, so we are now breaking Python source
compatibility. Consider this code (read the comments!)

from__future__importdivisionimportnumpyasnp# "cimport" is used to import special compile-time information# about the numpy module (this is stored in a file numpy.pxd which is# currently part of the Cython distribution).cimportnumpyasnp# We now need to fix a datatype for our arrays. I've used the variable# DTYPE for this, which is assigned to the usual NumPy runtime# type info object.DTYPE=np.int# "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For# every type in the numpy module there's a corresponding compile-time# type with a _t-suffix.ctypedefnp.int_tDTYPE_t# "def" can type its arguments but not have a return type. The type of the# arguments for a "def" function is checked at run-time when entering the# function.## The arrays f, g and h is typed as "np.ndarray" instances. The only effect# this has is to a) insert checks that the function arguments really are# NumPy arrays, and b) make some attribute access like f.shape[0] much# more efficient. (In this example this doesn't matter though.)defnaive_convolve(np.ndarrayf,np.ndarrayg):ifg.shape[0]%2!=1org.shape[1]%2!=1:raiseValueError("Only odd dimensions on filter supported")assertf.dtype==DTYPEandg.dtype==DTYPE# The "cdef" keyword is also used within functions to type variables. It# can only be used at the top indendation level (there are non-trivial# problems with allowing them in other places, though we'd love to see# good and thought out proposals for it).## For the indices, the "int" type is used. This corresponds to a C int,# other C types (like "unsigned int") could have been used instead.# Purists could use "Py_ssize_t" which is the proper Python type for# array indices.cdefintvmax=f.shape[0]cdefintwmax=f.shape[1]cdefintsmax=g.shape[0]cdefinttmax=g.shape[1]cdefintsmid=smax//2cdefinttmid=tmax//2cdefintxmax=vmax+2*smidcdefintymax=wmax+2*tmidcdefnp.ndarrayh=np.zeros([xmax,ymax],dtype=DTYPE)cdefintx,y,s,t,v,w# It is very important to type ALL your variables. You do not get any# warnings if not, only much slower code (they are implicitly typed as# Python objects).cdefints_from,s_to,t_from,t_to# For the value variable, we want to use the same data type as is# stored in the array, so we use "DTYPE_t" as defined above.# NB! An important side-effect of this is that if "value" overflows its# datatype size, it will simply wrap around like in C, rather than raise# an error like in Python.cdefDTYPE_tvalueforxinrange(xmax):foryinrange(ymax):s_from=max(smid-x,-smid)s_to=min((xmax-x)-smid,smid+1)t_from=max(tmid-y,-tmid)t_to=min((ymax-y)-tmid,tmid+1)value=0forsinrange(s_from,s_to):fortinrange(t_from,t_to):v=x-smid+sw=y-tmid+tvalue+=g[smid-s,tmid-t]*f[v,w]h[x,y]=valuereturnh

After building this and continuing my (very informal) benchmarks, I get:

In [21]: importconvolve2In [22]: %timeit-n2-r3convolve2.naive_convolve(f,g)2 loops, best of 3: 828 ms per loop

There’s still a bottleneck killing performance, and that is the array lookups
and assignments. The []-operator still uses full Python operations –
what we would like to do instead is to access the data buffer directly at C
speed.

What we need to do then is to type the contents of the ndarray objects.
We do this with a special “buffer” syntax which must be told the datatype
(first argument) and number of dimensions (“ndim” keyword-only argument, if
not provided then one-dimensional is assumed).

In [18]: importconvolve3In [19]: %timeit-n3-r100convolve3.naive_convolve(f,g)3 loops, best of 100: 11.6 ms per loop

Note the importance of this change.

Gotcha: This efficient indexing only affects certain index operations,
namely those with exactly ndim number of typed integer indices. So if
v for instance isn’t typed, then the lookup f[v,w] isn’t
optimized. On the other hand this means that you can continue using Python
objects for sophisticated dynamic slicing etc. just as when the array is not
typed.

Negative indices are checked for and handled correctly. The code above is
explicitly coded so that it doesn’t use negative indices, and it
(hopefully) always access within bounds. We can add a decorator to disable
bounds checking:

Now bounds checking is not performed (and, as a side-effect, if you ‘’do’’
happen to access out of bounds you will in the best case crash your program
and in the worst case corrupt data). It is possible to switch bounds-checking
mode in many ways, see [:reference/directives:compiler directives] for more
information.

Negative indices are dealt with by ensuring Cython that the indices will be
positive, by casting the variables to unsigned integer types (if you do have
negative values, then this casting will create a very large positive value
instead and you will attempt to access out-of-bounds values). Casting is done
with a special <>-syntax. The code below is changed to use either
unsigned ints or casting as appropriate:

(Also this is a mixed benchmark as the result array is allocated within the
function call.)

Warning

Speed comes with some cost. Especially it can be dangerous to set typed
objects (like f, g and h in our sample code) to
None. Setting such objects to None is entirely
legal, but all you can do with them is check whether they are None. All
other use (attribute lookup or indexing) can potentially segfault or
corrupt data (rather than raising exceptions as they would in Python).

The actual rules are a bit more complicated but the main message is clear:
Do not use typed objects without knowing that they are not set to None.

i.e. use object rather than np.ndarray. Under Python 3.0 this
can allow your algorithm to work with any libraries supporting the buffer
interface; and support for e.g. the Python Imaging Library may easily be added
if someone is interested also under Python 2.x.

There is some speed penalty to this though (as one makes more assumptions
compile-time if the type is set to np.ndarray, specifically it is
assumed that the data is stored in pure strided mode and not in indirect
mode).