Clone this wiki locally

Proposal for a new buffer syntax

What is the problem?

The current syntax is e.g.

cdef object[int, mode="fortran"] x
cdef np.ndarray[double, ndim=2] y

This comes from viewing the buffer syntax as an optimization of the Python [] operator in certain special cases. Any non-optimizable operations are passed to the underlying object. In addition, the typename controls the default access mode ("strided" vs. "indirect").

Advantages:

It allows Python/NumPy syntax (except for in variable declaration), so one can simply add types to existing code

print b.foo() # crashes program
* Often one forget to type the indices, or indices in the wrong way (arr[i][j]) or similar without a warning at all that the code will be slowed down by a factor of hundreds
* The mechanism itself is rather crude (only optimizes one specific case), yet the syntax doesn't show this, and so one gets a "magic" feel to it.

Proposed solution

The proposed solution would be introducing buffers as a first class native type with a new syntax.

The syntax would embed everything needed to know for optimizing PEP 3118 buffer access without knowing anything about the underlying object type (like NumPy arrays) at all, or allowing operations on the object owning the buffer directly.

PEP 3118 allows for a very wide class of buffer layouts; restricting this is possible in a lot of ways and almost any restriction can give a lot of speedup.

It could work like this. Assume from cython import strided, contig, full, ptr:

int[[,:,]] is three-dimensional and so on. This makes a clear distinction from the C array syntax and it looks more Pythonic. Also it is within the Python grammar.

int[:] accesses only the buffer, not the corresponding Python object. Coercion from objects acquire a buffer view, while coercion to objects is disallowed in earlier Python versions and gives a standard Python memoryview in newer versions (backports could also be done, though likely e.g. a __frombuffer__ operator in numpy.pxd for efficient numpy.ndarray(buf) construction works better with less efforts).

Read-only vs. read-write is automatic like today

Mode is passed as a string, e.g. int[:,:,"fortran"] for a 2D array with Fortran contiguous ordering. Default mode is "strided", one can pass "indirect" to get indirect indexing.

Negative indices or not can be done by int[[,0]], this is slightly more featureful than today (disallowing negative wraparound on second dimension only)

Main differences from today, in the context of NumPy:

def f():
cdef int[:] a = ..., b = ..., c
c = a + b # would not work before new features are implemented
c = a[2:110] # would not work before new features are implemented
print a.flags # nope

So, int[:] represents only the buffer and not the NumPy array object. Slicing and arithmetic on these