Den 18.02.2012 23:54, skrev Travis Oliphant:
> Another factor. the decision to make an extra layer of indirection makes small arrays that much slower. I agree with Mark that in a core library we need to go the other way with small arrays being completely allocated in the data-structure itself (reducing the number of pointer de-references).
>
I am not sure there is much overhead to
double *const data = (double*)PyArray_DATA(array);
If C code calls PyArray_DATA(array) more than needed, the fix is not to
store the data inside the struct, but rather fix the real problem. For
example, the Cython syntax for NumPy arrays will under the hood unbox
the ndarray struct into local variables. That gives the fastest data
access. The NumPy core could e.g. have macros that takes care of the
unboxing.
But for the purpose of cache use, it could be smart to make sure the
data buffer is allocated directly after the PyObject struct (or at least
in vicinity of it), so it will be loaded into cache along with the
PyObject. That is, prefetched before dereferencing PyArray_DATA(array).
But with respect to placement we must keep in mind the the PyObject can
be subclassed. Putting e.g. 4 kb of static buffer space inside the
PyArrayObject struct will bloat every ndarray.
Sturla