On Thursday 10 June 2010 22:28:28 Pauli Virtanen wrote:
> Some places where Openmp could probably help are in the inner ufunc
> loops. However, improving the memory efficiency of the data access
> pattern is another low-hanging fruit for multidimensional arrays.
I was about to mention this when the thread started.. at least I want to raise
awareness of the current problems again, so they can be taken into account
while refactoring. Theoretically, this should be really low-hanging fruit,
but I have no idea how much work it is.
The problem is: NumPy uses C-order indexing by default. While it supports F-
order arrays, too, using F-oder arrays leads to unnecessary slow-downs, since
the dimensions are walked through from left to right, and not in memory order.
Ideally, algorithms would get wrapped in between two additional
pre-/postprocessing steps:
1) Preprocessing: After broadcasting, transpose the input arrays such that
they become C order. More specifically, sort the strides of one (e.g. the
first/the largest) input array decreasingly, and apply the same transposition
to the other arrays (in + out if given).
2) Perform the actual algorithm, producing a C-order output array.
3) Apply the reverse transposition to the output array.
This would
- fix the bug that operations (e.g. addition) on F-order arrays are slower
than on C-order arrays,
- fix the bug that the order of the output arrays differs from that of the
input arrays (this is very annoying when dealing with 3rd party libraries that
use and expect F-order everywhere), and
- be easy for elementwise operations, but
- need further thinking / special handling of operations where the number &
mapping of dimensions of the input & output arrays is more complex.
Last, but not least, here is a pointer to our previous discussion of the topic
in the thread "Unnecessarily bad performance of elementwise operators with
Fortran-arrays":
http://www.mail-archive.com/numpy-discussion@scipy.org/msg05100.html
Have a nice day,
Hans