On Wed, Dec 19, 2012 at 3:27 PM, Charles R Harris
<charlesr.harris@gmail.com> wrote:
>>> On Wed, Dec 19, 2012 at 8:10 AM, Nathaniel Smith <njs@pobox.com> wrote:
>> Right, my intuition is that it's like order="C" -- if you make a new
>> array by, say, indexing, then it may or may not have order="C", no
>> guarantees. So when you care, you call asarray(a, order="C") and that
>> either makes a copy or not as needed. Similarly for base alignment.
>>>> I guess to push this analogy even further we could define a set of
>> array flags, ALIGNED_8, ALIGNED_16, etc. (In practice only power-of-2
>> alignment matters, I think, so the number of flags would remain
>> manageable?) That would make the C API easier to deal with too, no
>> need to add PyArray_FromAnyAligned.
>>>> Another possibility is an aligned datatype, basically an aligned structured
> array with floats/ints in chunks of the appropriate size. IIRC, gcc support
> for sse is something like that.
True; right now it looks like structured dtypes have no special alignment:
In [13]: np.dtype("f4,f4").alignment
Out[13]: 1
So for this approach we'd need a way to create structured dtypes with
.alignment == .itemsize, and we'd need some way to request
dtype-aligned memory from array allocation functions. I guess existing
NPY_ALIGNED is a good enough public interface for the latter, but
AFAICT the current implementation is to just assume that whatever
malloc() returns will always be ALIGNED. This is true for all base C
types, but not for more exotic record types with larger alignment
requirements -- that would require some fancier allocation scheme.
Not sure which interface is more useful to users. On the one hand,
using funny dtypes makes regular non-SIMD access more cumbersome, and
it forces your array size to be a multiple of the SIMD word size,
which might be inconvenient if your code is smart enough to handle
arbitrary-sized arrays with partial SIMD acceleration (i.e., using
SIMD for most of the array, and then a slow path to handle any partial
word at the end). OTOH, if your code *is* that smart, you should
probably just make it smart enough to handle a partial word at the
beginning as well and then you won't need any special alignment in the
first place, and representing each SIMD word as a single numpy scalar
is an intuitively appealing model of how SIMD works. OTOOH, just
adding a single argument np.array() is a much simpler to explain than
some elaborate scheme involving the creation of special custom dtypes.
-n