2008/6/6 Keith Goodman <kwgoodman@gmail.com>:
> I'd like to shift the columns of a 2d array one column to the right.
> Is there a way to do that without making a copy?
>> This doesn't work:
>>>> import numpy as np
>>> x = np.random.rand(2,3)
>>> x[:,1:] = x[:,:-1]
>>> x
>> array([[ 0.44789223, 0.44789223, 0.44789223],
> [ 0.80600897, 0.80600897, 0.80600897]])
>> What I'm trying to do (but without the copy):
>> x2 = x.copy()
> x[:,1:] = x2[:,:-1]
Aha, thank you. You have provided an example where numpy's behaviour
is undefined and subtle:
What you saw:
In [34]: x = np.random.rand(2,3)
In [35]: x[:,1:] = x[:,:-1]
In [36]: x
Out[36]:
array([[ 0.21559212, 0.21559212, 0.21559212],
[ 0.02812742, 0.02812742, 0.02812742]])
In [37]: x = np.random.rand(3,2).T
If we transpose the array in memory it works:
In [38]: x[:,1:] = x[:,:-1]
In [39]: x
Out[39]:
array([[ 0.76877954, 0.76877954, 0.2793666 ],
[ 0.09269335, 0.09269335, 0.52361756]])
As a workaround you can use backwards slices:
In [40]: x = np.random.rand(2,3)
In [41]: x[:,:0:-1] = x[:,-2::-1]
In [42]: x
Out[42]:
array([[ 0.20183084, 0.20183084, 0.08156887],
[ 0.30611585, 0.30611585, 0.79001577]])
I can't explain exactly why this behaves this way, but the reason it
depends on memory layout is that when you write A[<some slice>] =
<some array>, numpy tries to reorder the loop to be as efficient as
possible. This depends on shape, size, and memory layout.
Unfortunately when the slices overlap, the different loop orderings
give different results.
I don't know what can be done about this, since detecting overlapping
slices is basically infeasible, and loop reordering could be an
extremely valuable optimization (and could therefore change to become
even more unpredictable in future, what with SIMD and cache issues).
The least painful option would be for numpy to offer the assurance
that it walks arrays in "virtual C order", that is, it walks the
indices from right to left. Then users who want efficient looping can
(try to) force their arrays to be laid out nicely in memory.
(Optimizing loops could be done when safe, that is, when the arrays
are known not to overlap.)
Less painful for numpy developers but more painful for users is to
warn them about the status quo: operations on overlapping slices can
happen in arbitrary order, depending on the memory layout of the input
arrays. This almost seems designed to make users tear their hair out
as their test cases work but real code fails. (I wonder how much of
numpy itself breaks on arrays with unusual memory layouts?)
I volunteer to write a quick test that will usually assert that arrays
cannot overlap (in fact I think I wrote one once). It won't detect all
cases where they don't overlap, but it should work on the majority.
Anne