Josef,
>> * get a fast path through the function for (no nans, unmasked)
> np.arrays, that's why I didn't convert inputs automatically to masked
> arrays.
>> * program basic statistical function for np.arrays without nans. I
> would like to limit the handling of different types of arrays to the
> input and output stages, so that the statistical core part does not
> need to be special cased.
>
Well, you can very well convert your inputs to MaskedArrays only (for
example through ma.fix_invalid), get rid of the missing values to work
only w/ standard ndarrays. I'm
> * use compressed not filled to convert masked data, because, in
> general, there is no neutral fill value for regressions. It's also
> easier to use existing functions, for example my version can use the
> standard np.vander.
Indeed.
> I'm not yet very familiar with numpy details, for example when a view
> and when a copy or when intermediate arrays are created and what the
> performance overhead of casting back and forth is.
With a view, you don't create a new array, which is nice if you don't
intend modifying ti. Creating a masked array version doesn't copy the
data either, an extra array is sometimes created (the mask), but it
can be modified relatively safely, modifications shouldn't be
propagated.
> If we get a general setting for handling different type of arrays,
> then this could be used to wrap standard statistical methods and
> functions without too much extra work.
That depends on the situation again. For regressions, your approach
works. In other cases, the masked values have to be taken into account
(because they should be counted as ties, for example). Using masked
arrays should make it easier to adapt the code to other objects
(TimeSeries, for example)
>> * if you need to mask an element, just mask it directly: you don't
>> have to set it to NaN and then use np.isnan for the mask. So, instead
>> of:
>> x_0 = x[:,0].copy()
>> x_0[0] = np.nan
>> x_0 = ma.masked_array(x_0, np.isnan(x_0))
>>>> just do:
>> x_0 = ma.array(x[:,0])
>> x_0[0] = ma.masked
>> I followed the docs examples. In your way x_0.data still has the
> original value (?), so I wouldn't have run into the problem with
> numpy.testing asserts? Would this hide some test cases?
I've never been happy with what was presented in the docs so far. Now
that a draft doc for numpy.ma is available, that should change.
In this example, yes, x_0.data[0] has the same value before and after
masking, but that's not a problem as the mask will hide it (and that
you'll drop it anyway later on). However, you want to use the
numpy.ma.testutils for testing.
>>>>> * To get rid of the missing data in x, use x.compressed() or emulate
>> it with x.data[~ma.getmaskarray(x)]. ma.getmaskarray(x) always
>> returns
>> a ndarray with the same length as x, whereas ma.getmask(x) can return
>> nomask.
>> this makes shape manipulation and shape preserving compression easier
> it tried this
> x_0[~ma.getmaskarray(x)]
> and got a masked array back, when I wanted this
> x_0.data[~ma.getmaskarray(x)]
I saw that. .compressed flattens the data, which is an issue in your
case. Just selecting elements of .data is more convenient.
>> Actually, after the discussion for 3D picture filling, that it would
> be possible to replace some of the missing values by their predicted
> value or their conditional expectation in a second stage. I think this
> would be the method specific "neutral" fill value.
Except that it won't work, as .filled takes only one element (all the
masked data are filled w/ the same value). What you wanna do is to use
putmask on your standard ndarray.