On Mon, Nov 22, 2010 at 10:35 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
> This thread started on the numpy list:
>http://mail.scipy.org/pipermail/numpy-discussion/2010-November/053958.html>> I think we should narrow the focus of the package by only including
> functions that operate on numpy arrays. That would cut out date
> utilities, label indexing utilities, and binary operations with
> various join methods on the labels. It would leave us with three
> categories: faster versions of numpy/scipy nan functions, moving
> window statistics, and group functions.
>> I suggest we add a fourth category: normalization.
>> FASTER NUMPY/SCIPY NAN FUNCTIONS
>> This work is already underway: http://github.com/kwgoodman/nanny>> The function signatures for these are easy: we copy numpy, scipy. (I
> am tempted to change nanstd from scipy's bias=False to ddof=0.)
scipy.stats.nanstd is supposed to switch to ddof, so don't copy
inconsistent signatures that are supposed to be depreciated.
I would like statistics (scipy.stats and statsmodels) to stick with
default axis=0.
I would be in favor of axis=None for nan extended versions of numpy
functions and axis=0 for stats functions as defaults, but since it
will be a standalone package with wider usage, I will be able to keep
track of axis=-1.
Josef
>> I'd like to use a partial sort for nanmedian. Anyone interested in coding that?
>> dtype: int32, int64, float 64 for now
> ndim: 1, 2, 3 (need some recursive magic for nd > 3; that's an open
> project for anyone)
>> MOVING WINDOW STATISTICS
>> I already have doc strings and unit tests
> (https://github.com/kwgoodman/la/blob/master/la/farray/mov.py). And I
> have a cython prototype that moves the window backwards so that the
> stats can be filled in place. (This assumes we make a copy of the data
> at the top of the function: arr = arr.astype(float))
>> Proposed function signature: mov_sum(arr, window, axis=-1),
> mov_nansum(arr, window, axis=-1)
>> If you don't like mov, then: move? roll?
>> I think requesting a minimum number of non-nan elements in a window or
> else returning NaN is clever. But I do like the simple signature
> above.
>> Binary moving window functions: mov_nancorr(arr1, arr2, window, axis=-1), etc.
>> Optional: moving window bootstrap estimate of error (std) of the
> moving statistic. So, what's the std of each erstimate in the
> mov_median output? Too specialized?
>> dtype: float64
> ndim: 1, 2, 3, recursive for nd > 0
>> NORMALIZATION
>> I already have nd versions of ranking, zscore, quantile, demean,
> demedian, etc in larry. We should rename to nanzscore etc.
>> ranking and quantile could use some cython love.
>> I don't know, should we cut this category?
>> GROUP FUNCTIONS
>> Input: array, sequence of labels such as a list, axis.
>> For an array of shape (n,m), axis=0, and a list of n labels with d
> distinct values, group_nanmean would return a (d,m) array. I'd also
> like a groupfilter_nanmean which would return a (n,m) array and would
> have an additional, optional input: exclude_self=False.
>> NAME
>> What should we call the package?
>> Numa, numerical analysis with numpy arrays
> Dana, data analysis with numpy arrays
>> import dana as da (da=data analysis)
>> ARE YOU CRAZY?
>> If you read this far, you are crazy and would be a good fit for this project.
> _______________________________________________
> SciPy-User mailing list
>SciPy-User@scipy.org>http://mail.scipy.org/mailman/listinfo/scipy-user>