>>>>> "travis" == Travis Oliphant <oliphant.travis at ieee.org> writes:
travis> Hello developers.
travis> What should we do about nan's and the stats toolbox. Stats is one
travis> package where people may use nans to represent missing values.
Yech. This is a hard issue, but NAN isn't the solution. I believe
someone who used to be tracking this list, John Barnard, had written
some tools for implementing statistical procedures for handling
missing values (imputation-based) in python. But I don't know the
state of that work.
(the issue being that one can want to distinguish between types of
missing values).
travis> There are two options that I see.
travis> 1) MATLAB option
travis> MATLAB defines 6 new functions nanmean, nanmedian, nansum, nanmin,
travis> nanmax, and nanstd that ignore nans properly. These can be used in
travis> place of the normal functions which don't use nans properly. Perhaps
travis> they did this as an afterthought.
travis> Note, this is an easy option and is (as of now) implemented in the CVS
travis> scipy.
Would it be possible, instead to take the R approach, which is to have
a missing data handler? (i.e. mean(x,missing=missing.drop()), where
the default is to drop, and the other options might be "replace with
mean", "replace with random sample", "user defined function", "barf
because we shouldn't compute with missings", etc.
Missing data handling is hard, and to be done right, needs to be
handled flexibly.
best,
-tony
--
A.J. Rossini Rsrch. Asst. Prof. of Biostatistics
U. of Washington Biostatistics rossini at u.washington.edu
FHCRC/SCHARP/HIV Vaccine Trials Net rossini at scharp.org
-------------- http://software.biostat.washington.edu/ ----------------
FHCRC: M: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email
UW: Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX
(my tuesday/wednesday/friday locations are completely unpredictable.)