On Wed, Jan 14, 2009 at 11:24 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
>> On Jan 14, 2009, at 10:15 PM, josef.pktd@gmail.com wrote:
>> The function in stats, that I tested or rewrote, are usually identical
>> to around 1e-15, but in some cases R has a more accurate test
>> distribution for small samples (option "exact" in R), while in
>> scipy.stats we only have the asymptotic distribution.
>> We could try to reimplement part of it in C,. In any case, it might
> be worth to output a warning (or at least be very explicit in the doc)
> that the results may not hold for samples smaller than 10-20.
I am not a "C" person and I never went much beyond HelloWorld in C.
I just checked some of the doc strings, and I am usually mention that
we use the asymptotic distribution, but there are still pretty vague
statements in some of the doc strings, such as
"The p-values are not entirely reliable but are probably reasonable for
datasets larger than 500 or so."
>>> Also, not all
>> existing functions in scipy.stats are tested (yet).
>> We should also try to make sure missing data are properly supported
> (not always possible) and that the results are consistent between the
> masked and non-masked versions.
>
I added a ticket so we don't forget to check this.
> IMHO, the readiness to incorporate user feedback is here. The feedback
> is not, or at least not as much as we'd like.
That depends on the subpackage, some problems in stats have been
reported and known for quite some time and the expected lifetime of a
ticket can be pretty long. I was looking at different python packages
that use statistics, and many of them are reluctant to use scipy while
numpy looks very well established. But, I suppose this will improve
with time and the user base will increase, especially with the recent
improvements in the build/distribution and the documentation.
Josef