On Mon, Jun 9, 2008 at 4:45 PM, Robert Kern <robert.kern@gmail.com> wrote:
> On Mon, Jun 9, 2008 at 18:34, Keith Goodman <kwgoodman@gmail.com> wrote:
>> Does anyone have a function that converts ranks into a Gaussian?
>>>> I have an array x:
>>>>>> import numpy as np
>>>> x = np.random.rand(5)
>>>> I rank it:
>>>>>> x = x.argsort().argsort()
>>>> x_ranked = x.argsort().argsort()
>>>> x_ranked
>> array([3, 1, 4, 2, 0])
>> There are subtleties in computing ranks when ties are involved. Take a
> look at the implementation of scipy.stats.rankdata().
Good point. I had to deal with ties and missing data. I bet
scipy.stats.rankdata() is faster than my implementation.
>> I would like to convert the ranks to a Gaussian without using scipy.
>> No dice. You are going to have to use scipy.special.ndtri somewhere. A
> basic transformation (off the top of my head, I have no idea if this
> is statistically meaningful):
>> scipy.special.ndtri((ranks + 1.0) / (len(ranks) + 1.0))
>> Barring tied first or last items, this should give equal weight to
> each of the tails outside of the range of your data.
Nice. Thank you. It passes the never wrong chi-by-eye test:
r = np.arange(1000)
g = special.ndtri((r + 1.0) / (len(r) + 1.0))
pylab.hist(g, 50)
pylab.show()
I wasn't able to use scipy.special.ndtri (after import scipy) like you
did. I had to do (but I'm new to scipy)
from scipy import special
special.ndtri
scipy.__version__
'0.6.0'
from Debian Lenny.