Chris Fonnesbeck wrote:
> I was surprised to find that the random number sampler in newcore is
> significantly slower than RandomArray in Numeric (at least for
> binomial sampling). A quick comparison showed the
> scipy.basic.random.binomial sampler to be over 8 times slower than
> RandomArray.binomial. I was surprised, since the newcore stuff is
> Pyrex-based (isnt it?).
It's a Pyrex wrapper to pure C code. See distributions.c for the actual
implementations of the distribution algorithms.
> Am I the only one observing such differences,
> or am I using the wrong method? If not, is the performance expected to
> improve significantly.
Can you show us the code you're using? Particularly, the parameters p
and n. I use two different algorithms for the binomial distribution, a
waiting time algorithm when p*n <= 30 (or (1-p)*n <= 30 if p > 0.5) and
the BTPE algorithm (the same algorithm that's in RANLIB, but
reimplemented) when p*n > 30 (or (1-p)*n ...).
In [10]: tra = Timer('x=RA.binomial(n,p,size)', 'import RandomArray as
RA; n=200; p=0.25; size=10000')
In [11]: tra.repeat(3,100)
Out[11]: [1.5208730697631836, 1.5047860145568848, 1.5198800563812256]
In [12]: tra = Timer('x=RA.binomial(n,p,size)', 'import RandomArray as
RA; n=20; p=0.25; size=10000')
In [13]: tra.repeat(3,100)
Out[13]: [0.89270305633544922, 0.90227413177490234, 1.183967113494873]
In [14]: tmt = Timer('x=random.binomial(n,p,size)', 'from scipy import
random; n=200; p=0.25; size=10000')
In [15]: tmt.repeat(3,100)
Out[15]: [1.922713041305542, 2.0582780838012695, 1.906635046005249]
In [16]: tmt = Timer('x=random.binomial(n,p,size)', 'from scipy import
random; n=20; p=0.25; size=10000')
In [17]: tmt.repeat(3,100)
Out[17]: [3.1456100940704346, 2.9999620914459229, 2.9954609870910645]
It looks like my implementation of BTPE is 30% slower than RANLIB and
the waiting time algorithm is 3 times slower than RANLIB's BTPE on my
machine.
Compare to normal():
In [21]: tra = Timer('x=RA.normal(0.0, 1.0, size)', 'import RandomArray
as RA; size=10000')
In [22]: tra.repeat(3,100)
Out[22]: [0.70471906661987305, 0.76926708221435547, 0.7034919261932373]
In [23]: tmt = Timer('x=random.normal(size=size)', 'from scipy import
random; size=10000')
In [24]: tmt.repeat(3,100)
Out[24]: [0.56431293487548828, 0.52272319793701172, 0.79108381271362305]
So the evidence suggests that the slowness is constrained to individual
distributions, one of them being binomial. I'll futz with it.
--
Robert Kern
rkern at ucsd.edu
"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter