#1389: scipy.stats distributions are slow
-------------------------+--------------------------------------------------
Reporter: jpaalasm | Owner: somebody
Type: enhancement | Status: new
Priority: normal | Milestone: 0.10.0
Component: Other | Version: 0.8.0
Keywords: |
-------------------------+--------------------------------------------------
Calling the methods of scipy.stats.distributions.rv_continuous is slow if
the call applies to only one variate per call.
In the case of the pdf-method of the normal distribution, only 10% of the
cumulative time taken by rv_continuous.pdf is spent in _norm_pdf, which
does the actual calculation, and 90% goes to generic parameter checking.
The slowness of the methods is a real problem in cases where the
parameters of the distributions have to be changed after each call, which
prevents calling the methods for arrays of variates. That is needed for
example with sequential monte carlo methods.
I implemented the below simplified version of rv_continuous for pdf and
rvs, which shows that performance improvements are possible. Generating a
single normal variate is about 50 times faster and calculating a pdf value
about 20 times faster than with rv_continuous.
{{{
class fast_distribution:
def __init__(self, distribution, scale, loc):
self.distribution = distribution
self.scale = scale
self.loc = loc
def pdf(self, x):
x -= self.loc
x /= self.scale
pdf_value = self.distribution._pdf(x)
pdf_value /= self.scale
return pdf_value
def rvs(self):
variate = self.distribution._rvs()
variate *= self.scale
variate += self.loc
return variate
}}}
See attached files for a benchmark and its results.
--
Ticket URL: <http://projects.scipy.org/scipy/ticket/1389>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.