[SciPy-dev] GLMs ?

On Sat, Aug 15, 2009 at 7:35 AM, <josef.pktd@gmail.com> wrote:
> On Sat, Aug 15, 2009 at 3:32 AM, Pierre GM<pgmdevlist@gmail.com> wrote:
>>>> On Aug 15, 2009, at 3:00 AM, David Warde-Farley wrote:
>>>>>>>> On 14-Aug-09, at 7:29 PM, josef.pktd@gmail.com wrote:
>>>>>>>> Fab'.
>>>>> FYI, I need to fit Tweedie distributions to precipitation series. I
>>>>> have already coded the distributions in the scipy standard, and
>>>>> now I
>>>>> need to estimate the parameters...
>>>>> Thanks again
>>>>>> As I understand it, the Tweedie distributions are a further
>>> generalization of the exponential family.
>>>> Indeed.
>>>>> Are you saying that your
>>> parametric assumption is that they are Tweedie but not any of the
>>> standard ones like Gaussian, Poisson, Gamma?
>>>> Yes, something intermediate between Poisson and Gamma, with a variance
>> proportional to the mean to a power 1<=p<=2.
>>>>>> Are you trying to estimate parameters of the distribution themselves,
>>>> or parameters of the distribution as function of some explanatory
>>>> variables? In the first case, GLM won't be of much help.
>>>>>> Is it that you have samples of a (nonstandard) Tweedie random variable
>>> that you want to regress on explanatory variables?
>>> You can probably do it by gradient descent but I don't foresee it
>>> being pretty and probably not even convex. Either way, a GLM package
>>> probably won't help.
>>>> I'm not sure yet whether GLMs are the way to go to my particular
>> problem. I'm trying to reproduce an approach to model precipitation
>> patterns (keeping track of both the number and intensities of rainfall
>> events) described in several papers. I know that at term, I'll have to
>> introduce extra variables and then GLMs will be the way to go. I just
>> wanted to check what algorithms were already available.
>> Thanks a lot for your comments.
>> Using models.GLM could be as easy as adding a new distribution to the
> family. The main algorithm is (supposed to be) independent of the
> distribution, and all distribution specific code is supposed to be in
> family.
>> If Tweedie is like Poisson and Gamma, mainly with a different variance
> function, then I think it *should* work with very little work.
>> If you try this, then this would be a good check for how general our
> implementation is, and whether there are still some hidden,
> distribution specific assumptions left.
>> And it will be good if we soon have more eyes on the models code,
> because I don't think we have settled on a good API yet.
>> Josef
>
I should have done a bit of homework first.
The tweedie family of distributions looks very interesting, and it
should fit in both glm and maximum likelihood framework. R/S has
tweedie in GLM and ML in fbasics.
So I would be very interested in seeing it both in models.glm and in
scipy.stats.distributions. However, in the short term, I see two
potential issues
Wikipedia: "Apart from the four special cases identified above, their
probability density function have no closed form. However, software is
available that enables the accurate computation of the Tweedie
densities (and probability distribution functions)"
Currently the models code is in pure python, which makes distribution
as a standalone package much easier, until the dust has settled, and
models is reintegrated into scipy. Do you have the numerical
calculations in python or compile, fortran,C? I didn't find tweedie in
hydroclimpy.
Wikipedia: "For 1 < p < 2, the distribution is continuous on the
positive reals, plus an added mass (exact zero) at Y = 0"
The generic framework for the distributions in stats.distributions
doesn't handle, currently, distributions that have continuous and
discrete support (masspoints). In some cases, this can be extended by
delegation, but we could think about to handle the mixed case.
Josef
>>>>> _______________________________________________
>> Scipy-dev mailing list
>>Scipy-dev@scipy.org>>http://mail.scipy.org/mailman/listinfo/scipy-dev>>>