Tag: standard deviation

Bounty: 100

If I remember correctly I heard some mention of standard deviation for precipitation means of sums is pretty useless due to the highly variable nature of preciptation quantities.

Let’s say that climatologists have calculated standard deviations for means of sums of monthly precipitation for every month of the year for 30 years of measurements. If the standard deviation of these mean values are bigger than the mean values themselves it tells us there is a relatively high spread in the dataset. Now when the difference between the average values of two 30-year-periods are calculated, each period introduces its own standard deviation. And the resulting standard deviation for the difference would be even bigger than the largest standard deviation between each of the normal periods, due to error propagation. If the resulting standard deviation is bigger than the difference of the mean values, it means that the difference between the mean values may be very far away from the true value. In other words a pretty “non-accurate” quantity in this case right, which for certain/quite many observations would yield fictitious differences of means?

Precipitation can vary greatly in some regions of the world, for example due to large scale weather fluctuations like ENSO or other natural variaton. So perhaps 30 years is to low for averaging precipitation data due to high variability in some locations. But if the precipitation data is only available for 30 years, are there any alternatives to standard deviation that would be recommended/considered more useful?

As a sidequestion: would the mean value be more accurate, with lower relative standard deviation if one has one million years of measurements of data, even when each data point (spread) is highly variable?

Bounty: 50

I quite often find myself testing hypotheses in which the standard deviation of one (Normally distributed) variable is linked to (the mean of) another variable. I would like to be able to express the strength of this association by means of an index between [-1, 1], similar in spirit to a correlation coefficient. I feel like I can’t be the first one with this problem, so my first question is: does something like this exist? My second question is whether something I’ve come up with myself seems reasonable.

To express the problem more precisely, let $Z$ be a normally distributed variable:
$$
Z sim Nleft(0,sigma^2right)
$$
where the standard deviation $sigma$ is a linear function of some other variables:
$$
sigma=Xbeta+varepsilon
$$
where $X={x_1, x_2, …, x_p}$ is a set of predictor variables, and $beta$ is a vector of linear coefficients on these predictors. So compared to the familiar linear model, the difference is that we now have a linear prediction for the second, rather than the first moment of the distribution of $Z$.

Given some observations of $Z$ and $X$, we can find the maximum likelihood estimate of $beta$, which we’ll denote $hat{beta}$. Now the question is, how much of the ‘variance in variance’ of $Z$ is explained by this linear model? This leads me to the idea of using kurtosis. That is, because $Z$ is distributed as a mixture of Normals with different SDs and a common mean, it will be leptokurtic and thus have excess kurtosis w.r.t. a Normal distribution with constant variance. However, if we divided each observation of $Z$ by its SD (i.e. $dot{Z_i}=frac{Z_i}{sigma_i}$, where $sigma_i=X_ibeta$), we should be able to reduce its kurtosis (to the point where, if the changes in variance of $Z$ are perfectly predicted by our fitted model, we should be able to get rid of 100% of the excess kurtosis).

So the index I’m proposing (analogous to $R^2$) is:
$$
xi^2 = 1 – frac{left|text{Kurt}[Z/hat{sigma}]-3right|}{text{Kurt}[Z]-3}
$$
where $hat{sigma}=Xhat{beta}$. If our model explains no “variance in variance” at all, then the kurtosis should be just as high after we transform $Z$ as before, in which case $xi^2=0$. If we managed to explain away all the changes in variance, then ${Z}/{hat{sigma}}$ should be perfectly Normally distributed (with kurtosis of 3), and thus $xi^2=1$.

Does that seem reasonable? Did I just re-invent the wheel (or a dumb version of a wheel)?

Bounty: 50

First of all sorry for the sloppy terminology, but I am right looking for the name of a statistical concept.

I was asked to calculate the “turnover” of the Facebook friends commenting on my posts, so I am looking for an indicator that has high value if always the same let’s say 10 friends are commenting my posts, and low if always different friends are commenting.

Obviously a set of friends commenting my given post form a subset of my friends, so I am looking a kind of “standard deviation”, “variance” of these subsets over my all posts.

What is the proper name of this statistical concept? How do you calculate it?

Bounty: 50

This is probably a silly question and I hope it will not be closed beacuse of topic.

I have $n$ sensors, each of them providing a value $y_i$ for a measurement with a corresponding standard deviation $sigma_i$. I know that the most probable value $Y$ is given by
$$Y=left(sum_{i=1}^n frac {y_i}{sigma_i^2}right)left(sum_{i=1}^n frac {1}{sigma_i^2}right)^{-1}$$ My problem is that I do not remember how is computed $sigma_Y$.

Bounty: 50

I want to calculate the average ratio and SD for aow2/fow1, I can not calculate it for each single replicate because they are not paired and different number.
I could calculate average and SD for each column fow1 and aow but how can I calculate the aow2/fow1 average and SD? What scripts should I use?