ab_def at prontomail.com wrote:
> Suppose that we have n independent identically distributed random
> variables {u[1], ..., u[n]} and P[u[i] == u[j]] == 0 for i != j. We
> form another sequence {xi[1] = Boole[u[1] > u[2]], ..., xi[n - 1] =
> Boole[u[n - 1] > u[n]]} and we're looking for the variance of the sum
> of xi[i]:
>
> D[N[n]] == Variance[Sum[xi[i], {i, n - 1}]] ==
>
> Variance[Sum[xi[i], {i, n - 2}] + xi[n - 1]] ==
>
> Variance[Sum[xi[i], {i, n - 2}]] + Variance[xi[n - 1]] +
>
> 2*Covariance[Sum[xi[i], {i, n - 2}], xi[n - 1]] ==
>
> D[N[n - 1]] + 1/4 + 2*Sum[Covariance[xi[i], xi[n - 1]], {i, n - 2}]
>
> For any pair of adjacent elements we have
>
> Covariance[xi[1], xi[2]] ==
>
> P[xi[1] == 1 && xi[2] == 1] - P[xi[1] == 1]*P[xi[2] == 1] ==
>
> P[u[1] > u[2] > u[3]] - P[u[1] > u[2]]*P[u[2] > u[3]] ==
>
> 1/6 - 1/4 == -1/12
>
> because all permutations of {u[1], ..., u[n]} are equally probable. For
> any non-adjacent elements Covariance[xi[i], xi[j]] == 0. Therefore,
>
> D[N[n]] == D[N[n - 1]] + 1/4 + 2*(-1/12), D[N[2]] = 1/4
>
> and D[N[n]] == (n + 1)/12 if n >= 2.
>
> Here is a check for n = 6:
>
> In[1]:= n = 6;
>
> Lvalfreq = {First@ #, Length@ #}& /@ Split@ Sort@
> (Count[Sign[Most@ # - Rest@ #], 1]& /@
> Permutations@ Range@ n)
>
> {Lval, Lp} = {Lvalfreq[[All, 1]], Lvalfreq[[All, 2]]/n!};
> mu = Lval.Lp
> sigma = ((Lval - mu)^2).Lp
>
> Out[2]= {{0, 1}, {1, 57}, {2, 302}, {3, 302}, {4, 57}, {5, 1}}
>
> Out[4]= 5/2
>
> Out[5]= 7/12
>
> And a numerical test:
>
> In[6]:= Lcnt = Array[
> Count[Sign[Most@ # - Rest@ #]&@ Array[Random[]&, n], 1]&,
> 10^5];
>
> {Mean@ Lcnt, Variance@ Lcnt} - {mu, sigma} // N
>
> Out[7]= {0.00262, 0.0033856695}
>
> Maxim Rytin
> m.r at inbox.ru
>
> Darren Glosemeyer wrote:
>
>> For the variance quoted on the TimeSeries page, I initially thought the
>> same thing you did. Assuming the signs are independent and there are equal
>> probabilities of getting positive and negative signs (and 0 probability of
>> getting a 0 difference), the statistic would follow
>> BinomialDistribution[n-1, 1/2], which would have a variance of
>> (n-1)/4. Simulations give a variance that appears to be (n+1)/12 (which
>> would still indicate a typo in the TimeSeries documentation). I haven't
>> figured out why this should be the variance yet. My best guess is that the
>> assumption of independence is not valid given the differencing and as a
>> result the distribution is something other than BinomialDistribution[n-1, 1/2].
>>
>>
>> Darren Glosemeyer
>> Wolfram Research
>>
>>
>> At 05:15 AM 3/10/2006 -0500, john.hawkin at gmail.com wrote:
>>
>>> Hello,
>>>
>>> I have two questions.
>>>
>>> 1. Are there any resources of .nb files available on the internet
>>> where I might find an implementation of the D'Agostino Pearson k^2 test
>>> for normal variates?
>>>
>>> 2. In the mathematica time series package (an add-on), the
>>> "difference-sign" test of residuals is mentioned (url:
>>> http://documents.wolfram.com/applications/timeseries/UsersGuidetoTimeSeries/1.6.2.html).
>>> It says that the variance of this test is (n+1) / 2. However, it
>>> would seem to me that a simple calculation gives a variance of (n-1)/4.
>>> It goes as follows:
>>>
>>> If the series is differenced once, then the number of positive and
>>> negative values in the difference should be approximately equal. If Xi
>>> denotes the sign of each value in the differenced series, then
>>> Mean(Xi) = 0.5(1) + 0.5(0) = 0.5
>>> Var(Xi) = Expectation( (Xi - Mean(Xi))^2 )
>>> = Expectation( Xi^2 -Xi + 0.25 )
>>> = 0.5 - 0.5 + 0.25
>>> = 0.25
>>>
>>> And assuming independence of each sign from the others, the total
>>> variance should be the sum of the individual variances, up to n-1 for n
>>> data points (since there are only n-1 changes in sign), thus
>>>
>>> Variance = (n-1) / 4
>>>
>>> There is an equivalent problem in Lemon's "Stochastic Physics" about
>>> coin flips, for which the answer is listed, without proof, as (n-1)/8.
>>> Because of these three conficting results I am wondering if I have made
>>> an error in my calculation, and if anyone can find one please let me
>>> know.
>>>
>>> Thank you very much,
>>>
>>> -John Hawkin
>>>
>
>
>
>
When you define the sign test in this way the adjacent terms are indeed
not independent. A high residual value is more likely to be followed by
one that is lower for example. However this is a strange way to do a
sign test. Normally you would be interested in the deviation of the
model from the observation i.e. the residual itself. In that case it is
the sign of the residual that is of interest and this would be equally
likely to be positive or negative under the null hypothesis that your
time series model is correct. Thus if you define a value of one or zero
according to the sign of the residual, you would have a series of
independent and identically distributed binomial variables with p=0.5.
The covariances of any two terms are by definition zero (iid variables).
The mean and variance of the sum of the variables would be as calculated
by John.
This seems to me to be a more appropriate way to test the fit of a model.
LP