1.3 Confidence Interval Around the Mean

A confidence interval reflects the set of statistical hypotheses that won’t be rejected at a given significance level. So the confidence interval around the mean reflects all possible values of the mean that can’t be rejected by the data. It is a multiple of the standard error added to and subtracted from the mean.

2.2 Multinomial Confidence Intervals

If you have more than two categories, a multinomial confidence interval supplies upper and lower confidence limits on all of the category proportions at once. The formula is nearly identical to the preceding one.

3. Formulas For Reporting Count Data

If the incoming events are independent, their counts are well-described by a Poisson distribution. A Poisson distribution takes a parameter λ, which is the distribution’s mean — that is, the average arrival rate of events per unit time.

3.1. Standard Deviation of a Poisson Distribution

The standard deviation of Poisson data usually doesn’t need to be explicitly calculated. Instead it can be inferred from the Poisson parameter:

3.2. Confidence Interval Around the Poisson Parameter

The confidence interval around the Poisson parameter represents the set of arrival rates that can’t be rejected by the data. It can be inferred from a single data point of c events observed over t time periods with the following formula:

3.3. Conditional Test of Two Poisson Parameters

Please never do this:

From a statistical point of view, 5 events is indistinguishable from 7 events. Before reporting in bright red text that one count is greater than another, it’s best to perform a test of the two Poisson means.

4. Formulas For Comparing Distributions

If you want to test whether groups of observations come from the same (unknown) distribution, or if a single group of observations comes from a known distribution, you’ll need a Kolmogorov-Smirnov test. A K-S test will test the entire distribution for equality, not just the distribution mean.

4.1. Comparing An Empirical Distribution to a Known Distribution

The simplest version is a one-sample K-S test, which compares a sample of n points having an observed cumulative distribution function F to a known distribution function having a c.d.f. of G. The test statistic is:

Dn=supx|F(x)−G(x)|

In plain English, Dn is the absolute value of the largest difference in the two c.d.f.s for any value of x.

The critical value of Dn is given by Kα/n‾√, where Kα is the value of x that solves:

1−α=2π‾‾‾√x∑k=1∞exp(−(2k−1)2π2/(8×2))

The critical must be solved iteratively, e.g. by Newton’s method. If only the p-value is needed, it can be computed directly by solving the above for α.

4.2. Comparing Two Empirical Distributions

The two-sample version is similar, except the test statistic is given by:

Dn1,n2=supx|F1(x)−F2(x)|

Where F1 and F2 are the empirical c.d.f.s of the two samples, having n1 and n2 observations, respectively. The critical value of the test statistic is Kα/n1n2/(n1+n2)‾‾‾‾‾‾‾‾‾‾‾‾‾‾√ with the same value of Kαabove.

To compute the critical value, this equation must also be solved iteratively. When k=2, the equation reduces to a two-sample Kolmogorov-Smirnov test. The case of k=4 can also be reduced to a simpler form, but for other values of k, the equation cannot be reduced.