For example, one important parameter is where the centre of the distribution is
located, known as central tendency, i.e. the averages, notably the
arithmetic mean, median and mode amongst others.

We also need to know the dispersion of the data, that is, how spread out from the
mean are the values (e.g. are they all closely clustered around the mean or are they well
scattered). The three most usual measures of dispersion are:

range: the distance between the smallest value and the largest value.

variance: the most sophisticated and useful measure, leading to:

standard deviation: which is the square root of the variance.

Uses of standard deviation

The standard deviations is essential for:

assessing the degree of dispersion of the values around its mean,

assessing the error to which the mean of a sample is subject when estimating the
mean of a population from which the sample was taken (see the note at the end).

finding probabilities of events occurring in a given

Various forms of standard deviation

Standard deviation is usually denoted by the Greek symbol sigma (σ).
The calculation of σ depends on the format of the data or variables,
which can be divided into three categories:

Continuous variables, which are numerical values in units of length, mass, time, electrical
measures etc. on a continuous scale

Discrete variables, also numerical values but can only be particular numbers, such as numbers
of employees in companies or shoe sizes (7,7½,8,8½ etc),

Attribute variables, which are descriptive, like defective products, scratches or other damage
on a surface, proportions of people voting or not voting, or activity sampling
(e.g. "operator working" or "not working").

Calculating variance and standard deviation

Standard deviation for continuous variables and discrete variables

The variance is:
(sum of the deviations of the values from their mean)2 divided by (sample size)

In symbolic form this is:

var = (σ(x - mean))2 ÷ n, hence, standard deviation is:

σ = √[(Σ(x - mean))2 ÷ n] where n is the sample size

Example: Calculate the standard deviation for the following ten lengths:

Values: 12, 9, 3, 10, 12, 22, 7, 11, 15 and 19cm.

Mean = 120 ÷ 10 = 12

1

2

3

4

5

6

7

8

9

10

sum

Values (cm)

12

9

3

10

12

22

7

11

15

19

120

deviations

0

-3

-9

-2

0

10

-5

-1

3

7

0

Dev.squared

0

9

81

4

0

100

25

1

9

49

278

Sum of the deviations squared = 278

so the variance = 278 ÷ 10 = 27.8 cm

and standard deviation = √27.8 = 5.27 cm

Standard deviation for attributes data:

Binomial: σ = √[p(1-p) x n] where p is the proportion of the values
and σ is the absolute standard deviation

also σ = √[p(1-p) ÷ n] where p is the proportion of the values
and σ is the proportional standard deviation

Examples: An activity sampling study shows that the number of times the subject
was observed to be working during the day was 36 out of a total of 50 random observations.
Estimate the probable proportion of the day the subject was actually working.

Using the second, proportional, formula:

p = 36 out of 50 = 0.72, (or 72%).

So, σ = √[0.72*(1-0.72) ÷ 50] = √(0.2 ÷ 50) = 0.063 or 6.3%

Therefore, our estimate of the proportion of a day the subject was working

= p ± standard deviation = 0.72 ± 0.063 or 72% ± 6.3%

i.e. somewhere between 65.7% and 78.3%.

(Note on significance: as we have only taken one standard deviation in the
calculation this result is only reliable to 68%. In other words we are only 68% confident
that the result for a whole day actually IS between 65.7% and 78.3%.
To be more accurate we need to take a larger sample and to be more confident
in the result, more standard deviations.
Statistical tables tell us that for 95.4% confidence we must take 2 s.d. and for
99.8% confidence we must take 3 s.d.

So in the above calculations the estimates become, respectively:

95.4% confidence: estimated mean = 0.72 ± (2 x 0.063) or 72% ± 12.6%

99.8% confidence: estimated mean = 0.72 ± (3 x 0.063) or 72% ± 18.9%

It is clear that the more confident we wish to be that the result is reliable,
the bigger the possible error. (What you gain on the swings you lose on the roundabouts).

Where n is not known

Example. A company calculates the mean number of orders placed per week is 400
but obviously it cannot know the number of orders not placed.

This is a case of the Poisson distribution, the standard deviation for which is
simply: σ = √mean. So in this example, σ = 20 orders.

Extension of σ to other distributions

Each distribution (such as Beta, Gamma, exponential, Weibull among others) has its own
particular standard deviation formula.

The standard deviation all other types of data such as continuous and discrete data can be used
similarly to assess errors on sample means. However, standard deviations must be converted into
standard errors - but that is another story!