William Gosset was a brewer, but had an interest in statistics. He found the
estimation of the probability for the standard deviate z unreliable if the observations were
few. He derived a correction of the probability estimate according to sample
size and called it t. Gosset published his papers under the pseudonym of
Student and this became known as Student's t.

Student's t allows the use of a small number of measurements to estimate
what may be true of the whole population. This forms the basis of modern
inferential statistics, where a small number of observations are made, and the
results are generalized to the wider population.

The t distribution curve is wider than the normal one. Therefore, a
larger area (or higher probability) of being greater than a particular deviate
is obtained compared to the normal distribution. This difference varies with
sample size (degrees of freedom), such that the probability of t approaches that
of z when the sample size increases towards infinity. Conceptually, this is
represented by the diagram to the left.

With infinite degrees of freedom (i.e., a large sample size), the one tailed t and z have
the same value for a particular probability, but with fewer cases, t will be
larger than z in obtaining the same probability.

When calculating t, a one tail or two tail model needs to be specify. A one
tailed t is conceptually similar to the z, and assumes all the excluded
values are on one side (tail) of the normal distribution, as shown in the following diagram.

A two tailed t however, assumes the area excluded are on both sides (tails) of the t distribution,
so that each side contains only half of the excluded area, as shown in the following
diagram.

In calculations
involving the confidence interval, the two tailed t is usually used.

For examples :

For a two tail model calculating 95% confidence interval using a sample of 21 cases, the t value for
p=0.05 (p=0.025 excluded on each of the two tails) and degrees of freedom=20 is 2.09 is obtained. In other words, the 95% confidence interval
is mean±2.09SD.

For a one tail model, p=0.05 (all 5% excluded in one tail) the t value is 1.72. The 95% confidence
interval is either -∞ to mean+1.72SD, or mean-1.72SD to +∞, depending on which tail is used.

I made up some numbers to demonstrate the use of t. These are not real data.

We compare the birth weight of 10 boys and 10 girls, to draw conclusions whether they differed in weight.
The results are as follows.

n

mean(Kg)

SD(Kg)

Boys

10

3.8

350

Girls

10

3.2

345

Diff=0.6, SEDiff=0.16, df=18

Example 1Example 2

We know that, when comparing 2 groups, t = Difference / SEDiff = 0.6/0.16 = 3.78.
The probability of having a t value of 3.78 or more, with 18 degrees of freedom, is 0.002
for a two tail test. We can therefore conclude that it is most unlikely that boys and girls
have the same weight (rejecting the null hypothesis), and accept that boys are heavier than girls.

we can also find that the t value for 0.05, with 18 degrees of freedom, in a two tailed model,
is 2.1. We can calculate the 95% Confidence interval of the difference as
95%CI = Diff ± t(SEDiff) = 0.6 ± 2.1(0.16) = 0.6 ±0.34
= 0.26 to 0.94. We can therefore conclude that boys are 0.26 to 0.94Kgs heavier than girls
at birth.

Please Note : These figures are made up to demonstrate the math, and do not reflect reality. On average, boys are
0.2Kg heavier than girls, and the Standard Deviation is about 0.5Kg, so it takes more than a sample size of 20 to detect a
significant difference.