doublestep=1.;// in z doublerange=4;// min and max z = -range to +range.intprecision=17;// traditional tables are only computed to much lower precision.// but std::numeric_limits<double>::max_digits10; on new Standard Libraries gives// 17, the maximum number of digits that can possibly be significant.// std::numeric_limits<double>::digits10; == 15 is number of guaranteed digits,// the other two digits being 'noisy'.// Construct a standard normal distribution snormals;// (default mean = zero, and standard deviation = unity)cout<<"Standard normal distribution, mean = "<<s.mean()<<", standard deviation = "<<s.standard_deviation()<<endl;

And all this you can do with a nanoscopic amount of work compared to
the team of human computers toiling
with Milton Abramovitz and Irene Stegen at the US National Bureau of
Standards (now NIST). Starting
in 1938, their "Handbook of Mathematical Functions with Formulas,
Graphs and Mathematical Tables", was eventually published in 1964,
and has been reprinted numerous times since. (A major replacement is
planned at Digital Library of Mathematical
Functions).

Pretty-printing a traditional 2-dimensional table is left as an exercise
for the student, but why bother now that the Math Toolkit lets you
write

doublez=2.;cout<<"Area for z = "<<z<<" is "<<cdf(s,z)<<endl;// to get the area for z.

Correspondingly, we can obtain the traditional 'critical' values for
significance levels. For the 95% confidence level, the significance
level usually called alpha, is 0.05 = 1 - 0.95 (for a one-sided test),
so we can write

cout<<"95% of area has a z below "<<quantile(s,0.95)<<endl;// 95% of area has a z below 1.64485

and a two-sided test (a comparison between two levels, rather than
a one-sided test)

cout<<"95% of area has a z between "<<quantile(s,0.975)<<" and "<<-quantile(s,0.975)<<endl;// 95% of area has a z between 1.95996 and -1.95996

First, define a table of significance levels: these are the probabilities
that the true occurrence frequency lies outside the calculated interval.

It is convenient to have an alpha level for the probability that z
lies outside just one standard deviation. This will not be some nice
neat number like 0.05, but we can easily calculate it,

Notice the distinction between one-sided (also called one-tailed) where
we are using a > or < test (and
not both) and considering the area of the tail (integral) from z up
to +∞, and a two-sided test where we are using two > and
< tests, and thus considering two tails, from -∞ up to z low and z
high up to +∞.

So the 2-sided values alpha[i] are calculated using alpha[i]/2.

If we consider a simple example of alpha = 0.05, then for a two-sided
test, the lower tail area from -∞ up to -1.96 is 0.025 (alpha/2) and
the upper tail area from +z up to +1.96 is also 0.025 (alpha/2), and
the area between -1.96 up to 12.96 is alpha = 0.95. and the sum of
the two tails is 0.025 + 0.025 = 0.05,

Armed with the cumulative distribution function, we can easily calculate
the easy to remember proportion of values that lie within 1, 2 and
3 standard deviations from the mean.

cout.precision(3);cout<<showpoint<<"cdf(s, s.standard_deviation()) = "<<cdf(s,s.standard_deviation())<<endl;// from -infinity to 1 sdcout<<"cdf(complement(s, s.standard_deviation())) = "<<cdf(complement(s,s.standard_deviation()))<<endl;cout<<"Fraction 1 standard deviation within either side of mean is "<<1-cdf(complement(s,s.standard_deviation()))*2<<endl;cout<<"Fraction 2 standard deviations within either side of mean is "<<1-cdf(complement(s,2*s.standard_deviation()))*2<<endl;cout<<"Fraction 3 standard deviations within either side of mean is "<<1-cdf(complement(s,3*s.standard_deviation()))*2<<endl;

To a useful precision, the 1, 2 & 3 percentages are 68, 95 and
99.7, and these are worth memorising as useful 'rules of thumb', as,
for example, in standard
deviation:

Fraction 1 standard deviation within either side of mean is 0.683
Fraction 2 standard deviations within either side of mean is 0.954
Fraction 3 standard deviations within either side of mean is 0.997

We could of course get some really accurate values for these confidence intervals
by using cout.precision(15);

Fraction 1 standard deviation within either side of mean is 0.682689492137086
Fraction 2 standard deviations within either side of mean is 0.954499736103642
Fraction 3 standard deviations within either side of mean is 0.997300203936740

But before you get too excited about this impressive precision, don't
forget that the confidence intervals of the standard
deviation are surprisingly wide, especially if you have
estimated the standard deviation from only a few measurements.

Mean lifespan of 100 W bulbs is 1100 h with standard deviation of 100
h. Assuming, perhaps with little evidence and much faith, that the
distribution is normal, we construct a normal distribution called
bulbs with these values:

The we can use the Cumulative distribution function to predict fractions
(or percentages, if * 100) that will last various lifetimes.

cout<<"Fraction of bulbs that will last at best (<=) "// P(X <= 1000)<<expected_life<<" is "<<cdf(bulbs,expected_life)<<endl;cout<<"Fraction of bulbs that will last at least (>) "// P(X > 1000)<<expected_life<<" is "<<cdf(complement(bulbs,expected_life))<<endl;doublemin_life=900;doublemax_life=1200;cout<<"Fraction of bulbs that will last between "<<min_life<<" and "<<max_life<<" is "<<cdf(bulbs,max_life)// P(X <= 1200)-cdf(bulbs,min_life)<<endl;// P(X <= 900)

Note

Real-life failures are often very ab-normal, with a significant number
that 'dead-on-arrival' or suffer failure very early in their life:
the lifetime of the survivors of 'early mortality' may be well described
by the normal distribution.

A machine is set to pack 3 kg of ground beef per pack. Over a long
period of time it is found that the average packed was 3 kg with a
standard deviation of 0.1 kg. Assuming the packing is normally distributed,
we can find the fraction (or %) of packages that weigh more than 3.1
kg.

doublemean=3.;// kgdoublestandard_deviation=0.1;// kgnormalpacks(mean,standard_deviation);doublemax_weight=3.1;// kgcout<<"Percentage of packs > "<<max_weight<<" is "<<cdf(complement(packs,max_weight))<<endl;// P(X > 3.1)doubleunder_weight=2.9;cout<<"fraction of packs <= "<<under_weight<<" with a mean of "<<mean<<" is "<<cdf(complement(packs,under_weight))<<endl;// fraction of packs <= 2.9 with a mean of 3 is 0.841345// This is 0.84 - more than the target 0.95// Want 95% to be over this weight, so what should we set the mean weight to be?// KK StatCalc says:doubleover_mean=3.0664;normalxpacks(over_mean,standard_deviation);cout<<"fraction of packs >= "<<under_weight<<" with a mean of "<<xpacks.mean()<<" is "<<cdf(complement(xpacks,under_weight))<<endl;// fraction of packs >= 2.9 with a mean of 3.06449 is 0.950005doubleunder_fraction=0.05;// so 95% are above the minimum weight mean - sd = 2.9doublelow_limit=standard_deviation;doubleoffset=mean-low_limit-quantile(packs,under_fraction);doublenominal_mean=mean+offset;normalnominal_packs(nominal_mean,standard_deviation);cout<<"Setting the packer to "<<nominal_mean<<" will mean that "<<"fraction of packs >= "<<under_weight<<" is "<<cdf(complement(nominal_packs,under_weight))<<endl;

Setting the packer to 3.06449 will mean that fraction of packs >=
2.9 is 0.95.

Setting the packer to 3.13263 will mean that fraction of packs >=
2.9 is 0.99, but will more than double the mean loss from 0.0644 to
0.133.

Alternatively, we could invest in a better (more precise) packer with
a lower standard deviation.

To estimate how much better (how much smaller standard deviation) it
would have to be, we need to get the 5% quantile to be located at the
under_weight limit, 2.9

Fraction of packs >= 2.9 with a mean of 3 and standard deviation
of 0.06 is 0.9522

Now we are getting really close, but to do the job properly, we could
use root finding method, for example the tools provided, and used elsewhere,
in the Math Toolkit, see Root
Finding Without Derivatives.

But in this normal distribution case, we could be even smarter and
make a direct calculation.

normals;// For standard normal distribution, doublesd=0.1;doublex=2.9;// Our required limit.// then probability p = N((x - mean) / sd)// So if we want to find the standard deviation that would be required to meet this limit,// so that the p th quantile is located at x,// in this case the 0.95 (95%) quantile at 2.9 kg pack weight, when the mean is 3 kg.doubleprob=pdf(s,(x-mean)/sd);doubleqp=quantile(s,0.95);cout<<"prob = "<<prob<<", quantile(p) "<<qp<<endl;// p = 0.241971, quantile(p) 1.64485// Rearranging, we can directly calculate the required standard deviation:doublesd95=abs((x-mean))/qp;cout<<"If we want the "<<p<<" th quantile to be located at "<<x<<", would need a standard deviation of "<<sd95<<endl;normalpack95(mean,sd95);// Distribution of the 'ideal better' packer.cout<<"Fraction of packs >= "<<under_weight<<" with a mean of "<<mean<<" and standard deviation of "<<pack95.standard_deviation()<<" is "<<cdf(complement(pack95,under_weight))<<endl;// Fraction of packs >= 2.9 with a mean of 3 and standard deviation of 0.0608 is 0.95

Notice that these two deceptively simple questions (do we over-fill
or measure better) are actually very common. The weight of beef might
be replaced by a measurement of more or less anything. But the calculations
rely on the accuracy of the standard deviation - something that is
almost always less good than we might wish, especially if based on
a few measurements.

A bolt is usable if between 3.9 and 4.1 long. From a large batch of
bolts, a sample of 50 show a mean length of 3.95 with standard deviation
0.1. Assuming a normal distribution, what proportion is usable? The
true sample mean is unknown, but we can use the sample mean and standard
deviation to find approximate solutions.