[EX1] Examples of Results and Their Use

Consider the whole EU/EEA at year 2040. Open the file under
2040, and copy it to a statistical program, or a spreadsheet
program. The first row shows which columns belong to which group by
sex (M = males, F = females) and age. In this case 5-year
age-groups are available.

There are 40 columns, each with 3,000 elements. We will do two
things. First, we will produce a histogram for the predictive
distribution of the total population. Second, we will compute
statistics for the age-dependency ratio.

To get the total population we have to sum row-wise
the 40 columns, call them C1, C2, ..., C40. Call the resulting
vector S, so S = C1 + C2 + ... + C40. It may be a good idea to
divide the numbers by 1,000,000 to have the results in millions, so
we substitute S := S/1000000. Figure below shows what the histogram
of S then looks like.

Let us define the age-dependency ratio, call it A, as the
proportion of the population in ages 0-19 and 65+ to the population
in ages 20-64. Let Y be the young, O the old, and W those in the
middle. We first compute Y = (C1 + C2 + C3 + C4) + (C21 + C22 + C23
+ C24) to get the young males and females; W = (C5 + ... + C13) +
(C25 + ... + C33) to get the ones in the middle; and O = C14 + ...
+ C20) + (C34 + ... + C40). Here parentheses are just for clarity.
We can then compute the age-dependency ratio simply as A = (Y +
O)/W. Typical statistics one might want to compute are the mean =
0.9092, median = 0.9054, and standard deviation = 0.0639, first
quartile = 0.8716, and third quartile = 0.9424.

Some practical words of caution:

It is typically not meaningful to study a few of the
smallest and a few of the largest simulated values. The models
that are used to derive the predictive distributions are
approximations, and while they perform well in most cases, there
can be rare parameter combinations that produce population paths
that we would not consider as realistic - even when accounting for
their small probability. The larger the number of simulation
rounds, the greater the chance that such outlying values are
observed. By putting more constraints into the models, such values
could be eliminated, but this adds complexity, and has not been
done.

It is typically better to use medians, quartiles, deciles or
percentiles to summarize simulation results than to use means or
standard deviations. The latter are sensitive to outliers
mentioned above. In contrast, even the first percentile is based on
the location of the 30th smallest observation in a simulation study
of 3,000 runs, so it is not influenced by the values taken by the
29 observations that are smaller.

A tradition in statistics is to compute 95% confidence
intervals. This derives historically from hypothesis testing in,
e.g., agricultural experimentation, and is intended to guard
against too quick an acceptance of weakly tested methods or
findings. In those applications one can often (by spending more
money) get more precise data, if the interval is too wide and more
accuracy is needed. In forecasting, we are dealing with uncertainty
that is an order of magnitude greater, and there seems to be
relatively little we can do about it. We have found it more
practicable to present 80% prediction intervals, or even 67% or 50%
prediction intervals to give the user of a forecast an idea of how
things might deviate from the point forecast.