Programmers in all of the above areas have to find ways to generate
random numbers. However, the difficulty to find generators that are
both efficient and have good quality is often underestimated, and so
ad-hoc implementations often fail to meet either or both of these
goals.

The C++ standard library includes std::rand, inherited
from the C standard library, as the only facility to generate
pseudo-random numbers. It is underspecified, because the generation
function is not defined, and indeed early C standard library
implementations provided surprisingly bad generators. Furthermore,
the interface relies on global state, making it difficult or
inefficient to provide for correct operation for simultaneous
invocations in multi-threaded applications.

There is a lot of existing practice in this area. A multitude of
libraries, usually implemented in C or Fortran, is available from the
scientific community. Some implement just one random number
engine, others seek to provide a full framework. I know of no
comprehensive C++ framework for generating random numbers that adheres
to the design principles put forth in section III.

Random number generators are appropriate for this TR because they fall
into one of the domains (numerics) identified in N1314 as a target for
the TR.

Target Audience

There are several different kinds of programmers that are assumed to use
the facilities provided in this proposal.

programmers that provide additional engines

programmers that provide additional distributions

programmers that provide generic add-on packages

programmers that need random numbers

This proposal specifies an infrastructure so that the needs of all
four groups are met. The first two groups benefit from a modular
design so that they can plug in their contributions. Providing add-on
packages benefits from a design that suits to generic programming
needs. Finally, users in need of random numbers benefit from an
interface to the package that is easy to use.

II. Impact On the Standard

What does it depend on, and what depends on it? Is it
a pure extension, or does it require changes to standard components?
Does it require core language changes?

This proposal is a pure library extension. It does not require
changes to any standard classes or functions. It does not require
changes to any of the standard requirement tables. It does not
require any changes in the core language, and it has been implemented
in standard C++ as per ISO 14882:1998.

The ISO C99 extension that specify integral types having a given
minimum or exact bitwidth (e.g. int32_t) aids in
implementing this proposal, however these types (or the equivalent
thereof under another name) can be defined with template
metaprogramming in standard C++, so these are not strictly necessary.

In case the ISO C99 extensions become part of the TR, section IV should
be reviewed whether some requirements could be reformulated with the
ISO C99 extensions.

In case a standard reference-counted smart pointer becomes part of
the TR, section IV should be reviewed and instances of the smart
pointer be added to the acceptable template parameters for a
variate_generator.

III. Design Decisions

Why did you choose the specific design that you did?
What alternatives did you consider, and what are the tradeoffs? What
are the consequences of your choice, for users and implementors? What
decisions are left up to implementors? If there are any similar
libraries in use, how do their design decisions compare to yours?

The design decisions are compared to those in the following libraries:

All of the requirements are revisited in detail in the following
sections.

B. Pseudo-Random vs. Non-Deterministic Random Numbers

This section tries to avoid philosophical discussions about randomness
as much as possible, a certain amount of intuition is assumed.

In this proposal, a pseudo-random number engine is defined as
an initial internal state x(0), a function f that
moves from one internal state to the next x(i+1) := f(x(i)), and an
output function o that produces the output o(x(i)) of the generator.
This is an entirely deterministic process, it is determined by the
initial state x(0) and functions f and o only.
The initial state x(0) is determined from a seed. Apparent randomness
is achieved only because the user has limited perception.

A non-deterministic random-number engine provides a
sequence of random numbers x(i) that cannot be foreseen. Examples are
certain quantum-level physics experiments, measuring the time
difference between radioactive decay of individual atoms or noise of a
Zehner diode. Relatively unforeseeable random sources are also (the
low bits of) timing between key touches, mouse movements, Ethernet
packet arrivals, etc. An estimate for the amount of
unforeseeability is the entropy, a concept from information theory.
Completely foreseeable sequences (e.g., from pseudo-random number
engines) have entropy 0, if all bits are unforeseeable, the entropy is
equal to the number of bits in each number.

Pseudo-random number engines are usually much faster than
non-deterministic random-number engines, because the latter require
I/O to query some randomness device outside of the computer. However,
there is a common interface feature subset of both pseudo-random and
non-deterministic random-number engines. For example, a
non-deterministic random-number engine could be employed to produce
random numbers with normal distribution; I believe this to be an
unlikely scenario in practice.

Other libraries, including those mentioned above, only provide
either pseudo-random numbers, suitable for simulations and games, or
non-deterministic random numbers, suitable for cryptographic
applications.

C. Separation of Engines and Distributions

Random-number generation is usually conceptually separated into
random-number engines that produce uniformly distributed
random numbers between a given minimum and maximum and
random-number distributions that retrieve uniformly
distributed random numbers from some engine and produce numbers
according to some distribution (e.g., Gaussian normal or Bernoulli
distribution).
Returning to the formalism from section A, the former can be identified
with the function f and the latter with the output function o.

This proposal honours this conceptual separation, and provides a class
template to merge an arbitrary engine with an arbitrary distribution
on top. To this end, this proposal sets up requirements for
engines so that each of them can be used to provide uniformly
distributed random numbers for any of the distributions. The
resulting freedom of combination allows for the utmost re-use.

Engines have usually been analyzed with all mathematical and empirical
tools currently available. Nonetheless, those tools show the absence
of a particular weakness only, and are not exhaustive. Albeit
unlikely, a new kind of test (for example, a use of random numbers in
a new kind of simulation or game) could show serious weaknesses in
some engines that were not known before.

This proposal attempts to specify the engines precisely; two different
implementations, with the same seed, should return the same output
sequence. This forces implementations to use the well-researched
engines specified hereinafter, and users can have confidence in their
quality and the limits thereof.

On the other hand, the specifications for the distributions only
define the statistical result, not the precise algorithm to use. This
is different from engines, because for distribution algorithms,
rigorous proofs of their correctness are available, usually under the
precondition that the input random numbers are (truely) uniformly
distributed. For example, there are at least a handful of algorithms
known to produce normally distributed random numbers from uniformly
distributed ones. Which one of these is most efficient depends on at
least the relative execution speeds for various transcendental
functions, cache and branch prediction behaviour of the CPU, and
desired memory use. This proposal therefore leaves the choice of the
algorithm to the implementation. It follows that output sequences for
the distributions will not be identical across implementations. It is
expected that implementations will carefully choose the algorithms for
distributions up front, since it is certainly surprising to customers
if some distribution produces different numbers from one
implementation version to the next.

Other libraries usually provide the same differentiation between
engines and distributions. Libraries rarely have a wrapper around
both engine and distribution, but it turns out that this can hide some
complexities from the authors of distributions, since some facitilies
need to be provided only once. A previous version of this proposal
had distributions directly exposed to the user, and the distribution
type dependent on the engine type. In various discussions, this was
considered as too much coupling.

Since other libraries do not aim to provide a portable specification
framework, engines are sometimes only described qualitatively without
giving the exact parameterization. Also, distributions are given as
specific functions or classes, so the quality-of-implementation
question which distribution algorithm to employ does not need to be
addressed.

D. Templates vs. Virtual Functions

The layering sketched in the previous subsection can be implemented by
either a template mechanism or by using virtual functions in a class
hierarchy. This proposal uses templates. Template parameters are
usually some base type and values denoting fixed parameters for the
functions f and o, e.g. a word size or modulus.

For virtual functions in a class hierarchy, the core language requires
a (nearly) exact type match for a function in a derived classes
overriding a function in a base class. This seems to be unnecessarily
restrictive, because engines can sometimes benefit from using
different integral base types. Also, with
current compiler technology, virtual functions prevent inlining when a
pointer to the base class is used to call a virtual function that is
overridden in some derived class. In particular with applications
such as simulations that sometimes use millions of pseudo-random
numbers per second, losing significant amounts of performance due to
missed inlining opportunities appears to not be acceptable.

The CLHEP library bases all its engines on the abstract base class
HepRandomEngine. Specific engines derive from this class
and override its pure virtual functions. Similarly, all
distributions are based on the base class HepRandom.
Specific distributions derive from this class, override operator(),
and provide a number of specific non-virtual functions.

The GNU Scientific Library, while coded in C, adheres to the
principles of object-structuring; all engines can be used with any of
the distributions. The technical implementation is by mechanisms
similar to virtual functions.

E. Parameterization and Initialization for Engines

Engines usually have a "base" type which is used to store its internal
state. Also, they usually have a choice of parameters. For example,
a linear congruential engine is defined by x(i+1) = (a*x(i)+c) mod m,
so f(x) = (a*x+c) mod m; the base type is "int" and parameters are a,
c, and m. Finding parameters for a given function f that make for
good randomness in the resulting engine's generated numbers x(i)
requires extensive and specialized mathematical training and
experience. In order to make good random numbers available to a large
number of library users, this proposal not only defines generic
random-number engines, but also provides a number of predefined
well-known good parameterizations for those. Usually, there are only
a few (less than five) well-known good parameterizations for each
engine, so it appears feasible to provide these.

Since random-number engines are mathematically designed with computer
implementation in mind, parameters are usually integers representable
in a machine word, which usually coincides nicely with a C++ built-in
type. The parameters could either be given as (compile-time) template
arguments or as (run-time) constructor arguments.

Providing parameters as template arguments allows for providing
predefined parameterizations as simple "typedef"s. Furthermore, the
parameters appear as integral constants, so the compiler can
value-check the given constants against the engine's base type. Also,
the library implementor can choose different implementations depending
on the values of the parameters, without incurring any runtime
overhead. For example, there is an efficient method to compute (a*x)
mod m, provided that a certain magnitude of m relative to the
underlying type is not exceeded. Additionally, the compiler's
optimizer can benefit from the constants and potentially produce
better code, for example by unrolling loops with fixed loop count.

As an alternative, providing parameters as constructor arguments
allows for more flexibility for the library user, for example when
experimenting with several parameterizations. Predefined
parameterizations can be provided by defining wrapper types which
default the constructor parameters.

Other libraries have hard-coded the parameters of their engines and do
not allow the user any configuration of them at all. If the user
wishes to change the parameters, he has to re-implement the engine's
algorithm. In my opinion, this approach unnecessarily restricts
re-use.

Regarding initialization, this proposal chooses to provide
"deterministic seeding" with the default constructor and the
seed function without parameters: Two engines constructed
using the default constructor will output the same sequence. In
contrast, the CLHEP library's default constructed engines will take a
fresh seed from a seed table for each instance. While this approach
may be convenient for a certain group of users, it relies on global
state and can easily be emulated by appropriately wrapping engines
with deterministic seeding.

In addition to the default constructor, all engines provide a
constructor and seed function taking an iterator range
[it1,it2) pointing to unsigned integral values. An engine initializes its state by successively consuming
values from the iterator range, then returning the advanced iterator it1.
This approach has the advantage that the user can completely exploit
the large state of some engines for initialization. Also, it allows
to initialize compound engines in a uniform manner. For example, a
compound engine consisting of two simpler engines would initialize the
first engine with its [it1,it2). The first engine returns a smaller
iterator range that it has not consumed yet. This can be used to
initialize the second engine.

The iterator range [it1,it2) is specified to point to unsigned
long values. There is no way to determine from a generic user
program how the initialization values will be treated and what range
of bits must be provided, except by enumerating all engines, e.g. in
template specializations. The problem is that a given generator might
have differing requirements on the values of the seed range even
within one seed call.

For example, imagine a

xor_combine<lagged_fibonacci<...>, mersenne_twister<...> >

generator. For this, seed(first, last) will consume
values as follows: First, seed the state of the
lagged_fibonacci generator by consuming one item from
[first, last) for each word of state. The values are reduced to
(e.g.) 24 bits to fit the lagged_fibonacci state
requirements. Then, seed the state of the
mersenne_twister by consuming some number of items from
the remaining [first, last). The values are reduced to 32 bits to fit
the mersenne_twister state requirements.

How does a concise programming interface for those increasingly
complex and varying requirements on [first, last) look like? I don't
know, and I don't want to complicate the specification by inventing
something complicated here.

Thus, the specification says for each generator how it uses the seed
values, and how many are consumed. Additional features are left to
the user.

In a way, this is similar to STL containers: It is intended that the user
can exchange iterators to various containers in generic algorithms,
but the container itself is not meant to be exchanged, i.e. having a
Container template parameter is often not adequate. That is analogous
to the random number case: The user can pass an engine around and use its
operator() and min and max
functions generically. However, the user can't generically query the
engine attributes and parameters, simply because most are entirely
different in semantics for each engine.

The seed(first, last) interface can serve two purposes:

In a generic context, the user can pass several integer values >= 1
for seeding. It is unlikely that the user explores the full state space
with the seeds she provides, but she can be reasonably sure that her
seeds aren't entirely incorrect. (There is no formal guarantee for that, except that
the ability to provide bad seeds usually means the parameterization of
the engine is bad, e.g. a non-prime modulus for a linear congruential
engine.) For example, if the user wants a seed(uint32_t)
on top of seed(first, last), one option is to use a
linear_congruential generator that produces the values
required for seed(first, last). When the user defines the
iterator type for first and last so that it
encapsulates the linear_congruential engine in
operator++, the user doesn't even need to know beforehand how
many values seed(first, last) will need.

If the user is in a non-generic context, he knows the specific
template type of the engine (probably not the template value-based
parameterization, though). The precise specification for seed(first,
last) allows to know what values need to be passed in so
that a specific initial state is attained, for example to compare one
implementation of the engine with another one that uses different
seeding.

If the user requires both, he needs to inject knowledge into (1)
so that he is in the position of (2). One way to inject the knowledge
is to use (partial) template specialization to add the knowledge. The
specific parameterization of some engine can then be obtained by
querying the data members of the engines.

I haven't seen the iterator-based approach to engine initialization in
other libraries; most initialization approaches rely on a either a
single value or on per-engine specific approaches to initialization.

An alternative approach is to pass a zero-argument function object
("generator") for seeding. It is trivial to implement a generator
from a given iterator range, but it is more complicated to implement
an iterator range from a generator. Also, the exception object that
is specified to be thrown when the iterator range is exhausted could
be configured in a user-provided iterator to generator mapping.
With this approach, some engines would have three one-argument constructors:
One taking a single integer for seeding, one taking a (reference?) to
a (templated) generator, and the copy constructor. It appears that the
opportunities for ambiguities or choosing the wrong overload are too
confusing to the unsuspecting user.

F. Parameterization and Initialization for Distributions

The distributions specified in this proposal have template parameters
that indicate the output data type (e.g. float,
double, long double) that the user desires.

The probability density functions of distributions usually have
parameters. These are mapped to constructor parameters, to be set at
runtime by the library user according to her requirements. The
parameters for a distribution object cannot change after its
construction. When constructing the distribution, this allows to
pre-compute some data according to the parameters given without risk
of inadvertently invalidating them later.

Distributions may implement operator()(T x), for
arbitrary type T, to meet special needs, for example a
"one-shot" mode where each invocation uses different distribution
parameters.

G. Properties as Traits vs. In-Class Constants

Users might wish to query compile-time properties of the engines and
distributions, e.g. their base types, constant parameters, etc. This
is similar to querying the properties of the built-in types such as
double using std::numeric_limits<>. However,
engines and distributions cannot be simple types, so it does not
appear to be necessary to separate the properties into separate traits
classes. Instead, compile-time properties are given as members types
and static member constants.

H. Which Engines to Include

There is a multitude of pseudo-random number engines available in both
literature and code. Some engines, such as Mersenne Twister, have an
independent algorithm ("base engine"). Others change the values or
order of output of other engines to improve randomness, for example
Knuth's "Algorithm B" ("compound engine"). The template mechanism
allows easy combination of base and compound engines.

Engines may be categorized according to the following dimensions.

integers or floating-point numbers produced (Some engines produce
uniformly distributed integers in the range [min,max], however, most
distribution functions expect uniformly distributed floating-point
numbers in the range [0,1) as the input sequence. The obvious
conversion requires a relatively costly integer to floating-point
conversion plus a floating-point multiplication by
(max-min+1)-1 for each random number used. To save the
multiplication, some engines can directly produce floating-point
numbers in the range [0,1) by maintaining the state x(i) in an
appropriately normalized form, given a sufficiently good
implementation of basic floating-point operations (e.g. IEEE
754).

quality of random numbers produced (What is the cycle length?
Does the engine pass all relevant statistical tests? Up to what
dimension are numbers equidistributed?)

speed of generation (How many and what kind of operations have to
be performed to produce one random number, on average?)

size of state (How may machine words of storage are required to
hold the state x(i) of the random engine?)

option for independent subsequences (Is it possible to move from
x(i) to x(i+k) with at most O(log(k)) steps? This allows to
efficiently use subsequences x(0)...x(k-1), x(k)...x(2k-1), ...,
x(jk)...x((j+1)k-1), ..., for example for parallel computation, where
each of the m processors gets assigned the (independent) subsequence
starting at x(jk) (0 <= k &lt m).)

According to the criteria above, the engines given below were chosen.
The quality and size indications were completed according to best
known parameterizations. Other parameterizations usually yield poorer
quality and/or less size.

engine

int / float

quality

speed

size of state

subsequences

comments

linear_congruential

int

medium

medium

1 word

yes

cycle length is limited to the maximum value representable in one
machine word, passes most statisticial tests with chosen
parameters.

mersenne_twister

int

good

fast

624 words

no

long cycles, passes all statistical tests, good
equidistribution in high dimensions

subtract_with_carry

both

medium

fast

25 words

no

very long cycles possible, fails some statistical tests. Can be
improved with the discard_block compound engine.

discard_block

both

good

slow

base engine + 1 word

no

compound engine that removes correlation provably by throwing away
significant chunks of the base engine's sequence, the resulting speed
is reduced to 10% to 3% of the base engine's.

xor_combine

int

good

fast

base engines

yes, if one of the base engines

compound engine that XOR-combines the sequences of
two base engines

Some engines were considered for inclusion, but left out for the
following reasons:

I. Which Distributions to Include

The following distributions were chosen due to their relatively
widespread use:

Integer uniform

Floating-point uniform

Exponential

Normal

Gamma

Poisson

Binomial

Geometric

Bernoulli

The GNU Scientific Library has a multitude of additional distributions
that are not mentioned in the table below.

Distribution

this proposal

CLHEP

crng

GNU Scientific Library

Swarm

Numerical Recipes

Knuth

uniform (int)

uniform_int

-

-

-

UniformIntegerDist

-

-

uniform (float)

uniform_real

RandFlat

UniformDeviate

flat

UniformDoubleDist

-

uniform

exponential

exponential_distribution

RandExponential

ExponentialDeviate

exponential

ExponentialDist

exponential

exponential

normal

normal_distribution

RandGauss*

NormalDeviate

gaussian

NormalDist

normal (gaussian)

normal

lognormal

-

-

-

lognormal

LogNormalDist

-

-

gamma

gamma_distribution

RandGamma

GammaDeviate

gamma

GammaDist

gamma

gamma

beta

-

-

BetaDeviate

beta

-

-

beta

poisson

poisson_distribution

Poisson

PoissonDeviate

poisson

PoissonDist

poisson

poisson

binomial

binomial_distribution

RandBinomial

BinomialDeviate

binomial

-

binomial

binomial

geometric

geometric_distribution

-

GeometricDeviate

geometric

-

-

geometric

bernoulli

bernoulli_distribution

-

BernoulliDeviate

bernoulli

BernoulliDist

-

-

random bit

-

RandBit

-

-

RandomBitDist

-

-

breit-wigner

-

RandBreitWigner

-

-

-

-

-

chi-square

-

RandChiSquare

-

chisq

-

-

chi-square

landau

-

Landau

-

landau

-

-

-

F

-

-

-

F

-

-

F (variance-ratio)

t

-

-

-

t

-

-

t

J. Taxonomy of Concepts

All of the engines support the number generator requirements,
i.e. they are zero-argument function objects which return numbers.
All of the distributions are one-argument function objects, taking a
reference to an engine and returning numbers. All of the engines and
some of the distributions return uniformly distributed random numbers.
This is reflected in the concept of the uniform random number
generator, which refines number generator. Engines for pseudo-random
numbers model the requirements for pseudo-random number engine, which
refines uniform random number generator.

K. Validation

How can a user have confidence that the implementation of a
random-number engine is exactly as specified, correctly taking into
account any platform pecularities (e.g., odd-sized ints)? After all,
minor typos in the implementation might not be apparent; the numbers
produced may look "random". This proposal therefore specifies for
each engine the 10000th number in the random number sequence that a
default-constructed engine object produces.

This is considered an important feature for library implementors and
serious users to check whether the provided library on the given
platform returns the correct numbers. It could be argued that a
library implementor should provide a correct implementation of some
standard feature in any case.

No other library I have encountered provides explicit validation
values in either their specification or their implementation, although
some of them claim to be widely portable.

Another design option for validation that was part of early drafts of
this proposal is moving the reference number (10000th value in the
sequence) from specification space to implementation space, thus
providing a validation(x) static member function for each
engine that compares the hard-coded 10000th value of the sequence with
some user-provided value x presumeably obtained by
actually invoking the random-number engine object 10000 times. Due to
the template-based design, this amounted to a "val" template value
parameter for each engine, and the validation(x) function
reduced to the trivial comparison "val == x". Handling validation for
floating-point engines required more machinery, because template value
parameters cannot be of floating-point type. Also, from a conceptual
perspective, it seemed odd to demand a validation decision from the
very entitiy which one wanted to validate.

L. Non-Volatile Storage of Engine and Distribution State

Pseudo-random number engines and distributions may store their state on a
std::ostream in textual form and recover it from an
appropriate std::istream. Each engine specifies how its
internal state is represented. The specific algorithm of a
distribution is left implementation-defined, thus no specifics about
the representation of its internal state are given. A store operation
must not affect the number sequence produced. It is expected
that such external storage happens rarely as opposed to producing
random numbers, thus no particular attention to performance is paid.

Engines and distributions use the usual idioms of operator<< and
operator>>. If the user needs additional
processing before or after storage on non-volatile media, there is
always the option to use a temporary std::stringstream.

Distributions sometimes store values from their associated source of
random numbers across calls to their operator(). For example, a
common method for generating normally distributed random numbers is to
retrieve two uniformly distributed random numbers and compute two
normally distributed random numbers out of them.
In order to reset the distribution's random number cache to a defined
state, each distribution has a reset member function. It
should be called on a distribution whenever its associated engine is
exchanged or restored.

M. Values vs. References

Compounded engines such as shuffle_output and
discard_block contain a base engine by value, because
compounding is not intended to be used by reference to an existing
(re-used) engine object.

The wrapper variate_generator can store engines either by
value or by reference, explicitly chosen by the template parameters.
This allows to use references to a single engine in several
variate_generator, but makes it explicit to the user that
he is responsible for the management of the lifetime of the engine.

N. Providing the Probability Density Function in Distributions

Some libraries provide the probability density function of a given
distribution as part of that distribution's interface. While this may
be useful occasionally, this proposal does not provide for such a
feature. One reason is separation of concerns: The distribution class
templates might benefit from precomputing large tables of values
depending on the distribution parameters, while the computation of the
probability density function does not. Also, the function
representation is often straightforward, so the user can easily code
it himself.

O. Implementation-defined behaviour

This proposal specifies implementation-defined behaviour in a number
of places. I believe this is unavoidable; this section provides
detailed reasoning, including why the implementation is required to
document the choice.

The precise state-holding base data types for the various engines are
left implementation-defined, because engines are usually optimized for
binary integers with 32 bits of word size. The specification in this
proposal cannot foresee whether a 32 bit quantity on the machine is
available in C++ as short, int, long, or not at all. It is up to the
implementation to decide which data type fits best. The
implementation is required to document the choice of data type, so
that users can (non-portably) rely on the precise type, for example
for further computation. Should the ISO C99 extensions become part of
ISO C++, the implementation-defined types could be replaced by
e.g. int_least32_t.

The method how to produce non-deterministic random numbers is
considered implementation-defined, because it inherently depends on
the implementation and possibly even on the runtime environment:
Imagine a platform that has operating system support for randomness
collection, e.g. from user keystrokes and Ethernet inter-packet
arrival timing (Linux /dev/random does this). If, in some
installation, access to the operating system functions providing these
services has been restricted, the C++ non-deterministic random number
engine has been deprived of its randomness. An implementation is
required to document how it obtains the non-deterministic random
numbers, because only then can users' confidence in them grow.
Confidence is of particular concern in the area of cryptography.

The algorithms how to produce the various distributions are specified
as implementation-defined, because there is a vast variety of
algorithms known for each distribution. Each has a different
trade-off in terms of speed, adaptation to recent computer
architectures, and memory use. The implementation is required to
document its choice so that the user can judge whether it is
acceptable quality-wise.

P. Lower and upper bounds on UniformRandomNumberGenerator

The member functions min() and max() return
the lower and upper bounds of a UniformRandomNumberGenerator. This
could be a random-number engine or one of the uniform_int
and uniform_real distributions.

Those bounds are not specified to be tight, because for some engines,
the bounds depend on the seeds. The seed can be changed during the
lifetime of the engine object, while the values returned by
min() and max() are invariant. Therefore,
min() and max() must return conservative
bounds that are independent of the seed.

Q. With or without manager class

This proposal presents a design with a manager class template,
variate_generator, after extensive discussion with some
members of the computing division of Fermi National Accelerator
Laboratory. User-written and library-provided engines and
distributions plug in to the manager class. The approach is remotely
similar to the locale design in the standard library, where
(user-written) facets plug in to the (library-provided) locale class.

Earlier versions of this propsoal made (potentially user-written)
distributions directly visible to (some other) user that wants to get
random numbers distributed accordingly ("client"), there was no
additional management layer between the distribution and the client.

The following additional features could be provided by the management
layer:

The management layer contains an adaptor (to convert the engine's
output into the distribution's input) in addition to the engine and
the distribution.

Adaptors and distributions do not store state, but instead, in
each invocation, consume an arbitrary number of input values and
produce a fixed number of output values. The management layer is
responsible for connecting the engine - adaptor - distribution chain,
invoking each part when more numbers are needed from the next part of
the chain.

On request, the management layer is responsible for saving and
restoring the buffers that exist between engine, adaptor, and
distribution.

On request, the management layer shares engines with another
instance of the management layer.

It is envisioned that user-written distributions will often be based
on some arbitrary algorithmic distribution, instead of trying to
implement a given mathematical probability density function. Here is
an example:

Retrieve a uniform integer with value either 1 or 2.

If 1, return a number with normal distribution.

If 2, return a number with gamma distribution.

Both in this case and when implementing complex distributions based on
a probability density function (e.g. the gamma distribution), it is
important to be able to arbitrarily nest distributions. Either design
allows for this with utmost ease: Compounding
distributions are contained in the compound by value, and each one
produces a single value on invocation. With the alternative design of
giving distributions the freedom to produce
more than one output number in each invocation, compound distributions
such as the one shown above need to handle the situation that each of
the compounding members could provide several output values, the
number of which is unknown at the time the distribution is written.
(Remember that it is unfeasible to prescribe a precise algorithm for
each library-provided distribution in the standard, see subsection O.)
That approach shifts implementation effort from the place where it
came up, i.e. the distribution that chose to use an algorithm that
produces several values in one invocation, to the places where that
distribution is used. This, considered by itself, does not seem to be
a good approach. Also, only very few distributions lead to a natural
implementation that produces several values in one invocation; so far,
the normal distribution is the only one known to me. However, it is
expected that there will be plenty of distributions that use a normal
distribution as its base, so far those known to me are lognormal and
uniform_on_sphere (both not part of this proposal). As a conclusion,
independent of whether the design provides for a management layer or
not, distributions should always return a single value on each
invocation, and management of buffers for additional values that might
be produced should be internal to the distribution. Should it become
necessary for the user to employ buffer management more often, a
user-written base class for the distributions could be of help.

The ability to share engines is important. This proposal makes
lifetime management issues explicit by requiring pointer or reference
types in the template instantiation of variate_generator
if reference semantics are
desired. Without a management class, providing this features is
much more cumbersome and imposes additional burden on the programmers
that produce distributions. Alternatively, reference semantics could
always be used, but this is an
error-prone approach due to the lack of a standard reference-counted
smart pointer. I believe it is impossible to add a reference-counted
sharing mechanism in a manager-class-free design without giving its precise
type. And that would certainly conflict in semantic scope with a
smart pointer that will get into the standard eventually.

The management layer allows for a few common features to be factored
out, in particular access to the engine and some member typedefs.

Comparison of other differing features between manager and non-manager
designs:

When passing a variate_generator as a an argument to
a function, the design in this proposal allows selecting only those
function signatures during overload resolution that are intended to be
called with a variate_generator. In contrast,
misbehaviour is possible without a manager class, similar to
iterators in function signatures: template<class BiIter>
void f(BiIter it) matches f(5), without regard to
the bidirectional iterator requirements. An error then happens when
instantiating f. The situation worsens when several competing
function templates are available and the wrong one is chosen
accidentally.

With the engine passed into the distribution's constructor, the
full type hierarchy of engine (and possibly adaptor) are available to
the distribution, allowing to cache information derived from the
engine (e.g. its value range) . Also, (partial) specialization
of distributions could be written that optimize the interaction with a
specific engine and/or adaptor. In this proposal's design,
this information is available in the variate_generator template
only. However, optimizations by specialization for the
engine/adaptor combination are perceived to possibly have high
benefit, while those for the (engine+adaptor) / distribution
combination are presumed to be much less beneficial.

Having distribution classes directly exposed to the client easily
allows that the user writes a distribution with an additional
arbitrary member function declaration, intended to produce random
numbers while taking additional parameters into account. In this
proposal's design, this is possible by using the
generic operator()(T x) forwarding function.

R. Add-on packages

This proposal specifies a framework for random number generation.
Users might have additional requirements not met by this framework.
The following extensions have been identified, and they are expressly not
addressed in this proposal. It is perceived that these items can be
seamlessly implemented in an add-on package which sits on top of an
implementation according to this proposal.

unique seeding: Make it easy for the user to provide a unique seed
for each instance of a pseudo-random number engine. Design idea:

class unique_seed;
template<class Engine>
Engine seed(unique_seed&);

The "seed" function retrieves some unique seed from the unique_seed
class and then uses the seed(first, last) interface of an
engine to implant that unique seed. Specific seeding requirements for
some engine can be met by (partial) template specialization.

runtime-replaceable distributions and associated save/restore
functionality: Provide a class hierarchy that invokes distributions by
means of a virtual function, thereby allowing for runtime replacement
of the actual distribution. Provide a factory function to reconstruct
the distribution instance after saving it to some non-volatile media.

S. Adaptors

Sometimes, users may want to have better control how the bits from the
engine are used to fill the mantissa of the floating-point value that
serves as input to some distribution. This is possible by writing an
engine wrapper and passing that in to the variate_generator as the
engine. The variate_generator will only apply automatic adaptations
if the output type of the engine is integers and the input type for
the distribution is floating-point or vice versa.

Z. Open issues

Some engines require non-negative template arguments, usually bit
counts. Should these be given as "int" or "unsigned int"? Using
"unsigned int" sometimes adds significant clutter to the
presentation. Or "size_t", but this is probably too large a type?

IV. Proposed Text

(Insert the following as a new section in clause 26 "Numerics".
Adjust the overview at the beginning of clause 26 accordingly.)

This subclause defines a facility for generating random numbers.

Random number requirements

A number generator is a function object (std:20.3
[lib.function.objects]).

In the following table, X denotes a number generator
class returning objects of type T, and u is
a (possibly const) value of X.

Number generator requirements (in addition
to function object)

expression

return type

pre/post-condition

complexity

X::result_type

T

std::numeric_limits<T>::is_specialized is
true

compile-time

In the following table, X denotes a uniform random number
generator class returning objects of type T,
u is a value of X and v is a
(possibly const) value of X.

Uniform random number generator
requirements (in addition to number generator)

expression

return type

pre/post-condition

complexity

u()

T

-

amortized constant

v.min()

T

Returns some l where l is less than or equal to all values
potentially returned by operator().
The return value of this function shall not change during the lifetime
of v.

constant

v.max()

T

If std::numeric_limits<T>::is_integer, returns
l where l is less than or equal to all values potentially returned by
operator(), otherwise, returns l where l is strictly less than all
values potentially returned by operator(). In any case,
the return value of this function shall not change during the lifetime
of v.

constant

In the following table, X denotes a pseudo-random number
engine class returning objects of type T, t
is a value of T, u is a value of
X, v is an lvalue of X,
it1 is an lvalue and it2 is a (possibly
const) value of an input iterator type It
having an unsigned integral value type, x, y
are (possibly const) values of
X, os is convertible to an lvalue of type
std::ostream, and is is convertible to an
lvalue of type std::istream.

A pseudo-random number engine x has a state x(i) at any given time.
The specification of each pseudo-random number engines defines the size of its
state in multiples of the size of its result_type, given
as an integral constant expression.

creates an engine with the same initial state as all other
default-constructed engines of type X in the
program.

O(size of state)

X(it1, it2)

-

creates an engine with the initial state given by the range
[it1,it2). it1 is advanced by the size of
state. If the size of the range [it1,it2) is insufficient, leaves
it1 == it2 and throws invalid_argument.

O(size of state)

u.seed()

void

post: u == X()

O(size of state)

u.seed(it1, it2)

void

post: If there are sufficiently many values in [it1, it2) to
initialize the state of u, then u ==
X(it1,it2). Otherwise, it1 == it2, throws
invalid_argument, and further use of u
(except destruction) is undefined until a seed member
function has been executed without throwing an exception.

O(size of state)

u()

T

given the state u(i) of the engine, computes u(i+1), sets the
state to u(i+1), and returns some output dependent on u(i+1)

amortized constant

x == y

bool

== is an equivalence relation. The current state x(i)
of x is equal to the current state y(j) of y.

O(size of state)

x != y

bool

!(x == y)

O(size of state)

os << x

std::ostream&

writes the textual representation of the state x(i) of
x to os, with
os.fmtflags set to
ios_base::dec|ios_base::fixed|ios_base::left and the fill
character set to the space character. In the output, adjacent numbers
are separated by one or more space characters.
post: The os.fmtflags and fill character are
unchanged.

O(size of state)

is >> v

std::istream&

sets the state v(i) of v as determined by reading its
textual representation from is.
post: The is.fmtflags are unchanged.

O(size of state)

In the following table, X denotes a random distribution
class returning objects of type T, u is a
value of X, x is a (possibly const)
value of X, and e is an lvalue of an
arbitrary type that meets the requirements of a uniform random number
generator, returning values of type U.

subsequent uses of u do not depend on values
produced by e prior to invoking reset.

constant

u(e)

T

the sequence of numbers returned by successive invocations with
the same object e is randomly distributed with some
probability density function p(x)

amortized constant number of invocations of e

os << x

std::ostream&

writes a textual representation for the parameters and additional
internal data of the distribution x to os.
post: The os.fmtflags and fill character are
unchanged.

O(size of state)

is >> u

std::istream&

restores the parameters and additional internal data of the
distribution u.
pre: is provides a textual representation that was
previously written by operator<<
post: The is.fmtflags are unchanged.

O(size of state)

Additional requirements: The sequence of numbers produced by
repeated invocations of x(e) does not change whether or
not os << x is invoked between any of the
invocations x(e). If a textual representation
is written using os << x and that representation
is restored into the same or a different object y of the
same type using is >> y, repeated invocations of
y(e) produce the same sequence of random numbers as would
repeated invocations of x(e).

In the following subclauses, a template parameter named
UniformRandomNumberGenerator shall denote a class that
satisfies all the requirements of a uniform random number generator.
Moreover, a template parameter named Distribution shall
denote a type that satisfies all the requirements of a random
distribution.
Furthermore, a template parameter named RealType shall
denote a type that holds an approximation to a real number. This type
shall meet the requirements for a numeric type (26.1
[lib.numeric.requirements]), the binary operators +, -, *, / shall be
applicable to it, a conversion from double shall exist,
and function signatures corresponding to those
for type double in subclause 26.5 [lib.c.math] shall be
available by argument-dependent lookup (3.4.2 [basic.lookup.koenig]).
[Note: The built-in floating-point types float
and double meet these requirements.]

The template argument for the parameter Engine shall be
of the form U, U&, or
U*, where U denotes a
class that satisfies all the requirements of a uniform random number
generator. The member engine_value_type shall name
U.

Specializations of variate_generator satisfy the
requirements of CopyConstructible. They also satisfy the requirements
of Assignable unless the template parameter Engine is of
the form U&.

The complexity of all functions specified in this section is constant.
No function described in this section except the constructor throws an
exception.

variate_generator(engine_type eng, distribution_type d)

Effects: Constructs a variate_generator
object with the associated uniform random number generator
eng and the associated random distribution
d.
Throws: If and what the copy constructor of Engine or
Distribution throws.

result_type operator()()

Returns:distribution()(e)Notes: The sequence of numbers produced by the
uniform random number generator e, se, is
obtained from the sequence of numbers produced by the associated
uniform random number generator eng, seng, as
follows: Consider the values of
numeric_limits<T>::is_integer for
T both Distribution::input_type and
engine_value_type::result_type. If the values for both
types are true, then se is identical to
seng. Otherwise, if the values for both types are
false, then the numbers in seng are divided by
engine().max()-engine().min() to obtain the
numbers in se. Otherwise, if the value for
engine_value_type::result_type is true and
the value for Distribution::input_type is
false, then the numbers in seng are divided by
engine().max()-engine().min()+1 to obtain the
numbers in se. Otherwise, the mapping from seng
to se is implementation-defined. In all cases, an implicit
conversion from engine_value_type::result_type to
Distribution::input_type is performed. If such a
conversion does not exist, the program is ill-formed.

template<class T> result_type operator()(T value)

Returns:distribution()(e, value). For
the semantics of e, see the description of
operator()().

engine_value_type& engine()

Returns: A reference to the associated uniform random
number generator.

const engine_value_type& engine() const

Returns: A reference to the associated uniform random
number generator.

Random number engine class templates

Except where specified otherwise, the complexity of all functions specified in
the following sections is constant. No function described in this
section except the constructor and seed functions taking an iterator
range [it1,it2) throws an exception.

The class templates specified in this section satisfy all the
requirements of a pseudo-random number engine (given in tables in
section x.x), except where specified otherwise. Descriptions are
provided here only for operations on the engines that are not
described in one of these tables or for operations where there is
additional semantic information.

All members declared static const in any of the following
class templates shall be defined in such a way that they are usable as
integral constant expressions.

The template parameter IntType shall denote an integral
type large enough to store values up to (m-1). If the template
parameter m is 0, the behaviour is
implementation-defined. Otherwise, the template parameters
a and c shall be less than m.

Class template mersenne_twister

A mersenne_twister engine produces random numbers
o(x(i)) using the following computation, performed modulo
2w. um is a value with only the upper
w-r bits set in its binary representation.
lm is a value with only its lower r bits set
in its binary representation. rshift is a bitwise right
shift with zero-valued bits appearing in the high bits of the result.
lshift is a bitwise left shift with zero-valued bits
appearing in the low bits of the result.

The template parameter IntType shall denote a signed
integral type large enough to store values up to m-1. The following
relation shall hold: 0<s<r. Let w the number of bits in the
binary representation of m.

The template parameter UniformRandomNumberGenerator shall
denote a class that satisfies all the requirements of a uniform random
number generator, given in tables in section x.x. r <= p. The size
of the state is the size of b plus 1.

The template parameters UniformRandomNumberGenerator1 and
UniformRandomNumberGenerator1 shall denote classes that
satisfy all the requirements of a uniform random number generator,
given in tables in section x.x . The size of the state is
the size of b1 plus the size of
b2.

xor_combine()

Effects: Constructs a xor_combine
engine. To construct each of the subobjects b1 and
b2, invokes their respective default constructors.

xor_combine(const base1_type & rng1, const base2_type & rng2)

Effects: Constructs a xor_combine
engine. Initializes b1 with a copy of rng1 and
b2 with a copy of rng2.

Class random_device

A random_device produces non-deterministic random
numbers. It satisfies all the requirements of a uniform random number
generator (given in tables in section x.x). Descriptions are provided
here only for operations on the engines that are not described in one
of these tables or for operations where there is additional semantic
information.

If implementation limitations prevent generating non-deterministic
random numbers, the implementation can employ a pseudo-random number
engine.

Effects: Constructs a random_device
non-deterministic random number engine. The semantics and default
value of the token parameter are implementation-defined.
[Footnote: The parameter is intended to allow an implementation to
differentiate between different sources of randomness.]
Throws: A value of some type derived from
exception if the random_device could not be
initialized.

result_type min() const

Returns:numeric_limits<result_type>::min()

result_type max() const

Returns:numeric_limits<result_type>::max()

double entropy() const

Returns: An entropy estimate for the random numbers
returned by operator(), in the range min() to
log2( max()+1). A deterministic random
number generator (e.g. a pseudo-random number engine) has entropy 0.
Throws: Nothing.

result_type operator()()

Returns: A non-deterministic random value, uniformly
distributed between min() and max(),
inclusive. It is implementation-defined how these values are
generated.
Throws: A value of some type derived from
exception if a random number could not be obtained.

Random distribution class templates

The class templates specified in this section satisfy all the
requirements of a random distribution (given in tables in section
x.x). Descriptions are provided here only for operations on the
distributions that are not described in one of these tables or for
operations where there is additional semantic information.

A template parameter named IntType shall denote a type
that represents an integer number. This type shall meet the
requirements for a numeric type (26.1 [lib.numeric.requirements]), the
binary operators +, -, *, /, % shall be applicable to it, and a
conversion from int shall exist. [Footnote: The
built-in types int and long meet these
requirements.]

Given an object whose type is specified in this subclause, if the
lifetime of the uniform random number generator referred to in the
constructor invocation for that object has ended, any use of that
object is undefined.

No function described in this section throws an exception, unless an
operation on values of IntType or RealType
throws an exception. [Note: Then, the effects are undefined,
see [lib.numeric.requirements]. ]

The algorithms for producing each of the specified distributions are
implementation-defined.

Class template uniform_int

A uniform_int random distribution produces integer random
numbers x in the range min <= x <= max, with equal probability.
min and max are the parameters of the distribution.

A uniform_int random distribution satisfies all the
requirements of a uniform random number generator (given in tables in
section x.x).

Returns: A uniform random number x in the range 0
<= x < n. [Note: This allows a
variate_generator object with a uniform_int
distribution to be used with std::random_shuffe, see
[lib.alg.random.shuffle]. ]