The class type binomial_distribution
represents a binomial
distribution: it is used when there are exactly two mutually
exclusive outcomes of a trial. These outcomes are labelled "success"
and "failure". The binomial distribution
is used to obtain the probability of observing k successes in N trials,
with the probability of success on a single trial denoted by p. The binomial
distribution assumes that p is fixed for all trials.

Note

The random variable for the binomial distribution is the number of
successes, (the number of trials is a fixed property of the distribution)
whereas for the negative binomial, the random variable is the number
of trials, for a fixed number of successes.

The PDF for the binomial distribution is given by:

The following two graphs illustrate how the PDF changes depending upon
the distributions parameters, first we'll keep the success fraction
p fixed at 0.5, and vary the sample size:

Alternatively, we can keep the sample size fixed at N=20 and vary the
success fraction p:

Caution

The Binomial distribution is a discrete distribution: internally
functions like the cdf
and pdf are treated
"as if" they are continuous functions, but in reality the
results returned from these functions only have meaning if an integer
value is provided for the random variate argument.

The quantile function will by default return an integer result that
has been rounded outwards. That is to say lower
quantiles (where the probability is less than 0.5) are rounded downward,
and upper quantiles (where the probability is greater than 0.5) are
rounded upwards. This behaviour ensures that if an X% quantile is
requested, then at least the requested coverage
will be present in the central region, and no more than
the requested coverage will be present in the tails.

This behaviour can be changed so that the quantile functions are
rounded differently, or even return a real-valued result using Policies. It is
strongly recommended that you read the tutorial Understanding
Quantiles of Discrete Distributions before using the quantile
function on the Binomial distribution. The reference
docs describe how to change the rounding policy for these
distributions.

The largest acceptable probability that the true value of the success
fraction is less than the value
returned.

method

An optional parameter that specifies the method to be used to compute
the interval (See below).

For example, if you observe k successes from n
trials the best estimate for the success fraction is simply k/n,
but if you want to be 95% sure that the true value is greater
than some value, pmin, then:

There are currently two possible values available for the method
optional parameter: clopper_pearson_exact_interval
or jeffreys_prior_interval. These constants are
both members of class template binomial_distribution,
so usage is for example:

The default method if this parameter is not specified is the Clopper
Pearson "exact" interval. This produces an interval that guarantees
at least 100(1-alpha)% coverage, but which is known to be
overly conservative, sometimes producing intervals with much greater
than the requested coverage.

The alternative calculation method produces a non-informative Jeffreys
Prior interval. It produces 100(1-alpha)%
coverage only in the average case, though is typically
very close to the requested coverage level. It is one of the main methods
of calculation recommended in the review by Brown, Cai and DasGupta.

Please note that the "textbook" calculation method using a
normal approximation (the Wald interval) is deliberately not provided:
it is known to produce consistently poor results, even when the sample
size is surprisingly large. Refer to Brown, Cai and DasGupta for a full
explanation. Many other methods of calculation are available, and may
be more appropriate for specific situations. Unfortunately there appears
to be no consensus amongst statisticians as to which is "best":
refer to the discussion at the end of Brown, Cai and DasGupta for examples.

The two methods provided here were chosen principally because they can
be used for both one and two sided intervals. See also:

The greatest number of successes
that may be observed from n trials with success fraction p, at
probability P. Note that the value returned is a real-number, and
not an integer. Depending on the use case you may want to take
either the floor or ceiling of the result. For example:

The smallest number of successes
that may be observed from n trials with success fraction p, at
probability P. Note that the value returned is a real-number, and
not an integer. Depending on the use case you may want to take
either the floor or ceiling of the result. For example:

In the following table p is the probability that
one trial will be successful (the success fraction), n
is the number of trials, k is the number of successes,
p is the probability and q = 1-p.

Function

Implementation Notes

pdf

Implementation is in terms of ibeta_derivative:
if nCk is the binomial coefficient of a and b, then we have:

Which can be evaluated as ibeta_derivative(k+1,n-k+1,p)/(n+1)

The function ibeta_derivative
is used here, since it has already been optimised for the lowest
possible error - indeed this is really just a thin wrapper around
part of the internals of the incomplete beta function.

Since the cdf is non-linear in variate k
none of the inverse incomplete beta functions can be used here.
Instead the quantile is found numerically using a derivative
free method (TOMS
Algorithm 748).

quantile from the complement

Found numerically as above.

mean

p*n

variance

p*n*(1-p)

mode

floor(p*(n+1))

skewness

(1-2*p)/sqrt(n*p*(1-p))

kurtosis

3-(6/n)+(1/(n*p*(1-p)))

kurtosis excess

(1-6*p*q)/(n*p*q)

parameter estimation

The member functions find_upper_bound_on_pfind_lower_bound_on_p
and find_number_of_trials
are implemented in terms of the inverse incomplete beta functions
ibetac_inv,
ibeta_inv,
and ibetac_invb
respectively