Relevant For...

The probability density function or PDF of a continuous random variable gives the relative likelihood of any outcome in a continuum occurring. Unlike the case of discrete random variables, for a continuous random variable any single outcome has probability zero of occurring. The probability density function gives the probability that any value in a continuous set of values might occur. Its magnitude therefore encodes the likelihood of finding a continuous random variable near a certain point.

Heuristically, the probability density function is just the distribution from which a continuous random variable is drawn, like the normal distribution, which is the PDF of a normally-distributed continuous random variable.

Contents

Definition of the Probability Density Function

The probability that a random variable \(X\) takes a value in the (open or closed) interval \([a,b]\) is given by the integral of a function called the probability density function \(f_X(x)\):

\[P(a\leq X \leq b) = \int_a^b f_X(x) \,dx.\]

If the random variable can be any real number, the probability density function is normalized so that:

\[\int_{-\infty}^{\infty} f_X(x) \,dx = 1.\]

This is because the probability that \(X\) takes some value between \(-\infty\) and \(\infty\) is one: X does take a value!

These formulas may make more sense in comparison to the discrete case, where the function giving the probabilities of events occurring is called the probability mass function \(p(x)\). In the discrete case, the probability of outcome \(x\) occurring is just \(p(x)\) itself. The probability \(P(a\leq X \leq b)\) is given in the discrete case by:

\[P(a\leq X \leq b) = \sum_{a\leq x \leq b} p(x),\]

and the probability mass function is normalized to one so that:

\[\sum_x p(x) = 1,\]

where the sum is taken over all possible values of \(x\). One can see that the analogous formulas for continuous random variables are identical with the sums promoted to integrals.

Mean and Variance of Continuous Random Variables

Recall that in the discrete case the mean or expected value \(E(X)\) of a discrete random variable was the weighted average of the possible values \(x\) of the random variable:

\[E(X) = \sum_x x p(x).\]

This formula makes intuitive sense. Suppose that there were \(n\) outcomes, equally likely with probability \(\frac{1}{n}\) each. The the expected value is just the arithmetic mean, \(E(X) = \frac{x_1 + x_2 + \ldots + x_n}{n}\). In the cases where some outcomes are more likely than others, these outcomes should contribute more to the expected value.

In the continuous case, the generalization is again found just by replacing the sum with the integral and \(p(x)\) with the PDF:

\[E(X) = \int_{-\infty}^{\infty} x f(x) \,dx,\]

assuming the possible values \(X\) are the entire real line. If \(X\) is constrained instead to \([0,\infty]\) or some other continuous interval, the integral limits should be changed accordingly.

The variance is defined identically to the discrete case:

\[\text{Var} (X) = E(X^2) - E(X)^2.\]

Computing \(E(X^2)\) only requires inserting an \(x^2\) instead of an \(x\) in the above formulae:

\[E(X^2) = \int_{-\infty}^{\infty} x^2 f(x) \,dx,\]

The mean and the variance of a continuous random variable need not necessarily be finite or exist. Cauchy distributed continuous random variable is an example of a continuous random variable having both mean and variance undefined.

Note that the exponential random variable is defined for \(x\) in the range \([0,\infty)\) (if not obvious why, consider that the PDF is only normalized for this range). Computing the expected values that define the mean and variance respectively using integration by parts: