A probability mass function differs from a probability density function (pdf) in that the latter is associated with continuous rather than discrete random variables; the values of the latter are not probabilities as such: a pdf must be integrated over an interval to yield a probability.[2]

Thinking of probability as mass helps to avoid mistakes since the physical mass is conserved as is the total probability for all hypothetical outcomes x:

∑x∈AfX(x)=1{\displaystyle \sum _{x\in A}f_{X}(x)=1}

When there is a natural order among the hypotheses x, it may be convenient to assign numerical values to them (or n-tuples in case of a discrete multivariate random variable) and to consider also values not in the image of X. That is, fX may be defined for all real numbers and fX(x) = 0 for all x∉{\displaystyle \notin }X(S) as shown in the figure.

Since the image of X is countable, the probability mass function fX(x) is zero for all but a countable number of values of x. The discontinuity of probability mass functions is related to the fact that the cumulative distribution function of a discrete random variable is also discontinuous. Where it is differentiable, the derivative is zero, just as the probability mass function is zero at all such points.[citation needed]

Suppose that (A,A,P){\displaystyle (A,{\mathcal {A}},P)} is a probability space and that (B,B){\displaystyle (B,{\mathcal {B}})} is a measurable space whose underlying σ-algebra is discrete, so in particular contains singleton sets of B. In this setting, a random variable X:A→B{\displaystyle X\colon A\to B} is discrete provided its image is countable. The pushforward measureX∗(P){\displaystyle X_{*}(P)}---called a distribution of X in this context---is a probability measure on B whose restriction to singleton sets induces a probability mass function fX:B→R{\displaystyle f_{X}\colon B\to \mathbb {R} } since fX(b)=P(X−1(b))=[X∗(P)]({b}){\displaystyle f_{X}(b)=P(X^{-1}(b))=[X_{*}(P)](\{b\})} for each b in B.

Now suppose that (B,B,μ){\displaystyle (B,{\mathcal {B}},\mu )} is a measure space equipped with the counting measure μ. The probability density function f of X with respect to the counting measure, if it exists, is the Radon-Nikodym derivative of the pushforward measure of X (with respect to the counting measure), so f=dX∗P/dμ{\displaystyle f=dX_{*}P/d\mu } and f is a function from B to the non-negative reals. As a consequence, for any b in B we have

Suppose that S is the sample space of all outcomes of a single toss of a fair coin, and X is the random variable defined on S assigning 0 to "tails" and 1 to "heads". Since the coin is fair, the probability mass function is