Every now and then, one of my software engineering colleagues will ask me to explain what a probability density function (PDF) is. PDFs occur in many parts of machine learning. A density function is just a graph where a.) the total area under the graph is 1.0 and b.) X represents an outcome that is probabilistically variable, and c.) the probability of X between two values is given by the area under the graph.

In the figure I made a dummy PDF. Notice the total area is a triangle and so area = 1/2 * base * height = 1/2 * 4 * 0.5 = 1.0.

Imagine X is the score on some weird test and can only be between 3.0 and 7.0. Suppose you want the probability that a random person scores between 3.0 and 4.0 on the test.

For this dummy PDF, you could use geometry but most PDFs have complex shapes. The bell-shaped curve, aka Gaussian distribution, is an example. So, in most cases you compute area under a graph by using the calculus derivative. The dummy PDF equation is f(X) = 0.125X – 0.375. If you integrate that you get I(X) = 0.0625X^2 – 0.375X (you’ve probably forgotten your calculus but how to do the calculation isn’t important here).

So, the area from 3.0 to 4.0 = I(4.0) – I(3.0) = (a lot of annoying algebra) = 0.0625. in other words, the probability that a random person will score between 3.0 and 4.0 on the test is 0.0625.