Statistics Tutorials: The Definitive Guide to Percentiles – All the Tricks on the Book

This is a good topic for a tutorial because the concept of percentile tends to be confusing, due to the fact that rather confusing information is sometimes provided to students, and there many conventions around that could sometimes be misleading and even plain wrong. In the following paragraphs we will be outlying the concept of percentile in a precise way, so that you know exactly what we are talking about.

Cumulative Distribution

First of all, we need to be clear about the definition of percentile, which is associated to the concept of cumulative distribution. For a random variable X, the associated cumulative distribution function is defined as

\[{{F}_{X}}\left( x \right)=\Pr \left( X\le x \right)\]

This is, for a given value x, the associated cumulative distribution function is the probability that the random variable is less than or equal to x. Notice that the symbol used x as the argument is a generic function argument. If we write \({{F}_{X}}\left( y \right)\) we mean the cumulative distribution at the value of y (which corresponds to the probability that the random variable is less than or equal to y), or if we write \({{F}_{X}}\left( 4 \right)\) we mean the cumulative distribution at 4 (which corresponds to the probability that the random variable is less than or equal to 4).

With such as definition, it is clear the \({{F}_{X}}\) is a function that takes values from 0 to 1 (since it comes from a probability) and it is non-decreasing (this is, it either increases or stay constant, but it never decreases), but what is less obvious, and which can be proven from the axioms of probability, any cumulative distribution function \({{F}_{X}}\) is quite well behaved, as it is right-continuous (which very roughly means that the function is either continuous or it may potentially have "jumps"....it is more complicated than that, but for now you can think that way). In general, random variables that take a continuous range of values will have a continuous cumulative function \({{F}_{X}}\) whereas random variables that take a discrete range of values will have "jumps" in the graph of their associated \({{F}_{X}}\).

What is a Percentile?

Now we can define a percentile. For \(\alpha \in \left[ 0,1 \right]\), we define a \(\alpha\) percentile as \({{P}_{\alpha }}\), so that

\[\Pr \left( X\le {{P}_{\alpha }} \right)=\alpha\]

In human language, an \(\alpha\) percentile is a point so that the probability that the random variable is less than or equal to that point is exactly \(\alpha\). For example, a 0.10 percentile is a point in the distribution so that the probability that the random variable is less than or equal to that point is exactly 0.10. Typically, instead of asking, for example, for the 0.10 percentile, you will be asked for the 10% percentile, or the 10th percentile. Those are simple notations that should be aware of.

A percentile \({{P}_{\alpha }}\) for a random variable X is well defined when the cumulative distribution function \({{F}_{X}}\left( x \right)\) is continuous. If \({{F}_{X}}\left( x \right)\) has "jumps" in its graph, then it could be a bit more difficult to define some percentile values. This is why percentiles are well defined for continuous random variables (such as the normal distribution, exponential distribution, etc), but it may be difficult for discrete variables (such as the Poisson, Binomial, etc).

How to Compute is a Percentile?

First, you need to know the cumulative function \({{F}_{X}}\). So then, for \(\alpha\) between 0 and 1 we need to solve for \(x\):

\[\alpha ={{F}_{X}}\left( x \right)\]

Observe that solving for x the above equation is the same as intersecting the curve \( F_{X}(x)\) with the line \(y=\alpha\) (which is parallel to the x-axis). When \({{F}_{X}}\) is continuous, the intersection between the line \(y=\alpha\) and \({{F}_{X}}\left( x \right)\) exists, but that is not necessarily true for all values of \(\alpha\) for a non-continuous \({{F}_{X}}\left( x \right)\).

A Percentile is a parameter or a Statistic?

For the definition we have provided, a percentile is a population parameter, as it depends strictly on the distribution function and not on sample data. That is where the confusion arises. Sometimes students are given sample data are asked to compute a percentile. In reality, what they are being asked to compute is a sample percentile, a statistic that is computed using sample data, and which we hope that will be a good estimate of the corresponding. population percentile.