In 2005, Graham Cormode, Flip Korn, S. Muthukrishnan, and Divesh Srivastava published a paper called Effective Computation of Biased Quantiles over Data Streams [CKMS05]. This paper took the Greenwald-Khanna algorithm [GK01] and made the following notable changes:

Generalized the algorithm to work with arbitrary targeted quantiles, where a targeted quantile is a combination of a quantile $\phi$ and a maximum allowable error $\epsilon$. The algorithm will maintain the summary data structure $\mathsf{S}$ such that it will maintain each targeted quantile with the appropriate error rate.

Simplified the Compress() operation by removing the concept of banding.

The paper also demonstrated that the new, more generalized algorithm could handle the following cases with ease:

The uniform quantile case, where each quantile is maintained with identical allowable error $\epsilon$.

Understanding the CKMS Algorithm

Much of the below will only make sense if you have read the paper, so I suggest trying to read [CKMS05] first, and then returning here.

The key to understanding the CKMS algorithm is understanding the invariant function $f(r_i, n)$. I found that the best way to understand $f(r_i, n)$ is to understand the targeted quantiles invariant function (Definition 5 in [CKMS05]) and the error function $f(\phi n, n) / n$ charts (Figure 2 in [CKMS05]).

Uniform Quantiles Invariant Function

Consider the uniform quantile case, where we want all quantiles to be maintained with identical allowable error $\epsilon$. In this case, the targeted quantile tuples set will become $\mathsf{T} = {(\frac{1}{n}, \epsilon), (\frac{2}{n}, \epsilon), \ldots, (\frac{n-1}{n}, \epsilon), (1, \epsilon)}$.

Given the above definition, below is a visualization of the error function $f(\phi n)/n)$ computed for $\epsilon$ = 0.1 as $n$ increases from 3 to 10:

Visually, the solid polyline can be interpreted as the amount of error allowed at a given value of $\phi$; the higher the line, the more error allowed at that particular percentile.

Notice how the solid polyline is converging towards a horizontal line at $f(\phi n, n)/n = 0.2$, which exactly corresponds to the GK invariant function $f(r_i, n) = 2 \epsilon n$, as found in [GK01].

Low-Biased Quantiles Invariant Function

Now consider the low-biased quantile case, where we want lower quantiles to have higher precision than higher quantiles. In this case, the targeted quantile tuples set will become $\mathsf{T} = {(\frac{1}{n}, \frac{\epsilon}{n}), (\frac{2}{n}, \frac{2\epsilon}{n}), \ldots, (\frac{n-1}{n}, \frac{(n-1)\epsilon}{n}), (1, \epsilon)}$.

Given the above definition, below is a visualization of the error function $f(\phi n)/n)$ computed for $\epsilon$ = 0.1 as $n$ increases from 3 to 10:

Notice how the solid polyline allows less error at lower percentiles and is converging towards the line $f(\phi n, n)/n = 0.2\phi$, which exactly corresponds to the low-biased quantiles invariant function $f(r_i, n) = 2 \epsilon r_i$, as found in Definition 3 of [CKMS05].

High-Biased Quantiles Invariant Function

Now consider the high-biased quantile case, where we want higher quantiles to have higher precision than lower quantiles. In this case, the targeted quantile tuples set will become $\mathsf{T} = {(\frac{1}{n}, \epsilon), (\frac{2}{n}, \frac{(n-1)\epsilon}{n}), \ldots, (\frac{n-1}{n}, \frac{2\epsilon}{n}), (1, \frac{\epsilon}{n})}$.

Given the above definition, below is a visualization of the error function $f(\phi n)/n)$ computed for $\epsilon$ = 0.1 as $n$ increases from 3 to 10:

Notice how the solid polyline allows less error at higher percentiles and is converging towards the line $f(\phi n, n)/n = 0.2(1 – \phi)$, which corresponds to the high-biased quantiles invariant function $f(r_i, n) = 2 \epsilon (n – r_i)$ (not found in the [CKMS05] paper).

Availability in Streaming Percentiles Library

All four variants (uniform quantile, low-biased quantile, high-biased quantile, and targeted quantile) of the CKMS algorithm are available in my streaming percentiles library. Here’s an example of what it’s like to use:

Steven Engelhardt is a Director of Engineering at Relativity, an Adjust Professor of Software Engineering at DePaul University, an avid board gamer, and a father of two. His interests include high-performance data processing systems, server-side and cloud systems architecture, and the Chicago Blackhawks.