Introductory algorithms courses encourage us to think of computers as perfect machines that calculate exact answers. We typically design programs to provide exactly this type of perfection. However, it is possible to construct efficient algorithms by relaxing the zero error constraint. The demand for space and time resources can be drastically reduced in exchange of a small, quantifiable probability of error.

In this lecture, we will follow the journey of MildlyInappropriateCatAppreciationSociety.com and its competitors as they try to tackle some of the problems of managing large amounts of cat-related data. Motivated by examples and terrible cat puns, you will learn 5 probabilistic techniques that allow you do things such as:

efficiently test whether an item is already present in a gigantic distributed database

efficiently count the number of distinct items in said big database

efficiently tabulate the frequencies of different items in said big database

You will learn these techniques and their error bounds in sufficient detail that you will be able to implement them once the lecture is finished. They can all be implemented in a few dozen lines of code!

The theme of this lecture was inspired by this talk by Christian Steinruecken.