Median of a Stream of Integers

Problem Statement

Here is a very interesting problem which has been asked in many interviews and coding competitions. The problem statement goes as follows:

Given a never ending stream of number, at any point in time, find the median of all the numbers received till that point.

This means, we need to recalculate the new median whenever there is a number available in the input stream. This problems seems to be a very practical one, imagine a lot of sensors in manufacturing plants are pushing data about temperature, pressure and other physical parameters. It is very important to keep track of the average and median of all such parameters. Hence, Median of a Stream of Integers becomes an interesting and useful problem.

Approaching the Problem

Before approaching the problem, we need to understand the meaning of median.

Meaning of Median

Median of a list of given numbers is defined as below:

If the list contains odd number of elements, it is the middle element of the sorted list.

If the list contains even number of elements, it is the average of the two middle elements of the sorted list.

For e.g for a list 3, 5, 2, 4, 1, 9, 8, 6, 7 . The median can be calculated after sorting the list into 1, 2, 3, 4, 5, 6, 7, 8, 9 and finding the middle element that is 5.

In case the list contains even number of elements 3, 5, 2, 4, 1, 9, 8, 6, 7, 10 . The median is calculated after sorting the list into 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 and taking the average of the two middle elements. (5+6)/2 = 11/2 = 5.5

Approach

At any point of time, there can be two scenarios:

Total numbers in the stream are even

Here we need the two middle elements of the sorted array and take an average for the answer.

Total numbers in the stream are odd.

Here we need the middle element of the sorted array, which is the answer itself.

Let us assume that at any point the median is M

Before the start of the stream M would be 0.

At any time we just need two numbers

The largest number smaller than the median, and

The smallest number larger than the median.

But to find these two numbers we need to maintain two equal sized lists.

Let say the list with numbers smaller than median is LOW, and

The list with numbers bigger than median is HIGH

Algorithm

Let us try to solve the problem generically, for a given incoming number in the stream

If both the lists are of equal size

If the incoming number is smaller than M then add it into the LOW list, the highest number in the LOW list is new median.

Otherwise add the incoming number to the HIGH list, the smallest number in the HIGH list is new median.

If HIGH contains more elements than LOW

If the incoming number is smaller than M, add it to the LOW list

Otherwise remove the smallest number from HIGH list and add it to the LOW list and add the incoming number to the HIGH list.

Take the average of the smallest from HIGH and highest from LOW and that is the new median.

If HIGH contains lesser elements than LOW

If the incoming number is smaller than M, add it to the HIGH list

Otherwise remove the biggest number from the LOW list and add it to the HIGH list and add the incoming number in the LOW list

Take the average of the smallest from HIGH and highest from LOW and that is the new median.