There is an interesting series of programming job interview challenges proposed by Dev102.com, which is now at its tenth puzzle:

This week question is pretty easy. Your input is an unsorted list of n numbers ranging from 1 to n+1, all of the numbers are unique, meaning that a number can’t appear twice in that list. According to those rules, one of the numbers is missing and you are asked to provide the most efficient method to find that missing number. Notice that you can’t allocate another helper list, the amount of memory that you are allowed to allocate is O(1). Don’t forget to mention the complexity of your algorithm…

I'm not sure I understood correctly the constraint related to the memory allocation.

In my opinion, when they say we are limited to O(1), they mean that we can only allocate a single numeric variable and not any other data structure.

According to this interpretation, the solution is quite easy.

First of all, we take our only variable and store into it the sum of all the numbers between 1 and n + 1, which can be easily computed remembering that 1 + 2 + ... + n = n (n + 1) / 2.

Then, we subtract each element of the array from this value, eventually the result is actually our missing number.

The functional implementation of this imperative algorithm is straightforward:

O(1) doesn’t mean you can only assign one variable, you could assign a hundred or a thousand if you needed to. It just means the amount of memory used can’t be proportional to the length of the list, so you wouldn’t be allowed to solve this by allocating a boolean array, looping through the input list setting each corresponding boolean to true, and then checking the boolean array to see which one is still false. That boolean array would be length n + 1 for input length n, so you’d be using memory proportional to the input list length

You’ve made the assumption that n (n + 1) / 2 is less than the maximum integer size. For an arbitrarily long list you would need an arbitrarily large sum variable which would require an arbitrarily large amount of storage space.

If you read through the comments on the post for this question you’ll see that Jesus DeLaTorre and Heiko Hatzfeld have made similar observations. I accept it’s a very pedantic point though.

One way to do it with truly O(1) memory usage would be to sort the original array using an in-place sort like quicksort. Once the list was in order it would then be trivial to step through it and find the missing number.

Of course, the computational complexity of my solution is O(n^2), which is far less efficient than yours.

I think there is an O(n) solution. We should be able to sort the array in O(n) time because we can tell where a number should appear in the sorted list from its value. As I said in my last post, once we have a sorted list finding the missing number is easy.

Here’s a solution in Python that doesn’t have the overflow/linear space issue. It operates in linear time because the sorting works a bit like a keysort. It operates in constant space because by sorting the original list in-place.

Just because there is a less chance doesnt make the method better, there is still a chance of an overflow. Hence I have provided a method which do NOT use any integer summing. Instead I use cumulative XOR to cancel out duplicates. Here is the third implementation:

/**
* @param sequence : Sequence of length N Contains integers 1 to (N + 1)
* with one number missing. The array is unsorted.
*
* @return The missing number
*
* Algorithm: In this case we do not calculate any integer sum at all. We do cumulative
* XOR of all elements with index. Please note that N XOR N == 0. So all elements eventually
* cancel all index and last number standing after cumulative XOR is the missing number.
* For example consider the sequence [1] [2] [3] [?].
* So 1 XOR 1 == 0, 2 XOR 2 == 0, 3 XOR 3 == 0, so last number standing will be 4.
* It does not matter if numbers are permuted instead of being sorted.
*/