Saturday, January 30, 2016

Find Missing Number

1. Given a set of positive numbers less than equal to N, where one number is missing. Find the missing number efficiently.
2. Given a set of positive numbers less than equal to N, where two numbers are missing. Find the missing numbers efficiently.
3. Given a sequence of positive numbers less than equal to N, where one number is repeated and another is missing. Find the repeated and the missing numbers efficiently.
4. Given a sequence of integers (positive and negative). Find the first missing positive number in the sequence.
Solutions should not use no more than O(n) time and constant space.

For example,
1. A=[2,1,5,8,6,7,3,10,9] and N=10 then, 4 is missing.
2. A=[2,1,5,8,6,7,3,9] and N=10 then, 4 and 10 are missing.
3. A=[2,1,5,8,3,6,7,3,10] and N=10 then, 3 is repeating and 9 is missing.
2. A=[1,2,0] then first missing positive is 3, A=[3,4,-1,1], the first missing positive is 2.Single Number Missing

A trivial approach would be to sort the array and loop through zero to N-1 to check whether index i contains number i+1. This will take constant space but takes O(nlgn) time. We can do a counting sort to sort the array but still it’ll take in O(n+k) time and O(k) space. But we need to do it O(n) time and constant space, how?

Two Missing Numbers
We can solve it using math same as above. Let’s say p and q are the missing numbers among 1 to N. Then summation of given input numbers,

S = N*(N+1)/2 - p -q
=>p+q = N*(N+1)/2 -S

Also, we know that multiplication of numbers 1 to N is N! –

P = N!/pq
=>pq = N!/P

Then we can solve these two equations to find the missing number p and q. However this approach has a serious limitation because the product of a large amount of numbers can overflow the buffer. We could have used long but still multiplication operation is not cheap.

Can we avoid multiplication? As the numbers are positive and between the range [1,N] we could use the element of the array as index into the array to mark them as exists. Then the positions for missing element will be unmarked. But it’ll change the array itself. How do we make sure that marking one position we are not losing information at the position we are marking. For example, A=[2,1,5,8,6,7,3,9] then if we mark A[A[0]-1] i.e. A[1] with special value, lets say 0 marking 2 as not missing, then A becomes A’=[2,0,5,8,6,7,3,9], then we are losing information and A[A[1]-1] i.e. A[0] will never get marked to inform us that 1 is not missing.

We can actually overcome the overwriting issue by just negating the number at index A[abs(A[i])-1] for each i. So, we are not losing value but just changing the sign and indexing based on absolute value. After we mark for all the numbers we can now have a second pass on the array and check for unmarked i.e. positive elements. At the same time we can revert the negated elements back to positive thus getting back to original array. Below is the implementation of this idea.

The above solution has a limitation that we assume the input array is not immutable. What if we can’t update the input array (i.e.e immutable) and still we need to find the missing values in O(n) time and constant space?

What if we have one single number getting repeated twice and one missing? Note that, missing one element and repeating one element is equivalent phenomena with respect to xor arithmetic. Because during xor1 these repeating element will nullify each other and made the element missing in the xor. That is we can use the same procedure described above to find one missing and one repeated element.

First missing Positive
For example, A=[1,2,0] then first missing positive is 3, A=[3,6,4,-1,1], the first missing positive is 2. Can we use some of the above techniques we discussed? Note that, there might be more than one missing numbers as well as negative numbers and zeros. If all numbers were positive then we could have used the 2nd method for finding two missing number where we used the element as index to negate the value for marking them as non-missing. However, in this problem we may have non-positive numbers i.e. zeros and negatives. So, we can’t simply apply the algorithm. But if we think carefully then we notice that we actually don’t have to care about zeros and negative numbers because we only care about smallest positive numbers. That is if we can put aside the non-positive numbers and only considers the positives then we can simply apply the “element as index to mark non-missing by negating the value” method to find the missing positives.

How do we put aside non-positive elements? We can actually do a partition as we do in quicksort to create a partition where all positive elements will be put on left of the partition and all zeros and negatives on the right hand. If we find such a partition index q, then A[0..q-1] will contain all positives. Now, we just have to scan the positive partition of array the i.e. from 0 to q-1 and mark A[abs(A[i])-1] as marked i.e negating the value. After marking phase we sweep through the partition again to find first index i where we find a positive element. Then i+1 is the smallest i.e. first missing positive. If we do not find such an index then there is no missing numbers between 1 to q (why?). In that case we return next positive number q+1 (why?). Below is the implementation of this algorithm which assumes we can update the original array. It runs in O(n) time and constant space.