There is an array containing all the integers from 1 to n in some order, except that one integer is missing. Suggest an efficient algorithm for finding the missing number.

A friend gave me the problem above as I was driving him from the airport. He had just been at a job interview where they gave him two problems. This one can be solved in linear time and constant space.

But my friend was really excited by the next one:

There is an array containing all the integers from 1 to n in some order, except that one integer is missing and another is duplicated. Suggest an efficient algorithm for finding both numbers.

My friend found an algorithm that also works in linear time and constant space. However, the interviewer didn’t know that solution. The interviewer expected an algorithm that works in n log n time.

The company claims that they are looking for the smartest people in the world, and my friend had presented them with an impressive solution to the problem. Despite his excitement, I predicted that they would not hire him. Guess who was right?

I reacted like this because of my own story. Many years ago I was interviewing for a company that also wanted the smartest people in the world. At the interview, the guy gave me a list of problems, but said that he didn’t expect me to solve all of them — just a few. The problems were so difficult that he wanted to sit with me and read them together to make sure that I understood them.

The problems were Olympiad style, which is my forte. While we were reading them, I solved half of them. During the next hour I solved the rest. The interviewer was stunned. He told me of an additional problem that he and his colleagues had been trying to solve for a long time and couldn’t. He asked me to try. I solved that one as well. Guess what? I wasn’t hired. Hence, my reaction to my friend’s interview.

The good news: I still remember the problem they couldn’t solve:

A car is on a circular road that has several gas stations. The gas stations are running low on gas and the total amount of gas available at the stations and in the car is exactly enough for the car to drive around the road once. Is it true that there is a place on the road where the car can start driving, stopping to refuel at each station, so that the car completes a full circle without running out of gas? Assume that the car’s tank is large enough not to present a limitation.

Leo:

Adam:

Three possibilities:
-The recruiters know that most people in the company aren’t particularly smart, but they want to maintain the illusion for morale’s sake, and hiring someone actually smart would dispel it.
-Most people can’t distinguish differences in intelligence more than one level above themselves, so by advertising “the smartest people in the world,” they get a reputation of having really smart people without having to find any.
-The people putting together the interview questions list were actually smart, and knew that those would be interesting and fairly quick problems, but they passed it on to the interviewers without the solutions and without saying how hard the problems were.

Not hiring because of correctly answering “those” questions seems petty. I am not an expert on hiring decisions, but it seems unlikely that one would reject an applicant on the basis of arithmetic ingenuity. In fact, most good companies look out for such capabilities. Also, finding patterns in chaos is apparently a human superpower.

ckuehne:

xil:

Companies are looking for people that are smart enough to do the job. Mathematics or coding hackers are hard to motivate and can’t stand doing filling jobs like admin etc. Therefore, HR tend to not hire those. However, in many cases people like you tend to thrive in startups or underground teams. Not in corporate world. As a hint for your son, there are big data startups around looking for algo wizards. 🙂

Michael Watts:

Suppose there are n stations, s_1, …, s_n, arranged around my circle. I extend a colored arc clockwise along the circle from each station representing how far it’s possible to drive from that station on the amount of gas in stock there. By hypothesis, the total length of these arcs is the circumference of the circle. If no arc reached another station, then the total length would be less than that, so there is some station s_i which stocks enough gas to reach station s_{i+1}.

Now consider the problem where there are n-1 stations: t_1, …, t_i, t_{i+2}, …, t_n. Station t_i stocks the sum of the gas stocked by s_i and s_{i+1} in the previous stage of the problem; all other t_j stock the amount of gas at s_j. In effect, we’ve dumped all the gas from s_{i+1} into s_i and then removed s_{i+1}.

If we iterate this process, we’ll eventually be left with one station containing all the gas. That station will serve as the starting point for the original problem.

Christian Genco:

Mario:

Hm, I don’t see the XOR solution. Is there a property of the XOR of first N integers? I have the linear time and constant space solution involving some properties of sums and sums of squares for both problems.

jh:

The circle-road problem can be solved by sorting the gas stations by the ratio of ‘amount of gas available’ / ‘distance to next gas station’. Then any start point that puts you in reach of the gas station with the best ratio will be fine.

Very nice car problem:
You start from an arbitrary station, you go in an arbitrary direction starting with an empty car and you depict the evolution the tank until you come back to your origin. The station reaching the minimum should be the starting point since there is no way you can go in a negative consumption afterward.

Fabio:

I assume by constant space you mean storing a constant amount of numbers, as opposed to a constant number of bits (storing n takes O(log n) space).
If that’s so, I think I arrived at the same solution as Evan; if not, I’m confused.

The car one works by a straightforward induction argument, unless I missed something (hint: assume the car has no fuel).

xxjthxx:

ckuehne:

Thinking about it, it comes down to the level of analysis you choose. If you assume constant space per integer, then it is indeed solvable with O(1) additional space. For example, this is usually assumed when analyzing sorting algorithms and I guess this is also what you had in mind.

OMF:

In Ireland, to say that someone is “smart”, means that they are only good at making fun of other people’s mistakes.

I sometimes think that this is what the word actually means.

Sine the problem seems to allow access to the numbers in the list, and because they are ordered, you could also use a branched binary search to find the missing numbers. This would also work if there were several missing numbers. It still has O(N) complexity though.

From n=3 onwards the array itself is always going to be large enough to contain the answer no matter how many bits it takes which means the only space needed is a single pointer/counter to iterate the array.

The problem with the missing and the duplicate works similarly by special-casing n=2, and from n=3 onwards there are always sufficient bits in the array to store the full answer.

This is of course if we are allowed to destroy the contents of the input array, but since the problem does not state that isn’t permitted, I think it’s safe to say a constant-space solution is feasible in both cases even under the strictest interpretation of what is “constant space”.

Fabio:

@Jerry: If you store the array, you’re already storing O(n) numbers, so you can’t get sub-linear space.
The only way to get sub-linear space is by using some sort of streaming model: you get to see each number in the array once, then it goes away. You can store it if you want, but then that takes up space.
If you can re-use the space of the array, you can just look at a[0], then go to a[a[0]], read that number (move it to an additional space), replace a[a[0]] with -1 (to indicate you saw that number once), then keep iterating (going back to last unchecked location if cycles). You can easily detect duplicates, and you can check for the missing number in linear time once you’re done with this.
That works, but is not really constant space.

Foo:

Bob:

assuming they are in fact in order, identify the order. assuming ascending order, and assuming a 1-indexed array, array[n] == n; is the test, and you can use a basic binary search algorithm to find array[n]!=n, then narrow down the starting and ending point for this phenomena. if n <= array[n] then the missing is below and the duplicate is above. if n >= array[n] the duplicate is below and the missing above. If n == array[n] we haven’t found the region yet. in all cases this can be accomplished in O(logn) time and linear space, assuming you decide to write a binary search in iterative form instead of recursive.

Make Monies:

The car problem reduces to the Generalized Pigeonhole Principle, I think. And Michael Watts has it right.

I think ckuehne is right, too, storing a constant number of values is not the same as constant space. Though I guess we’re assuming numbers are stored in something fixed-width that can store items with value at least n.

Leo:

@Mario: Y = 1 XOR 2 XOR … XOR n XOR a[1] XOR a[2] XOR … XOR a[n] will give you the missing number XOR the duplicated number. The 1 bits in Y show where they differ. Let’s find the position of the least significant 1 bit in Y, then traverse the array again summing only values with bit 1 in that position. By comparing the sum with the expected value (that can be computed in parallel with the array traversal) we’ll find one of the numbers, and whether it was missing or duplicated. The other number is Y XOR the first number.

@Fabio: see @MakeMonies’ solution for an illustration of what I meant; if you destructively use the space of the passed-in input array (without making a local copy!) you need no more than constant space. Note that this generally works out, even with any form of variable-length encoding of the values in the input array, as long as we are free to pick a suitable encoding of the results outputs.

@MakeMonies: your implementation could be further improved if you add the indices to the result while at the same time subtracting the individual values; your final calculation using the multiplication (and the summation into array[0]) could overflow and potentially cause issues, whereas accumulating during the iteration is guaranteed to never exceed the final value of the result.

nikhil:

the first one is pretty obvious:
sum the given(with one missing) series. sum the series 1:n = n(n+1)/2. The difference is the missing value.

The second one involves a simple identity. x^2 – y^2 = (x-y)(x+y)
consider series A with one value a, replaced by b; we also have series B, 1:n ie the correct series
calculate sum(A) – sum(B); this is equivalent to b-a
calculate sum(A^2) – sum(B^2); this is b^2 – a^2 = (b-a)(b+a)
substitute to get answer

Fabio:

@Jerry: Yes, I understood what you meant.
This assumes that you have to whole copy of the array somewhere you can write to. Having this copy requires you to have n numbers stored, and you are therefore using the space storing those numbers take.
By sublinear space I interpret it to mean we’re considering some sort of streaming model in which the input arrives in an online fashion: For example, you’re receiving the numbers one at a time through a network, and you can’t store things in whatever the source is.
Using the space the array is stored in is using linear space (or n log n bits).
In other words, the statement says constant space, not constant extra space.

@Fabio: the possibility of streaming input is a fair enough point, however I was working off the interpretation that “there is an array containing” in the problem statement implies that we have an addressable array to operate on.

maggie:

Yeah, could see how you or a friend wouldn’t get hired. Comapnies want people who are good, but not so good as to outshine the master. If you’re really good, you’ll be there for a little bit, soak in the best they have to offer, then will leave and form something better. Definitely a fear in that happening, they have.

Yeah when they were hiring they didn’t mention the critical piece of information and that is they are hiring the smartest people in the world who is not smarter than the person hiring or the incumbent workforce.

Bob Bixler:

Would be cool if you gave your solutions to all three problems.
It’s sad but it’s true that employers usually don’t want their employees to be smarter than them. But seriously what are the names of those companies?

Steve Clamage:

If you are smarter than the company expected and still didn’t get hired, it does not *necessarily* mean you are too smart. They might have dropped you for other reasons. Just to list some that I’ve encountered when evaluating applicants:
– Personality is a bad fit for the organization (likely to be unable to work with others).
– Resume shows very frequent job changes (probably will quit this job too).
– Reasons for leaving past jobs: “the boss was an idiot.” (Really? All of them?)
– Applicant’s focus is on benefits, vacation time and the like, with no show of interest in the actual job.

If you were really dropped because you are too smart, you probably would hate working there anyway.
My advice is to act in the interview as who you are. That’s who will be working there if you get the job. If they hire you because they think you are someone else, you will all be unhappy.

Adam2:

@Steve
Thats all well and good, supposing that in the end everyone finds a job they are happy with, and dont need money to live in the meanwhile.
Id rather they hire me and then decide for myself.
Unfortunately the job market is a game of musical chairs with about 10 percent of chairs missing.

mike:

Tarun Anand:

For the car problem
Imagine there are two cars, one goes clockwise and the other goes counter clockwise… There must be one point in the circle where they can go in opposite directions and meet back at the starting point.

dEmigOd:

Car problem.

I will assume, that car’s tank is empty in the beginning, but there is a gas station in the starting point s_0 with the amount equal to what is in car’s tank.
I assume the circle is of length 1. I assume, that car needs the arc’s length amount of gas to move from arc’s start to end.
now let look at the gas station s_i, if the car with the empty tank is able to reach station s_i+1, when we could safely “transit” gas from s_i+1 to s_i, making it hold
the cumulative amount of gas, if the stations are different. After this process ends: there is not even one station with enough gas to reach next one, or there is exactly one station left, in which case we are done. So in the former case we just summarize gas amount in all stations and it is less, when 1. contradiction.

Frank Wolff:

Cars problem : let Si be the greatest gas station. Si, being the greatest, is greater than 1/n, hence will ensure access to Si+1 to anyone in Si. Merge Si and Si+1 (modulo n). A recursive hypothesis ensures the existence of an access to all gas stations (but Si+1 – obtained above). Left as an exercise : prove now that it works for a 2 stations circuit…

I’m no imo player, and sadly far below that level. I’m not even in computing-like job. The real wonder is that this employer, and his colleagues, were not able to solve this problem. Well, surely they should choose another job (or should i ? 🙂 ).

But things are not so bad. You don’t hire your boss ; it’s the other way round. Neither do you hire someone who will show you wrong all the time. And neither do you go for a boss with evidence of weak ability. Finally, bad reasons have led to the right decision…aa& which is what an interview is designed for !

Frank Wolff:

That’s right. Actually the choice for Si is even simpler. Just take one station that provides enough gas to the next station. There’s at least one, otherwise there wouldn’t be gas to complete the loop.

Er, unless I’m mistaking again. This reminds me of my last math teacher. He was disparately looking for audience participation in course (in a country where students are mute). He would say “come on, propose some idea. Math course is the place in life to welcome wrong statements. Don’t miss the chance”. Guess who took it…

1. Note the solution can be a unique point (e.g. if exactly one gas station, car’s tank empty at start).

2. Also note, if we instead ask: does there exist a startpoint from which the car can
perform a full circle IN EITHER DIRECTION, then the answer can be “no” —
for example empty car tank, two stations located near to each other each with
half the gas.

3. What is the computational complexity of finding a startpoint for the gas/circle problem, given the locations and gas-content of each station as input?
If the gas stations are input in angular order, then it is possible to solve this
in O(N) steps and O(N) words of memory. To sketch the method: use a doubly-linked
list to store the gas stations in angular order. Scan thru the list once as described by
Michael Watts, to find two stations to combine. (Accomplish combination by deleting
the further-forward station from the list and summing it and its predecessor’s gas-contents.) Each time you combine two, check to see if the new super-station can
“reach” its successor, if so combine them; otherwise continue scanning.
The problem will be solved (all combined into one) in linear total time.
But this is a “forward only” algorithm which never needed to “backtrack.” Consequently, it actually is possible to implement this with only a singly-linked list;
and even better with only O(1) words of memory-storage provided we have
circular-tape style access to the read-only scan-forward-only input.

4. Now here is a NEW PROBLEM for you:
Idiots have dropped N fuel-depots onto the surface of the moon. Their locations
and fuel-contents were arbitrary. Now you, a smart astronaut, choose where you
want to land on the moon with your moon-buggy. Your goal is to
tour around the moon in the buggy.
Prove:
If the total fuel in the N depots is at least C*N^(3/2),
where C is a sufficiently large constant,
then there exists at least one point on the moon, such that, starting from there,
the moon-buggy will be able to reach every depot.
Try to find the least value of C that you can, such that this remains true.
On the other hand if C is too small then it becomes possible that no such startpoint
exists. Try to find the greatest value of C, such that this nonexistence can happen.

Abhishek:

I’m sorry i know this is old, but i wanted to present a theory on “Why they were not hired?”
So, they say, “We are searching for the smartest people in the world.” Now they have set a level of smartness they require by the questions they ask in the interview. Now if an applicant is unable to answer at least 50% (or whatever limit they have set, remember they say, “just a few”) of those, you are under-qualified for the job. And if you solve a lot more than that limit, you are rather over-qualified for the job.

Now think of this like this, the interviewers made the toughest problems they could when they wanted to hire someone. Now if you are able to solve all of them very easily, this means you are too fit for the job. They would have to create more problems for you more often than other employees. Then there will be a limit, when they won’t be able to create any more problems that are challenging to you. This would result in you resigning the job. They know of this beforehand and just doesn’t hire you, to save YOUR precious time.

PS: This is a true story of a professor of mine, who is a recruiter himself.