The twelfth post of the series of programming job interview challenge is out, 31 readers provided answers to job interview challenge #11. I have to admit that I probably failed explaining what I was looking for in challenge #11, because I asked you to provide the best algorithm in both manners: performance and memory. What I really meant is that performance is most important but don’t neglect the memory issue. Due to my little “embarrassing failure”, there are two groups of correct answers – the performance oriented and the memory oriented.

The correct answer which I was looking for (best at performance) as Alex, the first one to provide a detailed solution (its two times in a row), wrote:

The most time-efficient way would be to utilize a hashtable. (Or, simply a set if your language supports it)

Let’s take the example of searching for a sum 14in the list 2,6,4,9,1,12,7.Iterate through the list. For each value x in the list, add it to the hashtable, then check if 14-x exists in the hashtable. If not, go to the next item. if it does, return true (you found your pair). That “add to hashtable before checking” thing is necessary in order to prevent a 14-7=7 situation.

If you reach the end of the list without finding a matching pair, return false.

This algorithm is O(n) time, but also O(n) space.

Those are the other readers who provided such a solution: Kimmen, Dejan Dimitrovski, Alessio Spadaro AS, Guido Domenici, Michael and lucas.

The first one to provide the best solution when it comes to memory consumption was Morgan Cheng:

Here is a time O(n*log(n)) and space O(1) solution:

The first step: sort the N-element list which takes O(n*log(n)) time complexity.

The second step: search two elements whose sum is M. Since the list is already sorted(smaller to the left and larger to the right), we find one number from left to right, while the other number from right to left. The index of left number is indexed by i, and the index of right number is indexed by j. For each step, we calculate delta = M – Array[i] and compare it to Array[j]. If delta equals Array[j], we can return true; if delta Array[j], we increment i and continue. When i >= j, we break and return false.

Those are the other readers who provided such a solution: Mark R, Maneesh, Christof Jans, Dan Sydner, Alessio Spadaro AS, Michael Mrozek and Michael Dikman.

What is wrong with this code and how would you fix it? There are no dirty tricks here, don’t try to catch me on a typo or something like that, you can assume that this code compiles. The correct answer is about an important issue which was neglected in the above sample code. Don’t catch me on the fact that I directly expose a member variable i, for intance. This is just a little sample code where a basic concept is violated.

Get updates by RSS to catch up with the next posts in this series and to get the correct answer for today’s question. As always you may post the solution in your blog or comment. Comments with answers will be approved only next week.

I have to “complain” about the hashtable solution… The hashtable solution is NOT O(N).

The optimal cost for retrieving an item from a hashtable is O(log(n)). And since we do this (worst case) N Times, we still have to spend N*Log(N) operations to achieve the result same result as the sorting comparing version.

This also ignores the cost for generating the hashtable. Just because it is one line of code, it does not mean it is one operation…

So after all you have the “same cost” as sorting and finding, you just have an overhead of:
– Creating a hashtable
– O(2 N) extra Memory to store the data + hashes

So I think to solution might “look” smarter, but you are actually doing the same here, with the only difference the the hash solution would have to do N-1 comparisons in the worst case, while the sorting one will stop after N/2 comparisons

“The optimal cost for retrieving an item from a hashtable is O(log(n))” – This would be true only if the values in the list are not unique, which is not the case. In our case, it would definetly be O(1).

Trinity

Said on
July 14, 2008 :

Hi,
I think the problem is with the line

(*i)++;

inside the incr function of the Unmanaged class. I assume that we are trying to increment the value held in the variable pointed to by i. But because it is a single reference it wouldnt refer to the intended memory location and hence would not increment the integer at the right memory location. The way to fix this would be to declare i, the paramter to the incr function, as a pointer to a pointer as in

A “normal” hashtable is usually implemented as a “balanced binary tree”. So in order to retrieve an element, you would have to do log(n) comparisons. If we have unique values, we just dont have to deal with more then “1 item per bucket”…

If you want to have a retrieval with a speed of O(1), then you would have to allocate Int.Max slots.

Please explain to me how you would locate the
Item “2354357” in an O(1) operation if there are 4 Billion possible items, but only 10 are located in your list…

By creating the hashtable you efficently “sort the list”. The final step of locating the final element is just not es efficient as traversing the sorted list from both sides.

Another issue with the hashtable, is that the context switching erases any gains in algorithmic efficiency (at least in Java). For instance, up until 10K records, both solutions run dead even. At 100K, the tight loop solution pulls ahead. At 1M it’s an order of magnitude. At 10M, it’s not even close (like 1500ms vs 23000ms on m dual CPU). I don’t have enough memory to go beyond that with the hashtable solution. So for all practical purposes, the tight loop wins in performance and memory.

Awe man, and I was on a winning streak! Unfortunately my only exposure to c++ is a class I took back in 2003, so I don’t really have a shot at this one. The only things I notice out of hand, I’m afraid will fall into the “nitpicky” category. Those two things being- The main method isn’t wrapped in a class, so I don’t know out-of-hand if it’s managed code or not, and the fact that you’re using unmanaged code to change state in managed code. Even if that’s legal, it just feels wrong.

The problem with the code is that the address of a managed object member is passed to unmanaged code without pinning the memory pointer first; specifically, in this line:

pUnmngd->incr(&(pMngdClass->i));

the &(pMngdClass->i) yields the address of a member of a managed object.

As the code stands, the value at the address may change as a result of the garbage collector reorganizing objects on the managed heaps, resulting in various Bad Things(tm) like access violations when the incr method of the UnmanagedClass attempts to dereference the pointer. Pinning the pointer will keep the .NET garbage collector from moving the object until the address is unpinned, preventing these Bad Things.

Fixing it requires pinning the pointer before passing it to the incr method:

pin_ptr p = &(pMngdClass->i);
pUnmngd->incr( ( int * )p );

The pin will remain in effect (and the pMngdClass remains immobile in memory) until the pin_ptr goes out-of-scope.

I think the problem with challenge #12 is that the managed class instance is under control of the .net memory manager and thus it can be moved around in memory whenever the .net memory manager feels like it.
The managed instance should probably be “pinned” to a specific location in memory before it is passed to the unmanaged method call.

I think Heiko’s confusion arose from ambiguity around hashtables and dictionaries. Many people use the two terms interchangeably, but do so incorrectly. A dictionary can be implemented as a hashtable or binary tree (or anything else really that supports keyed lookup), but a hashtable refers to a very specific implementation.

Well, i guess the value incremented in UnmanagedClass will not be saved anywhere.
(*i) retrieves the value
then (*i)++ increments it, but it is not saved anywhere, thus managed i will still be the same.

>> “That “add to hashtable before checking” thing is necessary in order to prevent a 14-7=7 situation.<> This algorithm is O(n) time, but also O(n) space. <<

This is not exactly true. It’s either O(n) time OR O(n) space.

To achieve O(n) time, we’d need a perfect hash function: one that would always point a value to a distinct bucket. With no pre-knowledge of the values in the list (or even the size of the list), the only way you could do this is with M buckets and hash(n) = n. This would reduced the algorithm to Shams Mahmood’s boolean table. (see last weeks’s comments). Hence O(n) time, but O(m) space.

To achieve O(n) space, we would need a 100% fill rate. Again lacking a perfect hash function, when allowed only n buckets, some collisions will occur, requiring a linear search. Taking the extreme of the completely imperfect hash function, hash(n) = 1, every lookup would itself be O(N), making the entire task O(N*N).

Finding the balance between the two is a bit tricky: Normally space is cheaper than time, so the first would be preferred, however, there’s a practical limit on size of n, while there isn’t on m (i.e. N={1,10000000}, m=10000001 would need an awful big boolean table).

hmmmm… It seems that the editor here swallowed my first comment in the above post. Let’s try it again:

[[That “add to hashtable before checking” thing is necessary in order to prevent a 14-7=7 situation.]]

I’m not sure what problem he’s trying to avoid. The only problem I see is actually caused by this technique. If you are looking for 14, and have a 7, if you add it first, and then check for 14-7, you’ll find the 7 and think you have two. For this to work, you’d need to look first, and THEN add the number to the hash.