Sunday, January 31, 2010

I was digging around trying to find what kind of questions Google interviewers ask potential candidates and found some guy story who was asked the following question:

"What is the complexity of searching for existence of all characters from one string into another?"

That guy answered "n*n", and after being rejected was wondering if it was "n*lg(n)". Poor little thing...

IMHO: Rule #1 when you're having an interview (I guess, especially at Google): "never trust the interviewer", it is their job and trade to mislead you and check how you actually think, rather if you know the correct answer.

So, here is how I would answer to that question. (yeah, I'm showing off, but please consider that as me simply exercising)

Okay. First of all the complexity vastly depends on an implementation. I can see at least three of them.

1. A blunt force solution, you just go through every character from the pattern string (p) and check if each of them exist in the text string (t). Strings are essentially arrays and search of any character in them have linear complexity. So that if we have n = |p| and m = |t|, then the overall complexity of the algorithm will be

O(n * m)

2. There might be another solution. If we pre sort the t string with say quicksort algorithm, then we can use the binary search, which has logarithmic complexity and faster than a blunt cell-by-cell search. In this case, considering that the quick-sort complexity is super-linear, the overall complexity will look about like that:

O(m * lg(m) + n * lg(m))

3. One more version. Considering that both strings consist of the same limited number of symbols, we can create two hash-tables, one will contain unique characters from the p string and another one will have characters from string t. As the alphabet is a limited list, finding of the intersection of those two tables will have a constant complexity. Plus we should add the complexity of building the hash-tables, which will be equal to the lengths of the strings. And the overall complexity will be

O(n + m + C) = O(n + m)

Roughly speaking, we can take a look at the case when n = m, then we can say the following about those three algorithms: