Monday, August 4, 2014

Leetcode: Implement strStr()

Implement strStr().

Returns a pointer to the first occurrence of needle in haystack, or null if needle is not part of haystack.

Analysis:As usual, before we come out a solution, we must understand the question very well. Ask questions if there is any ambiguities in it. In this question, we are actually asked for finding a substring from a string. The question looks simple at first glance but very tricky. First of all, let's look at the return value. The question asks for returning a pointer to the first occurrence of needle in haystack. And the method of the question defines a return value as a string. Therefore, we can return a substring(i) where i points to the first occurrence of the substring. For instance, for haystack = "abcde", needle = "cd", the return value is "cde"Then we can take several examples. The crux of the question is to handle boundary condition well. Let's take a look at it from null string. 1. If either of the string null. We should return null. Note that you should always check null string before calling length() else it will return NullPointerException. 2. If both strings are empty string, we should return empty string instead of null. That is because empty string is substring of any string

3. If haystack is empty, while needle string is not, we should return null. To make it more generic, if length of haystack is less than needle, it must return null. 4. If haystack is not empty while needle is empty, we should return haystack. Again, that is because empty string is a substring of all strings. Therefore, we may merge the condition check 2 and 4 into one together, i.e, only check if needle is empty, if yes, return haystack.

Solution:Use two pointers, one points i, the haystack string, while the other points j, to the need string. For each i, we check if charAt(i) == charAt(j) and its subsequent characters in j matches. Note that you should check if i is overflowed. For instance, in the condition haystack = "abcde", while needle = "efg", the first match happens at end of the haystack string, when we increase the pointer of i and j, we should make sure i is not overflow. Code (Java):

Discussion:For haystack length is n and needle length m, the worst case time complexity is O(mn). That is because for each character in haystack, we should check every character in needle until we find a match. In the worst case there is no matched substring, so the complexity in worst case is O(mn). Since we did not allocate additional space the space complexity is constant.

Summary:This question is very simple but quite error-prone. It is actually very hard to write a bug-free implementation at once. The crux is you should consider every possible conditions and check. The rule of thumb for string problems is first to check if the input string(s) is null, and consider the return values. In most cases, null string is the most unacceptable string because it points to a null address. Then check if it is an empty string and consider the return value. Take care if the string if is overflowed and make a condition to check. Update on 11/19/2014:

Implement strStr().

Returns the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Update (2014-11-02):
The signature of the function had been updated to return the index instead of the pointer. If you still see your function signature returns a char * or String, please click the reload button to reset your code definition.