Wildcard Matching

Dec 11 2017

After solving this leetcode problem using dynamic programming, I had a look at how
others solved it, and this approach caught my eyes. The C++
version is surprisingly neat, possibly due to utilizing the fact NULL is at the end of the char array.

1234567891011121314151617181920212223242526

// reprinted from above linkboolisMatch(constchar *s, constchar *p){constchar* star=NULL;constchar* ss=s;while (*s){//advancing both pointers when (both characters match) or ('?' found in pattern)//note that *p will not advance beyond its lengthif ((*p=='?')||(*p==*s)){s++;p++;continue;}// * found in pattern, track index of *, only advancing pattern pointerif (*p=='*'){star=p++; ss=s;continue;}//current characters didn't match, last pattern pointer was *, current pattern pointer is not *//only advancing pattern pointerif (star){ p = star+1; s=++ss;continue;}//current pattern pointer is not star, last patter pointer was not *//characters do not matchreturnfalse; }//check for remaining characters in patternwhile (*p=='*'){p++;}return !*p;}

The gist of the algorithm is to match empty string with * firstly, and backtrack so that * match one more char every time the matching fails later
on, basically depth-first search if we see how state graph is traversed.

Since I don’t know much about C++, I tried to re-implement it in Java, here is what ended up with. I had to say this Java code is rather ugly, for the
program is littered with arbitrary +1, which makes it hard to assess whether the program is correct or not.

Amazed by the resulting Haskell program, I started to ponder on this transcending difference. In the process, I realized that the two versions are
actually not equivalent. If we draw the state graph, we could see that the two versions don’t cover the same state graph. In order to make the two
programs the same, I had to change the Java program to the following. (Modification is marked with DELTA.)

The modified Java program visited more states, so does the Haskell version, compared with my original Java program. The complexity of three
implementations is all length(str) * length(pattern), but with different constant factor. On the other hand, the Haskell version is too abstract to
reveal any of this low level details. Therefore, the conclusion of this small adventure is rather mundane, abstraction and optimization are often
conflicting with each other.