License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

Well, as direct as I could come up with anyway. Makes use of unsafe to enable pointer arithmetic. Unfortunately, because fixed is required to prevent the GC from moving the pointers, I had to change it to use increment indexers instead of directly manipulating the pointers. Alternatively, you could use stackalloc to instantiate two native char[]'s and copy the values, but that seems contrary to this function's low-memory footprint, high performance goals.

Has been tested against every test case presented in the comments section as well as some additional cases I threw in.

I am using this in Artistic Style, a popular multi-platform code formatter available at SourceForge.

http://astyle.sourceforge.net/

Release 1.22 added directory recursion to the project. Wildcard processing was made internal to the program. Linux has a glob function but Windows doesn't. I just used this for both of them. It let me process both platforms in a similar manner.

A minor change was made for Windows to make the comparison case insensitive. Linux was left case sensitive.

Thanks for making it available. Using this was a lot easier than writing my own. I doubt that mine would have been this sophisticated.

Boy do I feel stupid. I worked on an algorithm like this for days, and never got it quite right. Then, I see the wonderful, and simplistic work of someone like this, and it reminds me that sometimes we all are guilty of 'over-engineering'...

Ignore my last email,
like usually the problem sits in front of the screen.
(I mixed a project built with multibyte Chars with this code which was only chars. And of course I used a Umlaut instead of 'ue' in my tests. So no wonder, why it crashed after the '?' )
I´m very sorry!

i got the overall flow of the program I didnt get the logic of the second loop completely. I understand that in the second loop it checks if there is nothing after * if so then it is a match but if there is something it stores them in the two pointers and then goes on.
also in the final else it goes like else
{
wild = mp;
string = cp++;
}
am sorry but am not getting the logic totally.
can someone please explain?

In the second loop it looks for the first character after the asterisk that is the same in the string. At first it matches "*" against "ab". mp = ".abc" during this. Now wild = ".abc" and string = ".de.abc". Obvious no match. On the next loop the first characters do match (both '.') and wild becomes "abc" and string "de.abc". The next loop there is no match and it falls to the else. Here it resets wild to the last mp (mask pattern??) and string to the last cp (character pattern) WITHOUT THE FIRST CHARACTER. (It actually advances cp one position.)

Why does it do this. After matching the * against part of the string and encountering a possible poisiton where to match the remainder of the pattern, it continued comparing characters from both to each other. This fialed. Since right before the position of mp there was a *, it is still allowed to add characters to the part that is matched against that. Basically, it goes back to that position but decides that the character that occurs in both strings is not the next character in the pattern but part of the '*' wildcard.

- put the whole shebang into a class with public static methods- fixed a bug where the pattern '?' matches all strings- added an early-exit test for patterns that don't actually contain wildcards so it just defaults to normal string comparison

// if we have reached the end of the pattern without finding a * wildcard, // the match must fail if the string is longer or shorter than the pattern if (j == pattern.Length) return s.Length == pattern.Length;

Going into the final "if" line shown here, the maximum value that j may have is (pattern.length-1), due to the first "if" test. Then we see (j++) compared. But, the value of (j++) is the value of "j" BEFORE being incremented and thus is a maximum of (pattern.length-1) and is therefore NEVER >= pattern.length. Only after the if test is completed is j actually incremented.
So the following return is never taken.

Perhaps it can be fixed by changing j++ to ++j... but I can't tell that until I complete the analysis.

On a slightly different topic, I will state my opinion as a professional programmer. This demonstrates the extremely importance of EXTENSIVE COMMENTS in code explaining NOT what the code does, but "what the code is supposed to do" in each section. If such comments were in place, this would be an easy maintenance fix. Without them, I am having to analyze what the code DOES and, from that, try to discern what the programmer INTENDED the code to do. And, I have to consider all the possible wildcard permutations just like the original programmer did. I essentially have to reinvent the wheel... because the user manual is missing.

Everyone, especially Gurus, should put extensive comments in their code on "what it is intended to do". The only downside is lack of job security, because now someone other than you can fix the code. If you have that low of opinion of your worth to your employer, and are also lacking all compassion for others, then don't comment your code.

David Patrick wrote:most C compare functions return zero when the values are equal, but this function returns non-zero.

You make a good point. I probably should have made it behave like the strcmp() type functions. I'm a bit afraid to change it at this point since it has been posted for so long. It should be an easy fix for you or anyone else who is used to C style string comparisons. The C++ people here probably like the current behavior I would imagine.

-Jack

There are 10 types of people in this world, those that understand binary and those who don't.

I disagree. The return value for strcmp() is more than simply a test for equality, it tells you which string is greater than the other. A zero return value for strcmp() makes sense, but not for wildcmp() since the return value is strictly boolean, match or no match. The current implementation is fine (although some people might be picky about the return type, int vs bool). Perhaps to avoid confusion with string comparison functions, the function should be renamed to wildmatch() or something similar.

This is realy nice & and useful code. I used to write something similar, but your example is simplier and shorter.
Because it lacks comments, I spent some time to understand (before I saw comment form Targys - real tutorial ) and it is clear now. Thanks to both of you!

To 'wise' guys, flamers, and other people who has nothing to do instead of arguing:
- If the code has a bug, report but don't pretend you are a genius or a guru. If you can do it better, submit an article.
If you don't like the code, don't use it!

And about NULL pointers:
Idiot-proofing should be implemented at the level where data (function arguments) is acquired and prepared, not in such low-level function.
Besides that, I tested several functions from string.h with NULL parameters and every single one threw an exception. No further comments...

Maybe I posted too soon. I didn't think there was a way for cp to not equal string+1. But, after thinking about it some more I found a pattern type that would:

*???c*

It's interesting how the loop keeps shifting back and forth with this type of pattern.

However, using a test string of "testing" with the above pattern the match still failed (correctly) using both algorithms. But, there easily may be a pattern and string combination that wouldn't work without cp.

Just a thought:
the PathMatchSpec SLWU API could provide similar. I guess it does have some differences (e.g. allowing to specify multiple specs, separated by semicolon), but it might be a simple alternative for many similar tasks.

we are here to help each other get through this thing, whatever it is Vonnegut jr.

peterchen wrote:Just a thought:
the PathMatchSpec SLWU API could provide similar. I guess it does have some differences (e.g. allowing to specify multiple specs, separated by semicolon), but it might be a simple alternative for many similar tasks.

The wildcmp() function is meant to be lightweight and fast.

If the extra functionality of multple specs is needed and you don't want to parse the input yourself then you can go ahead and use the PathMatchSpec() API.

Just make sure you don't mind these limitations:

1. Adding another dependancy to your executable by including the lib
2. Not portable (wildcmp() compiles fine under unix)
3. More memory overhead (larger code footprint)
4. The horrible slowness

I have ran some benchmarks and pasted the results below. I can provide the .cpp file for the benchmarks if anyone is interested.

-Jack

10MM iterations.
Compiled as a console app using vc6 in release mode with /O2 optimization.
Ran on a pM 1.7ghz

The Windows Explorer search is not a straight wildcard match. It essentially adds *'s to either end of your input string so "b?r" matches "foobar.txt". I take a more literal approach. This function does not presume to be smarter than the caller. It is not only meant for files, it is also very useful for checking hostmasks for example.

If you think my function is producing bad results, can you paste an example of the string along with the wildcard?

Thanks,

Jack

There are 10 types of people in this world, those that understand binary and those who don't.

Be careful when using toupper(), some of the CRT variations of this function _only_ work when the input is known to be lowercase. For example, the return value is invalid for _toupper('A') and some implementations of toupper('A') as well...check the documentation.