Converting Wildcards to Regexes

Introduction

Ever wondered how to do wildcards in .NET? It's not hard, all you have to do is use regular expressions. But it's not always easy to figure it out either. I found that I had to dig around for a while to figure out how to do it properly.

Even though regexes are a lot more powerful, wildcards are still good in situations where you can't expect the user to know or learn the cryptic syntax of regexes. The most obvious example is in the file search functionality of practically all OSs -- there aren't many that don't accept wildcards. I personally need wildcards to handle the HttpHandlers tag in web.config files.

Note: This method is good enough for most uses, but if you need every ounce of performance with wildcards, here is a good place to start.

Using the Code

There are three steps to converting a wildcard to a regex:

Escape the pattern to make it regex-safe. Wildcards use only * and ?, so the rest of the text has to be converted to literals.

Once escaped, * becomes \* and ? becomes \?, so we have to convert \* and \? to their respective regex equivalents, .* and ..

Prepend ^ and append $ to specify the beginning and end of the pattern.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

Comments and Discussions

According to Code Project, the license for this code is unspecified. Can you declare what license you would like this code to be considered under? We have a requirement to know the license for all code used in our projects.

To find in Word:Open Word 2010 and paste the text.On the home tab in the far right, select the find drop down and select advanced find. In the resulting dialog type "s*a". Click the more button and check use wildcards. Click find next. It will find "she sells sea" as the first match.

The regex pattern generated for s*a is "^s.*a$"

If you test that regex pattern, it comes back with 0 matches.

The current regex pattern looks like it will only get a match when 's' is at the beginning of the string or line and 'a' is at the end of the string or line.I'm not too good with regex and could use a solution that would find the pattern anywhere in the string. I've tried a few modifications to the existing regex pattern, but desired result not reached yet.

****Update****Found what I was looking for.Changing the code to the following did the trick:

Removed '^', and '$' which says matches need to be at beginning and end of string or line. Changed ".*" to ".*?" - turned 'greedy quantifier' into 'lazy quantifier'. After that, it will still come back with only 2 matches. To compensate for that you could search the string multiple times bumping the start point of the search each time like below:

Thanks for the code. I liked it, especially the fact that it was derived from Regex. I use Regex static methods a lot, and hence added these methods to your Wildcard class so that its interface matches more the .NET's Regex class. Here are these methods:

I have released a new version of the RegEx Tester tool. You can download it free from http://www.codeproject.com/KB/string/regextester.aspx and http://sourceforge.net/projects/regextester

With RegEx Tester you can fully develop and test your regular expression against a target text. It's UI is designed to aid you in the RegEx developing. It uses and supports ALL of the features available in the .NET RegEx Class.

I think that with some reg expressions is better because you can declare exactly your intentions.For example sometime I want to check if the pattern match the whole text, so in this case you can use Exact. (*pattern* is not the same as Exact, in this case I force that the pattern match from start to end adding ^ and $).But probably you are right for the StartsWith and EndsWith, they are not very useful.

Lets say the input string is a*bcdef. The wildcard pattern I want to use is a\**d. I want to match a*bcd. The current code gives incorrect results. Can you suggest a way to make it work if the input string contains meta characters?

You may wish to consider adding "$" to the end of the Regex to get behavior that fully matches normal wildcard searching (maybe ^ at the beginning as well). As presented: A pattern of "*.dll" will find files named "abc.dll.tmp" for example. Hmmmm ... I wonder why have files of that form in my WINNT\System32 directory...TBD.Otherwise, a neat trick. I'd like to see more short, useful, items on CodeProject.

Inquiring minds might like to know that performing the equivalent file name pattern matching using the VB.Net Like operator does the same match in 60 per cent of the time. Regexs are very powerful, but not without a cost.I assumed that there was a cost to using Regex for simple matches so I got inspired to try it out. Do each over 2700 file names (about 1300 matches) 100 times, throw out the top & bottom 10 scores -- Like wins by 40 per cent.Average times: Ignoring case in both, compiling the Regex on initiation ofwildcard class.Like =.0083 secondsRegex=.0132 secondsEither one gets over 2700 comparisons very quickly.

Actually I did use RegexOptions.Compile. Actually, I was just idly curious, and had been for a while, about quantifying the difference. Seeing your article just prompted me to do that.Bottom line -- either approach is plenty fast enough for limited use. However, if not using Regex otherwise, why pay for the additional dll loads. Each method has its place, but Regex will do so much more -- if you need that functionality.