7.3 Regexp

Regular Expressions ("Regexp") is a complicated but powerful tool for pattern matching and text manipulation. Although it does not perform as well as pure text matching, it's more flexible. Based on its syntax, you can filter almost any kind of text from your source content. If you need to collect data in web development, it's not difficult to use Regexp to retrieve meaningful data.

Go has the regexp package, which provides official support for regexp. If you've already used regexp in other programming languages, you should be familiar with it. Note that Go implemented RE2 standard except for \C. For more details, follow this link: http://code.google.com/p/re2/wiki/Syntax.

Go's strings package can actually do many jobs like searching (Contains, Index), replacing (Replace), parsing (Split, Join), etc., and it's faster than Regexp. However, these are all trivial operations. If you want to search a case insensitive string, Regexp should be your best choice. So, if the strings package is sufficient for your needs, just use it since it's easy to use and read; if you need to perform more advanced operations, use Regexp.

If you recall form validation from previous sections, we used Regexp to verify the validity of user input information. Be aware that all characters are UTF-8. Let's learn more about the Go regexp package!

Match

The regexp package has 3 functions to match: if it matches a pattern, then it returns true, returning false otherwise.

All 3 functions check if pattern matches the input source, returning true if it matches. However if your Regex has syntax errors, it will return an error. The 3 input sources of these functions are slice of byte, RuneReader and string.

The difference between ComplePOSIX and Compile is that the former has to use POSIX syntax which is leftmost longest search, and the latter is only leftmost search. For instance, for Regexp [a-z]{2,4} and content "aa09aaa88aaaa", CompilePOSIX returns aaaa but Compile returns aa. Must prefix means panic when the Regexp syntax is not correct, returning error otherwise.

Now that we know how to create a new Regexp, let's see how the methods provided by this struct can help us to operate on content:

As we've previously mentioned, Regexp also has 3 methods for matching. They do the exact same thing as the exported functions. In fact, those exported functions actually call these methods under the hood: