Working with regular expressions in Excel

Regular expressions can be very useful for validating and manipulating data. Unfortunately there’s no built in support for regular expressions in Excel. You can use VBA but I think that’s a bit too messy.

In my Excel plugin SeoTools, I’ve added four methods:

=RegexpIsMatch(string input; string expr)

Matches an input string with a regular expression and returns true if there’s a match.

=RegexpFind(string input; string expr; int group)

Takes a regular expression and retrieves the matched group from an input string.

=RegexpFindOnUrl(string url; string regexp; int group)

Same as RegexpFind but the input string is instead the contents of a webpage.

=RegexpReplace(string input; string expr; string replacement)

Within a specified input string, replaces all strings that match a specified regular expression with a specified replacement string.

All examples below assume that you have downloaded and added SeoTools to the document you’re working on.

Example1: Validating email addresses with RegexpIsMatch

Let’s say you have a column with a bunch of email addresses and you want to find all invalid addresses. The following formula will return FALSE for all invalid email addresses:

Example2: Find a particular part in a text

Now let’s say you have a column with text where you’d like to extract a particular part from, i.e. “Sports (id:4)”, where we like to extract the “4″ into a new column.

=RegexpFind(A1;”\(id:(\d+)\)”;1)

Example3: Get the number of pages indexed in Google for a set of keywords

For the last example we try some simple web scraping using the regular expression support in SeoTools. In our example we have a set of keywords and want to have the formula return the number of pages a search returns.

With functions in SeoTools that are marked with “(cached)” (such RegexpFindOnUrl) the URL is only fetched once per “session”. So if you have several pieces you want to extract from a webpage the source of the page is only downloaded once even if you reference the URL several times in different formulas.