Introduction

The .NET framework provides a powerful class Regex for creating and using Regular Expressions. While regular expressions are a powerful way to parse, edit, and replace text, the complex syntax makes them hard to understand and prone to errors, even for the experienced user. When developing code that uses regular expressions, I have found it very helpful to create and debug my expressions in a separate tool to avoid time consuming compile/debug cycles. Expresso enables me to do the following:

Build complex regular expressions by selecting components from a palette

Test expressions against real or sample input data

Display all matches in a tree structure showing captured groups, and all captures within a group

Build replacement strings and test the match and replace functionality

Highlight matched text in the input data

Automatically test for syntax errors

Generate Visual Basic or C# code that can be incorporated directly into programs

Read or save regular expressions and input data

Background

Regular expressions are a sophisticated generalization of the "wildcard" syntax that users of Unix, MSDOS, Perl, AWK, and other systems are already familiar with. For example, in MSDOS, one can say:

dir *.exe

to list all of the files with the exe extension. Here the asterisk is a wildcard that matches any character string and the period is a literal that matches only a period. For decades, much more complex systems have been used to great advantage whenever it is necessary to search text for complex patterns in order to extract or edit parts of that text. The .NET framework provides a class called Regex that can be used to do search and replace operations using regular expressions. In .NET, for example, suppose we want to find all the words in a text string. The expression \w will match any alphanumeric character (and also the underscore). The asterisk character can be appended to \w to match an arbitrary number of repetitions of \w, thus \w* matches all words of arbitrary length that include only alphanumeric characters (and underscores).

Expresso provides a toolbox with which one can build regular expressions using a set of tab pages from which any of the syntactical elements can be selected. After building an expression, sample data can be read or entered manually, and the regular expression can be run against that data. The results of the search are then displayed, showing the hierarchy of named groups that Regex supports. The tool also allows testing of replacement strings and generation of code to be inserted directly into a C# or Visual Basic .NET program.

The purpose of this article is not to give a tutorial on the use of regular expressions, but rather to provide a tool for both experienced and novice users of regular expressions. Much of the complex behavior of regular expressions can be learned by experimenting with Expresso.

The reader may also find it helpful to explore some of the code within Expresso to see examples of the use of regular expressions and the Regex class.

Using Expresso to Build and Test Regular Expressions on Sample Input Data

To use Expresso, download and run the executable file, which requires the .NET Framework. If you want to explore the code, download and extract the source files, open the solution within Visual Studio .NET, then compile and run the program. Expresso will start with sample data preloaded into the "Input Data" box (refer to the figure above). Create your own regular expression or select one of the examples from the list box. Click "Find Matches" to test the regular expression. The matches are shown in a tree structure in the "Results" box. Multiple groups and captures can be displayed by expanding the plus sign as shown above.

To begin experimenting with your own regular expressions, click the "Show Builder" button. It will display a set of tab pages as shown here:

In this example, the expression \P{IsGreek}{4,7}? has been generated by setting options that specify: match strings of four to seven characters, but with as few repetitions as possible, of any characters other than Greek letters. By clicking the "Insert" button, this expression will be inserted into the regular expression that is being constructed in the text box that contains the regular expression that will be used by the "FindMatch" or "Replace" buttons. Using the other tab pages, all of the syntactical elements of .NET regular expressions can be tested.

The Regex class supports a number of options such as ignoring the case of characters. These options can be specified using check boxes on the main form of the application.

Replacement strings may be built using the various expressions found on the "Substitutions" tab. To test a replacement pattern, enter a regular expression, a replacement string, some input data, and then click the "Replace" button. The output will be shown in the "Results" box.

Using Expresso to Generate Code

Once a regular expression has been thoroughly debugged, code can be generated by selecting the appropriate option in the "Code" menu. For example, to generate code for a regular expression to find dates, create the following regular expression (or select this example from the drop down list):

(?<Month>\d{1,2})/(?<Day>\d{1,2})/(?<Year>(?:\d{4}|\d{2}))

By selecting the "Make C# Code" from the "Code" menu, the following code is generated. It can be saved to a file or cut and pasted into a C# program to create a Regex object that encapsulates this particular regular expression. Note that the regular expression itself and all the selected options are built into the constructor for the new object:

Points of Interest

After creating this tool, I discovered the Regex Workbench by Eric Gunnerson, which was designed for the same purpose. Expresso provides more help in building expressions and gives a more readable display of matches (in my humble opinion), but Eric's tool has a nice feature that shows Tooltips that decode the meaning of subexpressions within an expression. If you are serious about regular expressions, try both of these tools!

History

Original Version: 2/17/03

Version 1.0.1148: 2/22/03 - Added a few additional features including the ability to create an assembly file and registration of a new file type (*.xso) to support Expresso Project Files.

Version 1.0.1149: 2/23/03 - Added a toolbar.

My brother John doesn't like the name Expresso, since he says too many people are already confused about the proper spelling and pronunciation of Espresso. Somehow, I doubt this program will have any impact on the declining literacy of America, but who knows. John prefers my earlier name "Mr. Mxyzptlk", after the Superman character with the unpronounceable name.

Have tried alternatives over the years but Expresso has always been my go-to tool for regex.
Not only brilliant in implementation but great fun to play around with - and it's not often anyone can say THAT about regex development!

In the Builder, under "Characters and Repetitions", select "Specific Character" and "Just once".
For the specific character, enter: .
You can swap between "Specific character" and "Any character" and the output doesn't change.

First of all, Expresso is a great piece of software. Now I'm wondering if there are any plans to include search & replace in (multiple) files? I've already tried RegexBuddy, but it's terribly slow with a regex I'm currently experimenting with (it takes several minutes to go through a >650kb source file) whereas Expresso 3 only needs a few seconds to show me all matches! So, with S&R over files your program would be close to be perfect.

Do you know you can do S&R with regex in visual studio? I'm using 2008, but I'm pretty sure it was in 2005 as well.

Click Edit in the menu, hover over Find and Replace, click on Replace in Files (my shortcut, which I think is default, is Ctrl + Alt + H for this dialog).

Make sure "Find Options" is expanded. There, you can check the "Use" box, and set the drop down to "Regular expressions."

Now, put a regular expression in the "find what" box at the top, and whatever you need in the "Replace with" box. Set the "Look in" to be any path you want, and decide which types of files you'll search with the "Look at these file types" drop down.

I love it and it's been very useful!

-Jeremy

P.S. You can first do a "Find in Files" to make sure you're getting the right matches, before you find and replace. And, "Keep modified files open after replace all" if you want the option of "undo" by not saving the open/changed files.

Yes, there is something wrong with the regex. I'm not sure what you were tyring to do, but be aware that special characters like ( don't have their normal meaning inside a character class, delimited by [ and ]. The first part of your regex, [([^\\] , makes sense. It says: find a single occurence of any of the following characters: ( or [ or ^ or \ . Following that correct character class, you have a closing parenthesis ) which is not balanced by an opening parenthesis, since the opening parenthesis inside the character class is interpreted literally, thus you have a syntax error. Was that clear as mud?

Why do you write [([^\\] as first part of my regex?
I don't think the first occurrence of [ is closed with the first occurrence of ] .Your parsing algorithm is probablytaking the first ] for the first [ at the start of my regex. But I think the correct way is to consider the closing bracket ']' for the first occurrence of '[' before it. Meaning that the inner brackets are together.

What I'm trying to do is to find
the first occurrence of (
or the character inside character class: [^\\]
or )

This syntax is a bit tricky, but the .NET Framework treats the second [ literally, rather than as the beginning of another character class, unless it is preceded by a - , which is the syntax for character class subtraction. If you doubt this, I suggest you try it in code or another regex tester.