Introduction

Regular Expressions are an immensely powerful tool in a developer’s arsenal. The flavour of regexs and the classes surrounding them that are available in the .NET Framework is, in my humble opinion, excellent. One of the most useful methods of all the methods available on all the types in the System.Text.RegularExpressions namespace is the static CompileToAssembly method of the Regex class. The CompileToAssembly method allows you to compile a regex to a standalone assembly - not much good you might think. Well, ponder on the problem I was left with:

The Problem

I was developing, actually I still am developing, an application that does quite a bit of text processing and makes a lot of use of regular expressions. Now regular expressions are complex, mysterious creatures. You may think you know lots about them, but trust me, you never know as much as you can until you have read Mastering Regular Expressions, 2nd Edition [^] a few hundred times .

I don't believe for one second that a regular expression that I write will be efficient the first time around. Sure, it will do exactly what I want it to do, but will it do it in as efficient a way as possible? Probably not. So, I needed to make extensive use of regular expressions and I knew that I didn't have the time to spend, or rather didn't want to spend the time, measuring the performance of every regex I wrote straightaway and then spending lots of time making it as efficient as possible.... so I was left with a problem: how do I easily replace the regexs within my application with new, more efficient regexs when I needed them? I didn't want to move them to a config file, as they are too easy to modify by the end user, and loading and parsing XML is, relatively speaking, a slow enough operation.

The Solution

A "Regular Expression Library," CompileToAssembly came to my rescue. I wrote a tool to quickly allow me to easily add multiple regular expressions to a "Regular Expression Library." I can create new libraries, load and modify existing ones, quickly add, remove, modify regular expressions and then redistribute the "Regular Expression Library" assembly via an auto-update to the application. The "parent" application can get new über-efficient regular expressions and continue working without any user intervention if needs be.

Manually set the version number of the assembly to help ensure compatibility with existing versions.

Much, much, more...

Seriously though, I wrote this tool because I really needed it. I'm not sure if other people will, but if you do, you can download it from above. Remember that this tool was put together quickly and for my own use, so don't expect any fancy exception handling, nice coding patterns, comments, unit tests or anything like that. It will have bugs... I'd appreciate it if you could leave a comment here or contact me to let me know if/when you find one/many.

If people are interested in the tool, I will develop it further and add things such as the ability to sign the Regular Expression Libraries that have been created, the ability to add custom attributes on a per regex basis, add some exception handling, code comments and so on. Also, if you use the tool, let me know! It's always good to know if something you've created for your own use is useful for others too.

Creating and Modifying a Regular Expression Library

The RegexLibrary Builder is an easy-to-use tool and most people should be able to figure it out without much difficulty. However, just in case, I'll write a quick (very quick) overview of how to create a Regular Expression Library and then how to open it again and modify it.

Creating a Regular Expression Library

Open up the RegexLibrary Builder application... obviously.

Steps 2 and 3 are interchangeable - i.e. order doesn't matter.

Fill in the assembly details (Figure 2). You can use the "..." button to select the location where the Regular Expression Library will be saved to.

Figure 2: Fill in the Assembly Details

Create a new regular expression to add to the Regular Expression Library by filling in the details in the regex group box and then clicking on "Add" (Figure 3). Check out the Regulator [^] and RegexDesigner.NET [^], two very cool regular expression testing and learning tools.

Figure 3: Add a Regular Expression to the Regular Expression Library

The regex will now appear in the list of regular expressions in the list box at the bottom of the application.

Finally, click on the "Save" button on the toolbar or select "Save Regex Library" from the "File" menu.

Modifying a Regular Expression Library

Load the Regular Expression Library by clicking on the "Open" button on the toolbar or by selecting "Load Regex Library" from the "File" menu.

When the library loads, all the regular expressions contained within will be displayed in the list box at the bottom of the application (Figure 4).

When you click on a regular expression in the list, its details will be filled in in the regex group box. To modify the regex, simply change the details and click on "Add" again. You can delete a regex by highlighting it and then clicking on the "Delete" button, clicking on "Delete" on the toolbar or by selecting "Delete Regex" on the "Regex" menu.

When you are happy with the changes you've made, you can save the changes to the Regular Expression Library by clicking on the "Save" button on the toolbar or selecting "Save Regex Library" from the "File" menu.

Using a Regular Expression Library

Once you have created a Regular Expression Library, using it in one of your applications is straightforward:

Add a reference to the assembly, as you would any other assembly.

Add the using statement for the namespace of the regex you want. Remember that you can set the namespace on a per regex basis if you like.

Use the regex as you would the Regex class:

// Check to see if some text matches
bool result = DutchPostCode.IsMatch("some text");
// Get all the matches
MatchCollection matches = DutchPostCode.Matches(new
System.IO.StreamReader(@"c:\DATA.txt").ReadToEnd()));
// Basically, just use it as you would the Regex Class

The Source

The source is straightforward. Nothing complicated. As I mentioned above, I wrote this tool quickly and for myself. I didn't expect to release the source, so you won't find any fancy exception handling, nice coding patterns, comments, unit tests or anything like that. It will have bugs... I'd appreciate it if you could leave a comment here or contact me to let me know if/when you find one/many.

The interesting source code file is RegexLibraryBuilder.cs, which contains the RegexLibraryBuilder class and the nested RegexLibraryLoader class. There is also a strongly typed RegexCompilationInfo collection class (cleverly named RegexCompilationInfoCollection with your boiler plate strongly typed collection code. The RegexLibraryBuilderForm class is the implementation of the UI and, before someone points it out, I know that it isn't very modular or well-designed. I should have moved certain aspects to a custom control and provided property access to certain things, as well as moved certain things out to separate classes. However, the app was thrown together quickly.

Comments and Discussions

I have released a new version of the RegEx Tester tool. You can download it free from http://www.codeproject.com/KB/string/regextester.aspx and http://sourceforge.net/projects/regextester

With RegEx Tester you can fully develop and test your regular expression against a target text. It's UI is designed to aid you in the RegEx developing. It uses and supports ALL of the features available in the .NET RegEx Class.

I updated the code slightly so it takes the library name as a command line parameter. If it exists it will open the file automatically, if it doesn't it just prefills the file name in the form and that's it. It's useful because now I can set it as a tool in VS and not hunt for the file each time. If you want it say so and I'll post the code.

A problem I've been having is that it seems to ignore the path and always puts the output file in the working directory. It happens in the unmodified version too, and I can't seem to get around it. I've tried putting a file:/// prefix on AssemblyName.CodeBase like the docs say, changing slashes to backslashes, etc. and nothing works. It's not a huge deal as I've been working around it by making the working directory the same as the file's, just thought I'd point it out. I'm using Vista if it matters.

Now way man - that is a cool tool - part of what I am looking for - I was thinking to do this all manUwelly (by hand) - what a cool deal..

Now the only other part I need is the REGEx Creator - I put my pattern in and I get options for the info

For example
1XRay FoxTrot1 (1299)

Each digit is a column and I tab through each column with checkbox - Check if Only numbers / only this number / only Alpha / only this Alpha
Grouping character Range (or acceptable range) etc.. then based on that result I get the last of the question to further reduce the RegEx .. maybe harder to code than to do manually.. but the thought to place a string in the box and puff a useable Expression makes my mouth water.

Anyway, you mentioned that you have made "modifications" in several of the posts and I was trying to "nudge" you to post an update. I been doing soemething simliar with RegEx, but not nearly as slick. I have worked with numerous organizations/ people and have had to deal with many spiderwebs of code changes because validation was not centralized as I am sure many other users have.

I believe compiled and centralized libraries are the only way to go and I would like to have the assemblies be strongly named, use all of the reflection properties, etc.

You also mentioned that you have over 60 expressions that you have added to your library and I know that I would be interested in those as well (assuming you don't mind sharing them).

I must admit that I had put the update off as I didn't think anyone was interested in using the tool. I'm a little busy this weekend but I should be able to get the time to write and post an update sometime next week. When I do I will post here and let you know.

I have a feeling that most people that are interested, unfortunately, don't take the time to leave a message.

I would actually prefer not to suggest "new" features in order to get a copy of "where its at now" sooner. (sorry - selfish here)

I am starting a new project from scratch and I would love to use a RegEx library as the validation core and I would be happy to suggest as I go (we all know that most suggestions come from the nuances of implementation).

Anyway, glad to know you are still out there and I don't expect perfection.

Hello Brian,
Great article! I learnt to supply also a Version via the AssemblyName to my Regex DLL.
But:
Does anybody know, how the other Assembly-Attributes like Title, Producer, Trademark, Copyright
can be set?
I tried something with the CompileToAssembly overload that gets an Array of CustomAttributeBuilder objects as input but I did'nt find a way to set assembly attributes there.

The above code was really useful to me. It's the only code sample that I managed to find anywhere that showed how to set the AssemblyInfo while calling CompileToAssembly. It was exactly what I needed!

This only sets the assembly properties, however. In Windows Explorer, the assembly's Properties Details remain empty. To set these, you also have to call the CompileToAssembly overload that takes a fourth parameter, which is the name of a Win32 resource file. Note that this is an old-style Win32 resource, not a .NET resx. To generate this resource file, you first have to create a .rc file containing the version information. (There are plenty of examples of .rc files on the internet.) Then, you can compile the .rc file to a Win32 resource file at the command line, similar to this:

The actual paths will vary depending on which version of Visual Studio or the Windows SDK you have; the above works for me with Visual Studio 2010 and 2013 installed. This creates a file called "Win32Resources.res"; this is the resource file name that you need to pass as the fourth parameter to CompileToAssembly.

It should be easy enough to write a command-line version of this as all the primary functionality is in the RegexLibraryBuilder class and the RegexLibraryLoader class nested in it. It's just a matter of coming up with intuitive command-line options and switches... or do you mean to pass in the regex's in an file (XML/txt/etc.)?

I will put a command-line version together this evening if I get the chance. If not, I'll do it over the weekend.

RossDonald wrote:Yeah, I was going to have an XML configuration file that had all of the regexes defined then run the tool against it to build the dll.

Probably the handiest way to do it.

RossDonald wrote:There is no rush as I am not working with that project at the moment.

I had wanted to throw a command-line version together anyway so I'll probably get around to doing it this evening. If you have a suggestion for the schema of the XML files let me know or drop me an email.