Jeff Moser's How .NET Regular Expressions Really Work

Jeff Moser's has done an in-depth study of how regular expressions work in .NET. His article covers the core operating principals of Microsoft’s implementation such as the machine code used by compiled regular expressions.

The first thing he reveals is that the last 15 regular expressions are cached. For those little utility applications that only use one or two expressions, this means explicitly creating a RegEx object is probably not necessary.

When compiling a regular expression, the first step consists of a scanner than emits a RegexTree. Looking at just the leaf node, this resembles the source code to a fair extend. Next this is translated into the machine code of the regular expression engine.

The bulk of the work is done by the 250 line switch statement that makes up the EmitFragment function. This function breaks up RegexTree "fragments" and converts them to a simpler RegexCode.

[…]

The reward for all this work is an integer array that describes the RegexCode "op codes" and their arguments. You can see that some instructions like "Setrep" take a string argument. These arguments point to offsets in a string table. This is why it was critical to pack everything about a set into the obscure string we saw earlier. It was the only way to pass that information to the instruction.

Is your profile up-to-date? Please take a moment to review and update.

Email Address

Note: If updating/changing your email, a validation request will be sent

Company name:

Keep current company name

Update Company name to:

Company role:

Keep current company role

Update company role to:

Company size:

Keep current company Size

Update company size to:

Country/Zone:

Keep current country/zone

Update country/zone to:

State/Province/Region:

Keep current state/province/region

Update state/province/region to:

Subscribe to our newsletter?

Subscribe to our industry email notices?

You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.

We notice you're using an ad blocker

We understand why you use ad blockers. However to keep InfoQ free we need your support. InfoQ will not provide your data to third parties without individual opt-in consent. We only work with advertisers relevant to our readers. Please consider whitelisting us.