Manipulate Text With Regular Expressions

Manipulate Text With Regular Expressions Determine whether a given date is greater than or equal to a predefined date, and learn how to add to the list of Standard Expressions in the Regular Expression Editor. by Bill Wagner

Posted April 9, 2004

Technology Toolbox: VB.NET, C#, ASP.NET

Q:Advanced Text Processing With RegEx
Is it possible to check whether a given date, in DD-MM-YYYY format, is greater than or equal to a predefined date (say 25-09-2003) using regular expressions?

A:
Regular expressions are a powerful tool for manipulating textual information, allowing you to perform the sophisticated processing you describe. Before I explain the solution, let's back up one step and discuss why you would use Regex for this problem. You would think that it would be much easier to create a System.DateTime structure with the date requested, then use System.DateTime.Compare() to determine if the given date is greater than your predefined limit. In fact, that is how I would solve this problem in almost all cases. The one case where it's worth the extra work to use regular expressions is when you want to perform the validation client-side in a Web application. ECMAScript supports regular expressions, which means you can push regular expression validation to the client. (The simpler-to-write System.DateTime comparison must take place at the server.)

Here's how it works. The regular expression language processes text input by reading the string and comparing it against a matching sequence. Let's build the regular expression that answers your question; we'll start by matching any date between 01/01/1990 and 31/12/2099. This regular expression performs that test:

(0[1-9]|[12][0-9]|3[01])[-\./]
(0[1-9]|1[012])[- /\.](19|20)\d\d

The expression is fairly easy to understand once you split it into sections. (0[1-9]|[12][0-9]|3[01]) matches a day of the month, expressed as a two-digit number: 01 to 31. 0[1-9] matches 01, 02 ? 09. It matches a zero, followed by any digit from one to nine. [12][0-9] matches any number from 11 to 29. The first character must be a one or a two, followed by any character from one to nine. 3[01] matches 30 or 31. If you OR all three of them using the vertical bar, you get every number from 0 to 31, expressed with two digits.

I added one small feature from your request above. [-\./] matches any of the three characters I allowed as separators: (/), (.), or (-). The period is significant in regular expressions, so it must be escaped with the backslash.

The month portion uses the same approach: (0[1-9]|1[012]) matches any two-digit number from 01 to 12. The year portion matches any four-digit number between 1900 and 2099. (Remember that regular expressions process text; you cannot process numbers as integers.) The year section, (19|20)\d\d, matches either 19 or 20, followed by two digits.

Put them all together and you're able to match any valid datewith a few minor problems. For example, this regular expression does not contain any logic to validate the date based on the month. It would erroneously allow for February 31st.

The techniques I'll show you to handle these specific validations follow the example in your question. You want to find whether a date is after 25-09-2003. This involves using two features of regular expressions known as grouping and alternation. A grouping is simply a match found earlier in the string. An alternation is the simplified form of if-then-else available in the regular expression language. Let's start with a simple example. Suppose you wanted to match a1 or b2. The right way to do that is simply (a1|b2), but that won't show you how to use grouping and alternation. Here's a sample that satisfies the requirements:

((?<foundA>a)|b)(?(foundA)1|2)

This regular expression builds a conditional that finds "a" or "b" and keeps track of whether "a" was found. The next character must be 1 if "a" was found. Otherwise, the next character must be 2.

(?<foundA>a) is a named group. The group foundA is TRUE if the first character is "a." Otherwise, the group foundA is FALSE. The alternation construct, (?(foundA)1|2), is read If foundA is TRUE, match 1, else, match 2.

You'll build a much more complicated expression using these constructs to match any day after 25-09-2003. In fact, here's the completed regular expression:

I didn't say it was easy. I just said it was possible. Let's break it apart so you can see how it works and you can extend it in your own development activities.

((?<smallDate>0[1-9]|1[0-9]|2[0-5])| matches any day of the month between 01 and 25. You'll keep track of that as "smallDate" to know which year is valid. (?<largeDate>2[6-9]|3[01]))[-/\.] matches any day of the month between 26 and 31. You'll keep track of that as "largeDate" to know which year is valid. The separator pattern [-/\.] is tacked onto the end of the date matches.

((?<earlyMonth>0[1-8])| matches any month from 01 through 08. Once again, you'll keep track of that as "earlyMonth" to evaluate which year is valid. (09)| matches September. (?<lateMonth>1[0-2]))[-/\.] matches any month from October through December. The separator [-/\.] is tacked on the end.

20 means I'm only allowing dates in this century. The last four lines are the hard part. These say:

if the month was 10-12
Then
Match 03-99.
Else if the month was 01-08
Then
Match 04-99
Else (which means September) if the date was 01-25
Match 04-99
Else
Match 03-99

Referring back to the code, it works like this: (?(lateMonth)(0[3-9]|[1-9][0-9])| says, "if the group named 'lateMonth' was a match, match any two-digit string from 03 through 99. Else ?"

(?(earlyMonth)(0[4-9]|[1-9][0-9])| says, "if the group named 'earlyMonth' was a match, match any two digit string from 04-99. Else ?"

(?(smallDate)(0[4-9]|[1-9][0-9])| says, "if the group named 'smallDate' was a match, match any two digit string from 04-99. Else ?"

(0[3-9]|[1-9][0-9])))) says, "match any two digit string from 03-99."

Regular expressions are powerful, but the syntax is not particularly easy to follow. Aren't you glad you have more expressive tools and class libraries at your disposal for everyday validation tasks? I know I am. In fact, I started adding the logic to validate the date based on the month and year. The number of conditionals and alternations became so long and convoluted that I seriously recommend the server-based solution using the System.DateTime class. It might be the wisest solution.

Q:Use ValidationExpression as a Code Snippet
Is there a way to add to the list of Standard Expressions in the Regular Expression Editor? I know how to use (Custom), but I want to add to the Validation Expression list that appears.

A:
It's really tempting after just coming up from that deep discussion of regular expression syntax to give you the simple answer: No. But that would leave you with that nagging feeling that I missed something.

Unfortunately, the answer is still no, at least not the way you want. The dialog box (see Figure 1) you're talking about is in the System.Design.dll assembly. It is not customizable, and you can't change its behavior.

This is where I use code snippets. Open up the code view of your ASPX page, and you can see ValidationExpression. Select the text and drag it to the toolbox in VS.NET. That adds it to the available code snippets. Simply drag it back into the page when you want to use it.

There are two drawbacks to this approach. First, you can't share code snippets easily. Each user must install his or her own snippets. Second, you can't drag snippets onto the property browser. Instead, you have to open the HTML view and drag the snippets onto it.

I agree that it would be better to add expressions to the dialog, but that's not possible. It strikes me as a bad tradeoff to write a custom control and a custom designer to get this feature. That's why I use VS.NET code snippets instead.

About the Author
Bill Wagner is a founder and consultant with SRT Solutions, specializing in scientific and engineering applications using .NET. He is the author of C# Core Language Little Black Book (The Coriolis Group), a contributing editor for VSM, and a regular contributor to .NET Insight. Bill has been developing software for more than 15 years and still enjoys learning new techniques for software development. Reach him at wwagner@srtsolutions.com.

About the Author

Bill Wagner, author of Effective C#, has been a commercial software developer for the past 20 years. He is a Microsoft Regional Director and a Visual C# MVP. His interests include the C# language, the .NET Framework and software design. Reach Bill at wwagner@srtsolutions.com.