Java

Regular Expressions

Regular expressions are a mechanism for telling the Java Virtual Machine (JVM) how to find and manipulate text for you. Using regular expressions to do this is different from the traditional approach. This article compares the two approaches. It is excerpted from Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070).

Thus far, youíve worked almost exclusively with regular expressions, but not really with Java. Now itís time to consider how the two interact. The following examples differ from the preceding ones in that they incorporate Java code with regular expressions. They offer a more complete picture of how you can use some J2SE regex syntax.

Some of the regular expressions youíll see here are slightly more advanced than in the examples youíve seen previously, as they build on the fundamentals discussed thus far in the chapter. For example, Listing 1-2 combines groups with quantifiers.

Donít be discouraged if the patterns themselves arenít completely clear to you right now. An intuitive understanding will develop as you continue to read this book. Focus on the concepts and become comfortable with how the Java code and the regex complement each other.

There are only two pieces of information you need to take full advantage of the following examples:

Any \-delimited regex expression metacharacter needs to be delimited once again when itís used in Java code. Thus, \dbecomes \\dand \sbecomes\\sin your Java code. Correspondingly, a more complex expression such as (\d-)?(\d{3}-)?\d{3}-\d{4}\s becomes (\\d-)?(\\d{3}-)?\\d{3}-\\d{4}\\sin Java code. All \ characters are doubled to produce \\when theyíre used in a String object.

In this book, when I talk about a regular expression in and of itself, I donít use the double delimiting mechanism. However, I do when working with specific coding examples.

The String.matches(String regex) method is a new method that has been added to the String class. It compares the String itís called on to the given regular expression, regex, and returns true if the regex pattern matches the String exactly. To match exactly means that the String in question canít contain any charactersónot even invisible characters such as newlines and spacesóthat arenít accounted for in the regex pattern.

Confirming Phone Number Formats Example

The code in Listing 1-2 simply determines if the given phone number meets the criteria of being well formatted. It takes advantage of two metacharacters introduced in Table 1-6. Specifically it uses range,{n,m}, indicating that the previous character or class must be repeated at least n times and no more than m times. It also uses the ?character, indicating the previous character or class must be present zero or one time.

The pattern as a whole checks for seven digits preceded by optional country and area codes. Output 1-2 shows the result of running the program, and Table 1-19 dissects the pattern.

In English:Look for a single digit followed by a hyphen. This is optional. Then, look for three digits followed by a hyphen. This is also optional. Next, look for three digits, followed by a hyphen, followed by four digits.

Confirming Zip Codes Example

The code in Listing 1-3 determines if the zip code meets the criterion of being well formatted. It checks for five digits optionally followed by a hyphen and four digits. Output 1-3 shows the result of running the program. Table 1-20 dissects the pattern.

In English: Look for five digits, optionally followed by a hyphen and four digits.

Confirming Dates Example

The code in Listing 1-4 checks the format of a given date. It confirms that given date format consists of one or two digits followed by a hyphen, followed by one or two digits, followed by a hyphen, followed by four digits. Output 1-4 shows the result of running the program. Table 1-21 dissects the pattern.