Basic Regular expression summary

These are the main regular expression characters that you should learn. For
the full set of regular expression characters, see Regular expression summary.
[Attribution: Much of this is copied from the Java API documentation. I'm
rewriting it and will Sun's text soon.]

Character classes

Character classes provide a way to specify a set of
characters. The set can be explicitly enclosed in []. The set can also
be expressed by what must not be in it by beginning the set with a
caret, "^". There are a number of predefined sets (eg, d, s, etc). The
minus, "-", can be used to indicate a range of character values. Altho a
character class matches only one character, a quantifier following it
can be used to match multiple characters.

[abc]

a, b, or c (simple class)

[^abc]

Any character except a, b, or c
(negation)

[a-zA-Z]

a through z or A through Z,
inclusive (range)

Predefined character classes

.

Any character (may or may not match line terminators)

d

A digit: [0-9]

D

A non-digit: [^0-9]

s

A whitespace character: [
x0Bf
]

S

A non-whitespace character: [^s]

w

A word character: [a-zA-Z_0-9]

W

A non-word character: [^w]

Quantifiers (repeating the previous element)

Greedy quantifiers - Expand as much as possible

X?

X, once or not at all

X*

X, zero or more times

X+

X, one or more times

X{n}

X, exactly n times

X{n,}

X, at least n times

X{n,m}

X, at least n but not more than m times

Reluctant quantifiers - Expand only if forced by later
failure to match

X??

X, once or not at all

X*?

X, zero or more times

X+?

X, one or more times

X{n}?

X, exactly n times

X{n,}?

X, at least n times

X{n,m}?

X, at least n but not more than m times

Boundary matchers - Zero-width matches.

^

The beginning of a line. Very useful.

$

The end of a line. Very userful. ^$ matches all emtpy
lines.

A word boundary

B

A non-word boundary

A

The beginning of the input

G

The end of the previous match

The end of the input but for the final terminator, if any

z

The end of the input

Other

Logical operators

XY

X followed by Y

X|Y

Either X or Y

Grouping - Parentheses both group and create a numbered
element that can be used later.

(X)

X. This capturing group is remembered so it can be
referenced later. Numbered starting at 1.

Quotation

Nothing, but quotes the following character.

Characters

x

The character x

The backslash character

The tab character ('u0009')

The newline (line feed) character ('u000A')

The carriage-return character ('u000D')

f

The form-feed character ('u000C')

Most of this material is copyright Sun Microsystems and is reproduced here
for educational purposes.