The Perl regular expression syntax is based on that used by the programming
language Perl . Perl regular expressions are the default behavior in Boost.Regex
or you can pass the flag perl to the basic_regex constructor, for example:

// e1 is a case sensitive Perl regular expression: // since Perl is the default option there's no need to explicitly specify the syntax used here:boost::regexe1(my_expression);// e2 a case insensitive Perl regular expression:boost::regexe2(my_expression,boost::regex::perl|boost::regex::icase);

A section beginning ( and ending )
acts as a marked sub-expression. Whatever matched the sub-expression is split
out in a separate field by the matching algorithms. Marked sub-expressions
can also repeated, or referred to by a back-reference.

A marked sub-expression is useful to lexically group part of a regular expression,
but has the side-effect of spitting out an extra field in the result. As
an alternative you can lexically group part of a regular expression, without
generating a marked sub-expression by using (?: and )
, for example (?:ab)+ will repeat ab
without splitting out any separate sub-expressions.

Any atom (a single character, a marked sub-expression, or a character class)
can be repeated with the *, +, ?,
and {} operators.

The * operator will match the preceding atom zero or more
times, for example the expression a*b will match any of
the following:

babaaaaaaaab

The + operator will match the preceding atom one or more
times, for example the expression a+b will match any of
the following:

abaaaaaaaab

But will not match:

b

The ? operator will match the preceding atom zero or one
times, for example the expression ca?b will match any of the following:

cbcab

But will not match:

caab

An atom can also be repeated with a bounded repeat:

a{n} Matches 'a' repeated exactly n times.

a{n,} Matches 'a' repeated n or more times.

a{n, m} Matches 'a' repeated between n and m times inclusive.

For example:

^a{2,3}$

Will match either of:

aaaaa

But neither of:

aaaaa

Note that the "{" and "}" characters will treated as
ordinary literals when used in a context that is not a repeat: this matches
Perl 5.x behavior. For example in the expressions "ab{1", "ab1}"
and "a{b}c" the curly brackets are all treated as literals and
no error will be raised.

It is an error to use a repeat operator, if the preceding construct can not
be repeated, for example:

a(*)

Will raise an error, as there is nothing for the * operator
to be applied to.

The normal repeat operators are "greedy", that is to say they will
consume as much input as possible. There are non-greedy versions available
that will consume as little input as possible while still producing a match.

*? Matches the previous atom zero or more times, while
consuming as little input as possible.

+? Matches the previous atom one or more times, while
consuming as little input as possible.

?? Matches the previous atom zero or one times, while
consuming as little input as possible.

{n,}? Matches the previous atom n or more times, while
consuming as little input as possible.

{n,m}? Matches the previous atom between n and m times,
while consuming as little input as possible.

By default when a repeated pattern does not match then the engine will backtrack
until a match is found. However, this behaviour can sometime be undesireble
so there are also "possessive" repeats: these match as much as
possible and do not then allow backtracking if the rest of the expression
fails to match.

*+ Matches the previous atom zero or more times, while
giving nothing back.

++ Matches the previous atom one or more times, while
giving nothing back.

?+ Matches the previous atom zero or one times, while
giving nothing back.

{n,}+ Matches the previous atom n or more times, while
giving nothing back.

For example [a-c] will match any single character in the
range 'a' to 'c'. By default, for Perl regular expressions, a character x
is within the range y to z, if the code point of the character lies within
the codepoints of the endpoints of the range. Alternatively, if you set the
collate
flag when constructing the regular expression, then ranges are locale
sensitive.

An expression of the form [[.col.]] matches the collating
element col. A collating element is any single character,
or any sequence of characters that collates as a single unit. Collating elements
may also be used as the end point of a range, for example: [[.ae.]-c]
matches the character sequence "ae", plus any single character
in the range "ae"-c, assuming that "ae" is treated as
a single collating element in the current locale.

As an extension, a collating element may also be specified via it's symbolic name, for example:

An expression of the form [[=col=]], matches any character
or collating element whose primary sort key is the same as that for collating
element col, as with collating elements the name col
may be a symbolic name.
A primary sort key is one that ignores case, accentation, or locale-specific
tailorings; so for example [[=a=]] matches
any of the characters: a, À, Á, Â, Ã, Ä, Å, A, à, á, â, ã, ä and å. Unfortunately implementation
of this is reliant on the platform's collation and localisation support;
this feature can not be relied upon to work portably across all platforms,
or even all locales on one platform.

All the escape sequences that match a single character, or a single character
class are permitted within a character class definition. For example [\[\]] would match either of [ or ]
while [\W\d]
would match any character that is either a "digit", or
is not a "word" character.

Any escaped character x, if x is
the name of a character class shall match any character that is a member
of that class, and any escaped character X, if x
is the name of a character class, shall match any character not in that class.

The following match only at buffer boundaries: a "buffer" in this
context is the whole of the input text that is being matched against (note
that ^ and $ may match embedded newlines within the text).

\` Matches at the start of a buffer only.

\' Matches at the end of a buffer only.

\A Matches at the start of a buffer only (the same as \`).

\z Matches at the end of a buffer only (the same as \').

\Z Matches a zero-width assertion consisting of an optional sequence of newlines
at the end of a buffer: equivalent to the regular expression (?=\v*\z).
Note that this is subtly different from Perl which behaves as if matching
(?=\n?\z).

The sequence \G matches only at the end of the last match
found, or at the start of the text being matched if no previous match was
found. This escape useful if you're iterating over the matches contained
within a text, and you want each subsequence match to start where the last
one ended.

The escape sequence \Q begins a "quoted sequence":
all the subsequent characters are treated as literals, until either the end
of the regular expression or \E is found. For example the expression: \Q*+\Ea+
would match either of:

\C Matches a single code point: in Boost regex this has
exactly the same effect as a "." operator. \X
Matches a combining character sequence: that is any non-combining character
followed by a sequence of zero or more combining characters.

\K Resets the start location of $0 to the current text
position: in other words everything to the left of \K is "kept back"
and does not form part of the regular expression match. $` is updated accordingly.

For example foo\Kbar matched against the text "foobar"
would return the match "bar" for $0 and "foo" for $`.
This can be used to simulate variable width lookbehind assertions.

Which can be then be referred to by the name NAME. Alternatively
you can delimit the name using 'NAME' as in:

(?'NAME'expression)

These named subexpressions can be referred to in a backreference using either
\g{NAME} or \k<NAME> and can
also be referred to by name in a Perl
format string for search and replace operations, or in the match_results member functions.

(?imsx-imsx ... ) alters which of the perl modifiers are
in effect within the pattern, changes take effect from the point that the
block is first seen and extend to any enclosing ). Letters
before a '-' turn that perl modifier on, letters afterward, turn it off.

(?|pattern) resets the subexpression count at the start
of each "|" alternative within pattern.

The sub-expression count following this construct is that of whichever branch
had the largest number of sub-expressions. This construct is useful when
you want to capture one of a number of alternative matches in a single sub-expression
index.

In the following example the index of each sub-expression is shown below
the expression:

Lookahead is typically used to create the logical AND of two regular expressions,
for example if a password must contain a lower case letter, an upper case
letter, a punctuation symbol, and be at least 6 characters long, then the
expression:

(?>pattern)pattern is matched
independently of the surrounding patterns, the expression will never backtrack
into pattern. Independent sub-expressions are typically
used to improve performance; only the best possible match for pattern will
be considered, if this doesn't allow the expression as a whole to match then
no match is found at all.

(?(condition)yes-pattern|no-pattern) attempts to match
yes-pattern if the condition is
true, otherwise attempts to match no-pattern.

(?(condition)yes-pattern) attempts to match yes-pattern
if the condition is true, otherwise matches the NULL
string.

condition may be either: a forward lookahead assert,
the index of a marked sub-expression (the condition becomes true if the sub-expression
has been matched), or an index of a recursion (the condition become true
if we are executing directly inside the specified recursion).

(?(<name>)yes-pattern|no-pattern)
Executes yes-pattern if named subexpression name
has been matched, otherwise executes no-pattern.

(?('name')yes-pattern|no-pattern)
Executes yes-pattern if named subexpression name
has been matched, otherwise executes no-pattern.

(?(R)yes-pattern|no-pattern) Executes yes-pattern
if we are executing inside a recursion, otherwise executes no-pattern.

(?(RN)yes-pattern|no-pattern)
Executes yes-pattern if we are executing inside
a recursion to sub-expression N, otherwise executes
no-pattern.

(?(R&name)yes-pattern|no-pattern)
Executes yes-pattern if we are executing inside
a recursion to named sub-expression name, otherwise
executes no-pattern.

(?(DEFINE)never-exectuted-pattern) Defines a block
of code that is never executed and matches no characters: this is usually
used to define one or more named sub-expressions which are referred to
from elsewhere in the pattern.

If you view the regular expression as a directed (possibly cyclic) graph,
then the best match found is the first match found by a depth-first-search
performed on that graph, while matching the input text.

Alternatively:

The best match found is the leftmost
match, with individual elements matched as follows;

Construct

What gets matched

AtomA AtomB

Locates the best match for AtomA that has
a following match for AtomB.

Expression1 | Expression2

If Expresion1 can be matched then returns
that match, otherwise attempts to match Expression2.

S{N}

Matches S repeated exactly N times.

S{N,M}

Matches S repeated between N and M times, and as many times as
possible.

S{N,M}?

Matches S repeated between N and M times, and as few times as possible.

S?, S*, S+

The same as S{0,1}, S{0,UINT_MAX},
S{1,UINT_MAX} respectively.

S??, S*?, S+?

The same as S{0,1}?, S{0,UINT_MAX}?,
S{1,UINT_MAX}? respectively.

(?>S)

Matches the best match for S, and only that.

(?=S), (?<=S)

Matches only the best match for S (this is
only visible if there are capturing parenthesis within S).

(?!S), (?<!S)

Considers only whether a match for S exists or not.

(?(condition)yes-pattern | no-pattern)

If condition is true, then only yes-pattern is considered, otherwise
only no-pattern is considered.

There are a variety
of flags that may be combined with the perl option
when constructing the regular expression, in particular note that the newline_alt
option alters the syntax, while the collate, nosubs
and icase options modify how the case and locale sensitivity
are to be applied.