Earlier we noted that the sequences \1, \2,
and so on are
available in the pattern, standing for the nth group matched so
far. The same sequences are available in the second argument of
sub and gsub.

"fred:smith".sub(/(\w+):(\w+)/, '\2, \1')

�

"smith, fred"

"nercpyitno".gsub(/(.)(.)/, '\2\1')

�

"encryption"

There are additional backslash sequences that work in substitution
strings: \& (last match), \+ (last matched group),
\` (string prior to match), \' (string after match), and
\\ (a literal backslash).
It gets confusing if you want to include a literal backslash in a
substitution. The obvious thing is to write

str.gsub(/\\/, '\\\\')

Clearly, this code is trying to replace each backslash in str
with two. The programmer doubled up the backslashes in the replacement
text, knowing that they'd be converted to ``\\'' in syntax
analysis. However, when the substitution occurs, the regular
expression engine performs another pass through the string, converting
``\\'' to ``\'', so the net effect is to replace
each single backslash with another single backslash. You need to write
gsub(/\\/, '\\\\\\\\')!

str = 'a\b\c'

�

"a\b\c"

str.gsub(/\\/, '\\\\\\\\')

�

"a\\b\\c"

However, using the fact that \& is replaced by the matched
string, you could also write

str = 'a\b\c'

�

"a\b\c"

str.gsub(/\\/, '\&\&')

�

"a\\b\\c"

If you use the block form of gsub, the string
for substitution is analyzed only once (during the syntax pass) and
the result is what you intended.

str = 'a\b\c'

�

"a\b\c"

str.gsub(/\\/) { '\\\\' }

�

"a\\b\\c"

Finally, as an example of the wonderful expressiveness of combining
regular expressions with code blocks, consider the following code
fragment from the CGI library module, written by Wakou Aoyama. The code takes a string containing HTML
escape sequences and converts it into normal ASCII. Because it was
written for a Japanese audience, it uses the ``n'' modifier on the
regular expressions, which turns off wide-character processing. It
also illustrates Ruby's case expression, which we discuss
starting on page 81.

def unescapeHTML(string)
str = string.dup
str.gsub!(/&(.*?);/n) {
match = $1.dup
case match
when /\Aamp\z/ni then '&'
when /\Aquot\z/ni then '"'
when /\Agt\z/ni then '>'
when /\Alt\z/ni then '<'
when /\A#(\d+)\z/n then Integer($1).chr
when /\A#x([0-9a-f]+)\z/ni then $1.hex.chr
end
}
str
end