(Mathematicians don't typically put quotes around a string, preferring to let the fixed-width typewriter font distinguish it as one, but I'm guessing
that programmers are more comfortable with the quotes around strings.)

In regular language theory, there are two atomic languages:

$\epsilon$ -- the null language, which contains the string of length zero; and

$\emptyset$ -- the empty language, which contains no strings at all.

In almost every programming language, the null string is written "".

Mathematicians are often sloppy with the notation for the null language, using $\epsilon$ to represent
both the null language, {""}, and the null string, "".

For each character c in the alphabet,
there is a corresponding one-character
primitive language, {"c"}.

(The alphabet is a set of characters, usually denoted $\Sigma$ or $A$.)

Once again, mathematicians are often sloppy in their notation, using the character c
to mean the language {"c"}.

Regular languages are those that can be obtained by unrestricted composition of
the operations union, concatenation and Kleene star on the atomic and primitive languages:

But, this script still breaks if there are nested body tags in the document.

If nesting in a pattern matters,
it's probably time to switch to a formalism
more powerful than regular languages,
such as context-free languages.

Useful operations

The group operation { operation1 ; ... ; operationn }
executes all of the specified operations, in order, on the given address.

The operation s/pattern/replacement/arguments
replaces instances of pattern with replacement
according to the arguments
in the current line.
In the replacement, \n stands for the nth submatch,
while & represents the entire match.

The operation b branches to a label, and if none is specified,
then sed skips to processing the next line.
Think of this as a break operation.

The operation y/from/to/
transliterates the characters in from to their corresponding
character in to.

The operation q quits sed.

The operation d deletes the current line.

The operation w file writes the
current line to the specified file.

Common arguments to the substitute operation

The most common argument to the substitute command
is g, which means "globally" replace
all matches on the current line, instead of just the first.

Sometimes, other arguments are useful:

n tells sed to replace the nth match only, instead of the first.

p prints out the result if there is a substitution.

i ignores case during the match.

w file writes the current line to file.

Useful flags

-n suppresses automatic printing of each result; to print a result, use command p.

Next steps with sed

There are label (:) and branching commands (b,
t) that allow loops, and in theory, arbitrary
(Turing-equivalent) computation.

sed
keeps track of both a pattern space (the current line) and
hold space, and there are commands to manipulate both of them, e.g.,
g, G, h and H.

That said, you should probably never use these commands!

If you find yourself tempted to use these more advanced constructs, it's a
sign that you want to use a tool like awk or Perl instead.

AWK

The awk command provides a more
traditional programming language for text processing
than sed.

Those accustomed to seeing only hairy awk one-liners might not
even realize that AWK is a real programming language.
For example, here's a comprehensible
AWK program that prints the factorial of each line: