6. OTHER ISSUES

6.1. I have a certain problem that stumps me. Where can I get help?

Post your question on the "sed-users" mailing list (section 2.3.2),
where many sed users will be able to see your question. You will have
to subscribe to have posting privileges.

Your other alternative is one of these newsgroups:

alt.comp.editors.batch

comp.editors

comp.unix.questions

comp.unix.shell

6.2. How does sed compare with awk, perl, and other utilities?

Awk is a much richer language with many features of a programming
language, including variable names, math functions, arrays, system
calls, etc. Its command structure is similar to sed:

address { command(s) }

which means that for each line or range of lines that matches the
address, execute the command(s). In both sed and awk, an address
can be a line number or a RE somewhere on the line, or both.

In program size, awk is 3-10 times larger than sed. Awk has most of
the functions of sed, but not all. Notably, sed supports
backreferences (\1, \2, ...) to previous expressions, and awk does
not have any comparable syntax. (One exception: GNU awk v3.0
introduced gensub(), which supports backreferences only on
substitutions.)

Perl is a general-purpose programming language, with many features
beyond text processing and interprocess communication, taking it
well past awk or other scripting languages. Perl supports every
feature sed does and has its own set of extended regular
expressions, which give it extensive power in pattern matching and
processing. (Note: the standard perl distribution comes with 's2p',
a sed-to-perl conversion script. See section 3.6 for more info.)
Like sed and awk, perl scripts do not need to be compiled into
binary code. Like sed, perl can also run many useful "one-liners"
from the command line, though with greater flexibility; see
question 4.41 ("How do I make substitutions in every file in a
directory, or in a complete directory tree?").

On the other hand, the current version of perl is from 8 to 35
times larger than sed in its executables alone (perl's library
modules and allied files not included!). Further, for most simple
tasks such as substitution, sed executes more quickly than either
perl or awk. All these utilities serve to process input text,
transforming it to meet our needs . . . or our arbitrary whims.

6.3. When should I use sed?

When you need a small, fast program to modify words, lines, or
blocks of lines in a textfile.

6.4. When should I NOT use sed?

You should not use sed when you have "dedicated" tools which can do
the job faster or with an easier syntax. Do not use sed when you
only want to:

print individual lines, based on patterns within the line itself.
Instead, use "grep".

print blocks of lines, with 1 or more lines of context above or
below a specific regular expression. Instead, use the GNU version
of grep as follows:

grep -A{number} -B{number} "regex"

remove individual lines, based on patterns within the line
itself. Instead, use "grep -v".

print line numbers. Instead, use "nl" or "cat -n".

reformat lines or paragraphs. Instead, use "fold", "fmt" or "par".

The tr utility is also more suited than sed to some simple tasks. For
example, to:

delete individual characters. Instead of "s/[a-d]//g", use

tr -d "[a-d]"

squeeze sequential characters. Instead of "s/ee*/e/g", use

tr -s "{character-set}"

change individual characters. Instead of "y/abcdef/ABCDEF/", use

tr "[a-f]" "[A-F]"

Note, however, that tr does not support giving input files on the
command line, so the syntax is:

tr {options-and-patterns} < input-file

or, to process multiple files:

cat input-file1 input-file2 | tr {options-and-patterns}

If you have multiple files, using tr instead of sed is often more of
an exercise than a useful thing. Although sed can perfectly emulate
certain functions of cat, grep, nl, rev, sort, tac, tail, tr, uniq,
and other utilities, producing identical output, the native utilities
are usually optimized to do the job more quickly than sed.

6.5. When should I ignore sed and use awk or Perl instead?

If you can write the same script in awk or Perl and do it in less
time, then use Perl or awk. There's no reason to spend an hour
writing and debugging a sed script if you can do it in Perl in 10
minutes (assuming that you know Perl already) and if the processing
time or memory use is not a factor. Don't hunt pheasants with a .22
if you have a shotgun at your side . . . unless you simply enjoy
the challenge!

Specifically, use awk or perl if you need to:

count fields or words on a line. (awk)

count lines in a block or objects in a file.

check lengths of strings or do math operations.

handle very long lines or need very large buffers. (or gsed)

handle binary data (control characters). (perl: binmode)

loop through an array or list.

test for file existence, filesize, or fileage.

treat each paragraph as a line. (well, not always)

6.6. Known limitations among sed versions

Limits on distributed versions, although source code for most
versions of free sed allows for modification and recompilation. As
used below, "no limit" means there is no "fixed" limit. Limits are
actually determined by one's hardware, memory, operating system,
and which C library is used to compile sed.

6.6.6. Limits on length of write-file names

6.6.7. Limits on branch/jump commands

GNU sed: no limit
ssed: no limit
HHsed v1.5: 50
sed v1.6: [pending]

As a practical consequence, this means that HHsed will not read
more than 50 lines into the pattern space via an N command, even if
the pattern space is only a few hundred bytes in size. HHsed exits
with an error message, "infinite branch loop at line {nn}".

6.7. Known incompatibilities between sed versions

6.7.1. Issuing commands from the command line

Most versions of sed permit multiple commands to issued on the
command line, separated by a semicolon (;). Thus,

sed 'G;G' file

should triple-space a file. However, for non-GNU sed, some commands
require separate expressions on the command line. These include:

all labels (':a', ':more', etc.)

all branching instructions ('b', 't')

commands to read and write files ('r' and 'w')

any closing brace, '}'

If these commands are used, they must be the LAST commands of an
expression. Subsequent commands must use another expression
(another -e switch plus arguments). E.g.,

sed -e :a -e 's/^.\{1,77\}$/ &/;ta' -e 's/\( *\)\1/\1/' files

GNU sed, ssed, sed15 and sed16 all permit these commands to be
followed by a semicolon, so the previous script can be written:

6.7.2. Using comments (prefixed by the '#' sign)

Most versions of sed permit comments to appear in sed scripts only
on the first line of the script. Comments on line 2 or thereafter
are not recognized and will generate an error like "unrecognized
command" or "command [bad-line-here] has trailing garbage".

GNU sed, HHsed, sedmod, and HP-UX sed permit comments to appear on
any line of the script, except after labels and branching commands
(b,t), provided that a semicolon (;) occurs after the command
itself. This syntax makes sed similar to awk and perl, which use a
similar commenting structure in their scripts. Thus,

# GNU style sed script
$!N; # except for last line, get next line
s/^\([0-9]\{5\}\).*\n\1.*//; # if first 5 digits of each line
# match, delete BOTH lines.
t skip
P; # print 1st line only if no match
:skip
D; # delete 1st line of pattern space and loop
#---end of script---

is a valid script for GNU-based versions of sed, but is
unrecognized for most other versions of sed.

Finally, if the first two characters in a disk file script are
"#n", the output is suppressed, exactly as if -n were entered on
the command line. This is true for the following versions of sed:

ssed v3.57 and above

gsed

HHsed v1.5

sed v1.6

This syntax is not recognized by these versions of sed:

ssed v3.45 to v3.50 (other versions untested)

sedmod v1.0

6.7.3. Special syntax in REs

A. HHsed v1.5 (by Howard Helman)

The following expressions can be used for /RE/ addresses or in the
LHS side of a substitution:

+ - 1 or more occurrences of previous RE: same as \{1,\}
\< - boundary between nonword and word character
\> - boundary between word and nonword character

The following expressions can be used for /RE/ addresses or on
either side of a substitution:

The following expressions can be used for /RE/ addresses in the LHS
of a substitution:

+ - 1 or more occurrences of previous RE: same as \{1,\}
\a - any alphanumeric: same as [a-zA-Z0-9]
\A - 1 or more alphas: same as \a+
\d - any digit: same as [0-9]
\D - 1 or more digits: same as \d+
\h - any hex digit: same as [0-9a-fA-F]
\H - 1 or more hexdigits: same as \h+
\l - any letter: same as [A-Za-z]
\L - 1 or more letters: same as \l+
\n - newline (read as 2 bytes, 0D 0A or ^M^J, in DOS)
\s - any whitespace character: space, tab, or vertical tab
\S - 1 or more whitespace chars: same as \s+
\t - tab (ASCII 09, 0x09)
\< - boundary between nonword and word character
\> - boundary between word and nonword character

The following expressions can be used in the RHS of a substitution.
"Elements" refer to \1 .. \9, &, $0, or $1 .. $9:

When used with the -x (extended) switch on the command line, or
when '#x' occurs as the first line of a script, Whaley's gsed103
supports the following expressions in both the LHS and RHS of a
substitution:

In normal mode, with or without the -x switch, the following escape
sequences are also supported in regex addressing or in the LHS of a
substitution:

\` matches beginning of pattern space: same as /^/
\' matches end of pattern space: same as /$/
\B boundary between 2 word or 2 nonword characters
\w any nonword character [*BUG!* should be a word char]
\W any nonword character: same as /[^A-Za-z0-9]/
\< boundary between nonword and word char
\> boundary between word and nonword char

F. GNU sed v2.05 and higher versions

The following expressions can be used for /RE/ addresses or in the
LHS side of a substitution:

\` - matches the beginning of the pattern space (same as "^")
\' - matches the end of the pattern space (same as "$")
\? - 0 or 1 occurrence of previous character: same as \{0,1\}
\+ - 1 or more occurrences of previous character: same as \{1,\}
\| - matches the string on either side, e.g., foo\|bar
\b - boundary between word and nonword chars (reversible)
\B - boundary between 2 word or between 2 nonword chars
\n - embedded newline (usable after N, G, or similar commands)
\w - any word character: [A-Za-z0-9_]
\W - any nonword char: [^A-Za-z0-9_]
\< - boundary between nonword and word character
\> - boundary between word and nonword character

In addition, GNU sed 4.0 can modify the way ^ and $ are interpreted,
so that ^ can also match an empty string after a newline character,
and $ can also match an empty string before a newline character (to
do this, add an "M" after the regular expression terminator, like
/^>/M -- see section 3.1.1). Even if you use this feature, \` and \'
still match the beginning and the end of the pattern space,
respectively.

H. ssed

Everything that was said for GNU sed applies to ssed as well. In
addition, in Perl-mode (-R switch), these become active or inactive:

foo(?=bar) - match "foo" only if "bar" follows it
foo(?!bar) - match "foo" only if "bar" does NOT follow it
(?<=foo)bar - match "bar" only if "foo" precedes it
(?<!foo)bar - match "bar" only if "foo" does NOT precede it
(?<!in|on|at)foo
- match "foo" only if NOT preceded by "in", "on" or "at"
(?<=\d{3})(?<!999)foo
- match "foo" only if preceded by 3 digits other than "999"

In Perl mode, there are two new switches in /addressing/ or s///
commands. Switches may be lowercase in s/// commands, but must be
uppercase in /addressing/:

6.7.4. Word boundaries

GNU sed, ssed, sed16, sed15 and sedmod use certain symbols to define
the boundary between a "word character" and a nonword character. A
word character fits the regex "[A-Za-z0-9_]". Note: a word character
includes the underscore "_" but not the hyphen, probably because the
underscore is permissible as a label in sed and in other scripting
languages. (In gsed103, a word character did NOT include the
underscore; it included alphanumerics only.)

These symbols include '\<' and '\>' (gsed, ssed, sed15, sed16,
sedmod) and '\b' and '\B' (gsed only). Note that the boundary
symbols do not represent a character, but a position on the line.
Word boundaries are used with literal characters or character sets
to let you match (and delete or alter) whole words without
affecting the spaces or punctuation marks outside of those words.
They can only be used in a "/pattern/" address or in the LHS of a
's/LHS/RHS/' command. The following table shows how these symbols
may be used in HHsed and GNU sed. Sedmod matches the syntax of
HHsed.

In ssed, the symbols '\<' and '\>' lose their special meaning when
the -R switch is used to invoke Perl-style expressions. However,
the identical meaning of '\<' and '\>' can be obtained through
these nonmatching, zero-width assertions:

(?<!\w)(?=\w) and (?<=\w)(?!\w)

6.7.5. Commands which operate differently

A. GNU sed version 3.02 and 3.02.80

The N command no longer discards the contents of the pattern space
upon reaching the end of file. This is not a bug, it's a feature.
However, it breaks certain scripts which relied on the older
behavior of N.

'N' adds the Next line to the pattern space, enabling multiple
lines to be stored and acted upon. Upon reaching the last line of
the file, if the N command was issued again, the contents of the
pattern space would be silently deleted and the script would abort
(this has been the traditional behavior). For this reason, sed
users generally wrote:

$!N; # to add the Next line to every line but the last one.

However, certain sed scripts relied on this behavior, such as the
script to delete trailing blank lines at the end of a file (see
script #12 in section 3.2, "Common one-line sed scripts", above).
Also, classic textbooks such as Dale Dougherty and Arnold Robbins'
sed & awk documented the older behavior.

The GNU sed maintainer felt that despite the portability problems
this would cause, changing the N command to print (rather than
delete) the pattern space was more consistent with one's intuitions
about how a command to "append the Next line" ought to behave.
Another fact favoring the change was that "{N;command;}" will
delete the last line if the file has an odd number of lines, but
print the last line if the file has an even number of lines.

To convert scripts which used the former behavior of N (deleting
the pattern space upon reaching the EOF) to scripts compatible with
all versions of sed, change a lone "N;" to "$d;N;".