This chapter shows interesting Tcl commands to operate on strings
for basic string manipulation, string matching, regular expressions,
conversion of strings to list and vice versa. The set of string related
commands on Tcl is large as you can guess, being the string particuarly
important for the semantic of the language itself, and not just
a data type among the others. Fortunately this is one of the
part of the languages better organized, so that many commands are
not hard to remember.

5.1 The append command

The append command is very similar to lappend but instead to
append elements to a list, it appends strings to a string.
The command's structure is:

append varName ?value value ...?

Every argument following varName is appended to the current content
of the varName variable, and the new content of the variable
returned. Example:

The append command is very efficient, It's faster to write "append a $b"
then "set a $a$b", but both solutions work. Still it's a bit habit to
consider the speed issues when programming with Very High Level Programming
Languages such Tcl, because they are not as fast as lower level languages
like C.

5.2 The string command

Instead to have different commands to perform different string operations
Tcl uses a single string manipulation command called string,
that takes as first argument the operation to do. The rest of the
arguments have different meaning in relation to the operation to perform.
In Tcl slang different operations are called subcommands.

For instance to get the length of a string, the first argument to
provide to the string command is length, that's the name of
the operation to do, or the subcommand if you prefer. The other
argument is the string itself.

% string length "Tcl is a string processor"
25
%

The number 25 is of course the number of characters that are inside
the string "Tcl is a string processor". It's important to know that
Tcl strings are binary safe, so every kind of character can be
inside a string, including the byte with value zero:

% string length "ab\000xy"
5

It's better to understand this concept now because in Tcl programming
you will not use string only when you need to read a text file, but
for general programming when binary data is involved too.

The string command have many other subcommands, we will show a subset
including the more interesting in this chapter.

5.3 string range

The range subcommand is used to extract parts of a string. The way
it works is very similar to the lrange command. Indexes can also be
in the form of end-<index>. The formal command structure is:

The index subcommand just extracts a single character from the
whole string.

string index stringindex

Example:

% string index "foobar" 3
b
% string index "foobar" end
r

As a more interesting real-world application of the string index
command is the following procedure that inverts the
order of the characters in a string, transforming for example
"Tcl" in "lcT". Because the final string is reversed the procedure
is called stringReverse.

The source command tells Tcl to execute the content of the
specified file as it was typed in place of it. So after
the "source stringReverse.tcl" call, the procedure stringReverse
is defined and can be called.

5.5 string equal

An operation that occurs very frequently is to compare two strings.
String equal does it searching for an exact match, that's, the
strings must match character by character to be considered the same
for the command. The return value is 1 if the two strings passed
as value are the same, otherwise 0 is returned:

"tcl" and "TCL" are not the same for string equal. If you want to
compare in a case insensitive way, there is a -nocase option
to change the behaviour and consider characters of different case
the same:

% string equal -nocase tcl TCL
1

Another interesting option is -length num, that limits the comparison
to the first num characters:

This subcommand is very similar to equal, but instead to return
true or false if the strings are the same or not, the command
will return:

-1 if the first string is < than the second
0 if the first string is the same as the second
1 if the first string is > than the second

This gives more information compared to string equal that may be useful
for sorting or other tasks.

5.7 string match

When there is the need for more powerful string matching capabilties,
string match can be used in place of string equal, because
instead to compare two strings, the command compares a string against
a pattern.

String match supports patterns composed of normal characters, and
the following special sequences:

* Matches any sequence of characters. Even an empty string.
? Matches any single character.
[chars] Matches the set of characeters specified. It's possible
to specify a squence in the x-y form, like [a-z], that
will match every character from a to z.
\x Matches exactly x without to interpret it in a special way.
This is used in order to match *, ?, [, ], \, as single
characters.

This is some example of pattern, and what it may match, in order to
make it simpler to understand how it works:

Note that pattern containing the [x-y] form must be grouped using
braces, or quoted using \, to prevent that Tcl try to substitute it
as a command.

The last pattern in the example shows how it's possible to
match everthing is at least N chars in length using N question marks
followed by an asterisk. "???*" will match at least 3 chars, and so on.
Tcl supports more advanced pattern matching using
regular expressions, still string match is very interesting because
in most cases it's enough to express in a simpler way a pattern,
and works much faster than regular expressions commands.

5.8 string map

String map is a powerful tool able to substitute occurrences of
strings with other strings. The substitution is driven by a key-value pairs
list. For example the list {foo bar x {} y yy} will replace
every occurence of "foo" with "bar", will remove every occurrence of
"x", and will duplicate every occurrence of "y".
The command structure is the following:

string match ?-nocase? patternstring

Substitutions are done in an ordered way: starting from the first character
of the original string, every key in the key-value pairs list is searched.
If there is no match, the character is appended to the result that
will be returned, and the process continues from the next character.
If instead there is a match, the value relative to the matching key
is appended to the result, and the process continues from the character
just after the matching key.

The above description may appear pedanting and complex, actually it's
not hard at all to understand how string map works. It turns
every occurence of a key in the key-value pair to the occurrence of the
coresponding value. Once the programmer will get comfortable with
string map, he will probably want know the details of the substitution
process, so the above text will be more useful later when you will be
a more experieced Tcl programmer.

Similarly to many other string subcommands, map can take a
-nocase option in order to turn the matching process case insensitive.

5.9 string is

String is tests if a string is a member of a given class, like
integers, alphanumeric characters, spaces, and so on.
The structure of the command is:

string is class ?-strict? ?-failindex varname? string

For default the command returns 1 for empty strings, so
the -strict option is used to invert the behaviour and
return 0 on empty strings (i.e. to don't consider the empty
string a member of the given class).

The class can be one of the following:

alnum alphabet or digit character
alpha alphabet character
ascii every character in the 7-bit ASCII range
boolean any form allowed for Tcl booleans (0, 1, yes, no, ...)
control a control character
digit a digit character
double a valid Tcl double precision number
false any form allowed for Tcl boolean with false value
graph a printing character, except space
integer any valid form of 32-bit integers
lower a lovercase alphabet character
print a printing character including space
punct punctuation character
space any space character
true any form allowed for Tcl boolean with true value
upper an uppercase lphabet character
wordchar any word character. alphanumeric, puntuation, underscore
xdigit an hexadecimal digit

As you can see some classes are oriented to a single character
(like alnum), and some are useful for strings, (like integer).
If strings composed of more then a single character are
tested against classes oriented to characters, every element
of the string must belong to the class for the command to return 1.
Some example:

If the -failidnex option followed by the name of a variable is used,
the command will store the index of the first character that failed
the test in the variable.

5.10 More string subcommands

There are a big number of string subcommands that we don't cover.
The reader may like to look at the string man page to check what's
available: it's very important to know what can be done with the
built-in Tcl functionality to avoid to reimplement a feature already
available.

5.11 Advanced string matching

Tcl string matching capabilities include two powerful commands,
[regexp] and [regsub], to exploit egrep-like regular expressions
facilities. This commands will be explored in chapter FIXME
of this book.