DrDialog, or: How I learned to stop worrying and love REXX - Part 11

Welcome back to our series on programming with REXX and DrDialog. I had to take
a break in order to get other things done that piled up behind me. Sorry for making
you wait but finally, here we are back. As the last
article dealt with loops, there is one addendum I would like to make to that
subject:

Make sure to never mess with the loop (or counter) variable manually!
While this sometimes is used in other languages to invoke a direct exit out of the
loop, such "tricks" should be avoided as it could lead to unpredictable
behavior in some circumstances. Rather use the EXIT statement provided by REXX or
think about using a different structure for your loops.

Today we'll talk about REXX's wealth of functions for working with strings. We
won't be complete on that subject because some of the functions are that specific
that you might hardly ever need them. As you go on writing your own stuff, you'll
find some of the functions will always be part of your code while others won't be.
It largely depends upon your approach to solve a specific problem. In order to provide
a structured overview, I came up with grouping them by functions for...

obtaining information about strings

substrings and search

creating or transforming strings

At the end of the article, you might wonder what happened to the PARSE keyword/function.
Well, PARSE is worth an article itself I guess. This is one of the most
powerful parts of REXX - both in matters of functionality as well as complexity.
We'll have one article dealing with the basic use of PARSE at a later moment.
There is much more to PARSE than what we will be dealing with in our series
of course, but as this series is intended to address beginners in REXX as well,
I don't think it would be such a great idea to confuse you by going into details.
Sure, knowing all of PARSEs behavior might provide you with powerful means
to solve your programming issues, but most of the time you'll only have to deal
with a basic subset of it and this is what we'll be dealing with as well.

Obtaining information about strings

When dealing with strings, it's very useful to know something about them before
messing around.

LENGTH will tell you the amount of bytes (or characters) that
a string contains. Note that this also includes leading and trailing blanks:

/* length sample */text = " I'm 97 years old. "say length(text)

If run, this script would print 20.

VERIFY is used to check whether a string contains specific characters
or not. To accomplish this, you need to specify the string to be checked, a second
string which holds the "comparison characters" and additional options.
For example, you could check if a string is a valid phone number - that's to say
it shall only contain the digits 0 through 9, blanks, dash and plus sign (for international
dial prefix). The comparison string thus would look like

"0123456789 -+"

Now VERIFY can be used to either check if the string ONLY contains one
of these characters or whether it contains NONE of these.Actually, VERIFY
will return the position of the first character in the test string which does or
doesn't match with any of the characters in the comparison string. A little confusing
at first, hm? Here we go:

In the above example, if one would enter +44-123 456 / 789 VERIFY would
actually return 13 because in the test string, character number 13 ("/")
is not part of the comparison string. Thus, VERIFY used with the NOMATCH
parameter will tell which character does NOT MATCH the comparison string characters.
If a 0 (zero) is returned, this means that there is no character that doesn't match,
thus the number is "okay" we might say.
If "MATCH" is used instead, VERIFY well tell you the position
of the first character in the test string which MATCHES with those in the comparison
string. It depends on what's easier to code but most of the time, you might prefer
the NOMATCH parameter for it's easier to read and understand program logic.
Some notes about VERIFY: The full syntax is:

Only the first letter of the "MATCH" or "NOMATCH" parameter
is required and can be either upper or lower case. And there's an additional parameter
of START which tells the position in the test string from where to check comparison.
By default, comparison starts with the first character in test string but depending
on what you might construct as test string, you might want to skip comparison of
a certain number of leading characters. If your program uses a specific concatenation
of strings for you address book for example, this might result in something like
"Doe,John/555-6780". In order to test if the number is valid, you should
tell VERIFY to start at position 10 by coding

if VERIFY(phone, matchstring, "NOMATCH",10) = 0 then

or by using the abbreviated version

if VERIFY(phone, matchstring, "N", 10) = 0 then

As "Nomatch" is the default for the comparison type, you could even
just omit it. In this case, if you want to supply the optional parameter of START,
you'll still have to use the additional comma for the parser to understand
that you actually omitted the comparison type parameter:

if VERIFY(phone, matchstring, ,10) = 0 then

In case you just want to check right from the start using "Nomatch",
you could omit the whole rest and type

if VERIFY(phone, matchstring) = 0 then

For the example of John Doe you might wonder how to tell the position for START
if the name changes. Good point. We'll use another function that'll be discussed
in a few moments. But for now, let's have a look at the last informational string
function.

WORDS is a very useful function. The whole concept of "words"
in strings feels like heaven to you if you're used to program in BASIC for example.
A WORD is a subpart of a string which is either enclosed in spaces or the begin/end
of the entire string respectively. Amongst those WORD-functions, WORDS
is quite simple: It returns the number of words found in a string:

/* words example */text = "This is a words() sample. "say words(text)

WORDS would recognize the following substrings: Thisisawords()sample.
Thus, WORDS in the above example would return a value of 5.
In order to explain what WORDS are about let me put it this way: If YOU
look at a string, there's words that you recognize, right? The WORDS() function
quite exactly works the same way. As long as there is at least one space between
strings, they will be recognized as two words. It doesn't matter if there's - let's
say - twenty spaces between them. It's still two words. Exceptions are that you
might not recognize a single full stop as a word but WORDS() does if the
dot is separated from the rest by spaces like in
"There were 57 channels and nothing on . "
(note that separated "." at the end). WORDS() would recognize
8 words.

Substrings and search functions

Among string functions, these are the ones used most - at least for me. Let's
start with a very basic one:POS is used to find the starting position of a string within another
string. IBM uses the "needle and haystack" method to explain the syntax
- that's a quite good way of memorize the syntax scheme:

result = POS( <needle>, <haystack> [, START])

POS searches the <haystack> string for the first occurrence of <needle>.
It either returns the starting position (the character number, starting with 1 for
the first character) or ZERO if the <needle> wasn't found in the <haystack>.
Optionally, you can tell POS not to start its search from the first character
of <haystack> but from a different position. This is useful for identifying
special substrings - although the WORD-functions described later do a much better
job here.
As an example, imagine that you have a string named "record" containing
contact data such as

"firstname=Peter, lastname=Jones, phone=555-12345"

and you want to retrieve the phone number. Assuming that 'phone number' is always
the last entry of the contact data string, you would go like this:

phonestart = POS("phone=", record)

If phonestart contains something else than zero, it means that the string was
found. Next, you skip 6 characters (the length of 'phone=') and you know the starting
position of the actual number. Next, you would determine the length of the string
according to the entire length of 'record' in order to retrieve the number from
the string. But this requires an additional functions (substr) described later.
Personally, I use this function most of the time to check whether a string actually
is contained within another string or not - regardless of where it actually
is like in:

LASTPOS does quite the same, except that it searches the <haystack>
backwards. It uses the same options (START) and the same return value. LASTPOS
is the convenient way to make sure you find the LAST occurrence of <needle>
within the haystack. Of course, you could achieve the same by using a loop of POS
calls that subsequently START by the last found position, but hey: Why worry?
Personally I use LASTPOS mostly when dealing with file names that include
drive and path information (so-called "fully qualified file names"). Once
I know the position of the last "backslash" character, I know that everything
else "behind" it must be the actual file name and - vice versa - the preceding
part is drive and path. Yes, I could use the FILESPEC() function as well,
but depending on the program needs, sometimes you might need to refer to such data...

The WORD-functions (word, wordpos, wordindex, wordlength and subword)
are extremely useful when dealing with parts of strings that are separated by one
ore more blanks. If you ever tried to identify such parts "by hand" like
in vintage BASIC dialects or other programming languages that lack such functions,
you might agree that REXX feels like "programmer's heaven" ;)
As an example for the following set of functions, let's assume that you have a string
named "input" containing an unknown amount of parts (or "words")
separated by an unknown amount of blanks... for example "Mary has
5 little lambs."

WORDS (as already discussed above) will tell you the amount of
"parts" (or "words") that are contained in the string.

SAY WORDS(input)

would display 5WORD is used to retrieve a single word from a string and "cleaning"
it by removing both leading and trailing blanks. In order to achieve this, you must
tell WORD which word to retrieve by specifying a "word number"
(1 for the first, 2 for the second and so on...). Thus

SAY WORD(input, 2)

would display has.

WORDPOS works just like POS described above - except,
that it doesn't deal with character positions but words: It searches <haystack>
for the first occurrence of <needle> and returns the number of the word that
matches <needle>. Just like POS, an optional parameter can be used
to make WORDPOS start from a "later" position than the first
word. Again, in WORDPOS this refers to a word number.
The syntax is

result = WORDPOS(<needle>, <haystack> [ ,START ] )

Note that <needle> and <haystack> must match exactly for WORDPOS
to function correctly - that means, the case of characters must match as well.

SAY WORDPOS("HAS", input)

would give you 0
because "HAS" is not equal to "has", while

SAY WORDPOS("has", input)

would result in2

Another fact worth mentioning is that you can use more than one "word"
for the <needle>. In this case, WORDPOS treats the <needle>
contents the same way as all WORD-functions treat the <haystack>: The contents
are internally parsed into words. Thus

SAY WORDPOS("has 5", input)

will display 2
as well, although 'input' contains "Mary has 5
little lambs." (which shows 4 spaces between 'has' and '5') while the '<needle>'
uses only 1 space. By internally parsing both needle and haystack into separate
words, the match applies...

WORDINDEX is used to get the starting position of a certain word within
the entire string - that's to say including all leading characters, even if they
are blanks.

SAY WORDINDEX(input, 2)

would display 7

The SUBSTR function returns you a part of a string, specified
by starting character number and length.

SAY SUBSTR(input, 3, 17)

for example will return you the part of 'input' that starts with the 3rd character
and is 17 characters long - which results in ry has
5 lit
being displayed. In case that you're familiar with BASICs "MID$"-function,
note that SUBSTR cannot be used to set/change subparts of a string, but
only to retrieve them. Optionally, SUBSTR can be told to fill up "non-existent
parts" of the substring to retrieve with a specified character. "Non-existent"
in this case refers to a substring that is longer than the actual string. Example?
If your programs retrieves the characters number 3,4 and 5 of a string and you accidentally
pass it a string of 3 characters only, you won't get an error message. Instead,
you will only receive character number 3 along with two spaces - because by default
(if no explicit padding character was specified) blanks are used. If you use the
optional "padding" character, you'll get character number 3 and two padding
characters returned:

SAY SUBSTR(input, 25, 10)

would display"ambs.
"
(without the quotes - they're only used by me to show the trailing blanks)

SAY SUBSTR(input, 25, 10, "-")

would display"ambs.-----"
A handy feature of SUBSTR is that if you don't specify a length operand,
it'll return you the entire rest of the string starting from the specified position:

SAY SUBSTR(input, 17)

would give youlittle lambs.

The two string functions that I use in almost each program are left and
right. They're used to retrieve a substring in a given length of characters
from another string. This can be achieved by either starting from the right or the
left boundary of the string - according to how the function is called respectively:

SAY LEFT(input, 7)

will display "Mary h"whereas

RIGHT(input, 7)

will display " lambs."respectively
- both (again) without the quotes of course.
Just like SUBSTR, both left and right will use spaces
for padding of non-existent parts (beyond the start/end of the string) if you don't
explicitly specify another character for padding like in

SAY LEFT("abcdefg",10,"-")

This would display abcdefg---

SUBWORD acts in a similar way to SUBSTR. Besides the
fact that it deals with words instead of characters, there are quite some more differences
though: There is no padding for "exceeded" parts like in SUBSTR.
Remember that input contains 5 "words". If we try to retrieve words 4,
5 and 6 from input by

SAY SUBWORD(input, 4, 3)

it would simply give us"little lambs."
(again, without the quotes - I just use them here to show that the returned value
does not contain trailing blanks...)
Another fact worth mentioning is, that SUBWORD returns the separation blanks
exactly the way they're contained in the original string - that's to say, there
is no internal parse that removes additional separators:

SAY SUBWORD(input, 1, 3)

thus will display"Mary has
5"
Just like with SUBSTR the entire rest of the string is passed if no length
(amount of words) was specified.

WORDLENGTH finally tells you how much characters a word in a
string is made up of:

SAY WORDLENGTH(input, 3)

would display1

Creating or transforming strings

Besides creating strings from subparts of other strings, there are of course more
ways to do so. Considering string "transformation" I must admit
that we actually don't really "transform" strings but rather create new
ones from existing ones. Sometimes, we might re-assign them directly back to the
source string variable like in

mystring = left(mystring, 5)

but basically we don't transform a string. But this is not so important right now
- let's conclude the article.

COPIES creates a string by concatenating multiple copies of a
specified string:

SAY COPIES("bla", 3)

for example would displayblablabla
Great. COPIES is quite useful for example when you might want to do separator
lines in VIO mode that have to be of a specific length:

SAY COPIES("-", 18)

will give you------------------

XRANGE is useful e.g. for being prepared to deal with character
translation. As you might now, each character has an "index number" within
the character table. We call that "ASCII table". XRANGE makes
use of these numbers and creates a string that consists of a consecutive row of
characters (according to the table sequence) by taking into account both start and
end characters:

myalphabet = XRANGE("a", "z")
SAY myalphabet

will displayabcdefghijklmnopqrstuvwxyz
Note that the ASCII table contains 256 entries (from #0 to #255). If you want to
display the whole table, you'll have to use hex notation because both #0 and #255
contain non-printable (thus non-"enterable") characters:

SAY XRANGE("00"x, "FF"x)

will display the entire ASCII table contents (as far as the entries are printable
characters...).
Note as well, that if the end value is smaller than the start value (according to
the table sequence), XRANGE will start with the start value, display every
entry to 255, then restart with 0 and display every entry up to the end value. Thus,
you won't get the "reverse" range but rather something you did not expect.

We already know STRIP from a previous example: It removes leading
and/or trailing characters from a string. Or, like I said above, it rather creates
a new string that was removed those characters. By default, it removes spaces but
can be used for other characters as well. Optionally, you can also specify what
type (leading, trailing or both) to remove. The default is both.

SAY STRIP(" Mary. ")

will return (display) Mary.
This is because none of the optional parameters was specified which defaults to
"remove leading and trailing spaces".

SAY STRIP("0012.850", "L", "0")

will give you 12.850 while

SAY STRIP("0012.850", , "0")

will display 12.85
Note that again we need to specify the comma in order to make sure that REXX'S parser
understands that we actually omitted the first optional parameter and that "0"
is the character we want to remove. Writing

SAY STRIP("0012.850", "0")

would result in an error, because "0" will be interpreted to be the leading/trailing
parameter - which is not valid, but only "L", "T" or "B".

INSERT appears to be quite complex at first sight. The full syntax
diagram is

result = INSERT ( <what> , <into> [, START ] [, LENGTH ] [, PAD
] )

Basically, it inserts a string into another string by using a specified character
position:

SAY INSERT("123", "abcde", 3)

will display abc123de
Note that the START parameter defaults to ZERO which means, that <what> will
be put "in front" of <into>:

SAY INSERT("123, "abcde")

will display 123abcde
As long as you're happy with the defaults, there's nothing to take care about. If
you wish to have some more features, you'll need to know what LENGTH and PAD will
do to the functions behavior... LENGTH is used to fill up the <what>-string
to a given length before inserting it. By default, spaces will be used for filling
(or "padding")...

SAY INSERT("123", "abcde", 3, 5)

will display abc123 de
However, if you specified a padding character in PAD, the <what> string will
be filled with that character instead of spaces like in:

SAY INSERT(123, "abcde", 3, 5, "#")

will display abc123##de

DELSTR removes a substring from another string. It uses quite
the same parameters like SUBSTR - the starting character position and length:

SAY DELSTR("abcde", 3, 2)

would display abe
Again, just like SUBSTR, a missing length operand equals "all
the rest":

SAY DELSTR("abcde", 3)

would thus display ab

DELWORD is the equivalent counterpart to DELSTR when
dealing with words. We'll return to Mary and her lambs to show how it works:

SAY DELWORD(input, 2, 2)

will remove two words from input, starting with word number two: Mary
little lambs.DELWORD does not internally parse the contents - this means, that additional
spaces between the cutting edge words will not be removed, but exactly only the
words with their limiting spaces plus all spaces "behind" the last word
to remove. To show this in detail:

SAY DELWORD("abc def ghi jkl", 2, 2)

will result in abc jkl
Why is there a 3-blanks space between the words? Because "def ghi"
are the two words to remove. Plus the three trailing blanks of "ghi" -
thus, it's "def ghi " that's cut from the string.
The remaining parts are "abc " in front of it and "jkl"
behind it. Glue them together and you'll have "abc jkl".

CENTER is some kind of special "flavor" of INSERT.
It'll center a string within a new empty string of a given length and can be told
to fill up the boundary parts with a special character. This is great for doing
headlines in VIO mode for example:

SAY CENTER("Mary's lambs", "20", "-")

will display ----Mary's lambs----
Funny thing. Here's a little program to show you some possible use of COPIES
and CENTER:

Great, huh?
You might wonder what happens if you tell it to center a string into something smaller
than the string itself right? Naah, no errors - just truncations:

SAY CENTER("This is not funny!", 7)

will display is not
Whenever even and odd numbers of characters are involved in center (thus, a "balanced"
centering with equal boundaries is not feasible), the right boundary will be added
or removed a character in order to make the string fit to the length specified.
Another fact mentioning is that this function can equally be called using "center"
as well as "centre". Now, this is what I call "IBM quality".

REVERSE is nothing tremendously abstract: It simply gives you
the reverse notation of the string passed. If you ever wanted to know what your
first name is looking "the other way round", give it a try with:

SAY REVERSE("thomas")

You must replace 'thomas' with your first name in order to make the program function
correctly. ;) Except, of course, your first name is Thomas too.

SPACE is great when dealing with words. It can be used to make
words spaced with the same amount of characters. Did you ever try to first get the
"words" out of a string, then put them together, separated by one space
each? This can be done with a single command in REXX:

SAY SPACE(input)

will display Mary has 5 little lambs.
Of course, we did again let the defaults save us work. Actually, we would have to
write

SAY SPACE(input, 1, " ")

to make it understand that we want the words to be separated by 1 blank each. Why
not separate them by two underscores each:

SAY SPACE(input, 2, "_")

would display Mary__has__5__little__lambs.
As you might have understood already, SPACE uses internal parsing of course
to "get the words right". This is a great function for "normalizing"
user input or data from other programs if you need it to be in a special way...
note that if you use 0 as the amount of separation, all blanks will be removed from
the string:

SAY SPACE(input, 0)

will thus display Maryhas5littlelambs.

I must admit that until today, I didn't ever mess with OVERLAY.
After looking into what it's used for... well, I might mess with it in the future.
What overlay actually does is working like INSERT (it even uses the same
syntax and parameter list) except that - let me put it that way - it uses "overwrite
mode" instead of "insert mode" while typing its text... know what
I mean?

SAY OVERLAY("=XYZ=", "01234567890", 4)

will display 012=XYZ=890
If you make use of the optional parameters then let's look at the syntax scheme
first:

result = OVERLAY ( <what> , <into> [, START ] [, LENGTH ] [, PAD
] )

If you specify a length parameter, <what> will be padded with PAD characters
to the specified length. The default for PAD is blanks. Thus,

SAY OVERLAY("=XYZ=", "01234567890", 4, 6)

will display 012=XYZ= 90
The default value for START is 1, which means that <into> will be overwritten
right from the first character.

Finally, TRANSLATE is another cool function for messing with
strings. With TRANSLATE, you set up two tables of characters which are
used to replace the characters of a string. For each character in the string, TRANSLATE
looks it up in the "input" table, then replaces it with the corresponding
character in the "output" table. Both tables are just strings with characters,
where the correspondence is derived from the character position within the string.
That's to say that character #1 in the input table corresponds to character #1 in
the output table and so on...
The syntax scheme looks like this:

The PAD character will be used to fill up the <output-table> if it's size
is smaller than the one of <input-table>. If no PAD is specified, blanks are
used by default. This is to ensure that there is a match in <output-table>
for each entry of <input-table>. You might wonder what happens if all optional
parameters are omitted. What happens to <input> then? Quite simply: It will
be translated to upper case only:

SAY TRANSLATE("hello")

thus displays HELLO
Characters of <input> which are not found in <input-table> will be left
as they are. This is a good way of getting rid of special characters: Simply replace
them by spaces, then remove all spaces out of the string using SPACE. For
example you might want to remove all vocals out of a string:

outstring = TRANSLATE("hello", copies(" ", 5), "aeiou")

This will set up an <input-table> which contains all vocals and an <output-table>
which contains five spaces. Thus, each vocal found will be replaced with a space.
This will give us "h ll " as output-string. Next, we'll use the SPACE
function with a separation amount of zero:

SAY SPACE(outstring, 0)

which would display hll
Or to put it in one line of code:

SAY SPACE(TRANSLATE("hello", copies(" ", 5), "aeiou"),
0)

This might not be the perfect example. Just imagine that you have a multi-line text
entry field and want to count the lines in it. You only need to replace everything
with spaces except for "0D"x (which is "LF", "line feed"),
then strip all spaces off by SPACE() and get the length of the remaining
string. You're done. This method can even be used for counting lines in a text file,
once you read the entire file into a variable. I didn't believe how d**n fast this
is compared to what I coded so far... until I tried on my own. Want to give it a
try? Make sure your text files are not too large (I tried with up to approx. 52K).