Steven Black

Steven Black

Text and String Handling in VFP

By Steven Black

Introduction

This article serves to introduce, illustrate, and explore some of the great ( and not so great ) string handling capabilities of Visual FoxPro.

I always seem to be involved with solving many text-data related problems in my VFP projects. On the surface, handling text isnt very sexy and seemingly not very interesting. I think otherwise, and I hope youll agree.

This document is split into three sections: Inbound is about getting text into the VFP environment so you can work with it. Processing is about manipulating the text, and Outbound is about sending text on its way when youre done.

To illustrate text handling in VFP, I am using the complete text of Tolstoys War And Peace, included on the conference CD as WarAndPeace.TXT, which along with thousands of works of literature, are available on the web, including here among others.

This article was originally written using Visual FoxPro version 6, and has since been updated for VFP 7 and VFP 8.

Some facts about VFP strings

Here are a few things you need to know about VFP strings:

In functional terms, there is no difference between a character field and a memo field. All functions that work on characters also work on memos.

The maximum number of characters that VFP can handle in a string is 16, 777, 184.

Inbound

This section is all about getting text into your processing environment.

Inbound text from table fields

To retrieve text from a table field, simply assign it to a memory variable.

Inbound from text files

There are many ways to retrieve text from files on disk.

FILETOSTR( cFileName ) is used to place the contents of a disk file into a string memory variable. This is among my favorite new functions in VFP 6. Its both useful and fast. For example, the following code executes in one-seventh of a second on my 220Mhz Pentium laptop.

In other words, on a very modest laptop ( by todays standards ) VFP can load the full text from Tolstoys War And Peace in one-seventh of a second.

Low Level File Functions ( LLFF ) are somewhat more cumbersome but offer great control. LLFF are also very fast. The following example reads the entire contents of Tolstoys War And Peace from disk into memory:

Given the similar execution times, I think we can conclude that internally, LLFF and FILETOSTR() are implemented similarly. However with the LLFF we also have fine control. For example, FGETS() allows us to read a line at a time. To illustrate, the following code reads the first 15 lines of War And Peace into array wpLines.

Inbound from text files, with pre-processing

Sometimes you need to pre-process text before it is usable. For example, you may have an HTML file from which you need to clean and remove tags. Or maybe you have the problem exhibited by our copy of War and Peace, which has embedded hard-returns at the end of each line. How can we create a streaming document that we can actually format?

Often the answer is to use the APPEND FROM command, which imports from file into a table, and moreover supports a large variety of file formats. The strategy always works something like this: You create a single-field table, and you use APPEND FROM ... TYPE SDF to load it

The AT()and ATC()functions are also great for determining if a sub-string exists, the former having the advantage of being case insensitive and, moreover, their return values gives you an exact position of the sub-string.

The OCCURS() function will also tell you if a sub-string exists, and moreover tell you how many times the sub-string occurs. This code will count the number of occurrences of a variety of sub-strings in War And Peace.

Locating sub-strings

One of the basic tasks in almost any string manipulation is locating sub strings within larger strings. Four useful functions for this are AT(), RAT(), ATC(), and RATC(). These locate the ordinal position of sub-strings locating from the left ( AT() ), from the right ( RAT() ), both of which have case-insensitive variants ( ATC(), and RATC() ). All these functions are very fast and scale well with file size. For example, lets go look for THE END in War And Peace.

Thats pathetic performance. 20+ seconds to iterate through 767 lines! Fortunately, theres a trick to using MLINE(), which is to pass the _MLINE system memory variable as the third parameter. Like this.

Another twenty-fold improvement in speed. I think the lesion is clear: If you are using MLINE() in your applications, and you are using VFP 6, then its time to switch to ALINES(). There are just two major differences: First, ALINES() is limited by VFPs 65, 000 array element limit, and second, successive lines with only CHR( 13 ) carriage returns are considered as one line. For example:

Excuse me, but wow, considering were creating a 54, 337 element array from a file on disk, then were traversing the entire array assigning each elements contents to a memory variable, and were back in 3.4 seconds.

So, on my Pentium 233 laptop using VFP 6, we can load War and Peace from disk into a 54, 000-item array in 2.2 seconds. On my newer desktop machine, a Pentium 500, this task is subsecond.

Traversing text word-by-word

You could recursively traverse a string word-by-word by using, among other things, the return value from AT( , x, n )and SUBS( , , ) and, if you are doing that, youre missing a great and little known feature of VFP.

Two new functions are great for word-by-word text processing. The GETWORDCOUNT() and GETWORDNUM() functions, return the number of words and individual words respectively.

Prior to VFP 7, use the Words() and WordNum() functions, which are available to you when you load the FoxTools.FLL library, return the number of words and individual words respectively.

Lets see how they perform. Lets first count the words in War And Peace.

Which isnt bad considering that there are 159, 218 occurrences of character s in War And Peace.

However dont try to use CHRTRAN() when the second parameter is an empty string. The performance of CHRTRAN() in these circumstances is terrible. If you need to suppress sub-strings, use STRTRAN() instead.

String Concatenation

VFP has tremendous concatenation speed if you use it in a particular way. Since many common tasks, like building web pages, involve building documents one element at a time, you should know that string expressions of the form x = x+y are very fast in VFP. Consider this:

This full optimization occurs as long as the string is adding something to itself and as long as the string concatenated is stored in a variable. Using class properties is somewhat less efficient. String optimization does not occur if the first expression on the right of the = sign is not the same as the string being concatenated. So:

x = "" + x + y

is not optimized in this fashion. The above line, placed in the example above, takes 25 seconds! So appending strings to strings is blazingly fast in most common situations.

Outputting text

So you've got text, maybe a lot of it, what are your options for writing it to disk.

Foremostly theres the new STRTOFILE() function which creates a disk file wit the contents of a string. Lets write War And Peace to disk.

You can also use Low Level File Functions ( LLFF ) to output text. The FWRITE() function dumps all or part of a string to disk. The FPUTS() function outputs a single line from the string, and moves the pointer

Conclusion

So, there you have it, a cafeteria-style tour of VFPs text handling capabilities. I personally think that most of the code snippets Ive shown here have amazing and borderline unbelievable execution speeds. I hope Ive been able to show that VFP really excels at string handling.