Beginner Event 3: The Shot Put

In the shot put event, you must be able to handle a heavy load of text. To make it easier for you to carry the load, you will be asked to balance text between two files. The detailed event description was revealed last Wednesday.

Guest commentator: Alex K. Angelopoulos

Alex Angelopoulos is a former Microsoft scripting MVP. He maintains the Windows Scripting, Web site which is a veritable treasure trove of information related to scripting, with an emphasis on Microsoft Visual Basic Scripting Edition (VBScript).

VBScript solution

There are several ways to use and complete the VBScript for the Beginner’s Shot Put event. I chose a particularly rapid technique that reads the entire file all at once, and then uses VBScript's split method to break the file into two parts.

In its simplest form, the split method takes an arbitrary line of text and splits it at the spaces in the string. Therefore, you can do something like this:

words = Split("two words")for each word in wordsWScript.Echo wordnext

The previous code returns two words, each of which is on its own line. This is seen here:

twowords

The split method can also take an optional second argument that tells VBScript to look for something to split on other than just a space. If you change the initial line shown above to include the second parameter, you will get something that looks like this:

words = Split("two words", "o w")

The output is now completely different, and is shown here:

twords

The split method works just as easily with displayed and nondisplayed characters. Embedded line terminations are characters that are not displayed when you open a text file but are located in the file. In Windows, the default line termination is a sequence of two characters—the carriage return and the line feed. VBScript even includes a built-in constant representing this two-character sequence, vbCrLf.

What is significant for the shot put data is that the boundary between the two paragraphs is a blank line. Even though there are no visible characters on that line, it is preceded by a carriage return and a line feed, and is immediately followed by a carriage return and a line feed. This character sequence written in VBScript is seen here:

vbCrLf & vbCrLf

It is also important to realize that this character sequence appears nowhere else in the data. Therefore, if we read the contents of the entire file in one step, we can split it on a vbCrLf and vbCrLf, instead of reading the file line by line. This might seem a little confusing, so let us take a look at the solution.

Here is a complete solution using the handy split method, with embedded commentary and some commented-out code at the end that is usable for test cycling.

BeginnerEvent3Solution.vbs

Option Explicit' we get a reference to the FileSystemObject (a.k.a. FSO)Dim fso: Set fso = CreateObject("Scripting.FileSystemObject")' Constants used to control the mode for opening filesConst ForReading = 1, ForWriting = 2

' open the data file for reading.' We don't need to explicitly specify ForReading because this is the' default mode for opening a text file.Dim file: Set file = fso.OpenTextFile("Shot Put.txt", ForReading)

' Because we're done with the file now, we close it.' If we don't, we'll get a "Permission denied" error later'when we try to rename it.file.Close

' Now split contents on the repeated line ending;' this gives a 2-member array with multiline text blocks in each one.Dim data: data = Split(contents, vbCrLf & vbCrLf)

' Now we create both files using OpenTextFile.' We could have used Dim FileA, FileBSet FileA = fso.OpenTextFile("Shot Put A.txt", ForWriting, True)FileA.Write data(0)Set FileB = fso.OpenTextFile("Shot Put B.txt", ForWriting, True)FileB.Write data(1)fso.MoveFile "Shot Put.txt", "Shot Put.old"' following lines were used to speed up the testing cycle;' after moving the file, I make sure it exists from the script,' then rename it back to the old name so I don't need to go' in and change the file extension via Windows Explorer.' FileExists returns a true/false value, which gets echoed' as either -1 or 0. If we force it to string form using CStr,' we get a readable "True" or "False" instead.'WScript.Echo "file was renamed?", CStr(fso.FileExists("Shot Put.old"))'fso.MoveFile "Shot Put.old", "Shot Put.txt"

When we run the script, two text files are created. Those files are seen here:

This is extremely wordy code, but it has the advantage of being readable. This also means it is easier to maintain. Although not nearly as readable, an experienced scripter could condense this down to the following five lines, although I do not recommend it for anything but a throwaway script:

Guest commentator: Alex K. Angelopoulos

Scripting Guys Note: As it turns out, Alex is bilingual—he scripts in both VBScript and in Windows PowerShell. When he told us that he did his prototype VBScript in Windows PowerShell, we asked him to go ahead and submit his Windows PowerShell solution to Beginner Event 3. In fact, Alex is not unusual in this regard. We also use Windows PowerShell to prototype a solution to a VBScript problem. We do this often when we get questions such as, “I am trying to do such and such with VBScript, but the Windows Management Instrumentation ( WMI) class is not returning any data.” Windows PowerShell is just so much faster to test a WMI class that we always check the class by using Windows PowerShell before writing a VBScript answer to the problem. For more information about using Windows PowerShell to query WMI, check out the Hey, Scripting Guy! articles from the week of March 6, 2009.

Windows PowerShell solution

To solve the shot put problem with Windows PowerShell, you must understand how to use the Get-Content cmdlet. The Get-Content cmdlet provides you with the name of a file and it will automatically retrieve all of the data from the file. With the shot put problem, however, when you use Get-Content the easy way, it actually gives us a harder problem to solve. This is due in part to the design of Get-Content.

The design of Get-Content reflects the Windows PowerShell preference for pipeline processing. Instead of reading data in as a monolithic block of text, Get-Content returns the file line by line. This means that when you read the contents of a file and store the results in a variable, you get a collection of lines from the text file. This is seen here:

$data = Get-Content '.Shot Put.txt'

Because there are multiple line breaks in the file, you end up having an array of nine elements, each containing one line of text from the file. You can see that the data is stored in an array by querying the count property. This is seen here:

$data.Count

Because the goal of the shot put event is to split the data file into two separate paragraph files by using the blank line in the file as a boundary marker, the obvious way to work with this problem is by finding and locating the blank line. The following code checks the length of each line that is stored in the $data variable. When it finds the empty line, it exits with the index of the empty line stored in the variable $i. This is seen here:

We can then use the Windows PowerShell array notation to remove all of the lines before the empty line and put them into one file. We will repeat this process with the lines of text after the empty line. This is shown here:

This is an effective and fairly compact solution. However, it makes the problem more complicated than necessary. The Get-Content delimiter parameter allows us to specify any arbitrary string as a delimiter to use when separating data chunks in a file. Because we must split the file at the point where there is an empty line, we can use the empty line as a delimiter. Although the line is empty, it marks a point where there are two line terminations in a row. Because a standard line termination in Windows is a carriage return followed by a line feed—represented with the escape character sequence `r`n in Windows PowerShell—we can use `r`n`r`n as our delimiter. This is shown here:

$data = Get-Content '.Shot Put.txt' -Delimiter "`r`n`r`n"

In one step, we have imported the data from the Shot Put.txt file as an array of two elements, each containing data for a paragraph file. We just write the data to the files and we are finished. This is seen here:

Advanced Event 3: The Shot Put

The shot put event involves throwing a heavy, metal ball. Some people think decathlon events are all the same. For the shot put event, you are required to find words in a file that contain all the same vowels.

Guest commentator: Scott Hanselman

VBScript solution

In the Advanced Shot Put Event, we must search for a file that contains words that have the same vowels. We will use the FileSystemObject object to read a text file. After we create the FileSystemObject and store it in a variable, we use the OpenTextFile method to open the text file, and the CreateTextFile method to create a new text file. This is seen here:

We then use the Do…While…Loop to walk through the text file, one line at a time. We use a variable, isUniVowel and set it equal to True. We use the MID function to examine each letter in the string. We call the isVowel function to see if the letters within the string are vowels. If they are vowels, we continue through the string to see if there is an additional vowel that is not equal to the first vowel. If we find more than one vowel, we set isUniVowel to False. This is seen here:

Windows PowerShell solution

In English, there are five letters that always represent a vowel when written: a, e, i, o, and u. This Advanced event requires that we get all of the univowel words from a given file and write them to a new file. There are two rules we must follow:

A word can have only one unique vowel (for example, "go").

The same vowel can appear more than once in the same word (for example, "good").

Our approach to this script is to break the solution into three tasks:

1.Get the file content.

2.Process the univowel words.

3.Export the univowel words to a new file.

The first thing we must do is get the file content and examine the words found in the file. Getting the file content is basically an easy task. We use the Get-Content cmdlet, which reads the file content one line at a time and returns an object for each line. In our task, each line represents a word. Here are the first 10 items of the WordList_Adv3.txt file:

As you can see, the first line contains the word “the”. The word is a univowel word because it contains only the vowel “e”. On the other hand, line number nine contains the word “have”; it is not a legal univowel word because it contains two vowels, “a” and “e”.

Now let us skip to the third part of our task and see how we can export the words (that we have not found yet) to a file. We can redirect the output to a file with the Out-File cmdlet:

PS > Out-File. univowel .txt

So we know how to get the content of the file and how to export the results to a new file. We can write a temporary command that looks like the following:

Finally, we must solve part of number two, which will process each line and check if the current word is univowel. To accomplish this part of the script, we discover if a word contains vowel characters by using the –Match operator and a regular expression pattern. This is seen here:

if($word -match '[aeiou]' ){Write-Host "Word: $word contain vowels"}

The regular expression pattern [aeiou] is a "character class." A character class matches only one out of several characters. We can now filter words that contain vowels by using the Where-Object cmdlet. The Where-Object creates a filter that c