Exercise 3—GUI File Copy

Write a file copy program that uses a graphical user interface.
The user enters the name of the file to copy and the name
of the destination file in TextFields and clicks a button
to perform the copy.
Write error messages into another TextField.

For a nicer program,
look in your documentation for the
FileDialog class.
Include it in the GUI so the user
can choose the source file graphically.

Exercise 4—Reading Random Integer Data

Write a class that reads data from a file
containing integers in character format.
This class will be a software tool that
other programs can use to simplify their
own input.
The constructor for the class will use
the name of the input file as a parameter,
and will create the appropriate streams.
Write a close() method
and a method int getNextInt()
that returns the value of the next integer
from the stream.
When an error is encountered,
these methods will write an error message
and stop the program.

The input file
may have none, one, or several integers per line.
A suitable file could be created by the
program for Exercise 1 of the previous chapter.
The class will work with any
number of integers total,
and a varying number of integers per line.

Use BufferedReader,
FileReader,
parseInt(),
and StringTokenizer.

Test your class by using it in a program that
writes the integers from the input file on the monitor,
one per line.
Once the class is debugged there are many other
programs you can write that use it:

Compute the sum and average of all integers in a file (easy).

Compute the sum, average, and standard deviation of the integers (easy).

Find the minimum and maximum integer (easy).

Find the most and least frequently occuring integer, reporting all ties (medium).

Exercise 6 — HTML Filter

Any text editor, such as Notepad, can be used
to create web pages.
Unfortunately, these editors usually do not
check spelling.
Word processors can open a text file and check
its spelling.
But when a file is sprinkled with HTML tags
they all are flagged as errors and
the real spelling errors are hard to see.
This exercise is to write a utility that strips
the HTML tags from a text file.

Write a program that reads in a text file
and writes out another text file.
The input file may have any number of HTML tags
per line.
The output file will be a copy of the input file
but with spaces substituted for each HTML tag.
The program will not check HTML syntax; it looks
at the file as a stream of tokens and substitutes
spaces for each token that is a tag.
For this program,
an HTML tag is any token that looks like one of these:

<Word> </Word>

Assume that
Word is a single word (perhaps just one letter or no letters)
and that there are
no spaces between the left and right angle brackets.
With this definition, the following are tags:

<p> </p> <em> </em>
<rats> </1234> <blockquote> </>

With this definition, the following are NOT tags (although some are with real HTML):

Challenging Exercise:
Write the program to filter out
any tag that looks like one of these:

<Word .... > </Word ... >

Now Word is a single word that immediately
follows the left angle bracket, but may be followed by
more text which may include spaces.
A tag ends with a right angle bracket, which
might or might not be
preceeded by a space.
Assume that a tag starts and ends on the same line.
With this definition,
the following are tags

Start by setting a flag to false.
Now look at the input stream of tokens one by one.
When a token starts a tag set a boolean flag to true.
While the flag is true discard tokens until encountering
a tag end (either stuff> or >).
Set the flag to false.

File IO Project

Letters of the alphabet
occur in text
at different frequencies.
Write a program
that confirms this phenomonon.
Your program will be invoked from
the command line like this:

C:\mydir> java freqCount avonlea.txt avonlea.rept -all

It will then read through the first text file
on the command line (in this case "avonlea.txt")
accumulating the counts for each letter.
When it reaches the end of the file,
it will write a report (in this case "avonlea.rpt")
that displays the total number of alphabetic
characters "a-zA-Z" and for each character
the number of times it occured and the relative
frequency with which it occured.
In counting characters,
regard lower case "a-z" and upper case "A-Z"
characters as identical.

You will need an array of 26 long integers,
one per character.
To increment the count for a particular character
you will have to convert it into an index in the
range 0..25.
Do this by first determining which range the
character belongs in:"a-z" or "A-Z" and then
subtracting 'a' or 'A' from it, as appropriate:

int inx = (int)ch - (int)'A' ;
count[inx]++ ;

Discard characters not in either range
without increasing any count.

Second Part:

Do the relative frequencies of the initial
letters of words differ from the relative
frequencies for all letters in a text?
Add logic to the program so that it examines
only the first character in each word.
Allow the user to chose between the two options
with a switch on the command line:

C:\mydir>java freqCount avonlea.txt avonlea.rept -first

For this option it will be convenient to use the
Java class StringTokenizer to deliver
individual words one at a time.
In the string of delimiters passed to StringTokenizer
include whitespace and all punctuation that might be at
the start or end of a word.
This is not quite good enough for an accurate count
because some words will be split between lines

It is often true that handling the an-
noying details makes up the large maj-
ority of the statements in a pro-
gram.

So, if the last token
in a line (returned by StringTokenizer)
ends with '-', don't include
the first letter of the first token on the next line
in the count.

Testing:

For testing, create some really simple files
that demonstrate that your program is working.
For instance:

The first draft of your program will write its count to
the monitor for easy debugging.
Add text file output later.
It is probably wise to write the first part of the program
and debug it before moving on to the second option.

Download a text file of a novel of at least 400K bytes from
Project Gutenberg..
Use a file that does not use HTML formatting tags
(which would confuse the count).
Delete the text at the beginning of the file that is
not part of the novel (the legalese and documentation).
Run both options of the program on the text.

Example:

Here is a sample run of my program with the text "Ann of Avonlea" from project
Gutenberg.