The objective of this project to practice using the
Java Collections Framework

This homework is
considered a closed homework. Make sure you read the
open/closed policy before continuing working on this project.

Project Clarifications

Any clarifications or corrections associated with this project will be available at: clarifications.html

Overview

For this project you must implement the methods of the
DenseBag and MarkovText classes. These methods will allow us to generate
random text. You will need to design and write additional classes in
order to help you implement the functionality of those two named classes. In
particular, you will need to write two classes that implement the
Iterator interface; one to provide an Iterator for DenseBag,
and one to implement the functionality of the
MarkovText.markovIterator(java.util.Random) method. You may use
inner classes if you wish.

Markov Chains

OK, we know about finite state machines. A first-order Markov chain is
a system where states are chosen randomly, with the probability distribution for
the next state being determined only by the current state. For example, you
could model weather prediction as the following Markov chain:

If it is sunny today, there is a 90% chance it will be sunny tomorrow,
otherwise it will be cloudy.

If it is cloudy today, there is a 50% chance it will be sunny tomorrow,
otherwise it will be cloudy.

In a second-order Markov chain, the probability of each state is based on
the previous two states; in a third-order Markov chain, the probability of
each state is based on the previous three states. More pointers on Markov Chains
at:

DenseBag class

As part of this project, you will implement a "Dense Bag". A bag is like a
Set, but with repeats allowed. A dense bag is a bag where we expect to
get lots of repeats. In this project, bags will contain words, so a
typical "dense bag" might contain:

Note that in the example above you will not actually store 170
separate copies of the word "cow" in your DenseBag object (nor will you even
store 170 separate references to the word "cow") -- rather you will
create just one copy, and associate the number 170 with this copy to keep track
of the number of occurrences of "cow" that are supposed to be in the bag.

Read the javadoc for the DenseBag class to see what methods you must
implement and what they must do. Note that for the DenseBag class, you need to
go beyond the functionality needed for this project, and provide a good,
general-purpose implementation of a DenseBag. We will be testing features of
your DenseBag implementation that are not needed for the Markov chain text
generation.

Markov Chain Text Generation

We are going to read in a corpus of text (also referred to as "training
sequence"), and use that to build a Markov
chain transition table. For example, using a third-order Markov transition
table, we might find that the word sequence "what was the" occurs three times.
Twice it is followed by "matter" and once by "cause". In generating new text, if
we have a sequence of words ending in "what was the", then we
randomly choose which word to add next. One third of the time, we choose
"cause", and two thirds of the time we choose "matter".

Although lots of people have done this before, we are going to take our own,
unique OO approach to the problem.

Creating an Order-n Markov "Transition Table"

Let's suppose we are going to generate random text based on the following
input stream (also referred to as "training sequence").

This is a nice sentence. This is another sentence. This is another
sentence.

From the list of words described above, the program will build a "transition
table", which will be a map. For a markov chain of Order-n, the map will use
Lists of n
wordsas thekeys. The
values that the map assigns to the keys will be
DenseBag objects containing words.

Here's how it works. Suppose we are generating a Markov Transition Table of
Order-2. Let's consider a typical sequence of two words taken
from the input in the example above. The sequence of two words [This, is]
occurs in three places in the input. In two places, "This is" is followed
by the word "another". In the other spot, "this is" is followed by "a". So in
the transition table there would be an entry mapping the List [This, is]
to a DenseBag containing [2 occurrences of "another", 1 occurrence of "a"].

As part of your design, you have to figure out how to define the
first and last map entries associated with a training sequence. One
possibility is to use the empty string (which never occurs in training
sequences) as a special marker. For example, for an Order-1 Markov
Transition Table trained on the sequence [ "a", "a", "b" ], we might generate
the map:

[""] = {"a" = 1}
["a"] = {"a" = 1, "b" = 1}
["b"] = {"" = 1}

For an Order-2 Markov Transition Table trained on the sequence [ "a",
"a", "a", "b" ], we would generate the map:

For an Order-n Markov Transition Table, the first map entry will have
a list of n empty strings mapped to the first word in the training
sequence. The map list entry representing the end of the training
sequence, will map to a dense bag that contains an empty string (regardless of
the order used). Keep in mind that other strings besides the empty string
can be associated with this list entry (see Order-2 Markov Chain example below).

Example

Below is the Transition Table for the Order-2 Markov Chain generated
by the following text:

This is a nice sentence. This is another sentence. This is another
sentence.

["",""]
= {"This" = 1}

["", This]
= {"is" = 1}

[This, is]
= {"a" = 1, "another" = 2}

[is, a]
= {"nice" = 1}

[a, nice]
= {"sentence." = 1}

[nice, sentence.]
= {"This" = 1}

[sentence., This]
= {"is" = 2}

[is, another]
= {"sentence." = 2}

[another, sentence.] = {"This"
= 1, "" = 1}

Below is the transition table you would get if you were using Order-3.

["","",""]
= {"This" = 1}

["","", This]
= {"is" = 1}

["", This, is]
= {"a" = 1}

[This, is, a]
= {"nice" = 1}

[is, a, nice]
= {"sentence." = 1}

[a, nice, sentence.]
= {"This" = 1}

[nice, sentence., This]
= {"is" = 1}

[sentence., This, is]
= {"another" = 2}

[This, is, another]
= {"sentence." = 2}

[is, another, sentence.] = {"This"
= 1, "" = 1}

[another, sentence., This] = {"is" = 1"}

Your program should be able to generate a Markov Transition Table from a list
of words using any order specified (1, 2, 3, 17, whatever...)

Generating Random Text using the Transition Table

Once your transition table is complete, you can use it to generate random
text. Let's do an example using the Order-2
Transition Table described above.

To start things off, look up List ["",""] in the map. We
get back a DenseBag containing [1 occurrence of "This"]. No choices here --
we will select the word "This" and now consider the list ["",
This].

Okay, look up the List ["", This] in the map. We get back the DenseBag
containing [1 occurrence of "is"]. Okay, still no choices. We
take the word "is" and now consider the list [This, is].

Okay, look up the List [This, is] in the map. We get back the DenseBag
containing [1 occurrence of "a", 2 occurrences of "another"]. This is
where the randomization comes in! Since there are 3 total things in the
bag, 1/3 of the time we will select the word "a" and 2/3 of the time we will
select "another". Let's imagine that in this example we happened to choose
"another". Now we'll consider [is, another].

Looking up [another, sentence.] yields a dense bag containing [1
occurrence of "This", 1 occurrence of ""]. Randomization will be
used again! Since there are a total of 2 things in the bag, 1/2 of the
time we will select "This" and 1/2 of the time we will select "".
Let's imagine that in this example we happened to choose "". The
process stops. If instead of "" we choose "This", then we will next
consider [sentence.,This].

Markov Text Applet

You can download & add to
your project a Java applet to help test your Markov text generator.
The applet provides a GUI for entering/editing training data (or downloading training data directly from an URL), then displays the generated text. The GUI also allows you to specify the order of the Markov chains. Some training data you may use:

Please make sure you read the academic integrity section of the syllabus so
you understand what is permissible in our programming projects. We want to
remind you that we check your project against other students' projects and any
case of academic dishonesty will be referred to the
University's Office of
Judicial Program