You may work alone or in a group of size 2 or 3 on this assignment.
However, in order to make sure you learn the material and are
well-prepared for the exams, you should work through non-coding
problems on your own before discussing them with your partner, should
you choose to work with someone. Also, be sure every programming task
is a team effort. In particular, the "you do these and I'll do
these" approach is sure to leave you unprepared for the exams.

All GitHub repositories must be created with all group members having
write access and all group member names specified in the
README.md file by 4:00 PM, Monday, December 3, 2018. This applies to those who choose
to work alone as well!

Please also note that the regular late policy does not apply to this
assignment, so we can go over some of the problems in class on our
last day. No late submissions will be accepted. Whatever is pushed
to your repository at the deadline will be considered your submission.

Getting Set Up

You will receive an email with the
link to follow to set up your GitHub repository for this problem set
(ps7-yourgitname). One member of the group should follow the
link to set up the repository on GitHub.
Other group members will receive a
subsequent email with a link to follow that will grant them the
rights to clone the repository and commit and push changes to the
origin on GitHub.

Binary Search Trees

For the first three questions, consider three-node integer-valued
binary trees whose nodes contain the elements "1", "2", and "3".

Question 1: Draw all valid trees whose in-order traversal would visit the
nodes in the order 1-2-3. Which of these are binary search trees?
Note that there are several valid tree "shapes" in each case. (2
points)

Question 2: Draw all valid trees whose preorder traversal would visit
the nodes in the order 1-2-3. Which of these are binary search
trees? (2 points)

Question 3: Draw all valid trees whose postorder traversal would
visit the nodes in the order 1-2-3. Which of these are binary
search trees? (2 points)

For the next two questions, please use the AVL tree rules as we
studied them in class.

Question 5: Now, starting with the original tree, insert 32 then 31. Show
any rotations needed to maintain the AVL condition. (3 points)

Copy your version of BinarySearchTree.java from Lab 9 into your
repository and then add, commit, and push that version
as your starting point for this section.

One could argue that two BinarySearchTree instances are equal if
they contain the same values in any shape tree, or that they must
contain the same values and have the same shape.

Question 6: Draw two small BinarySearchTrees which are equal by
the first definition, but not equal by the second. (2 points)

For the methods below, you may write them recursively or iteratively.
You may only call other methods if doing so does not cause a
significant efficiency degradation.

Practice Program: Write a method equals in the
BinarySearchTree class that considers trees equal if they
contain the same values. Write associated tests in the main
method of BinarySearchTree. (5 points)

Practice Program: Write a method shapeEquals in the
BinarySearchTree class that considers trees equal if they
contain the same values and have the same shape. Write
associated tests in the main method of BinarySearchTree. (5
points)

Practice Program: Write a method isAVL in the
BinarySearchTree class that returns a boolean indicating
if the tree satisfies the AVL condition. Write associated tests in
the main method of BinarySearchTree. (5 points)

Question 11: Complete the heapsort procedure on the array you just
"heapified" in 13.12. (2 points)

Question 12: Repeat the heapsort in the previous two questions, but
build a max heap. (3 points)

Back to Word Frequency

The Java API interface java.util.Map provides a mechanism by
which key-value pairs can be stored using different underlying
structures.

In a recent lab, you worked with a program that computed and reported
the frequency of words in input text. Data structures that implement
the Map interface are perfectly suited to perform this kind of
task. We will use the words as keys and counts as values.

In this part of the problem set, you will be doing a small empirical
analysis study using a program that computes word frequencies. The
program reads the words from the input, then computes the frequencies
five times: once with each of the structures we used previously for
this task, once for a TreeMap, and twice with a HashMap.

In the version of this program that you will find in
WordFrequency.java in your repository, the input is first read
into an ArrayList, and entries from the ArrayList are used
to populate the data structures that perform frequency counting. This
separates out the overhead of file I/O from the timings and can give
us a better idea of the relative efficiencies of the data structures
for this task.

Question 13: According to the Java API documentation, what is the
underlying data structure used to implement java.util.TreeMap?
(1 point)

Question 14: According to the Java API documentation, what is the
underlying data structure used to implement java.util.HashMap?
(1 point)

In each of the computations of the word frequencies, there are method
calls operating on the structure that stores the frequencies. For
example, for the ArrayList version, the indexOf, add,
and get methods are called. Each call to indexOf performs a
linear search, so has a best case running time of O(1), when an
entry for the word is found at the beginning of the ArrayList and
a worst case running time of O(n) when there is no entry for the
word yet or it's found near the end. Each call to get is O(1),
since the get method of the ArrayList class can access the
entry directly in its underlying array. The calls to add result
in the new entry being added at the end of the list. Most of these
are O(1), with the exception of the times when the underlying array
needs to be expanded, in which case it is O(n).

Question 15: For each of the method calls on each of the other
structures, give its running time in Big O notation and justify
your answer. Be specific, and if there is best and worst case
behavior, note that clearly. Follow the example in the description
of the ArrayList behavior above. (12 points)

In your repository, you will find a number of text files of varying
sizes. Run WordFrequency multiple times, redirecting your input
at the command line from each of these files. Run each case at least three
times.

Question 16: Present a table of the results of the runs described
above. Include a row for each input file, and a column for the name
of the file, the number of words in the file, the number of unique
words found, and the elapsed times for each of the 5 structures.
Use the fastest time you get for each case. Order your table's rows
in increasing numbers of words in the input file. (6 points)

Question 17: The last two tests each use a HashMap, but the
second is likely to be faster than the first. Using the Java
API documentation for reference, explain the difference between the
two HashMaps. Why would you not use an even larger value in
hopes of further timing improvements? (3 points)

Question 18: Analyze the overall results in your table. Relate the timings
back to the expected running times of the methods. Do they match
expectations? Be specific. (10 points)

Submitting

Please answer questions that require a text response in the
README.md file of your repository. Hand-drawn images may be
submitted electronically through your repository or in hard copy.
Only one submission per group is needed.

Your submission requires that all required code deliverables are
committed and pushed to the master for your repository's origin on
GitHub. That's it! If you see everything you intend to submit when
you visit your repository's page on GitHub, you're set.