The lexicon file just has the string of phones for the pronunciation of
each word, separated by spaces, with a pound-sign (#) at the end of each line.
Notice that there is a special phone for silence, called SIL:

The phone file gives the set of phones in order. The phone set is a version
of the ARPAbet. Unlike a real system which would represent phonetic information as
triphones or even up to quintaphones, this homework assumes a simple monophone system.

The likelihoods file has the output from the Gaussian models in the following format:

The first number is the frame number; there are 558 frames in this utterance.
Since frames are every 10 milliseconds, this means that the utterances was 5.58 seconds
long. The second symbol is the phonetic unit. The third number is the state
(this is a three-state HMM). The last number is the log-probability (base 10)
of observing the feature vector given the base phoneme and HMM state.

Your Viterbi implementation will need to read in the observations file, the phone
file, and the lexicon, create a Viterbi_prob matrix, and then fill it out from
left to right. You will need to keep an array of back-pointers so when the matrix
is complete, you can start from the end and find the most likely sequence of words.

Your output should look something like this:

The best path probability is -500000.00
The best path contains the following words: seven three four etc etc (or whatever)

Hint: In order to get the correct answer, you will need to
add a word transition penalty of -50 to each path
every time it transitions from word to word.

Hint 2: The HMM structure you should assume for each
word is technically a "left-to-right non-skip self-loop 3-state HMM".
That means that each phone has exactly three substates,
that you can't skip any of these states, and that they
all have a self-loop. In addition, I assume all the state transition probabilities
are 1.0. Technically this means that they aren't true probabilities
(if they were true probabilities, they would have to be 0.5, since
the two arcs leaving each state must sum to 1). But in practice,
since all that matters is that they are all equal,
it makes things simpler to just assign everything a probability of 1,
since the log of 1 is 0; that means you can just ignore the probabilities
altogether, except for the word transition penalty.
Anyhow here's an example for the word "two":

Hint 3:
You'll need to keep track, for each node of your lattice,
which state (subphone) of which model (phone) of which word
it corresponds to. This is automatically
available from the row number in the viterbi lattice.
There are exactly three transition cases you need to consider in
your Viterbi algorithm as you move from one frame (column) to the next

Staying in the same state

(only if you are not in the last state of the
current word) Transitioning to the next state in the same word

(only if you are in the last state of the current word)
Transitioning to the first state of every possible word.

Hint 4: Some frequently asked questions for this homework:

Q: should we create separate HMMs for each word, or one big HMM?

A: Mathematically it's one big HMM, but as mentioned
above you need to represent the separate words in
your state column in your lattice, since you
need to know which state of which word you are in.

Q: Is the "f" of "four" the same state as the "f" of "five"?

A: No. You keep a distinct state for each phone of each word.
Those two 'f's WILL share the same observation likelihood, i.e.
the same "b" matrix. BUT they will have different viterbi scores
because they correspond to different rows in the Viterbi lattice,
and will be in different paths.

Hint 5:
If you would like to check that your code is correct, we have provided
an additional output file, available here which
you may run your code on. The correct answer for this file is:silence
three
one
silence
one
five
eight
two
six
five
six
one
three
six
seven
five
silence
silence
silence
silence

log probability: -19202.51304840347

NOTE: Please note that in order to match this output, you must force your Viterbi implementation to end in an end state (instead of ending mid-word).

How to submit the homework:
Format the correct word sequence and
the path probability as shown above, i.e. each word or
silence goes into a separate line. Then put one newline
character between the words and the log probability line. Name the file as
result.txt. Here is an
example result.txt. Put the result.txt and
your code into a folder. Include a README file which tells
us how to run your code. Submit the folder using the submit script
from HW1. Remember to use pa2 when prompted.