All posts tagged HMM

Late last year, my thesis was approved by Virginia Tech. Fortunately, that meant I could graduate. Unfortunately, it probably means that my thesis will be socked in a (virtual) drawer somewhere to gather (virtual) dust. I had the option to produce a vanity publication through some shady German-owned publication house, but I opted out. I decided that I would publish my thesis to the Internet for anyone who was interested in such research. It can be found here:

I looked at Hidden Markov Models (HMMs) as a method to enhance functionality in cognitive radios. HMMs have been used previously for pattern recognition, such as handwriting and speech analysis, but they have seen limited use in the wireless world for things like spectrum sensing and analysis.

CPUs (especially the really small ones found in modern wireless devices like radios and cell phones) have trouble keeping up with the demands of most HMM implementations. Therefore, I created both C and CUDA implementations of such HMM algorithms in order to compare their executions times. From what I found, graphics cards (GPUs) can surpass CPUs only when many states or many models are used at the same time. If you’re interested in the results, check out my thesis. If you’re interested in trying out the code or replicating my results, you can find my code hosted here:

For the final project in my Computer Vision class (ECE 5554), I decided that the children’s book Where’s Waldo needed to be solved. For good. All those countless hours spent searching for the red and white striped man could be best spent elsewhere. You know, like solving mazes in Highlights.

After some research, I decided that I could construct a Hidden Markov Model based on the 2-dimensional Discrete Cosine Transform (2D DCT) of our friend, Waldo. Using the model, we could scan through a much larger image, looking for sections that closely match the model of Waldo. If the match is close enough (i.e. over a threshold), then highlight that section blue. We can play with the threshold to make the identification better or worse. With the test images, we saw a 70% positive detection of Waldo. Just think, with a few more algorithm and speed optimizations, manually solving for that elusive figure will be a thing of the past!

You can find the report and MATLAB code below. The code is based on the HMM Toolbox.

Hidden Markov Models[Edit: 5/2/2011] I realized that the B matrix was all sorts of messed up in the example. It has been fixed below.

Using our previous example of weather, let’s say that you wish to know the weather in a remote location but have no access to view it directly. For example, we’ll say that Alive lives in Seattle and Bob lives in Washington D.C. Bob calls Alice on the phone every day at 8 am and explains his intended activity for the day. Since Bob lives a simple life, his activities include walking outside, shopping, or cleaning his house. Additionally, Bob’s activities are highly depended on the weather. Alice wants to know the weather in Washington D.C. but fails to ask Bob and has no other means for finding out. However, she knows that Bob has a 60% chance to take a walk if it is sunny, cleans the house 70% of the time when it is raining, and goes shopping 30% of the time on sunny days. Using this information, she can construct a model for figuring out the weather in Washington D.C.

Because Alice cannot see the weather (or states) in our example, we consider the Markov Model to be Hidden. We know that the states exist, but we cannot directly observe them. Instead, we have to rely on a set of observables to model the system. We can construct another matrix to describe the likelihood of observable occurrence based on the current state. Each element of the array consists of an individual probability. For example:

b11 = P[activity = walk | weather = sunny] = 0.6

which means that Bob has a 0.4 chance of either cleaning or shopping on a sunny day. We will follow this example and fill out the rest of the probabilities.

b12 = P[activity = walk | weather = rainy] = 0.1

Following this naming convention, we can re-write the other probabilities:

b21 = P[activity = clean | weather = sunny] = 0.1

b22 = P[activity = clean | weather = rainy] = 0.7

b31 = P[activity = shop | weather = sunny] = 0.3

b32 = P[activity = shop | weather = rainy] = 0.2

These can be compressed into a matrix used to describe the output symbol probabilities:

Taking the observations into account, we can view the HMM as a state diagram with observables:

The above figure shows how the state transitions from the Markov Chain remain the same but incorporates the probabilities of observables. These probabilities are appropriately captured in the A and B matrices.

In addition to the A and B matrices, we need to define the probabilities for the initial state transition. In many HMM applications, we cannot know the starting state. In many cases, however, we can define the probabilities for the initial state. Continuing our example, the first day that Bob calls Alice, Alice knows the general weather patterns of Washington D.C. and assumes that the weather has a 75% chance of being sunny and 25% chance of being rainy. From this, she can construct the initial state probability matrix:

Using the A, B, and π ,matrices, we can fully define a Hidden Markov Model. We will use the following notation to describe the model:

When considering the use of HMMs, three canonical problems arise:

1. Given a model, compute the probability of an output sequence, which can be expressed as P(O|λ). This can be accomplished using the Forward algorithm.

2. Given a model and an output sequence, find the most likely sequence of states. This is covered by the Viterbi algorithm.

3. Given an output sequence, find the most likely sequence of states and output probabilities. Put differently, find the maximum likelihood estimator for the HMM parameters, which can be accomplished via the Baum-Welch algorithm.

HMMs play an important role in statistical modeling of time or space varying systems and have seen widespread use in applications such speech and handwriting recognition.

A Markov chain is a discrete random process that exhibits the Markov property. This means that there are a finite number of “states” or “positions,” and determining the next state is random. The Markov property states that the probability of the next event or “state” is only dependent on the current state. Namely:

For those not familiar with probability-speak, this says that the probability of event x given the outcomes of x1, x2,…xn (all events up to the current event) is equal to the probability of event x given the occurrence of event xn (the current event). In other words, all previous events except for the last occurring event do not matter.

Let’s take the example of weather. We will use discrete-time measurements to make this simpler. Let’s say you wake up every day, and at exactly 8:00 AM you note the weather. Also, let’s say you only care whether it is sunny or rainy. Given this information, you want to predict what tomorrow’s weather will be like. After several months of careful observation, you construct a model that estimates the probabilities of transitioning between our two states: sunny or rainy. We will call “sunny” state 1 and “rainy” state 2.

P[“Tomorrow is sunny” given that “Today is sunny”] which can be re-written:

a11 = P[W(n+1) = sunny | W(n) = sunny] = 0.8

Following that notation, we write the other probabilities:

a12 = P[W(n+1) = rainy | W(n) = sunny] = 0.2

a21 = P[W(n+1) = sunny | W(n) = rainy] = 0.6

a22 = P[W(n+1) = rainy | W(n) = rainy] = 0.4

Our “a” indicates a probability of state transition. “a11” is the probability of transitioning from state 1 to state 1. “a21” is the probability of transitioning from state 2 to state 1. Note that because this model has the Markov property, only today’s weather matters in trying to predict tomorrow’s weather. From this, we can construct the state transition probability matrix A:
Notice that summing across the rows yields 1. One of the laws of probabilities states that the probabilities of all the possible outcomes must sum to 1 (e.g. 100% chance of occurrence). If it is rainy today, there are only 2 possibilities for tomorrow: sunny or rainy. This can be easily viewed as a state diagram:

If we wanted to know the percentage of days that were sunny or rainy, we can find the steady-state vector. This is accomplished by:

This can be approximated by raising the state transition probability matrix to a large number. For example, A^100. This results in the following:

Which indicates that 75% of the days will be sunny and 25% of the days will be rainy.

Markov chains see wide use in many applications, including statistics, economics, physics, chemistry, biology, queuing theory, and as we will see, cognitive radio.