Welcome to Data Science Learning Club

"You have found the Data Science Learning Club forums for the Becoming a Data Scientist Podcast. Everyone is welcome to join! Every two weeks, a learning activity is posted for us to work on together, share resources, and learn from one another. See the Welcome Message in the forums for more info!"

To be able to understand Hidden Markov models we need a short introduction of Markov chains. In Markov chains there is in general a finite amount of states (e.g. A, B, C) which are observable with a certain transition probability between each pair of states (A->A 0.4, A->B 0.1, A->C 0.5) stored in a transition matrix P (sometimes A but since A is a state in my example I’ll stick to P). Usually there is a sequence of such states (e.g. AABACBA). When we have the complete transition matrix P (I only showed the first row) we can compute how likely such a sequence is or also generate a sequence.

In a hidden Markov model the sequence of states is “hidden”. We can not directly observe the state sequence AABACBA but each state emits something with a certain probability. We can have more, less or the same amount of different observations as states (e.g. states are A, B, C and possible stats are R, S, T, U). The length of the unobserved and observed sequences are the same though, because there is one observed state for each unobserved state. Most of the time we say that each state is an observation at a certain point in time, but since Markov models are also used in text analysis each state can also be a letter, word, sentence, … in a longer sequence of letters, words, sentences, ...
Hidden Markov models can be used when we can observe part of something but want to make inferences about something we can’t observe. In the above example we could observed RRUSTUR and want to know the sequence of A, B and Cs. From previous data we might know the emission probabilities (A->R, B->R, … how likely is it to observed R when there is A), … From this matrix E (also sometimes called O), the transition matrix P and the initial probabilities for each state A, B, C (often stored in a vector p) we can compute the most likely hidden state sequence.

What does this mean in reality? I’ll give you an example.

1) Let’s say we stay at home all the time and we only observe our cat as she comes in through the cat door. We see whether the cat’s fur is dry or wet but we don’t know whether the weather is sunny or raining. We can only conclude from the cat’s fur. We know that there are other reasons why the cat might be wet so we can not be sure it’s raining just because the cat is wet. From historical weather data we know that a transition matrix for rainy/sunny looks like this:

You might notice that the rows sum up to 1 but the columns don’t. Think about it ?
We know the emission/observation probabilities.

This means that when it is sunny, there is a 10% chance that the cat has wet fur. When it is rainy, there is a 99% chance she is wet (in 1% of the time she was sleeping in her house right outside the cat door before she came in ).

If we observe the cat for a few hours (for simplification we say she comes in once per hour) and we observe this pattern: DDDWWDDWDD
There could be several possibilities how the weather changed on this day. Maybe it didn’t change at all and it was just chance that my cat’s fur got dry/wet.

So far we assumed that we know the parameters (transition matrix P, emission matrix E, initial state probabilities p) but if we don’t we can learn them using the Baum-Welch algorithm

Basics
This was the most basic introduction I could think of. There are many extensions for Hidden Markov models and Markov chain to make them work for non-discrete times (so far the time between each observation is the same) or to work with distributions of observations instead of concrete values.
You can find more here

About Becoming A Data Scientist

BecomingADataScientist.com is a blog created by Renee Teate to track her path from "SQL Data Analyst pursuing an Engineering Master's Degree" to "Data Scientist". She created this club so participants can work together and help one another learn data science. See her other site DataSciGuide for more learning resources.