Classifying with Discrete Time Markov Chains

Discrete Time Markov Chain

A discrete time markov chain is a stochastic model describing a sequence of observed events. The Markov property is that the probability of observing the next state is only dependent on the current state the sequence is in. More formally,

A practical example is the probability of the weather being sunny tomorrow is only dependent on the weather observed today. These probabilities are expressed as a transition probability matrix. With the transition probabilities we can do a number of things, such as classify an unlabeled sequence.

Weather example

Consider a sequence of observed weather for a particular city e.g. sunny, sunny, cloudy, rainy, etc. How could we determine which city this weather pattern was observed?

I sourced the weather for Canberra and Darwin for January and converted the measurements to labels. From these sequences we can calculate the transition probabilities for weather states sunny, cloudy and rainy using the ‘markovchain’ package and fit by MLE.

## A 3 - dimensional discrete Markov Chain defined by the following states:

## cloudy, rainy, sunny

## The transition matrix (by rows) is defined as follows:

## cloudy rainy sunny

## cloudy 0.4444444 0.1111111 0.4444444

## rainy 0.3333333 0.5000000 0.1666667

## sunny 0.1333333 0.2000000 0.6666667

1

2

darwinFit$estimate

1

2

3

4

5

6

7

8

## MLE Fit

## A 3 - dimensional discrete Markov Chain defined by the following states:

## cloudy, rainy, sunny

## The transition matrix (by rows) is defined as follows:

## cloudy rainy sunny

## cloudy 0.2000000 0.6000000 0.2

## rainy 0.1428571 0.8571429 0.0

## sunny 0.2500000 0.2500000 0.5

1

2

3

# plot

plot(canberraFit$estimate)

Assume we have a sequence of observed weather states for a week but don’t know which city it was from, Canberra or Darwin. How could we determine which city this weather pattern was observed? We use the transition probabilities and calculate the log likelihood that the sequence was observed from each city.

Based on these results the pattern was most likely observed from the Darwin model.

I like this example because it is simple to understand and the ideas behind it are the building blocks for more advanced methods for analysing sequences such as hidden markov models, across a wide range of real-world processes. It is also good at explaining the distinction between probability and likelihood, being we have some observed data and we compute the likelihood of that data being produced from system 1 or system 2 (Canberra or Darwin).