Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

January 3, 2012

A Semigroup Approach to Finite Markov Chains

Posted by Tom Leinster

It can be encoded as a square matrix of nonnegative reals, each of
whose rows sums to 1 (a ‘stochastic matrix’). Here you’ll meet
the fundamental results on finite Markov chains, proved in a way that
appears to be new. — TL

I’m going to prove all the standard finite Markov chain convergence results
using the theory of compact semigroups. This approach may be known, but I
couldn’t find it. I don’t obtain any of the bounds on convergence, although I
suppose one can obtain them via this method with a little work.

Compact semigroups

By a compact semigroup we mean a non-empty semigroup SS with a compact
Hausdorff topology and
jointly
continuous multiplication S×S→SS\times S\to S. We
shall assume for simplicity that the topology on SS is metric since our main
examples have this property. The following simple exercise from topology will
be used without comment in the sequel.

Lemma 1
Let (X,d)(X,d) be a compact metric space and {xn}\{x_n\} a sequence in XX. Then
xnx_n converges to x∈Xx\in X if and only if every convergent subsequence of
{xn}\{x_n\} converges to xx.

For example, it is easy to prove from this lemma that if SS is a compact
semigroup which is algebraically a group, then the inverse is continuous.
Indeed, if sn→ss_n\to s and s˜\tilde{s} is a limit of some subsequence of
{sn−1}\{s_n^{-1}\}, then s˜s=1=ss˜\tilde{s}s=1=s\tilde{s} and so s˜=s−1\tilde{s}=s^{-1}. By
the lemma, it follows that sn−1→s−1s_n^{-1}\to s^{-1}.

A compact semigroup is called monothetic if it is topologically
generated by a single element, that is, S=⟨s⟩¯S=\overline{\langle s\rangle} for
some
s∈Ss\in S where, as usual, ⟨s⟩={sn∣n≥1}\langle s\rangle =\{s^n\mid n\geq 1\} is the
subsemigroup generated by ss. One easily checks that SS is commutative. In
fact, the structure of SS is well known and we record it in the following
theorem, due independently to Koch and Numakura without the metric assumption.

Theorem 2 (Koch, Numakura)
Let S=⟨s⟩¯S=\overline{\langle s\rangle} be a monothetic compact (metric) semigroup.
Then SS has a unique minimal
idealII, namely the set of all limits of
convergent subsequences of {sn}\{s^n\}. Moreover, II is a compact abelian group.
The identity sωs^{\omega} of II is the unique idempotent of SS. Furthermore,
I=sωSI=s^{\omega}S and the map x↦sωxx\mapsto s^{\omega}x is a surjective continuous
homomorphism S→IS\to I and so in particular II is a monothetic compact abelian
group.

Proof:
The sets snSs^n S with n≥1n\geq 1 form a decreasing chain of closed ideals and
hence have non-empty intersection by compactness. Put I=⋂n≥1snSI=\bigcap_{n\geq 1}
s^n S; clearly II is a closed ideal. We claim it consists precisely of all
limits of convergent subsequences of {sn}\{s^n\}.

Indeed, suppose that x=limsnkx=\lim s^{n_k} and n≥1n\geq 1. Then by passing to a
subsequence we may assume that nk>nn_k &gt; n for all kk and {snk−n}\{s^{n_k-n}\}
converges to an element y∈Sy\in S. Then x=snyx=s^n y by continuity of
multiplication. We conclude that x∈Ix\in I. For the converse, first observe
that every element of S∖{sk∣k≥1}S\setminus \{s^k\mid k\geq 1\} is a limit of a
subsequence of {sn}\{s^n\}. Thus it suffices to prove that if sm∈Is^m\in I for
some m≥1m\geq 1, then sms^m is a limit of a convergent subsequence of {sn}\{s^n\}.
But then we have sm=smys^m=s^m y for some y∈Sy\in S. If y=limsnky=\lim s^{n_k} with
nkn_k an increasing sequence, then sm=limsm+nks^m=\lim s^{m+n_k}. Otherwise,
sm=sm+ks^m=s^{m+k}
for some k≥1k\geq 1. But then sm+nk=ss^{m+n k}=s for all n≥1n\geq 1 and so
sm+nk→sms^{m+n k}\to s^m. Thus II consists of all limits of convergent subsequences
of {sn}\{s^n\}.

We next show that II is algebraically a group. It will then follow that it is
a compact abelian group.
To do this, it suffice to show (by commutativity) that ax=ba x=b has a solution
x∈Ix\in I for all a,b∈Ia,b\in I. Suppose a=limsnka=\lim s^{n_k} and b=limsmkb=\lim s^{m_k}.
Passing to a subsequence, we may assume that mk>nkm_k &gt; n_k for
all
kk. Extracting further subsequences, we may assume smk−nks^{m_k-n_k} converges to
an element x∈Sx\in S. Then
ax=limsnksmk−nk=limsmk=b.
a x=\lim s^{n_k} s^{m_k-n_k} = \lim s^{m_k} = b.
This completes the proof that II is a group.

Let us show that II is the unique minimal ideal. Let JJ be an ideal of
SS. Then ∅≠IJ⊆I∩J⊆J\emptyset\neq I J\subseteq I\cap J\subseteq J. But II is a group
and so has no proper ideals. Thus IJ=II J=I and so I⊆JI\subseteq J.

Suppose that f∈Sf\in S is an idempotent. From f=limfnf=\lim f^n, it is immediate
that ff is a limit of a subsequence of {sn}\{s^n\} and hence belongs to II.
But II is a group and therefore contains a unique idempotent, its
identity. Clearly sωS⊆Is^{\omega} S \subseteq I and so sωS=Is^{\omega}S=I by
minimality. For x,y∈Sx,y\in S, one has
sωxsωy=(sω)2xy=sωxys^{\omega} x s^{\omega} y = (s^{\omega})^2 x y = s^{\omega} x y and so
x↦sωxx\mapsto
s^{\omega}x is a continuous surjective homomorphism S→sωS=IS\to s^{\omega}S=I.
This completes the proof.

Let S=⟨s⟩¯S=\overline{\langle s\rangle} be a compact monothetic semigroup (which we
continue to assume to be metric) and let S×X→XS\times X\to X be a faithful
continuous left action of SS on a compact Hausdorff space XX. Then
sωXs^{\omega}X is a closed SS-invariant subspace, known as the eventual
range of ss, and SS acts on sωXs^{\omega}X by homeomorphisms. Moreover, the
action of SS on sωXs^{\omega}X factors through the homomorphism S→sωSS\to
s^{\omega}S from Theorem 2.

Notice that the fixed-point set of ss is contained in the eventual
range. Indeed, sx=xs x=x implies snx=xs^n x=x for all n≥1n\geq 1 and so
sωx=xs^{\omega}x=x. It follows that if the eventual range of ss is a singleton,
that is, if sωs^{\omega} acts as a constant map, then ss has a unique
fixed-point. Indeed, if sωX={x0}s^{\omega}X=\{x_0\}, then since the eventual range
of ss is SS-invariant, clearly sx0=x0s x_0=x_0. But the above discussion shows
that each fixed point of ss belongs to the eventual range and so x0x_0 is the
unique fixed-point of SS. In this case, sωS={sω}s^{\omega}S=\{s^{\omega}\} (by
faithfulness of the action) and hence sn→sωs^n\to s^{\omega} in light of
Theorem 2 and Lemma 1. We summarize this
discussion in the following proposition.

Proposition 3
Let S=⟨s⟩¯S=\overline {\langle s\rangle} be a monothetic compact semigroup acting
faithfully on the left on a compact Hausdorff space XX and suppose that
sωs^{\omega} acts as a constant map with image x0x_0, i.e., {x0}\{x_0\} is the
eventual range of ss. Then x0x_0 is the unique fixed-point of ss on XX,
sn→sωs^n\to s^{\omega} (uniformly) and snx→x0s^n x\to x_0 for all x∈Xx\in X. The dual
statement holds for right actions.

As a warmup exercise let us give a quick proof of the
Banach
fixed point theorem
for compact metric spaces. Let (X,d)(X,d) be a compact metric space. A
mapping f:X→Xf\colon X\to X is a weak contraction if d(f(x),f(y))≤d(x,y)d(f(x),f(y))\leq
d(x,y) for all x,y∈Xx,y\in X. If the inequality is strict for x≠yx\neq y, then
ff is said to be a (strict) contraction. The weak contractions form a
submonoid WCtr(X)WCtr(X) of the monoid of continuous self-maps of
XX, which is easily verified to be closed in the compact-open topology (and
hence is metric via the sup norm). Since the weak contractions clearly form an
equicontinuous family, the monoid WCtr(X)WCtr(X) is in fact
compact by the
Arzelà–Ascoli
theorem. The weak contractions form an
ideal JJ of WCtr(X)WCtr(X).

Let f:X→Xf\colon X\to X be a weak contraction such that fNf^N is a contraction for
some N≥1N\geq 1. Then S=⟨f⟩¯S=\overline{\langle f\rangle} is a compact monothetic
subsemigroup of WCtr(X)WCtr(X). Moreover, if II is the minimal
ideal of SS, then I⊆fNS⊆JI\subseteq f^N S\subseteq J and hence consists of
contractions. In particular, fωf^{\omega} is a contraction. But the image of
an idempotent is its fixed-point set and it is easily checked that a
contraction has at most one fixed-point. Thus fωf^{\omega} is a constant map
to some point x0x_0. By Proposition 3, x0x_0 is the unique
fixed-point of ff and fn(x)→x0f^n(x)\to x_0 for all x∈Xx\in X. This is the Banach
fixed-point theorem for compact metric spaces.

Stochastic matrices

Let 𝒫n\mathcal{P}_n be the space of all probability distributions on
{1,…,n}\{1,\ldots,n\}. We may identify it with the subspace of (row) vectors in
ℝn\mathbb{R}^n with non-negative entries summing to 11. That is,
𝒫n\mathcal{P}_n is the standard (n−1)(n-1)-simplex in ℝn\mathbb{R}^n and hence is
compact. Let 𝒮n\mathcal{S}_n be the set of all n×nn\times nstochastic
matrices, that is, the
set of all square matrices whose rows belong to 𝒫n\mathcal{P}_n. Then
𝒮n\mathcal{S}_n
is a compact monoid (homeomorphic to 𝒫nn\mathcal{P}_n^n) with respect to matrix
multiplication. Moreover, it is a partially ordered monoid with respect to the
usual ordering (A≥BA\geq B if A−B≥0A-B\geq 0) on non-negative matrices.

The natural right action ℝn×𝒮n→ℝn\mathbb{R}^n\times \mathcal{S}_n\to \mathbb{R}^n is
clearly continuous and 𝒫n\mathcal{P}_n is a compact invariant subspace.
Moreover, the action of 𝒮n\mathcal{S}_n on 𝒫n\mathcal{P}_n is faithful because
𝒫n\mathcal{P}_n contains the standard unit vectors. Thus we can also view
𝒮n\mathcal{S}_n as a compact monoid of transformations of both 𝒫n\mathcal{P}_n
and ℝn\mathbb{R}^n with respect to the compact-open topology.

Let JJ denote the matrix of all ones; note that PJ=JP J=J for any stochastic
matrix PP.

Consequently, the positive stochastic matrices form a left ideal of
𝒮n\mathcal{S}_n (topologically open). A probability distribution π∈𝒫n\pi\in
\mathcal{P}_n is stationary for a stochastic matrix PP if πP=π\pi P=\pi,
i.e., it is a fixed-point of PP.

Lemma 5
An idempotent positive stochastic matrix QQ has a unique stationary
distribution π\pi. Moreover, π\pi is strictly positive, each row of QQ is
π\pi, and all fixed vectors of QQ are multiples of π\pi. Consequently, QQ
has rank 11 and acts as a constant map on 𝒫n\mathcal{P}_n with image π\pi.

Proof:
Let QijQ_{i j} be the largest entry in column jj. Then
∑k=1nQik(Qij−Qkj)=∑k=1nQikQij−∑k=1nQikQkj=Qij−Qij=0
\sum_{k=1}^n Q_{i k}(Q_{i j}-Q_{k j}) =
\sum_{k=1}^n Q_{i k} Q_{i j}-\sum_{k=1}^n Q_{i k}Q_{k j} = Q_{i j}-Q_{i j}=0
and hence
Qij=QkjQ_{i j}=Q_{k j}, for k=1,…,nk=1,\ldots,n, as each Qik>0Q_{i k} \gt 0 and
Qij−Qkj≥0Q_{i j}-Q_{k j}\geq
0. Thus all rows of QQ are π\pi for some strictly positive probability
distribution π\pi. Idempotency immediately yields πQ=π\pi Q=\pi. Moreover,
since QQ is a rank 11 idempotent, every fixed vector of QQ must be a
multiple of π\pi. Consequently, π\pi is the unique stationary distribution
for QQ. Since the image of an idempotent mapping is its fixed-point set, we
conclude that QQ acts on 𝒫n\mathcal{P}_n as a constant map with image π\pi.

A stochastic matrix PP is said to be ergodic if Pm>0P^m \gt 0 for some
mm. Now we can prove one of the main theorems of finite Markov chain
theory. Recall that an eigenvalue of a matrix is simple if it has
multiplicity one as a root of the characteristic polynomial.

Theorem 6
Let PP be an ergodic stochastic matrix. Then:

PP has a unique stationary distribution π\pi;

π\pi is strictly positive;

Pn→ΠP^n\to \Pi where Π\Pi is the matrix all of whose rows are π\pi;

if μ\mu is any probability distribution, then μPn→π\mu P^n\to \pi;

the eigenvalue 11 of PP is simple and all other eigenvalues of PP have
modulus strictly less than 11.

Proof:
Let P∈𝒮nP\in \mathcal{S}_n with Pm>0P^m \gt 0. Put S=⟨P⟩¯S=\overline{\langle P\rangle}
and let II be the minimal ideal of SS. As I⊆SPmI\subseteq S P^m, it follows
from Lemma 4 that II consists of positive elements. Thus
Pω>0P^{\omega} \gt 0 and so Lemma 5 implies that PωP^{\omega} is a
constant map on 𝒫n\mathcal{P}_n with image π>0\pi \gt 0 its stationary
distribution. Proposition 3 then yields that π\pi is the unique stationary
distribution for PP, that Pn→PωP^n\to P^{\omega} and that μPn→π\mu P^n\to \pi for
all μ∈𝒫n\mu\in \mathcal{P}_n. Moreover, Pω=ΠP^{\omega}=\Pi by
Lemma 5.

For the final statement, observe that
ℝn=ℝnPω⊕kerPω=ℝπ⊕kerPω
\mathbb{R}^n
=
\mathbb{R}^n P^{\omega} \oplus \ker P^{\omega}
=
\mathbb{R}\pi \oplus \ker P^{\omega}
(the last equality by Lemma 5). Moreover, PP preserves this direct sum
decomposition because PP and PωP^{\omega} commute. Set V=kerPωV=\ker P^{\omega}.
From the fact that Pn→PωP^n\to P^{\omega}, it follows that the powers of the
restriction P|VP|_V of PP to VV converge to 00 and hence P|VP|_V has
spectral
radius
strictly less than 11. Since the characteristic polynomial of PP is the
product of x−1x - 1 and the characteristic polynomial of P|VP|_V, it follows that
11 is a simple eigenvalue of PP.

A stochastic matrix PP is called irreducible if, for
all i,ji,j, there exists k>0k \gt 0 such that Pijk>0P^k_{i j} \gt 0. It is easy to
verify that if PP is irreducible, then Q=(1/2)In+(1/2)PQ=(1/2) I_n + (1/2) P is ergodic,
where InI_n is the n×nn\times n identity matrix. Observe that vQ=vv Q=v if and
only if v=vPv=v P for any vector vv. This leads to the existence and uniqueness
of stationary distributions for irreducible Markov chains.

Corollary 7
Let PP be an irreducible stochastic matrix. Then PP has a unique stationary
distribution π\pi. Moreover, π\pi is strictly positive and every fixed
vector of PP is a multiple of π\pi.

Posted at January 3, 2012 10:31 PM UTC

TrackBack URL for this Entry: http://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/2484

5 Comments & 0 Trackbacks

Re: A Semigroup Approach to Finite Markov Chains

This is very nice! Thanks!

I’ll make just a trivial terminological comment for the moment. When you started using the phrase “eventual range” in earlier threads, I didn’t realize that it was actually an accepted term. I thought you were just taking what I was calling the eventual image and adapting it to your preference for the word “range” over “image”. But I see now that “eventual range” is a term in use, e.g. it’s defined on page 242 of Lind and Marcus’s book on symbolic dynamics.

(As far as I was concerned, “eventual image” was my own coinage. I googled it a while ago, and discovered that a few other people had used it too, but I didn’t think of trying it with “range” in place of “image”.)

Re: A Semigroup Approach to Finite Markov Chains

Yes and no. If SS is any compact semigroup then the probability measures on S form a compact semigroup under convolution. The stochastic matrices essentially form a special case. Some of the convergence results of Diaconis and Brown can be understood from this viewpoint but you need semigroup representation theory for the bounds on rate of convergence.