The Cayley-Hamilton Theorem

Philip J. Erdelsky

June 8, 2006
Reformatted June 7, 2012

Please e-mail comments, corrections and additions to the webmaster at
pje@efgh.com.

1. The Theorem

Some mathematical theorems are especially beautiful. It's hard to say
what makes a theorem beautiful, but two properties come to mind:

It's simple to state.

It's hard to prove.

In my opinion, one very beautiful theorem is the Cayley-Hamilton
Theorem of matrix algebra. It states that if p(z) is the
characteristic polynomial of an n ⨉ n complex matrix A,
then p(A) is the
zero matrix, where addition and multiplication in its evaluation are the
usual matrix operations, and the constant term p0 of
p(z) is
replaced by the matrix p0I.

Recall that the characteristic polynomial of A is given by

p(z) = det(A-zI).

2. The Simple But Invalid Proof

At this point, students sometimes naively substitute A for
z in the
definition of the characteristic polynomial and falsely conclude that
the Cayley-Hamilton Theorem is trivial:

Why isn't p(A) = det(A-AI) =
det(A-A) = det(O) = 0?

There are two obvious fallacies. First of all, zI in the definition of
the characteristic polynomial represents the multiplication of a matrix
by a scalar, but AI is the multiplication of two matrices. Moreover, the
argument given above seems to prove that a matrix is equal to a scalar.

Actually, this argument is valid in the special case where
A
is a 1 ⨉ 1 matrix, and there is essentially no difference between matrix
and scalar operations; but in that case the Cayley-Hamilton Theorem
really is trivial.

3. An Analytic Proof

There are many proofs of the Cayley-Hamilton Theorem. I must admit that
I have trouble following most of them, because they involve rather
advanced algebra or combinatorics. A more analytic argument, like the
one presented here, is more suited to my own training and talents.

To prove the theorem in the general case, let λ be an eigenvalue of
A, and let x be the corresponding eigenvector (expressed as a column
vector). Then

Ax = λx.

Using elementary properties of scalar and matrix operations gives

A2x = (AA)x = A(Ax) = A(λx) = λ(Ax) = λ(λx) = λ2x.

It can be shown in general that

[1] Akx = λkx, for k = 0, 1, 2, ... n,

where A0 = I.

Let the characteristic polynomial of A be

p(z) = pnzn +
pn-1zn-1 +
pn-2zn-2 +
... +
p0.

Then multiply each equation in [1] above by pk and add them
to obtain

p(A)x = p(λ)x.

Now p(λ) is zero because λ is an eigenvalue
of A. Hence p(A)x = 0
for every eigenvector x of A.

If A has n linearly independent eigenvectors, this implies that
p(A) must be the zero matrix.

If A does not have n linearly independent eigenvectors,
we construct
a sequence A1, A2, ...
of matrices whose limit is
A, each of which has n linearly independent eigenvectors.
Then if pj(z) is the characteristic polynomial of
Aj, pj(Aj) = O.
Since all coefficients
in pj(Aj) are continuous functions of the matrix
entries, the same is true of the limit p(A).

To create such a sequence, it is sufficient to construct matrices
arbitrarily close to A, each of which has n linearly independent
eigenvectors.

First, we need a simple lemma. The matrix A, like all complex matrices,
is similar to an upper triangular matrix, i.e., there is a nonsingular
matrix Q for which
Q-1AQ
is upper triangular. This result is well-known, but a simple proof
is given in Appendix A.

The eigenvalues of an upper triangular matrix appear along its principal
diagonal. There is an upper triangular matrix T
arbitrarily close to
Q-1AQ with n distinct eigenvalues.
Then QTQ-1 is
arbitrarily close to A and has the same n distinct
eigenvalues as T.

A matrix with n distinct eigenvalues has n distinct eigenvectors. If
these eigenvectors were linearly dependent, they would span a space of
dimension less than n. The mapping defined by the matrix, restricted to
this space, would still have the same n distinct eigenvalues, which is
impossible. Hence the eigenvectors are linearly independent.

4. Proof for Commutative Rings

This proves the Cayley-Hamilton Theorem for complex matrices, but it
is also true for matrices over more general commutative rings.

The proof of this is actually fairly simple. Our experience in proving
the 2 ⨉ 2 case shows the way. The expression for each entry in
p(A) is a
polynomial in n2 variables, which are the entries of
A. It's
not just any polynomial, but one which takes on the value zero for all
values of the variables. That can happen only if all the coefficients
are zero when like terms are combined. (This seems to be an obvious
result, but it requires proof, so one is given in Appendix B.) Hence the
polynominal evaluates to zero in any other algebraic entity that has all
the necessary operations.

It might appear that the ring must have a unit. However, if we refrain
from combining like terms, we will have a sum of monomials, each
prepended by a plus sign or a minus sign. Even in the complex field,
cancellation is possible only if every positive monomial has a
corresponding negative monomial. They will cancel in a ring, too, even
if the ring has no unit.

Appendix A. Every Complex Matrix is Similar to an Upper Triangular
Matrix.

Let A be an n ⨉ n complex matrix. We show,
by induction on n, that it
is similar to an upper triangular matrix.

For n = 1 the assertion is trivial because A is already upper
triangular.

In other cases, let λ be an eigenvalue of
A and let x1 be a
corresponding eigenvector. Extend this to a set x1,
x2, ..., xn of linearly independent vectors,
and let Q be the matrix whose columns are x1,
x2, ..., xn.

Then if e1 is the column vector with 1 in its first
row and zeros elsewhere, Me1
is the first column of M for any
n ⨉ n matrix M. Hence

Q-1AQe1 = Q-1Ax1 =
Q-1λx1 = λQ-1x1 =
λe1,

and the first column of Q-1AQ is
λe1. Hence it
can be partitioned as follows:

| λ v |
| |
| 0B |,

where v is an (n-1)-element row vector,
B is an (n-1) ⨉ (n-1) matrix,
and 0 is all zeros. By inductive hypothesis,
R-1BR is upper
triangular for some nonsingular matrix R,
so the following matrix
is the required upper triangular matrix similar to A:

| 1 0 | | 1 0 |
| | * Q-1AQ * | | .
| 0R-1 | | 0R |

APPENDIX B. A Complex Polynomial Is Identically Zero Only If All
Coefficients Are Zero

If a polynomial is identically zero, all of its coefficients are zero,
when like terms are combined, of course.

The proof is by induction on the number of independent variables.

For a polynomial p(z) in one variable, we simply note that
p(k)(0) is k! times the coefficient of
zk.
If the polynomial is identically zero, all of its derivatives are also
identically zero, and all of its coefficients must be zero.

A polynominal in n variables can be rearranged so it is a polynomial
in one variable, and each of its coefficients is a polynomial in the
remaining n-1 variables.

By the result for one variable, every such coefficient is zero for all
values of its independent variables. Hence by inductive hypothesis all
of its coefficients are zero.

Although the coefficients of an identically zero complex polynomial are
all zero, this is not true over finite fields. If N is the number of
elements in a finite field, then zN - z = 0 for every element
z of the field. (The proof given above breaks down in this case.
Although formal derivatives can be defined, N! is zero, as it appears in
the formal derivative.)