Section JCF Jordan Canonical Form

We have seen in Section IS that generalized eigenspaces are invariant
subspaces that in every instance have led to a direct sum decomposition of the
domain of the associated linear transformation. This allows us to create a block
diagonal matrix representation (Example ISMR4, Example ISMR6). We also
know from Theorem RGEN that the restriction of a linear transformation to a
generalized eigenspace is almost a nilpotent linear transformation. Of
course, we understand nilpotent linear transformations very well from
Section NLT and we have carefully determined a nice matrix representation for
them.

So here is the game plan for the final push. Prove that the domain of a
linear transformation always decomposes into a direct sum of generalized
eigenspaces. We have unravelled Theorem RGEN at Theorem MRRGE so that
we can formulate the matrix representations of the restrictions on the
generalized eigenspaces using our storehouse of results about nilpotent
linear transformations. Arrive at a matrix representation of any linear
transformation that is block diagonal with each block being a Jordan
block.

Subsection GESD: Generalized Eigenspace Decomposition

In Theorem UTMR we were able to show that any linear transformation from
V to
V has
an upper triangular matrix representation (Definition UTM). We will now show
that we can improve on the basis yielding this representation by massaging the
basis so that the matrix representation is also block diagonal. The subspaces
associated with each block will be generalized eigenspaces, so the most general
result will be a decomposition of the domain of a linear transformation into a
direct sum of generalized eigenspaces.

Proof Suppose that \mathop{ dim}\nolimits \left (V \right ) = n
and the n (not necessarily
distinct) eigenvalues of T
are scalarlistρn. We begin
with a basis of V
that yields an upper triangular matrix representation, as guaranteed by Theorem UTMR,
B = \left \{{x}_{1},\kern 1.95872pt {x}_{2},\kern 1.95872pt {x}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {x}_{n}\right \}. Since
the matrix representation is upper triangular, and the eigenvalues of the linear
transformation are the diagonal elements we can choose this basis so that there are then
scalars {a}_{ij},
1 ≤ j ≤ n,
1 ≤ i ≤ j − 1 such
that

We now define a new basis for V
which is just a slight variation in the basis
B. Choose
any k and
ℓ such
that 1 ≤ k < ℓ ≤ n and
{ρ}_{k}\mathrel{≠}{ρ}_{ℓ}. Define the
scalar α = {a}_{kl}∕\left ({ρ}_{ℓ} − {ρ}_{k}\right ). The
new basis is C = \left \{{y}_{1},\kern 1.95872pt {y}_{2},\kern 1.95872pt {y}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {y}_{n}\right \}
where

We now compute the values of the linear transformation
T with inputs
from C,
noting carefully the changed scalars in the linear combinations of
C
describing the outputs. These changes will translate to minor
changes in the matrix representation built using the basis
C.
There are three cases to consider, depending on which column
of the matrix representation we are examining. First, assume
j < ℓ.
Then

So how different are the matrix representations relative to
B and
C in
column ℓ?
For i > k, the
coefficient of {y}_{i} is
{a}_{ij}, as in the representation
relative to B. It is a
different story for i ≤ k,
where the coefficients of {y}_{i}
may be very different. We are especially interested in the coefficient of
{y}_{k}. In fact, this
whole first part of this proof is about this particular entry of the matrix representation. The
coefficient of {y}_{k}
is

If the definition of α
was a mystery, then no more. In the matrix representation of
T relative to
C, the entry
in column ℓ,
row k
is a zero. Nice. The only price we pay is that other entries in column
ℓ, specifically
rows 1
through k − 1,
may also change in a way we can’t control.

As before, we ask: how different are the matrix representations relative to
B and
C in
column j?
Only {y}_{k}
has a coefficient different from the corresponding coefficient when the basis is
B.
So in the matrix representations, the only entries to change are in row
k, for
columns ℓ + 1
through n.

What have we accomplished? With a change of basis, we can place a zero in a desired
entry (row k,
column ℓ)
of the matrix representation, leaving most of the entries untouched. The only
entries to possibly change are above the new zero entry, or to the right of the new
zero entry. S Suppose we repeat this procedure, starting by “zeroing out” the
entry above the diagonal in the second column and first wow. Then we move right
to the third column, and zero out the element just above the diagonal in the
second row. Next we zero out the element in the third column and first row. Then
tackle the fourth column, work upwards from the diagonal, zeroing out elements
as we go. Entries above, and to the right will repeatedly change, but
newly created zeros will never get wrecked, since they are below, or just
to the left of the entry we are working on. Similarly the values on the
diagonal do not change either. This entire argument can be retooled in the
language of change-of-basis matrices and similarity transformations, and
this is the approach taken by Noble in his Applied Linear Algebra. It is
interesting to concoct the change-of-basis matrix between the matrices
B and
C and
compute the inverse.

Perhaps you have noticed that we have to be just a bit more
careful than the previous paragraph suggests. The definition of
α has a
denominator that cannot be zero, which restricts our maneuvers to zeroing out entries in
row k and
column ℓ
only when {ρ}_{k}\mathrel{≠}{ρ}_{ℓ}.
So we do not necessarily arrive at a diagonal matrix. More carefully we can
write

where the {b}_{ij}
are our new coefficients after repeated changes, the
{y}_{j} are the new basis vectors,
and the condition “i : \kern 1.95872pt {ρ}_{i} = {ρ}_{j}”
means that we only have terms in the sum involving vectors whose final
coefficients are identical diagonal values (the eigenvalues). Now reorder the basis
vectors carefully. Group together vectors that have equal diagonal entries in the
matrix representation, but within each group preserve the order of the precursor
basis. This grouping will create a block diagonal structure for the matrix
representation, while otherwise preserving the order of the basis will retain the
upper triangular form of the representation. So we can arrive at a basis that yields
a matrix representation that is upper triangular and block diagonal, with the
diagonal entries of each block all equal to a common eigenvalue of the linear
transformation.

More carefully, employing the distinct eigenvalues of
T,
{λ}_{i},
1 ≤ i ≤ m, we can assert there is
a set of basis vectors for V ,
{u}_{ij},
1 ≤ i ≤ m,
1 ≤ j ≤ {α}_{T }\left ({λ}_{i}\right ), such
that

Whew. This is a good place to take a break, grab a cup of coffee,
use the toilet, or go for a short stroll, before we show that
{U}_{i} is a subspace of the
generalized eigenspace {G}_{T }\left ({λ}_{i}\right ).
This will follow if we can prove that each of the basis vectors for
{U}_{i} is a generalized
eigenvector of T for
{λ}_{i} (Definition GEV). We
need some power of T − {λ}_{i}{I}_{V }
that takes {u}_{ij}
to the zero vector. We prove by induction on
j (Technique I)
the claim that {\left (T − {λ}_{i}{I}_{V }\right )}^{j}\left ({u}_{
ij}\right ) = 0.
For j = 1
we have,

This completes the induction step. Since every vector of the spanning set for
{U}_{i} is an element of
the subspace {G}_{T }\left ({λ}_{i}\right ),
Property AC and Property SC allow us to conclude that
{U}_{i} ⊆{G}_{T }\left ({λ}_{i}\right ). Then by
Definition S, {U}_{i} is
a subspace of {G}_{T }\left ({λ}_{i}\right ).
Notice that this inductive proof could be interpreted to say that every element of
{U}_{i} is a generalized
eigenvector of T for
{λ}_{i}, and the algebraic
multiplicity of {λ}_{i}
is a sufficiently high power to demonstrate this via the definition for each
vector.

We are now prepared for our final argument in this long
proof. We wish to establish that the dimension of the subspace
{G}_{T }\left ({λ}_{i}\right ) is the algebraic
multiplicity of {λ}_{i}. This will
be enough to show that {U}_{i}
and {G}_{T }\left ({λ}_{i}\right )
are equal, and will finally provide the desired direct sum decomposition.

We will prove by induction (Technique I) the following claim. Suppose that
T : V \mathrel{↦}V is a linear
transformation and B
is a basis for V
that provides an upper triangular matrix representation of
T. The number of
times any eigenvalue λ
occurs on the diagonal of the representation is greater than
or equal to the dimension of the generalized eigenspace
{G}_{T }\left (λ\right ).

We will use the symbol m
for the dimension of V
so as to avoid confusion with our notation for the nullity. So
\mathop{ dim}\nolimits V = m and our proof will proced
by induction on m. Use
the notation {#}_{T }(λ) to count
the number of times λ
occurs on the diagonal of a matrix representation of
T. We
want to show that

For the base case, \mathop{ dim}\nolimits V = 1. Every
matrix representation of T
is an upper triangular matrix with the lone eigenvalue of
T,
λ, as the diagonal entry.
So {#}_{T }(λ) = 1. The generalized
eigenspace of λ
is not trivial (since by Theorem GEK it equals the regular eigenspace), and is a subspace of
V . With Theorem PSSD
we see that \mathop{ dim}\nolimits \left ({G}_{T }\left (λ\right )\right ) = 1.

Suppose that λ is
any eigenvalue of T.
Then suppose that v ∈K\kern -1.95872pt \left ({\left (T − {λ}_{V }\right )}^{m}\right ).
As an element of V ,
we can write v
as a linear combination of the basis elements of
B, or more compactly,
there is a vector u ∈ U
and a scalar α
such that v = u + α{v}_{m}.
Then,

The final expression in this string of equalities is an element of
U since
U is invariant
relative to both T
and {I}_{V }.
The expression at the beginning is a scalar multiple of
{v}_{m},
and as such cannot be a nonzero element of
U without violating the
linear independence of B.
So

\eqalignno{
α{\left ({λ}_{m} − λ\right )}^{m}{v}_{
m} & = 0 & &
}

The vector {v}_{m} is
nonzero since B
is linearly independent, so Theorem SMEZV tells us that
α{\left ({λ}_{m} − λ\right )}^{m} = 0.
From the properties of scalar multiplication, we are confronted with two
possibilities.

Our first case is that λ\mathrel{≠}{λ}_{m}.
Notice then that λ
occurs the same number of times along the diagonal in the representations of
T{|}_{U} and
T. Now
α = 0 and
v = u + 0{v}_{m} = u. Since
v was chosen as an
arbitrary element of K\kern -1.95872pt \left ({\left (T − λ{I}_{V }\right )}^{m}\right ),
Definition SSET says that K\kern -1.95872pt \left ({\left (T − λ{I}_{V }\right )}^{m}\right ) ⊆ U.
It is always the case that K\kern -1.95872pt \left ({\left (T{|}_{U} − λ{I}_{U}\right )}^{m}\right ) ⊆K\kern -1.95872pt \left ({\left (T − λ{I}_{
V }\right )}^{m}\right ).
However, we can also see that in this case, the opposite
set inclusion is true as well. By Definition SE we have
K\kern -1.95872pt \left ({\left (T{|}_{U} − λ{I}_{U}\right )}^{m}\right ) = K\kern -1.95872pt \left ({\left (T − λ{I}_{
V }\right )}^{m}\right ).
Then

Subsection JCF: Jordan Canonical Form

Now we are in a position to define what we (and others) regard as an
especially nice matrix representation. The word “canonical” has at its root, the
word “canon,” which has various meanings. One is the set of laws established by a
church council. Another is a set of writings that are authentic, important or
representative. Here we take to to mean the accepted, or best, representative
among a variety of choices. Every linear transformation admits a variety
of representations, and will declare one as the best. Hopefully you will
agree.

Definition JCFJordan Canonical Form A square matrix is in Jordan canonical form if it meets the following
requirements:

The matrix is block diagonal.

Each block is a Jordan block.

If ρ < λ
then the block {J}_{k}\left (ρ\right )
occupies rows with indices greater than the indices of the rows occupied
by {J}_{ℓ}\left (λ\right ).

If ρ = λ
and ℓ < k,
then the block {J}_{ℓ}\left (λ\right )
occupies rows with indices greater than the indices of the rows occupied
by {J}_{k}\left (λ\right ).

Theorem GESD gives us a decomposition of
V
into generalized eigenspaces, one for each distinct eigenvalue.
Since these generalized eigenspaces ar invariant relative to
T, this provides
a block diagonal matrix representation where each block is the matrix representation of the
restriction of T
to the generalized eigenspace.

Restricting T
to a generalized eigenspace results in a “nearly nilpotent” linear transformation,
as stated more precisely in Theorem RGEN. We unravel Theorem RGEN in the
proof of Theorem MRRGE so that we can apply Theorem CFNLT about
representations of nilpotent linear transformations.

We know the dimension of a generalized eigenspace is the algebraic
multiplicity of the eigenvalue (Theorem DGES), so the blocks associated with the
generalized eigenspaces are square with a size equal to the algebraic multiplicity.
In refining the basis for this block, and producing Jordan blocks the results of
Theorem CFNLT apply. The total number of blocks will be the nullity of
T{|}_{{G}_{T}\left (λ\right )} − λ{I}_{{G}_{T}\left (λ\right )}, which is the geometric
multiplicity of λ as
an eigenvalue of T
(Definition GME). The largest of the Jordan blocks will have
size equal to the index of the nilpotent linear transformation
T{|}_{{G}_{T}\left (λ\right )} − λ{I}_{{G}_{T}\left (λ\right )},
which is exactly the definition of the index of the eigenvalue
λ (Definition IE).
■

Before we do some examples of this result, notice how close Jordan canonical
form is to a diagonal matrix. Or, equivalently, notice how close we have
come to diagonalizing a matrix (Definition DZM). We have a matrix
representation which has diagonal entries that are the eigenvalues of a matrix.
Each occurs on the diagonal as many times as the algebraic multiplicity.
However, when the geometric multiplicity is strictly less than the algebraic
multiplicity, we have some entries in the representation just above the
diagonal (the “superdiagonal”). Furthermore, we have some idea how often
this happens if we know the geometric multiplicity and the index of the
eigenvalue.

We now recognize just how simple a diagonalizable linear transformation really
is. For each eigenvalue, the generalized eigenspace is just the regular eigenspace,
and it decomposes into a direct sum of one-dimensional subspaces, each
spanned by a different eigenvector chosen from a basis of eigenvectors for the
eigenspace.

Some authors create matrix representations of nilpotent linear transformations
where the Jordan block has the ones just below the diagonal (the “subdiagonal”).
No matter, it is really the same, just different. We have also defined Jordan
canonical form to place blocks for the larger eigenvalues earlier, and for blocks
with the same eigenvalue, we place the bigger ones earlier. This is fairly standard,
but there is no reason we couldn’t order the blocks differently. It’d be
the same, just different. The reason for choosing some ordering is to be
assured that there is just one canonical matrix representation for each linear
transformation.

For each eigenvalue, we can compute a generalized eigenspace. By Theorem GESD we
know that {ℂ}^{10}
will decompose into a direct sum of these eigenspaces, and we can restrict
T
to each part of this decomposition. At this stage we know that the
Jordan canonical form will be block diagonal with blocks of size
2,
3 and
5, since
the dimensions of the generalized eigenspaces are equal to the algebraic
multiplicities of the eigenvalues (Theorem DGES). The geometric multiplicities
tell us how many Jordan blocks occupy each of the three larger blocks,
but we will discuss this as we analyze each eigenvalue. We do not yet
know the index of each eigenvalue (though we can easily infer it for
λ = 2) and
even if we did have this information, it only determines the size of the largest
Jordan block (per eigenvalue). We will press ahead, considering each eigenvalue
one at a time.

The eigenvalue λ = 2
has “full” geometric multiplicity, and is not an impediment to diagonalizing
T.
We will treat it in full generality anyway. First we compute
the generalized eigenspace. Since Theorem GEK says that
{G}_{T }\left (2\right ) = K\kern -1.95872pt \left ({\left (T − 2{I}_{{ℂ}^{10}}\right )}^{10}\right ) we
can compute this generalized eigenspace as a null space derived from the matrix
A,

The restriction of T
to {G}_{T }\left (2\right )
relative to the two basis vectors above has a matrix representation that is a
2 × 2 diagonal matrix
with the eigenvalue λ = 2
as the diagonal entries. So these two vectors will be the first two vectors in our basis
for {ℂ}^{10},

Notice that it was not strictly necessary to compute the 10-th power of
A − 2{I}_{10}. With
{α}_{T }\left (2\right ) = {γ}_{T }\left (2\right ) the null space
of the matrix A − 2{I}_{10}
contains all of the generalized eigenvectors of
T for the
eigenvalue λ = 2.
But there was no harm in computing the 10-th power either. This
discussion is equivalent to the observation that the linear transformation
T{|}_{{G}_{T}\left (2\right )}: {G}_{T }\left (2\right )\mathrel{↦}{G}_{T }\left (2\right ) is nilpotent
of index 1. In
other words, {ι}_{T }\left (2\right ) = 1.

The eigenvalue λ = 0
will not be quite as simple, since the geometric multiplicity is
strictly less than the geometric multiplicity. As before, we first
compute the generalized eigenspace. Since Theorem GEK says that
{G}_{T }\left (0\right ) = K\kern -1.95872pt \left ({\left (T − 0{I}_{{ℂ}^{10}}\right )}^{10}\right ) we
can compute this generalized eigenspace as a null space derived from the matrix
A,

So \mathop{ dim}\nolimits \left ({G}_{T }\left (0\right )\right ) = 3 = {α}_{T }\left (0\right ),
as expected. We will use these three basis vectors for the
generalized eigenspace to construct a matrix representation of
T{|}_{{G}_{T}\left (0\right )}, where
F is being defined
implicitly as the basis of {G}_{T }\left (0\right ).
We construct this representation as usual, applying Definition MR,

By Theorem RGEN we can obtain a nilpotent matrix from this
matrix representation by subtracting the eigenvalue from the
diagonal elements, and then we can apply Theorem CFNLT to
M − (0){I}_{3}. First check that
{\left (M − (0){I}_{3}\right )}^{2} = O, so we know that the
index of M − (0){I}_{3} as a nilpotent
matrix, and that therefore λ = 0
is an eigenvalue of T
with index 2,
{ι}_{T }\left (0\right ) = 2. To determine
a basis of {ℂ}^{3}
that converts M − (0){I}_{3}
to canonical form, we need the null spaces of the powers of
M − (0){I}_{3}. For
convenience, set N = M − (0){I}_{3}.

Then we choose a vector from N\kern -1.95872pt \left ({N}^{2}\right )
that is not an element of N\kern -1.95872pt \left ({N}^{1}\right ).
Any vector with unequal first two entries will fit the bill, say

We need a vector to pair with {z}_{1,1}
that will make a basis for the two-dimensional subspace
N\kern -1.95872pt \left ({N}^{1}\right ). Examining
the basis for N\kern -1.95872pt \left ({N}^{1}\right )
we see that a vector with its first two entries equal will do the job.

Now we add back the eigenvalue λ = 0
to the representation of N to
obtain a representation for M.
Of course, with an eigenvalue of zero, the change is not apparent, so we won’t display
the same matrix again. This is the second block of the Jordan canonical form for
T. However, the
three vectors in C
will not suffice as basis vectors for the domain of
T — they have the wrong
size! The vectors in C
are vectors in the domain of a linear transformation defined by the matrix
M. But
M was a matrix
representation of T{|}_{{G}_{T}\left (0\right )} − 0{I}_{{G}_{T}\left (0\right )}
relative to the basis F
for {G}_{T }\left (0\right ).
We need to “uncoordinatize” each of the basis vectors in
C to produce a linear
combination of vectors in F
that will be an element of the generalized eigenspace
{G}_{T }\left (0\right ).
These will be the next three vectors of our final answer, a basis for
{ℂ}^{10} that
has a pleasing matrix representation.

Five down, five to go. Basis vectors, that is.
λ = −1 is
the smallest eigenvalue, but it will require the most computation. First
we compute the generalized eigenspace. Since Theorem GEK says that
{G}_{T }\left (−1\right ) = K\kern -1.95872pt \left ({\left (T − (−1){I}_{{ℂ}^{10}}\right )}^{10}\right ) we
can compute this generalized eigenspace as a null space derived from the matrix
A,

So \mathop{ dim}\nolimits \left ({G}_{T }\left (−1\right )\right ) = 5 = {α}_{T }\left (−1\right ),
as expected. We will use these five basis vectors for the
generalized eigenspace to construct a matrix representation of
T{|}_{{G}_{T}\left (−1\right )}, where
F
is being recycled and defined now implicitly as the basis of
{G}_{T }\left (−1\right ). We
construct this representation as usual, applying Definition MR,

By Theorem RGEN we can obtain a nilpotent matrix from this
matrix representation by subtracting the eigenvalue from the
diagonal elements, and then we can apply Theorem CFNLT to
M − (−1){I}_{5}. First check that
{\left (M − (−1){I}_{5}\right )}^{3} = O, so we know that the
index of M − (−1){I}_{5} as a nilpotent
matrix, and that therefore λ = −1
is an eigenvalue of T
with index 3,
{ι}_{T }\left (−1\right ) = 3. To determine
a basis of {ℂ}^{5}
that converts M − (−1){I}_{5}
to canonical form, we need the null spaces of the powers of
M − (−1){I}_{5}. Again, for
convenience, set N = M − (−1){I}_{5}.

Then we choose a vector from N\kern -1.95872pt \left ({N}^{3}\right )
that is not an element of N\kern -1.95872pt \left ({N}^{2}\right ). The
sum of the four basis vectors for N\kern -1.95872pt \left ({N}^{2}\right )
sum to a vector with all five entries equal to 1. We will mess with the first entry to create a
vector not in N\kern -1.95872pt \left ({N}^{2}\right ),

We have a basis for the two-dimensional subspace
N\kern -1.95872pt \left ({N}^{1}\right ) and we can add to that the
vector {z}_{2,1} and we have three
of four basis vectors for N\kern -1.95872pt \left ({N}^{2}\right ).
These three vectors span the subspace we call
{Q}_{2}. We need a fourth
vector outside of {Q}_{2}
to complete a basis of the four-dimensional subspace
N\kern -1.95872pt \left ({N}^{2}\right ).
Check that the vector

is an element of N\kern -1.95872pt \left ({N}^{2}\right ) that lies
outside of the subspace {Q}_{2}.
This vector was constructed by getting a nice basis for
{Q}_{2} and
forming a linear combination of this basis that specifies three of the five entries of the
result. Of the remaining two entries, one was changed to move the vector outside
of {Q}_{2}
and this was followed by a change to the remaining entry to place the vector into
N\kern -1.95872pt \left ({N}^{2}\right ). The
vector {z}_{2,2}
is the lone basis vector for the subspace we call
{R}_{2}.

The remaining two basis vectors are easy to come by. They are the result of
applying N
to each of the two most recently determined basis vectors,

The basis C
yields a pleasant matrix representation for the restriction of the linear transformation
T − (−1)I to the generalized
eigenspace {G}_{T }\left (−1\right ).
However, we must remember that these vectors in
{ℂ}^{5} are representations
of vectors in {ℂ}^{10}
relative to the basis F.
Each needs to be “un-coordinatized” before joining our final basis. Here we
go,

If you are not inclined to check all of these computations, here are
a few that should convince you of the amazing properties of the basis
B. Compute the
matrix-vector products A{v}_{i},
1 ≤ i ≤ 10.
In each case the result will be a vector of the form
λ{v}_{i} + δ{v}_{i−1}, where
λ is one
of the eigenvalues (you should be able to predict ahead of time which one) and
δ ∈\left \{0, 1\right \}.

Alternatively, if we can write inputs to the linear transformation
T as linear combinations
of the vectors in B (which
we can do uniquely since B
is a basis, Theorem VRRB), then the “action” of
T is reduced to
a matrix-vector product with the exceedingly simple matrix that is the Jordan canonical
form. Wow! ⊠

Subsection CHT: Cayley-Hamilton Theorem

Jordan was a French mathematician who was active in the late 1800’s. Cayley
and Hamilton were 19th-century contemporaries of Jordan from Britain. The
theorem that bears their names is perhaps one of the most celebrated in
basic linear algebra. While our result applies only to vector spaces and
linear transformations with scalars from the set of complex numbers,
ℂ,
the result is equally true if we restrict our scalars to the real numbers,
{ℝ}^{}. It
says that every matrix satisfies its own characteristic polynomial.

Proof Suppose B and
C are similar matrices
via the matrix S,
B = {S}^{−1}CS, and
q(x) is any
polynomial. Then q\left (B\right )
is similar to q\left (C\right )
via S,
q\left (B\right ) = {S}^{−1}q\left (C\right )S. (See
Example HPDM for hints on how to convince yourself of this.)

By Theorem JCFLT and Theorem SCB we know
A is similar to a matrix,
J, in Jordan canonical
form. Suppose {λ}_{1},\kern 1.95872pt {λ}_{2},\kern 1.95872pt {λ}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {λ}_{m} are the
distinct eigenvalues of A
(and are therefore the eigenvalues and diagonal entries of
J).
Then by Theorem EMRCP and Definition AME, we can factor the characteristic
polynomial as

The matrix J − {λ}_{k}I
will be block diagonal, and the block arising from the generalized eigenspace for
{λ}_{k} will
have zeros along the diagonal. Suitably adjusted for matrices (rather than linear
transformations), Theorem RGEN tells us this matrix is nilpotent. Since
the size of this nilpotent matrix is equal to the algebraic multiplicity of
{λ}_{k}, the
power {\left (J − {λ}_{k}I\right )}^{{α}_{A}\left ({λ}_{k}\right )}
will be a zero matrix (Theorem KPNLT) in the location of this block.

Repeating this argument for each of the
m
eigenvalues will place a zero block in some term of the product at every location
on the diagonal. The entire product will then be zero blocks on the diagonal, and
zero off the diagonal. In other words, it will be the zero matrix. Since
A and
J are
similar, {p}_{A}\left (A\right ) = {p}_{A}\left (J\right ) = O.
■

Annotated Acronyms R: Representations

Definition VRMatrix representations build on vector representations, so this is the definition
that gets us started. A representation depends on the choice of a single basis for
the vector space. Theorem VRRB is what tells us this idea might be
useful.

Theorem VRILTAs an invertible linear transformation, vector representation allows
us to translate, back and forth, between abstract vector spaces
(V ) and concrete
vector spaces ({ℂ}^{n}).
This is key to all our notions of representations in this chapter.

Theorem CFDVSEvery vector space with finite dimension “looks like” a vector space of column
vectors. Vector representation is the isomorphism that establishes that these
vector spaces are isomorphic.

Definition MRBuilding on the definition of a vector representation, we define a representation of
a linear transformation, determined by a choice of two bases, one for the domain
and one for the codomain. Notice that vectors are represented by columnar lists of
scalars, while linear transformations are represented by rectangular tables of
scalars. Building a matrix representation is as important a skill as row-reducing a
matrix.

Theorem FTMRDefinition MR is not really very interesting until we have this theorem. The
second form tells us that we can compute outputs of linear transformations via
matrix multiplication, along with some bookkeeping for vector representations.
Searching forward through the text on “FTMR” is an interesting exercise.
You will find reference to this result buried inside many key proofs at
critical points, and it also appears in numerous examples and solutions to
exercises.

Theorem MRCLTTurns out that matrix multiplication is really a very natural operation, it is just
the chaining together (composition) of functions (linear transformations).
Beautiful. Even if you don’t try to work the problem, study Solution MR.T80 for
more insight.

Theorem KNSIKernels “are” null spaces. For this reason you’ll see these terms used
interchangeably.

Theorem RCSIRanges “are” column spaces. For this reason you’ll see these terms used
interchangeably.

Theorem NME9The NMEx series has always been important, but we’ve held off saying so until
now. This is the end of the line for this one, so it is a good time to contemplate all
that it means.

Theorem SCBDiagonalization back in Section SD was really a change of basis to achieve a
diagonal matrix repesentation. Maybe we should be highlighting the more general
Theorem MRCB here, but its overly technical description just isn’t as appealing.
However, it will be important in some of the matrix decompostions in
Chapter MD.

Theorem EERThis theorem, with the companion definition, Definition EELT, tells us that
eigenvalues, and eigenvectors, are fundamentally a characteristic of linear
transformations (not matrices). If you study matrix decompositions in
Chapter MD you will come to appreciate that almost all of a matrix’s secrets can
be unlocked with knowledge of the eigenvalues and eigenvectors.

Theorem ODCan you imagine anything nicer than an orthonormal diagonalization? A basis
of pairwise orthogonal, unit norm, eigenvectors that provide a diagonal
representation for a matrix? Here we learn just when this can happen —
precisely when a matrix is normal, which is a disarmingly simple property to
define.

Theorem CFNLTNilpotent linear transformations are the fundamental obstacle to a matrix (or
linear transformation) being diagonalizable. This specialized representation
theorem is the fundamental expression of just how close we can come
to surmounting the obstacle, i.e. how close we can come to a diagonal
representation.

Theorem DGESThis theorem is a long time in coming, but perhaps it best explains our interest in
generalized eigenspaces. When the dimension of a “regular” eigenspace (the
geometic multiplicity) does not meet the algebraic multiplicity of the
corresponding eigenvalue, then a matrix is not diagonalizable (Theorem DMFE).
However, if we generalize the idea of an eigenspace (Definition GES), then we
arrive at invariant subspaces that together give a complete decomposition of the
domain as a direct sum. And these subspaces have dimensions equal to the
corresponding algebraic multiplicities.

Theorem JCFLTIf you can’t diagonalize, just how close can you come? This is an answer
(there are others, like rational canonical form). “Canonicalism” is in the
eye of the beholder. But this is a good place to conclude our study of a
widely accepted canonical form that is possible for every matrix or linear
transformation.