Mathematics for the interested outsider

Today I just want to point out a dual proposition to the one I refined last week. At that time we stated that any linearly independent set can be expanded to a basis. This followed from the fact that a basis is a maximal linearly independent set. But a basis is also a minimal spanning set, and that will lead to our flipped result.

First, a basis must be a spanning set, and if we remove any vector it can no longer be a spanning set. For if the remaining vectors spanned, then the removed vector could be written as a finite linear combination of them, and this would contradict the linear independence of the basis.

On the other hand, if a spanning set is minimal it must be linearly independent. For if it were not, we could write out a nontrivial finite linear combination

For some index — say — we have . Then we can rearrange this to write

and so we may omit the vector from the set and the remaining vectors still span. This contradicts the assumed minimality of the original spanning set.

So a basis is a minimal spanning set. This leads us to the proposition that any spanning set may be narrowed to a basis. And the proof is again the same technique we used to show that every vector space has a basis. It’s just that this time we flip the whole thing over. Now we consider the set of subsets of our spanning set which span the vector space, and we use Zorn’s lemma to show that this must contain a minimal spanning set. This will then be a basis contained in our original spanning set.

Of course, as usual, in the finite-dimensional case we don’t need Zorn’s lemma, so the squeamish can relax in that case.

So let’s take this and consider a linear transformation . The first isomorphism theorem says we can factor as a surjection followed by an injection . We’ll just regard the latter as the inclusion of the image of as a subspace of . As for the surjection, it must be the linear map , just as in any abelian category. Then we can set up the short exact sequence

So where do “rank” and “nullity” come in? Well, these are just jargon terms. The “rank” of a linear transformation is the dimension of its image — not the target vector space, mind you, but the subspace of vectors of the form . That is, it’s the size of the largest set of linearly independent vectors in the image of the transformation. The “nullity” is the dimension of the kernel — the largest number of linearly independent vectors that sends to the zero vector in .

So what does the direct sum decomposition above mean? It tells us that there is a basis of which is in bijection with the disjoint union of a basis for and a basis for . In the finite-dimensional case we can take cardinalities and say that the dimension of is the sum of the dimensions of and . Or, to use our new words, the dimension of is the sum of the rank and the nullity of . Thus: the rank-nullity theorem.

Now that we know the splitting lemma, we can show that every short exact sequence of vector spaces splits!

To see this, we’ll need to refine an earlier result. Remember how we showed that every vector space has a basis. We looked for maximal linearly independent sets and used Zorn’s lemma to assert that they existed.

Here’s how we’re going to refine this result: start with a collection of linearly independent vectors. Then we don’t just look for any maximal collection, but specifically for a maximal collection containing . Clearly if we have a linearly independent set containing which is maximal among such sets, it is also maximal among all linearly independent sets — it is a basis. On the other hand, the previous argument (with Zorn’s lemma) says that such a maximal linearly independent set must exist.

What does this mean? It says that any linearly independent set can be completed to a basis. If we start with the empty set (which is trivially linearly independent) then we get a basis, just like before. So we recover the same old result as before.

But look what we can do now! Take a short exact sequence

and pick any basis of (notice that we’re using a generic, possibly infinite, index set). Now hit this basis with to get with . I say that this is a linearly independent set in .

Why is this? Well, let’s say that there’s a linear combination (only finitely many of the can be nonzero here). This linear combination is in the image of , since we can write it as . But exactness tells us that is injective, and so we have . But then all the have to vanish, since the form a basis!

So we’ve got a linearly independent set in . We now complete this to a maximal linearly independent set with . This is now a basis for , which contains the image of all the basis elements from .

Now turn this around and define a linear transformation by specifying its values on this basis. Set for and for . Then the composition is the identity on (check it on our basis), and so the sequence splits, as we said.

Notice here that all the Zorniness only matters for infinite-dimensional vector spaces. Everything we’ve done here works in without ever having to worry about such set-theoretic problems. However, given my politics I have no problem with using the Axiom of Choice when push comes to shove. I’m just pointing this out in case you’re the squeamish sort.

Evidently I never did this one when I was talking about abelian categories. Looks like I have to go back and patch this now.

We start with a short exact sequence:

A large class of examples of such sequences are provided by the split-exact sequences:

where these arrows are those from the definition of the biproduct. But in this case we’ve also got other arrows: and that satisfy certain relations.

The lemma says that we can go the other direction too. If we have one arrow so that then everything else falls into place, and . Similarly, a single arrow so that will “split” the sequence. We’ll just prove the first one, since the second goes more or less the same way.

Just like with diagram chases, we’re going to talk about “elements” of objects as if the objects are abelian groups. Of course, we don’t really mean “elements”, but the exact same semantic switch works here.

So let’s consider an element and write it as . Clearly lands in . We can also check

so . That is, any element of can be written as the sum of an element of and an element of . But these two intersect trivially. That is, if and then , and so . This shows that . Thus we can write every uniquely as .

Now consider an element . By exactness, there must be some so that . That is, we have a unique with . This shows that . It’s straightforward to show that also . Thus we have split the sequence: .

To show this, we must show that it is full, faithful, and essentially surjective. The first two conditions say that given natural numbers and the linear transformations and the matrices over are in bijection.

But this is just what we’ve been showing! The vector spaces of -tuples come with their canonical bases, and given these bases every linear transformation gets a uniquely-defined matrix. Conversely, every matrix defines a unique linear transformation when we’ve got the bases to work with. So fullness and faithfulness are straightforward.

Now for essential surjectivity. This says that given any finite-dimensional vector space we have some so that . But we know that every vector space has a basis, and for it must be finite; that’s what “finite-dimensional” means! Let’s say that we’ve got a basis consisting of vectors.

Now we just line up the canonical basis of and define linear transformations by and . Remember that we can define a linear transformation by specifying its values on a basis (which can all be picked independently) and then extending by linearity. Thus we do have two well-defined linear transformations here. But just as clearly we see that for any we have

and a similar equation holds for every -tuple in . Thus and are inverses of each other, and are the isomorphism we need.

This tells us that the language of linear transformations between finite-dimensional vector spaces is entirely equivalent to that of matrices. But we gain some conceptual advantages by thinking in terms of finite-dimensional vector spaces. One I can point to right here is how we can tell the difference between a vector space and its dual. Sure, they’ve got the same dimension, and so there’s some isomorphism between them. Still, when we’re dealing with both at the same time they behave differently, and it’s valuable to keep our eye on that difference.

On the other hand, there are benefits to matrices. For one thing, we can actually write them down and calculate with them. A lot of people are — surprise! — interested in using mathematics to solve problems. And the problems that linear algebra is most directly applicable to are naturally stated in terms of matrices.

What the theorem tells us is that none of this matters. We can translate problems from the category of matrices to the category of vector spaces and back, and nothing is lost in the process.

Now here’s the really important thing: There’s a functor that assigns the finite-dimensional vector space of -tuples of elements of to each object of . Such a vector space of -tuples comes with the basis , where the vector has a in the th place and a elsewhere. In matrix notation:

and so on. We can write (remember the summation convention), so the vector components of the basis vectors are given by the Kronecker delta. We will think of other vectors as column vectors.

Given a matrix we clearly see a linear transformation from to . Given a column vector with components (where the index satisfies ), we construct the column vector (here ). But we’ve already established that matrix multiplication represents composition of linear transformations. Further, it’s straightforward to see that the linear transformation corresponding to a matrix is the identity on (depending on the range of the indices on the Kronecker delta). This establishes that we really have defined a functor.

But wait, there’s more! The functor is linear over , so it’s a functor enriched over . The Kronecker product of matrices corresponds to the monoidal product of linear transformations, so the functor is monoidal, too. Following the definitions, we can even find that our functor preserves duals.

So we’ve got a functor from our category of matrices to the category of finite-dimensional vector spaces, and it preserves all of the relevant structure.

We define the monoidal product on objects by multiplication — — and on morphisms by using the Kronecker product. That is, if we have an matrix and an matrix , then we get the Kronecker product

Here we have to be careful about what we’re saying. In accordance with our convention, the pair of indices (with and ) should be considered as the single index . It’s clear that this quantity then runs between and . A similar interpretation goes for the index pairs .

Of course, we need some relations for this to be a monoidal structure. Strict associativity is straightforward:

For our identity object, we naturally use , with its identity morphism . Note that the first of these is the object the natural number , while the second is the matrix whose single entry is the field element . Then we can calculate the Kronecker product to find

and so strict associativity holds as well.

The category of matrices also has duals. In fact, each object is self-dual! That is, we set . We then need our arrows and .

The morphism will be a matrix. Specifically, we’ll use , with and both running between and . Again, we interpret an index pair as described above. The symbol is another form of the Kronecker delta, which takes the value when its indices agree and when they don’t.

Similarly, will be an matrix: , using yet another form of the Kronecker delta.

Now we have compatibility relations. Since the monoidal structure is strict, these are simpler than usual:

But now all the basic matrices in sight are various Kronecker deltas! The first equation reads

which is true. You should be able to verify the second one similarly.

The upshot is that we’ve got the structure of a monoidal category with duals on .

What we’ve been building up to is actually the definition of a category. Given a field we define the category of matrices over .

Most of our other categories have been named after their objects — groups are the objects of , commutative monoids are the objects of , and so on — but not here. In this case, matrices will be the morphisms, and the category of matrices illustrates in a clearer way than any we’ve seen yet how similar categories are to other algebraic structures that are usually seen as simpler and more concrete.

Down to business: the objects of will be the natural numbers , and the morphisms in are the matrices. That is, a morphism is a collection of field elements where runs from to and runs from to .

The category of matrices is actually enriched over the category of vector spaces over . This means that each set of morphisms is actually a vector space over . Specifically, we add matrices of the same dimensions and multiply matrices by scalars component-by-component.

We have yet to speak very clearly about identities. The axioms of an enriched category state that for each object (natural number) there must be a linear function . Because of linearity, this function is completely determined by its value at : . We must pick this matrix so that it acts as an identity for matrix multiplication, and we choose the Kronecker delta for this purpose: . That is, we use an matrix whose entries are if the indices are equal and otherwise. It’s straightforward to check that this is indeed an identity.

Other properties I’ve skipped over, but which aren’t hard to check, are that matrix multiplication is bilinear and associative. Both of these are straightforward once written out in terms of the summation convention; sometimes deceptively so. For example, the associativity condition reads . But remember that there are hidden summation signs in here, so it should really read:

so there’s an implicit change in the order of summation here. Since we’re just doing finite sums, this is no problem, but it’s still worth keeping an eye on.

About this weblog

This is mainly an expository blath, with occasional high-level excursions, humorous observations, rants, and musings. The main-line exposition should be accessible to the “Generally Interested Lay Audience”, as long as you trace the links back towards the basics. Check the sidebar for specific topics (under “Categories”).

I’m in the process of tweaking some aspects of the site to make it easier to refer back to older topics, so try to make the best of it for now.