Post navigation

Computing Homology

Update: the mistakes made in the code posted here are fixed and explained in a subsequent post (one minor code bug was fixed here, and a less minor conceptual bug is fixed in the linked post).

In our last post in this series on topology, we defined the homology group. Specifically, we built up a topological space as a simplicial complex (a mess of triangles glued together), we defined an algebraic way to represent collections of simplices called chains as vectors in a vector space, we defined the boundary homomorphism as a linear map on chains, and finally defined the homology groups as the quotient vector spaces

.

The number of holes in was just the dimension of this quotient space.

In this post we will be quite a bit more explicit. Because the chain groups are vector spaces and the boundary mappings are linear maps, they can be represented as matrices whose dimensions depend on our simplicial complex structure. Better yet, if we have explicit representations of our chains by way of a basis, then we can use row-reduction techniques to write the matrix in a standard form.

Of course the problem arises when we want to work with two matrices simultaneously (to compute the kernel-mod-image quotient above). This is not computationally any more difficult, but it requires some theoretical fiddling. We will need to dip a bit deeper into our linear algebra toolboxes to see how it works, so the rusty reader should brush up on their linear algebra before continuing (or at least take some time to sort things out if or when confusion strikes).

Two Big Matrices

Recall our example simplicial complex from last time.

We will compute of this simplex (which we saw last time was ) in a more algorithmic way than we did last time.

Once again, we label the vertices 0-4 so that the extra “arm” has vertex 4 in the middle, and its two endpoints are 0 and 2. This gave us orientations on all of the simplices, and the following chain groups. Since the vertex labels (and ordering) are part of the data of a simplicial complex, we have made no choices in writing these down.

Now given our known definitions of as an alternating sum from last time, we can give a complete specification of the boundary map as a matrix. For , this would be

,

where the row labels are the basis for and the column labels are the basis for . Similarly, is

The reader is encouraged to check that these matrices are written correctly by referring to the formula for as given last time.

Remember the crucial property of , that . Indeed, the composition of the two boundary maps just corresponds to the matrix product of the two matrices, and one can verify by hand that the above two matrices multiply to the zero matrix.

We know from basic linear algebra how to compute the kernel of a linear map expressed as a matrix: column reduce and inspect the columns of zeros. Since the process of row reducing is really a change of basis, we can encapsulate the reduction inside a single invertible matrix , which, when left-multiplied by , gives us the reduced form of the latter. So write the reduced form of as .

However, now we’re using two different sets of bases for the shared vector space involved in and . In general, it will no longer be the case that . The way to alleviate this is to perform the “corresponding” change of basis in . To make this idea more transparent, we return to the basics.

Changing Two Matrices Simultaneously

Recall that a matrix represents a linear map between two vector spaces . The actual entries of depend crucially on the choice of a basis for the domain and codomain. Indeed, if form a basis for and for , then the -th column of the matrix representation is defined to be the coefficients of the representation of in terms of the . We hope to have nailed this concept down firmly in our first linear algebra primer.

Recall further that row operations correspond to changing a basis for the codomain, and column operations correspond to changing a basis for the domain. For example, the idea of swapping columns in gives a new matrix which is the representation of with respect to the (ordered) basis for which swaps the order of . Similar things happen for all column operations (they all correspond to manipulations of the basis for ), while analogously row operations implicitly transform the basis for the codomain. Note, though, that the connection between row operations and transformations of the basis for the codomain are slightly more complicated than they are for the column operations. We will explicitly see how it works later in the post.

And so if we’re working with two maps and , and we change a basis for in via column reductions, then in order to be consistent, we need to change the basis for in via “complementary” row reductions. That is, if we call the change of basis matrix , then we’re implicitly sticking in between the composition to get . This is not the same map as , but we can make it the same map by adding a in the right place:

Indeed, whenever is a change of basis matrix so is (trivially), and moreover the operations that performs on the columns of are precisely the operations that performs on the rows of (this is because elementary row operations take different forms when multiplied on the left or right).

Coming back to our boundary operators, we want a canonical way to view the image of as sitting inside the kernel of . If we go ahead and use column reductions to transform into a form where the kernel is easy to read off (as the columns consisting entirely of zeroes), then the corresponding row operations, when performed on will tell us exactly the image of inside the kernel of .

This last point is true precisely because . This fact guarantees that the irrelevant rows of the reduced version of are all zero.

Let’s go ahead and see this in action on our two big matrices above. For , the column reduction matrix is

And the product is

Now the inverse of , which is the corresponding basis change for , is

and the corresponding reduced form of is

As a side note, we got these matrices by slightly modifying the code from our original post on row reduction to output the change of basis matrix in addition to performing row reduction. It turns out one can implement column reduction as row reduction of the transpose, and the change of basis matrix you get from this process will be the transpose of the change of basis matrix you want (by ). Though the code is particularly ad-hoc, we include it with the rest of the code used in this post on this blog’s Github page.

Now let’s inspect the two matrices and more closely. The former has four “pivots” left over, and this corresponds to the rank of the matrix being 4. Moreover, the four basis vectors representing the columns with nonzero pivots, which we’ll call (we don’t care what their values are), span a complementary subspace to the kernel of . Hence, the remaining four vectors (which we’ll call ) span the kernel. In particular, this says that the kernel has dimension 4.

On the other hand, we performed the same transformation of the basis of for . Looking at the matrix that resulted, we see that the first four rows and the last row (representing ) are entirely zeros and so the image of intersects their span trivially. and the remaining three rows (representing ) have nonzero pivots. This tells us exactly that the image of is spanned by .

And now, the coup de grâce, the quotient to get homology is simply

And the dimension of the homology group is 1, as desired.

The General Algorithm

It is no coincidence that things worked out at nicely as they did. The process we took of simultaneously rewriting two matrices with respect to a common basis is the bulk of the algorithm to compute homology. Since we’re really only interested in the dimensions of the homology groups, we just need to count pivots. If the number of pivots arising in is and the number of pivots arising in is , and the dimension of is , then the dimension is exactly

And it is no coincidence that the pivots lined up so nicely to allow us to count dimensions this way. It is a minor exercise to prove it formally, but the fact that the composition implies that the reduced version of will have an almost reduced row-echelon form (the only difference being the rows of zeros interspersed above, below, and possibly between pivot rows).

As the reader may have guessed at this point, we don’t actually need to compute and . Instead of this, we can perform the column/row reductions simultaneously on the two matrices. The above analysis helped us prove the algorithm works, and with that guarantee we can throw out the analytical baggage and just compute the damn thing.

Indeed, assuming the input is already processed as two matrices representing the boundary operators with respect to the standard bases of the chain groups, computing homology is only slightly more difficult than row reducing in the first place. Putting our homology where our mouth is, we’ve implemented the algorithm in Python. As usual, the entire code used in this post is available on this blog’s Github page.

The first step is writing auxiliary functions to do elementary row and column operations on matrices. For this post, we will do everything in numpy (which makes the syntax shorter than standard Python syntax, but dependent on the numpy library).

This more or less parallels the standard algorithm for row-reduction (with the caveat that all the indices are swapped to do column-reduction). The only somewhat confusing line is the call to rowCombine, which explicitly realizes the corresponding row operation as the inverse of the performed column operation. Note that for row operations, the correspondence between operations on the basis and operations on the rows is not as direct as it is for columns. What’s given above is the true correspondence. Writing down lots of examples will reveal why, and we leave that as an exercise to the reader.

Then the actual algorithm to compute homology is just a matter of counting pivots. Here are two pivot-counting functions in a typical numpy fashion:

And there we have it! We’ve finally tackled the beast, and written a program to compute algebraic features of a topological space.

The reader may be curious as to why we didn’t come up with a more full-bodied representation of a simplicial complex and write an algorithm which accepts a simplicial complex and computes all of its homology groups. We’ll leave this direct approach as a (potentially long) exercise to the reader, because coming up in this series we are going to do one better. Instead of computing the homology groups of just one simplicial complex using by repeating one algorithm many times, we’re going to compute all the homology groups of a whole family of simplicial complexes in a single bound. This family of simplicial complexes will be constructed from a data set, and so, in grandiose words, we will compute the topological features of data.

If it sounds exciting, that’s because it is! We’ll be exploring a cutting-edge research field known as persistent homology, and we’ll see some of the applications of this theory to data analysis.

Post navigation

28 thoughts on “Computing Homology”

This post is really amazing. I liked other entries too.
On your comment related to complexity of evaluating homotopy groups, I would like to share this post which talked about latest best SODA 2013 paper on providing a way to get homotopy groups.

I’ve been following these results, and it appears there are also some hardness theorems. In particular, computing the n-th homotopy group for a general simply connected space Y is #P-hard (here is variable as well as ). So unfortunately that means, while it is possible to compute fundamental groups, there is no efficient algorithm to compute them. See this page for more (Marek Krcal’s page, the author of the SODA paper).

But even coming up with a decidable algorithm to do it in the first place is wonderful. Even most uninteresting problems in group theory (esp the kind that you see when you talk about fundamental groups, group presentations, etc.) are known to be undecidable, so coming up with a decidable algorithm to compute something important is a treasured accomplishment indeed. I want to get this series to persistent homology first, since I’m much more familiar with that, and perhaps then I can return to computing homotopy.

Great you are already on top of it.
Sure I would wait for your post on Persistent Homology. There I am following Ayasadi Corporation and their interesting algorithms to represent complex data in terms of complexes and running persistent over it.

This survey of Carlsson is the best overview, but it completely bypasses the details of computing the things. For that you have to go to the original paper (which I’ve found needs some additional deciphering).

I’ve looked around for internet resources that are not original research papers, but most surveys are IMO too broad, skip the important parts in favor of the more advanced sections, or copy directly from the original paper w.r.t. the finer details. There are full implementations out there, but they’re generally written in C++ and I find them difficult to follow and use. I intend to have a fully explicit (read: implemented program) explanation of the algorithm, and I’d like to boil away all of the more advanced details (at least at first) to get a clear idea of how the basic algorithm works. I have already implemented the algorithm in Python, so hopefully my post will give people the chance to experiment with small-dimensional examples before using someone else’s C++ implementation or making their own.

Jeremy, thanks for linking to Perseus. Yes, the source is in C++ and — I will readily confess — annoying/impossible to read. But what on Earth did you find tough about using it?

In any case, I’ve recently finished co-writing a survey article of the type that kz is asking for and whose lack you appear to be lamenting. There are explicit examples, connections with PL topology, and how-to guides for computing (persistent) homology from boundary matrix data. The last section highlights some recent biological applications, but of course it can be completely ignored given the current context.

Your (your team’s?) implementation has the honorable distinction of being an industrial-strength solution. I’m looking for an educational implementation, and Perseus simply has too many knobs and dials for someone to just jump in and start playing. For instance, I would like to just have a function that accepts a dataset as input and produces a set of barcodes as output. It doesn’t appear (from your documentation at least) that there are naive defaults to do this. That means I have to learn what the options are and pick appropriate ones, and format my input file correctly (and install Matlab or roll my own script to view the output). Even if my naive program is much slower (e.g., computing all possible epsilon transitions in the V-R complex to figure out how big the filtration is), I only intend to use it on toy examples to explain how the algorithm works.

In fact, I think if I make it to the more interesting applications of this algorithm on large datasets, then I will ditch my version and use the full power of Perseus. I believe my readers will have more of an incentive to do the same after such an experience than if they were looking at Perseus from a blank slate. Or at least they will be better equipped to decide for themselves.

1) Can this same algorithm be used to on the edge cases? I’m fairly sure that the 2nd Betti number is found using
d3 = [[-1],[1],[-1],[1]]
and consequently is 0. However, for the zeroth Betti number, can I define
d0 = [[0,0,0,0,0]]
and find the corresponding 0th Betti number? Plugging this in blindly yields 0. This is supposed to be the number of connected components, which I would guess should be 1. Similarly, can I say
d4 = [[0]]
and find that the third Betti number is 0? This result seems reasonable, but I used the same method that said there should be 0 connected components.

2) Is there a typo on line 18 of simultaneousReduce – should it be i += 1 to move to the next row?

3) I’m trying to repeat the calculation for the given triangulation of the torus. I’m worried I might have messed up the orientations. I used the fact that d_1 . d_2 = 0 as a cross check and I think I sorted out all of those issues, but perhaps something snuck through. In particular, I’m not sure about the orientations in the d_1 Anyways, I calculated as the Betti number from p1 and p2 (I’m not sure whether this is the 1st or 2nd Betti number) to be 0, while from I’ve read I believe it should be 2. For what its worth, I found the three Betti numbers (following what I did above) to be 0,0,1, and at least the last of those I believe to be correct.

For reference, here are the matrices d_1 and d_2 that I found. I numbered the vertices left to right, bottom to stop (starting from bottom left) so that all together (included identified vertices) the numbering looks like
0120
6786
3454
0120
After this, I followed the same lexicographic ordering for the edges and vertices that you did in the article. For reference, here are the resulting matrices d_1 and d_2 I came up with:

Is it something you choose when constructing the complex? as in, gluing this simplex in this way or that way? or is it something chosen for you by how you numbered the vertices, just to keep everything algebraically consistent?

It is both, in a way. Say you just have a single 2-dimensional simplex with vertices abc. By writing the vertices in that order I’m implying an orientation: a->b, b->c, and a->c, so that if there are any two edges you want to know the order of you look at their lexicographic ordering. But I could have just as easily picked the ordering of vertices differently, or (and this is the part you might be wondering about) imposed a different “convention” on how to get from a list of vertices to an orientation on the edges, say I could go backwards in the list. But it can’t be *any* convention, it has to be one which satisfies the transitivity property. So if a->b and b->c, then a->c is required. If you instead chose a->b and c->b, then you could pick either a->c or c->a and transitivity would hold.

If you draw lots of these pictures you’ll realize that we’re just asking for one particular orientation, but we don’t care if we rotate or reflect it across some axis. And of course, the homology does not depend on the choice or orientation.

In the file of your example, you use for the boundary operator C_3(X) —> C_2(X) the matrix [[-1],[1],[-1],[1]], while in the example your differential is zero. But you don’t have 3-cells at all in the online-example. So I was thinking, why? (I tried with, say, [[0],[0],[0],[0]] and got the homology of the (Z^1,Z^1,Z^1).)

Yes! I believe this would be a straightforward extension of the algorithm to keep track of which elementary row operations were performed at each step. I encourage you to fork the repository on github and implement this! Feel free to submit a pull request if you do.

The inputs to a matrix operator are linear combinations of the vectors represented by columns (i.e., the index of a column represents one basis vector in the domain), and the outputs are linear combinations of (vectors represented by indices of) the rows. That’s why I didn’t say “row” in the sentence you’re referring to, just the index of the row, i.e. .

How would you suggest keep tracking of the kernal elements that do not bound after the correction you posted in your second post? There you are no longer doing anything on the kernal but rather on the image and with the rwo operations that you do at the boundary_n+1 you lose the alignment one needs in order to know the kernal elements that do not bound.

IIRC I made a shortcut here, and I’m doing everything over the rationals. If you really want to maintain integers, you can, but you have to do a GCD computation instead of a simple division. Cf. the Smith Normal Form https://en.wikipedia.org/wiki/Smith_normal_form

I’m currently working on my Masters thesis, and part of it involves computing homology over some large complexes derived from images. I’m having trouble with the tractability of this implementation. In particular, the biggest slow down is the need to keep track of the specific row operations (so that I can perform them on the other boundary matrix, and so that I can understand the basis to read off the simplices involved in the holes we discover), which means I need to implement the reduction myself and can’t fall back on the highly optimized BLAS.

Any suggestions for tackling things? Feel free to reach out to me by email if you want more details.

Great post! I’ve a question, I don’t understand the orientation of the incidence matrices. Probably I’m not numbering correctly the vertices. It seems to me that with the orientation given, two adjacent faces might not have compatible orientations.
Given \delta_1 for example, the simplex [0,1,2] and the simplex [0,1,3], are `glued’ along the edge [0,1]. I would then expect the (0,0) and (0,1) entries in the matrix to be of opposite signs. What am I getting wrong.
Thx, Francois