Big Numbers

Reisefieber: literally “travel fever,” the mixture of excitement and nerves before leaving on a trip. I love it. I hope I never lose my enthusiasm for getting on a plane.

So, tomorrow I leave for Park City (going to the PCMI summer school). So excited to meet all the image processing people I only know from books and papers.

Technical travel reading: A Wavelet Tour of Signal Processing by Stephane Mallat. This book is incredible. Everything you ever wanted to know about wavelets, frames, basis pursuit, l^1 pursuit, all that good stuff. The style is discursive. The illustrations are ample. The proofs are lovely.

A module is a generalization of the notion of ideals. Rings, ideals, quotient rings, and vector spaces are all modules. If A is a ring, then an A-module is an abelian group M on which A acts linearly:
$1x = x$
Submodules are subgroups of M closed under multiplication by elements of A; the sum and intersection of submodules is again a submodule.

The annihilator of M is the set of all a in A such that aM = 0. A module is called faithful if its annihilator is zero.

Direct sum of two modules is just defined as an ordered pair (x, y), with componentwise addition and scalar multiplication. An A-module is called free if it’s isomorphic to the direct sum of modules isomorphic to A. (A finitely generated A-module is isomorphic to a quotient of for some n.)

Proposition.
Let M be a finitely generated A-module. Let a be an ideal of A and let be an A-module automorphism of M st . Then satisfies an equation of the form

.

Proof.
Let be a set of generators of M. Then for some . That is,
Left-multiplying by the adjoint of the matrix , we see that the determinant annihilates each , hence is the zero endomorphism of M. Expanding out the determinant gives us an equation of the desired form.

Exact sequences. Suppose we have a sequence of homomorphisms between modules,
This is exact at if $Im(f_i) = Ker(f_{i+1})$. The sequence is exact if it is exact at every module. is exact iff f is injective; is exact iff g is surjective.
A short exact sequence:.
Every long exact sequence can be split into these.
If the above is an exact sequence, then is also exact; note that the order reverses when we go to the dual space.

An ideal, a, of a ring, A is a subset which is an additive subgroup and such that . The cosets of this ideal, , are known as a quotient ring.

A basic, though special, example here is the ring of integers. Every ideal is the set of multiples of a single integer, n. The quotient rings are the integers mod n. The ideal generated by n is denoted (n), and it is called a principal ideal. Because all the ideals in the integers are multiples of a single element, the integers are known as a principal ideal domain.

Notice that . Multiples of six are also multiples of three. An ideal generated by a prime number is not contained in any other ideal (except the whole ring of integers.) This brings us to the notion of prime and maximal ideals.

A prime ideal p in a ring A is an ideal that is not equal to (1) and if then or . Equivalently, a prime ideal is one where is an integral domain.

A maximal ideal m in a ring A is an ideal that is not equal to (1) and there is no ideal a such that . Equivalently, a maximal ideal is one where is a field. (This last is true because A is a field iff the only ideals in A are 0 and (1).)

Maximal ideals are always prime, because a field is always an integral domain.

Every ring (not equal to the zero ring) has at least one maximal ideal. This is a result of Zorn’s lemma.

As a result, every non-unit of A is contained in a maximal ideal.

In a principal ideal domain, every non-zero prime ideal is maximal.
For if is a prime ideal and , then we have some so that hence , say . Then , so and so .

The set of nilpotent elements in a ring is called the nilradical and it is an ideal. The product of anything with a nilpotent element is also nilpotent; the sum of nilpotent elements is nilpotent by the binomial theorem.

The nilradical is the intersection of all the prime ideals; if f is nilpotent, then for some n, so f itself is in all the prime ideals. If f is not nilpotent, you can construct a prime ideal — the largest in the set of ideals a such that $f^n$ is not in a. (The product of two elements not in this ideal is not itself in this ideal.)

The Jacobson Radical is the intersection of all the maximal ideals in A. It (obviously) contains the nilradical.

You can get new ideals from old by summing them (that gives you a new ideal), intersecting them (also an ideal) or finding the product (the set of all finite sums of products of elements in X and Y). In the integers, is the ideal generated by their lcm, and $ab = (mn)$, the ideal generated by the product of their generators. If the generators are relatively prime, then the intersection and the product are the same ideal.

This is true for general rings, as can be proven by induction. The generalization of relatively primality is that two ideals are coprime if a + b = (1).

I’m reading Atiyah and MacDonald’s Introduction to Commutative Algebra. So, I thought, how about an experiment: when I finish a chapter, I’ll write a post on it. It might be a good study trick — I like blogging, and it forces me to develop insights and revisit concepts. Uniting the “just for fun” and “serious business” parts of my life just might work.

What is the Laplace-Beltrami operator? It’s a generalization of the Laplacian for use on surfaces. Like the Laplacian, it’s defined as the divergence of the gradient of a function.
But what does this really mean on a surface?
Given a vector field X, the divergence is defined as
in Einstein notation, where g is the metric tensor associated with the surface. (Compare to the ordinary divergence of a vector field, which is the sum of its partial derivatives.)
The gradient is defined as
Combining the definitions,
Euclidean space is a special case where the metric tensor is just the Kronecker delta.

Why do we care about this operator? Well, you can analyze surfaces with it; on the applied end this means it’s important for signal and image processing. (The Laplace-Beltrami operator is essential to diffusion maps, for instance.) The thing is, in computational applications we generally want a discrete version of the operator to converge to the real McCoy. Hence the importance of convergence algorithms.

Taubin’s discrete version of is defined as
an averaging operator over the neighbors. This is a weighted graph Laplacian. But this cannot be an approximation to the Laplace-Beltrami operator because it goes to 0 as the mesh becomes finer.

The authors use a different discretization. We assume a triangular mesh on the surface. The local tangiential polygon at a point v is defined to be the polygon formed by the images of all the neighbors upon projection onto the tangent space at v.

A function can be lifted locally to the local tangiential polygon, by defining it at the points on the surface, and making it piecewise linear in the natural way.

After a few lines of manipulation of the Taylor series for a function on the plane, it follows that

Now you can use this as a discrete Laplacian applied to the local lifting function. The authors show this converges to the Laplace-Beltrami operator by using the exponential map from the tangent space into the surface. The Laplace-Beltrami operator can be computed from the second derivatives of any two perpendicular geodesics; let the geodesics be the images under the exponential map of the basis elements of the tangent space.

We use the fact that the normal to a sufficiently fine triangular mesh approximates the normal to the surface according to
where r is the size of the mesh.
This shows that the basis for the approximating tangent plane is close to the basis for the true tangent plane with error only O(r^2).
Calculating the approximate Laplacian gives us then that the error is only O(r).
There are also some numerical simulations in the paper giving evidence that this approximation is faster and more accurate than a competing approximation.

I’m by no means the most fanatical of Joyce fans, but yeah, I am one. So it always makes me happy when June 16 rolls around. In preparation, fix yourself a Gorgonzola sandwich and check out Robert Berry’s graphic-novel adaptation of Ulysses. And take a little time to curse the boneheads at Apple for refusing to carry it on any iPad or iPhone applications, due to some pages of (cartoon) nudity. (Um, hasn’t this happened to Ulysses before?)

This is a method for using topological invariants to define data. In particular, we want to define the homology of a data set representing a point cloud in a high-dimensional Euclidean space. A topologist would replace the point cloud with a simplicial complex — one of the easiest to compute is the Rips Complex, whose k-simplices correspond to unordered k+1-tuples of points within pairwise Euclidean distance .

However, the resulting simplicial complex depends on the choice of . A very small leaves the complex a discrete set, while a very large results in the complex being one big simplex. As moves, holes in the simplicial complex are born, grow, and die; the picture over time provides a description of the data’s topology.

The “persistence complex” is a sequence of chain complexes together with chain maps , which are inclusions. This is motivated by an increasing sequence of s and the inclusions from one complex to the next. (The precision here goes from fine to coarse.)

Ghrist introduces the notion of a “barcode” — each “bar” is a generator of the homology group, and the length of the bar is the range of values of for which this particular element is a generator of the homology group. A barcode is the persistence analogue of a Betti number.

Now, what I always wondered here is what this has to do with geometry. Consider a finger-shaped projection sticking out of a 2-dimensional surface. At different resolutions, the projection can appear to break off into an island. (This is a practical problem for protein visualization.) This would be an example of a feature that could be captured with persistence homology; but it could also be explained directly by noticing that the tip of the projection is a region of high curvature. Could other persistence homology features be explained by geometrical properties?

This paper by Gunnar Carlsson et al, seems to provide an answer. The authors define a tangent complex of a space, which is the closure of the set of all tangents to points in the space. Then, they define the filtered tangent complex, which is the set of tangent vectors for which the osculating circle has a radius larger than some . We have an inclusion between filtered tangent complexes of different s. (For curves, there is only one osculating circle; for surfaces, there is one in each direction, so the tangent space is defined based on the maximum between them.)

Then we look at the homology groups of the filtered tangent spaces. This provides a barcode. Such barcodes can, for instance, distinguish a bottle from a glass. (The relationship of the tangent-space barcode to the Rips-complex barcode remains mysterious to me.)

Interesting paper by Chinmay Hegde, Michael Wakin, and Richard Baraniuk on manifold learning.
Given a set of data points which are suspected to lie on a low-dimensional manifold, how do you detect the dimension of the manifold? This method uses projection onto a random subspace of dimension if the high-dimensional space is in and $K$ is the true dimension of the manifold. The pairwise metric structure of the points turns out to be preserved with high accuracy. What this means is that we can do signal processing directly on the sorts of random measurements produced by devices that exploit compressed sensing (like the single-pixel camera, for instance.)

This work relies on the Grassberger-Procaccia algorithm. Given the pairwise distances between data points, this computes the scale-dependent correlation dimension of the data,
where
We can approximate K by fixing $r_1$ and $r_2$ to be the biggest range over which the plot is linear and calculating in that range.

Once K is estimated, there are various techniques to generate a K-dimensional representation of the data points. The key theorem here is that if the manifold has dimension K, volume V, and condition number , then fixing and and letting be a random orthoprojector from and

then with probability greater than ,

That is, this map almost preserves geodesic distances.

A priori, though, we have no way to estimate V and of the manifold — we don't know how big the curvature or volume are — so we need a learning procedure. Start with a small M, and estimate the dimension with Grassberger-Procaccia. Then use Isomap to identify a manifold of that dimension, using random projections. Then we calculate the residual variance (a measure of error) and increment M upwards until the residual variance is small enough for our own arbitrary criterion.

I’m going to be going to the Park City Math Institute’s summer program at the end of June. The topic is Image Processing and there are lecture courses on mathematical image processing and compressed sensing, as well as whatever graduate lectures I also have time for. I’m excited about the whole thing.

So, in preparation, have been reading about compressed sensing, and the nicest and most complete tutorial I could find is this one by Richard Baraniuk: (PDF here). Compressed sensing, in short, is how you beat the Shannon bound; you can collect very few measurements and nonetheless reconstruct the signal perfectly. This, to drastically understate the point, is useful. Now I’m tempted to do nothing but read papers. (However, there’s also Serious Business to be done — currently Stein’s Singular Integrals.)