Big Numbers

Last fall I had the opportunity to go to the Cryo-EM mini-symposium at the Columbia Medical Center, which was an interesting experience — a mix of biologists, computer scientists, and mathematicians. One comment from a biologist stuck with me: sometimes, given that equipment has only finite precision, microscopic images can lead to confusion as to the number of connected components. If we’re thinking about a molecule, it’s quite important to know whether we’re looking at a protrusion or a separate component. With imprecise equipment, there’s a chance of getting it wrong. But how do you know when?

The question bugged me for a while. Connected components seem like a “topological” property, but it seems that you ought to be able to talk about this more directly, without appealing to homology.

Posing this question more specifically, suppose you have a connected domain E. To make life simple, let’s suppose it’s in the plane. And suppose we approximate it at different levels of “coarseness” — perhaps we overlay a grid of squares of side length r, and let E_r be the domain that contains a square precisely when the intersection between E and the square has area at least 1/2 r^2. For some sets E, there is some scale E_r that becomes disconnected; for example, if E is dumbbell-shaped, eventually at some scale it will “split” into two components. How do you know when it’s going to do this?

I asked a professor about this, and it turns out it’s a classical question, and the relevant notion is something called the Caratheodory Kernel, developed in 1912.

Let B_n be a sequence of simply-connected domains of the z-plane containing a fixed point z_0. If there exists a disc belonging to all , then the kernel of the sequence with respect to is the largest domain B containing z_0 and such that for each compact set E belonging to B there is an N such that E belongs to all B_n, for n larger than N.

A largest domain is one which contains any other domain having the same property. If there is no such a disc, then by the kernel of the sequence B_n, one means the point z_0 (in this case one says that the sequence B_n has a degenerate kernel). A sequence of domains B_n, converges to a kernel B if any subsequence of B_n has B as its kernel.

Link.
It’s a theorem of Caratheodory that a sequence of functions f_n with positive derivatives at z_0, conformally mapping the unit disc to B_n respectively and are regular and univalent in the disc |z – z_0| < 1 then in order for the sequence f_n to converge in the disc to a finite function, it is necessary and sufficient that the kernels converge to either a point or a domain containing more than one boundary point.

As a novice with some interest in applied areas, it happens all too often that I fall in love with an idea about perception/cognition/data analysis, and find out only later “Oh wait, that wasn’t actually math! And I’d have to do so much background reading in a completely different field to understand it!” I’ve experienced “Oh wait, I’m not an electrical engineer!”, “Oh wait, I’m not a neuroscientist!”, and “Oh wait, I’m not a statistician!”

Well, today is the day I say “Oh wait, I’m not a computer scientist!”

The idea that caught my attention is probabilistic computing. We often want to build a machine that can make predictions and build models (a computer that can diagnose medical symptoms, or predict which creditors will default, or even, dare we whisper, an AI). This is essentially a Bayesian task: given some data, which probability functions best explain it? The trouble is, computers are bad at this. For the most part, they’re built to do the opposite task: given probability distributions and models, simulate some data. Generating probability functions and finding the best one can be prohibitively expensive, because the space of probability functions is so large. Also, while the computational complexity of evaluating f(g(x)) is just f + g, the computational complexity of composing two conditional probability distributions B|A and C|B is

ΣB P(C, B|A)

whose computational time will grow exponentially rather than linearly as we compose more distributions.

Church, a language developed at MIT is an attempt to solve this problem. (Apparently it’s a practical attempt, because the founders have already started a company, Navia Systems, using this structure to build probabilistic computers.) The idea is, instead of describing a probability distribution as a deterministic procedure that evaluates the probabilities of different events, represent them in terms of probabilistic procedures for generating samples from them. That is, a random variable is actually a random variable. This means that repeating a computation will not give the same result each time, because evaluating a random variable doesn’t give the same result each time. There’s a computational advantage here because it’s possible to compose random variables without summing over all possible values.

The nice thing about Church (which is based on Lisp, and named after Alonzo Church) is that it allows you to compose practically any query without significantly increasing runtime. “What is the probability of A and B or C and not D given that X and Y and not Z?” and so on.

The PhD dissertation that introduced Church is where I started, but it’s actually much clearer if you learn about it from the interactive tutorial. The tutorial is really a beautiful thing, well-written to the point of addictiveness. It is completely accessible to just about anyone.

The other day, in the graduate student talks, Subhojoy was talking about the Simultaneous Uniformization Theorem. It was a nice treat because I used to be really into geometric topology (or at least as much as an undergrad with way too little background could.)

The big reveal is
but most of the talk, naturally, goes into defining what those letters mean.

The Riemann Mapping Theorem says that any simply connected set is conformally equivalent to the disc . Conformal maps are angle-preserving; in practice, they’re holomorphic functions with derivative everywhere nonzero. Conformal maps take round circles to round circles if the circles are small enough.

A Riemann Surface is a topological surface with conformal structure: a collection of charts to the complex plane such that the transition maps are conformal.

The first version of the Uniformization Theorem says that any simply-connected Riemann surface is conformally equivalent to the Riemann sphere (or complex plane or unit disc; these are all equivalent.)

The second, more general version of the Uniformization Theorem says that any Riemann surface of genus is conformally equivalent to where is the hyperbolic plane and is a discrete subgroup of .

To understand this better, we should observe more about the universal cover of a Riemann surface. This is, of course, simply connected. Its deck transformations are conformal automorphisms of the disc. But it can be proven that conformal automorphisms of the disc are precisely the Mobius transformations, or functions of the form

This implies that the automorphism group of is .

Now observe that there’s a model of the hyperbolic plane on the disc, by assigning the metric

And, if you were to check, it would turn out that conformal transformations on the disc preserve this metric.

So it begins to make sense; Riemann surfaces are conformally equivalent to their universal covering space, modulo some group of relations, a subgroup of the group of deck transformations of the universal cover.

are called Fuchsian groups — these define which Riemann surface we’re talking about, up to a conformal transformation.

Now we can define Fuchsian space as

It’s the set of maps from the fundamental group of a surface to .

And we can define Teichmuller space as the space of marked conformal structures on the surface S.
This is less enormously huge than you might think, because we consider these up to an equivalence relation. If and are conformal structures, and there exists a conformal map such that then we consider equivalent structures.

In fact, Teichmuller space is not that enormously huge: . It turns out that Teichmuller space is completely determined by what happens to the boundary circles in a pair of pants decomposition of the surface.

Here’s a picture of a pair of pants (aka a three-punctured sphere):

Here’s a picture of a decomposition of a Riemann surface into pairs of pants:

(Here’s a nice article demonstrating the fact. It’s actually not as hard as it looks.)

Now we generalize to Quasi-Fuchsian spaces. For this, we’ll be working with hyperbolic 3-space instead of 2-space. The isometries of hyperbolic 3-space happen to be .
Instead of a Poincare Disc Model, we have a ball model; again, acts by Mobius transformations, functions of the form .

A quasiconformal function takes, infinitesimally, circles to ellipses. It’s like a conformal map, but with a certain amount of distortion. The Beltrami coefficient definds how much distortion:

Quasi-Fuchsian space, QF(S), is the set of all quasiconformal deformations of a Fuchsian representation. In other words, this is the space of all representations to preserving topological circles on the boundary.

Now, the Simultaneous Uniformization Theorem can be stated:
the Quasi-Fuchsian space of a surface is isomorphic to the product of two Teichmuller spaces of the surface.

One application of this theorem is to hyperbolic 3-manifolds.
If is a hyperbolic 3-manifold, and if then $M \simeq S \times \mathbb{R}$.

In other words, we can think of a hyperbolic three-manifold as the real line, with a Riemann surface at each point — you can only sort of visualize this, as it’s not embeddable in 3-space.

The Simultaneous Uniformization Theorem implies that there is a hyperbolic metric on this 3-manifold for any choice of conformal structure at infinity.

This contrasts with the Mostow Rigidity Theorem, which states that a closed 3-manifold has at most one hyperbolic structure.

Together, these statements imply that any hyperbolic metric on is determined uniquely by the choice of conformal structures at infinity.

Here’s a talk he gave in Chicago last year on manifold learning; it’s a lovely, complete overview of the problem and current algorithms (Isomap, LLE, Laplacian eigenmaps, diffusion distances.) Also at the MLSS conference site are many other talk videos about computational learning — Stephane Mallat, Emmanuel Candes, Guillermo Sapiro, etc. If I had infinite time I would just watch all of them.

It’s been on my mind since I read this Douglas Rushkoff article. It’s a call to arms of sorts, an expression of dismay that we’ve become consumers of electronics that we don’t understand.

For me, however, our inability and refusal to contend with the underlying biases of the programs and networks we all use is less a threat to our military or economic superiority than to our experience and autonomy as people. I can’t think of a time when we seemed so ready to accept such a passive relationship to a medium or technology.
…
And while machines once replaced and usurped the value of human labor, computers and networks do more than usurp the value of human thought. They not only copy our intellectual processes–our repeatable programs–but they often discourage our more complex processes–our higher order cognition, contemplation, innovation, and meaning making that should be the reward of “outsourcing” our arithmetic to silicon chips in the first place. The more humans become involved in their design, the more humanely inspired these tools will end up behaving.

It hit home, because of course I use lots of devices and programs that I don’t understand. I consume much more than I produce. I can program — I can write a simulation in Matlab any time, and I could probably refresh my memory of C++ if I took a week or two — but I don’t know how the internet works. The techie ideal of having a DIY relationship to the technology I use is inspiring, but daunting. Maybe more daunting than it was in the 70’s, when Rushkoff was a new computer enthusiast; his ideal may have been easier to achieve when technology was simpler. I looked at an old CS textbook written in the 70’s — it wouldn’t have gotten me one-tenth of the way towards the goal of “understand all the technology you use today.”

I did find this non-technical intro to the structure of the internet useful, as a start. But I still wouldn’t say I understand how the internet works.

The Poisson kernel, , is defined as
This is the real part of a holomorphic function. Now, what are the properties of that holomorphic function?
We define the Herglotz kernel as
where .
As (this is why the Poisson kernel solves the Dirichlet problem in the disc.) Similarly, as , some unknown function. This is the conjugate function of f, and convolution with is known as the Hilbert transform.

Now, we can write explicitly as
similar to the Poisson kernel, except for the sign function. Summing this series, while
This conjugate function of need not be bounded, even if is. In other words, the Hilbert transform is not bounded in the norm. It’s not bounded in the norm either. But it’s clearly bounded in the norm.

The Hilbert transform can be shown to be bounded for all , by first showing that it satisfies a weak-type inequality and then showing that all linear operators satisfying a weak-type inequality are bounded in (this is called the Marcinkiewicz Interpolation Theorem.) I might type that up another time.

If we let , then is sort of an envelope for , larger in absolute value and smoother, and having the property that it stretches and shrinks with the function.

Right now my “interests” are supposed to be limited to passing quals. Fair enough, but I’m also persistently trying to get a sense of what math can tell us about how humans think and how to model it. Except that I don’t actually know any neuroscience. So I’ve been remedying that.

Here’s one overview paper that goes over the state of the field, in terms of brain architecture and hierarchical organization. Neurons literally form circuits, and, in rough outline, we know where those circuits are. We can look at the responses of those circuits in vivo to observe the ways in which the brain clusters and organizes content: even to the point of constructing a proto-grammar based on a tree of responses to different sentences. I hadn’t realized that so much was known already — the brain is mysterious, of course, but it’s less mysterious than I had imagined.

Then here’s an overview paper by Yale’s Steve Zucker about image detection using differential geometry. In his model, detection of edges and textures is based on the tangent bundle. Apparently, unlike some approaches in computational vision, this differential geometry approach has neurological correlates in the structure of the connections in the visual cortex. The visual cortex is arranged in a set of columns; the hypothesis is that these represent , with the column representing position and the slices at different heights of the columns representing orientation.

I went to the Friday grad seminar — this one by Hyun Kyu on the quantum Teichmuller space. (Here’s his paper, which I haven’t read as of now.) I thought it might be helpful to learn some background about this whole “quantum” business.

One way of thinking about the ordinary plane is to consider it the algebra freely generated by the elements x and y subject to the commutation relationship yx = xy. Now, what if we alter this description to instead have yx = qxy? This defines something known as the quantum plane. Here, q is an element of the ground field. Obviously, except when q = 1, this is a non-commutative algebra. For any pair of integers i and j, we have
We can define quantum versions of lots of things — for example, is the group of 2-by-2 matrices with determinant 1, satisfying the relations

Because I am a geek, and because I love my hometown, I had a T-shirt made.
The inspiration, of course, is the storied Valois Cafeteria in Hyde Park, where you can “see your food.”
It’s a local institution, known for comfort food and more recently famous as an Obama favorite.

Via Nuit Blanche, I found a paper relating the Johnson-Lindenstrauss Theorem to the Restricted Isometry Property (RIP.) It caught my eye because the authors are Rachel Ward and Felix Krahmer, whom I actually know! I met Rachel when I was an undergrad and she was a grad student. We taught high school girls together and climbed mountains in Park City, where I also met Felix.

So what’s this all about?
The Johnson-Lindenstrauss Lemma says that any p points in n-dimensional Euclidean space can be embedded in dimensions, without distorting the distance between any two points by a distance of more than a factor between and . So it gives us almost-isometric embeddings of high-dimensional data in lower dimensions.

The Restricted Isometry Property, familiar to students of compressed sensing, is the property of a matrix that

for all sufficiently sparse vectors .
The relevance of this property is that these are the matrices for which -minimization actually yields the sparsest vector — that is, RIP is a sufficient condition for basis pursuit to work.

Now these two concepts involve equations that look a lot alike… and it turns out that they actually are related. The authors show that RIP matrices produce Johnson-Lindenstrauss embeddings. In particular, they produce improved Johnson-Lindenstrauss bounds for special types of matrices with known RIP bounds.

The proof relies upon Rademacher sequences (uniformly distributed in [-1, 1]), which I don’t know a lot about, but should probably learn.