Saturday, January 16, 2010

A Statement of the ProblemThe problem I ultimately want to solve, and its solution, is described in the paper Target Enumeration via Euler Characteristic Integrals. My goal here is to show how to implement that solution on a computer and make it accessible to a wider audience.

Suppose we have a set of targets we want to count. This could be anything from enemy tanks rolling over the plains to electronically tagged wildlife roaming the countryside. Each target has a region of influence which might simply be circular in shape, or might be more complex and depend on the target. Now suppose that we have a high density of sensors scattered over our domain and that each sensor can tell us how many regions of influence it lies in. Roughly speaking, each sensor counts how many targets are nearby. How do we compute how many targets we have in total?

Here's an illustration:

There are four targets. The region of influence for each one is coloured making it easy to see which region is which. I've labelled each region with an integer showing how many targets can be detected in that region. The idea is that we'd have a very dense scattering of sensors in our domain, each sensor reporting an integer. In effect we'd be getting an image like a rasterised version of that picture. But we wouldn't be getting the convenient colours, just an integer per pixel.

At first it seems like a trivial problem. The sensors can all count, and if every target is in range of a sensor, every target will be counted. But we can't simply add the numbers from all of the sensors as many sensors will be in the domain of influence of the same target. If we sum all of the numbers we'll be counting each target many times over. We need to be able to subtract off the targets that are counted twice. But some targets will be counted three times and so on. And how do we tell when a target has been counted twice when all we have are counts?

We'll make one simplifying assumption in solving this problem: that the regions of influence are simply connected. In other words, they are basically some kind of shape that doesn't have holes in it. That could mean anything from a square or disk to a shape like the letter 'W'. But it excludes shapes like annuli or the letter 'B'. If we make this assumption then we can solve this problem with a very simple algorithm that will work in almost all cases. In fact, the only time it fails will be situations where no algorithm could possibly work. But there's a little ground to cover before getting to the solution.

We'll make another simplifying assumption for now. That the sensors are arranged in a rectangular grid. So the data we get back from the sensors will be a grid filled with integers. That essentially turns our problem into one of image processing and we can think of sensor values as pixels. Here's a picture where I've drawn one domain of influence and I've indicated the values returned for three of the sensors.

Simple GridsSo lets assume the sensors have coordinates given by pairs of integers and that they return integer counts. The state of all the sensors can be represented by a function of this type:

> type Field = Int -> Int -> Int

We'll assume that we get zero if we try to read from beyond our domain. We can represent a grid of sensors, including the grid's width and height, using:

> data Grid = Grid Int Int Field

For efficiency something of type Field ought to read data from an array, but I'll not be assuming arrays here.

Ourultimate goal is to define some kind of count function with signature:

count :: Grid -> Int

Now suppose the function f gives the counts corresponding to one set of targets and g is the count corresponding to another. If the region of influence of these two sets of targets is separated by at least one 'pixel' then it should be clear that

count f + count g == count (f+g)

So at least approximately, count is additive. We also need it to be translation invariant. There's only one function that has these properties, summing up the values at all pixels:

Scaling up an `image' shouldn't change the number of targets detected. It should only correspond to the same number of targets with double-sized regions of influence. So we'd also like the following property:

count (n `scale` f) = count f

It's easy to see that that gsum actually has the following property for n>0:

gsum (n `scale` f) = n^2 * gsum f

(^ is the power function. For some reason lhs2TeX displays it as an up arrow.) These requirements are pretty tough to meet with an additive operation. But there's an amazing transformation we can perform on the data first. Instead of working on a grid with one value for each pixel we'll also store values for the 'edges' between pixels and for the 'vertices' at the corners of pixels.

Euler GridsSo lets define a new kind of grid to be a tuple of Field functions, one for faces (ie. the pixels), one for horizontal edges, one for vertical edges, and one for vertices.

The lower left vertex is (0,0) but we need to add an extra row and column of vertices on the right. Similarly we'll need an extra row and an extra column of edges. We can now `resample' our original grid onto one of these new style grids:

(Hint: you just need to try applying mystery_property1 to a few scalings of some chosen shape. You'll quickly find some simultaneous equations in a, b and c to solve. Solve them.)

2. Can you find a simple geometric interpretation for mystery_property1? Assume that the original input grid simply consists of zeros and ones, so that it's a binary image. It shouldn't be hard to find a good interpretation. It's a little harder if it isn't a binary image so don't worry too much about that case.

3. Now find some suitable arguments a, b and c to measure so that we get:

4. Can you find a simple interpretation for binary images? You might think you have it immediately so work hard to find counterexamples. Have a solid interpretation yet? And can you extend it to images that consist of more general integers?

5. Optimise the code for mystery_property2 assuming the image is binary and that the input is on a 2D array. Ultimately you should get some code that walks a 2D array doing something very simple at each pixel. Can you understand how it's managing to compute something that fits the interpretation you gave?

6. Define a version of scale called gscale that works on EGrids. Among other things we should expect:

g2e . (scale n) == gscale n . g2e

and that the invariance properties of the mystery properties should hold with respect to gscale.

I'll answer most of these questions in my next post. If you find mystery_property2 you've rediscovered one of the deepest notions in mathematics. Something that appears in fields as diverse as combinatorics, algebraic geometry, algebraic topology, graph theory, group theory, geometric probability, fluid dynamics, and, of course, image processing.

Part 2The Semi-perimeterLet's start with exercise 1 from my previous post. I allowed you to assume there was a solution. Knowing this we only need to try scaling up one shape. Here are three scalings of a single pixel:

We find a=0, c=-2b. I'll pick b = 1, c = -2. There's a straightfoward interpretation of measure 0 1 (-2) for binary images. It computes half of the perimeter, the semi-perimeter. There's a nice way to see this. We can try to count the number of edges in a shape starting from the number of faces in it. You can think of each face as being surrounded by 4 "half-thickness" edges. Where two faces meet we get a full thickness edge, so using half the number of faces counts internal edges correctly. But around the border of a shape we are left with contributions from only a face on one side. So we're only counting the perimeter edges by half. We get a shortfall of the semi-perimeter. Working backwards tells us how to compute the semi-perimeter from the total number of edges and faces.

For more general images, not just binary ones, we can roughly think of measure 0 1 (-2) as computing the sum of the semi-perimeters of all of the isocontours of our image.

The Euler CharacteristicThe interesting case is now exercise 3. This time our equation is:

4a+ 4b+ c = x 9a+12b+4c = x16a+24b+9c = x

Now we get a solution a = 1, b = -1, c = 1. If you try computing this for a few shapes it looks like it's counting the number of connected components of a binary image. However, once you realise the possibility of some holes in your image you find that it always turns out to be the total number of connected components minus the total number of holes.

Here's an example. Treat this as a single complete image:

It has two components and one hole so we expect measure 1 (-1) 1 to give us 1. We can count:

14 faces42 edges29 vertices

Giving 29-42+14 = 1.

I'm not sure which is the easiest proof that vertices-edges+faces counts the number of components minus the number of holes. One approach is this: we only need to consider one connected component at a time. Remove its holes. Now build up the shape one pixel at a time starting with one pixel and ensuring that you have a hole-free shape at each stage. It's not hard to enumerate all of the possible ways you can add one pixel to an existing shape and show that each such addition leaves measure 1 (-1) 1 unchanged. If you now make a single pixel hole in your shape you'll see that it lowers measure 1 (-1) 1 by 1. If you now continue to add pixels to the hole, in a way that doesn't change the number of holes, you'll see that measure 1 (-1) 1 remains unchanged again.

measure 1 (-1) 1 computes what is known as the Euler characteristic of a shape. I talked a little about this in one context earlier and showed how to compute it in another context here. The Euler characteristic is a topological invariant of a shape in the sense that a rubber sheet deformation of a shape leaves the number of holes and the number of components unchanged.

The above description shows that the Euler characteristic is particularly easy to compute. It simply requires a map-reduce operation over the entire grid. But what about the separate terms: the number of components and the number of holes? These seem like simpler notions and you might expect them to be just as easy to compute. Actually they are harder to compute. Compare also with flood fill algorithms which solve a related problem. Minsky and Papert show in their book Perceptrons that any topological invariant that can be learnt by a one layer neural network (with certain resonable restrictions) must be a function of the Euler characteristic. I find it quite amazing that this notion from topology is connected (no pun intended) to learnability.

We can define

> euler = measure 1 (-1) 1

I have sketched an argument that euler counts #components-#holes. If we assume that each of our connected components has no holes then it counts the number of components. But here's a neat thing: if we *add* two images that contain strictly overlapping shapes (ie. not just touching each other along their boundaries) then because of additivity, euler will still count the number of shapes. In other words, if you did the exercises then you solved the target enumeration problem. It's pretty miraculous. You could splat down thousands of geometric shapes into an image. They can overlap as much as you like. But as long as they don't touch along a boundary you can still compute the total number of shapes. If two shapes do touch along a common boundary then no algorithm can work, after all they'll be indistinguishable from a single connected shape. For a quick example, notice how

> test3 = euler test1

recovers that test1 is the sum of 4 shapes that don't touch.

Generalised IntegralsConsider our original measurement area. This sums the values at each pixel. It is a numerical approximation to the integral of a function sampled at each pixel. Likewise each of the measure functions is numerical approximation to a generalised type of integral. The original paper uses these integrals to solve its problem. I have simply used a discrete version.

I apologise for only sketching proofs. It takes considerably more work to provide rigorous proofs. But I encourage you to experiment with the code and attempt to find counterexamples. The history of the Euler characteristic is itself characteristed by a kind of back and forth between attempted proofs and counterexamples than in a strange way mirrors the innocent looking definition: vertices-edges+faces.

One last thing: I have implicitly shown that target counting is learnable by a certain type of one-layer neural network.

And thanks to @alpheccar for pointing out the original target enumeration paper.

Historical AddendumI'm repeating this as a possibly apocryphal story I have heard from other parties: Minsky and Papert demonstrated that the only learnable topological invariants for single layer network are functions of the Euler characteristic. In particular, they demonstrated the unlearnability of connectedness. This was a precisely stated no-go theorem that discouraged and slowed investment in neural network research for many years and helped contribute to the AI Winter. Is there a good historical book on this period of AI research?

Sunday, January 10, 2010

Using LaTeX was so much easier than HTML that I'm doing it again. The PDF is here and the literate Haskell source is here.

Abstract

The problem I ultimately want to solve, and its solution, is described in the paper Target Enumeration via Euler Characteristic Integrals by Baryshnikov and Ghrist. My goal here is to show how to implement that solution on a computer, and by doing so make it accessible to a wider audience.

Friday, January 01, 2010

This is an attempt to collect together, in tutorial form, a few of the things I've said about monads going back as far as my field guide. It's probably not a good first tutorial, but it contains things I'd wish I'd known immediately after reading my first tutorial.

Unfortunately, I used a few more LaTeX features than I ought to have while drafting it making it hard to get back into HTML form cleanly. So here is a PDF: Monads are Trees with Grafting.