August 19, 2005

Our New Filtering Techniques Are Unstoppable!

Attention conservation notice: 2800 words
and many large figures advertising our latest scientific paper. More than most
people will ever want to know about nonlinear filtering, cellular automata and
coherent structures.

A year and a day after we began working on the manuscript, here it is:

Abstract: Most current methods for identifying coherent structures
in spatially-extended systems rely on prior information about the form which
those structures take. Here we present two new approaches to automatically
filter the changing configurations of spatial dynamical systems and extract
coherent structures. One, local sensitivity filtering, is a modification of the
local Lyapunov exponent approach suitable to cellular automata and other
discrete spatial systems. The other, local statistical complexity filtering,
calculates the amount of information needed for optimal prediction of the
system's behavior in the vicinity of a given point. By examining the changing
spatiotemporal distributions of these quantities, we can find the coherent
structures in a variety of pattern-forming cellular automata, without needing
to guess or postulate the form of that structure. We apply both filters to
elementary and cyclical cellular automata (ECA and CCA) and find that they
readily identify particles, domains and other more complicated structures. We
compare the results from ECA with earlier ones based upon the theory of formal
languages, and the results from CCA with a more traditional approach based on
an order parameter and free energy. While sensitivity and statistical
complexity are equally adept at uncovering structure, they are based on
different system properties (dynamical and probabilistic, respectively), and
provide complementary information.

Rob and Kristina worked out the fundamental theory and algorithms for this
paper and its predecessor;
Rob also figured out the order parameter for cyclic cellular automata, and
Kristina did the actual statistical analysis. Jean-Baptiste implemented all
our ideas (in
Objective CAML), and ran all the
simulations. Cris came up with the idea of local sensitivity, and pushed for
harder examples. I pushed for local statistical complexity, and a lot of
misconceptions.

OK, assuming anyone's still reading, let me give you an illustration of the
kind of thing we're talking about in the abstract.

That is a picture of the time-evolution of a one-dimensional cellular
automaton ("ECA rule 54", in the jargon), starting from a random initial
condition. Space goes up and down, and time advances from left to right. What
you can see is that, most of the system is soon dominated by patches of a
single repeating regular pattern, called a "domain". Technically, each domain
is defined by a "regular language" (a
certain kind of rule describing the pattern), which can extend indefinitely
across the lattice, and persist indefinitely in time under the action of the
cellular automata rule. ("The regular language is invariant under the time
evolution".) There are also things moving through the domains ("particles"),
which are another kind of structure. All this is, in this case, reasonably
easy to make out by eye. You'll also notice, if you look long enough, that
every once in a while the domains are disrupted seemingly out of nowhere.
Since the rule is deterministic, there has to be a reason, and it turns out, if
you look quite carefully, that there are multiple phases the domain could be
in, that the boundaries between regions of different phase act like particles,
and you're seeing the collision of those phase defects.

As I've discussed before,
understanding the particles and domains of such systems is important in
understanding their dynamics, and still more important in grasping their
computational properties
— particles and their collisions
are the components people use to
build computational circuits in cellular automata, and appear spontaneously in CA
evolved to do non-trivial computations. Accordingly, there's a fair amount
of theory now for the regular-language patterns of one-dimensional
deterministic cellular automata with known rules. (Important early
contributions were made by, inter alia, Wolfram,
Grassberger,
and Boccara; the most general
theory I know of was developed by Hanson
and Crutchfield. Ilachinski's textbook on CA has a
pretty good review, but it's still a live subject, witness Pivato's recent work on
particle kinematics.) But "general theory" here means a general framework,
where all the details still have to be filled in by hand, case-by-case, after
intensive communion with the pictures like that figure, and with mathematical
objects like the regular-language evolution operator induced by the CA. In the
end, you can build a little filter which will scan over configurations produced
by the system and identify the domains and particles. (Here is a figure showing
the domain and the filter, from an old paper I wrote with Wim
Hordijk and Jim Crutchfield.)

And then you turn to another system, like this one (ECA rule 110), and you
have to do everything over again, because all the work you've done is
completely dependent on that particular system. Use your old filter on the new
data, and you get nonsense.

Now, unlike my co-authors, I am lazy, which means I don't like putting that
much work into figuring out the coherent structures in one system, let
alone many. This is the kind of thing I want a computer to do for me, with as
little input or insight on my part as possible. (As Larry Wall has said about
Perl programming, this sort of laziness implies a negative discount rate: you
do work now so your future self won't have to.) More seriously, the primate
visual cortex is a remarkable thing, and does a marvelous job of analyzing
the kinds of patterns needed to get East African Plains Apes through their
natural life-cycles, but it was never supposed to cope with massive collections
of high-dimensional multi-variate data, which is what science increasingly is
faced with. (Talk to, say, these
people if you don't believe me.) Something more automatic and principled
is deeply to be desired. We tried to find a generic way of locating the places
where interesting, important things were happening, on the grounds that the
most interesting and important things in the system are the coherent
structures. In fact, we found two ways of doing this, which turn out to be
quite distinct. (I was sure, before we actually had any results, that they'd
turn out to be two ways of getting at the same aspects of the system,
which shows it's a good thing wiser heads were involved.)

The first quantity, which we ended up calling "local sensitivity", tries to
quantify interest and importance in the sense of "having a lot of influence on
the rest of the system" and "small changes here make a big difference". In
classical dynamics, you quantify things like this with the Lyapunov exponents,
but for a number of reasons, explored in the paper, we ended up needing
something different. Basically, we perturb a small-ish region in the vicinity
of a given point, and then see how large an area is affected by the perturbation
over a certain interval of time; the bigger that area is, compared to how large
it possibly could be, the larger the sensitivity at the point in question.
Areas of high sensitivity are ones where small perturbations influence the
future evolution of large parts of the system; they tend to drive their
neighbors, rather than be driven by them.

The other quantity is the "local statistical complexity", in essence the
number of bits of information about the past of a given point needed to
optimally predict its future behavior. You might worry that this is not an
objectively well-defined quantity, but we'd earliershown how to put those fears to
rest: we showed how to reconstruct the effect state space at each point
("causal state reconstruction"), and then use some information theory and the
idea of a minimal sufficient statistic to show this gives an objective
forecasting complexity. The details are too technical to go into here (though
the connection between physical complexity and statistical inference is
pleasing), so if you're really interested I'll refer you to the paper. Complex
regions, in this sense, are ones where a great deal of information about the
past is required for optimal prediction — where a lot of the past is
relevant to the future, and fine distinctions have to be drawn between similar
histories.

In practical terms, what we did was take the original CA configurations, and
then compute the values of these two fields — local sensitivity and local
statistical complexity — at each point in space, at each moment in time,
over and over again, and then compare those results to the original field.
Here, for example, is what we get looking at the first example (ECA 54) —
in order, the original system (repeated here for comparison), the
system as filtered for sensitivity, and as filtered for complexity. (The
darker points are more sensitive or more complex, respectively.)

Looking at this, all the little defects just pop right out, even though the
filters don't know anything about the phase structure of the background domain,
or even that there is a background domain. Now, when we
apply exactly the same filters to the second example (ECA 110), this
is what we get:

In other words, we see the particles cleanly separated from their regular
domain backgrounds, and the particle collisions/interactions as well; they're
as complex or even more complex than the particles. Since the interactions are
what you use to build a universal computer in this CA rule, that's pleasing,
but secondary.

Which is all very well, but there are remarkably few one-dimensional
pattern-forming systems in nature, and the regular-language story gets weird
and unsatisfying in two or more dimensions. Do our filters work in more
than one dimension? Well, it depends what you mean by work. The
defining math has no problem in higher dimensions, but how do we know
that the structures they find are the right, important ones? We needed a
two-dimensional dynamical system where people had already figured out
what the important structures were. For a number of reasons, we chose cyclic cellular automata
(CCA) — partly because they self-organize into cute spiral waves,
partly because some clevermathematicians had already thoroughlystudied the system, and
partly because some of us had already writtenpapers about them and felt
comfortable with them.

You really need a movie to appreciate what CCA do, but my skills don't
extend to creating one, so I'll just recommend that you download Mcell and see for yourself.
Here, in lieu of that, are a few snap-shots. You can see the random-noise
initial conditions,

the first growth of spirals,

and their eventual domination of the whole lattice.

If you're a statistical physicist, you're indoctrinated into a particular
way of thinking about coherent structures. You think of them as topological
defects in the order parameter field, places where the free energy associated
with the order parameter gets very high; and the order parameter is the
variable which measures the symmetry broken by the background pattern. This is
part of an elaborate, immensely-successful circle of ideas connecting broken
symmetry, order parameters, phase transitions, universality, thermodynamic
potentials, generalized elasticity and hydrodynamics — what have been
called the "basic
notions of condensed matter physics". But the application of
these ideas involves an immense amount of trial and error, too, and tradition.
When you're confronted with a new system, your first question is "which system
does it remind me of?", hoping that someone else has already identified the
right order parameter and effective free energy for that system. You
then try to make them work for yours, find that they don't, and start tinkering
with them until, if you are very good, things start to click. Making
everything work out of equilibrium is a bear, and even if the next
system you encounter looks similar, there's no guarantee that the same order
parameters will in fact work there. (While we were writing this, we ran
across this paper on a
system with a similar phenomenology to our CCA. We tried their order parameter
on our case, and the results were horrible.)

Nobody had worked out the order parameter for CCA before, but we wanted to
compare the results of our filters to more physical ideas, so we needed to come
up with one. This was basically the work of Rob Haslinger, who (unlike me) is
actually very good at real physics. After remarkably few tries, he came up
with something which fits the end-state spiral configurations very well (a kind
of discretized XY model, if
you're into this kind of thing). Here is a typical image of one of those
long-run configurations, on the left, and the free energy field corresponding
to the order parameter on the right.

Now here is what we get from the sensitivity and complexity filters. Let me
stress, again, that these are the same filters we applied to the 1D case
— we really, honest, cross our hearts didn't cheat, tweak or adapt at
all.

Clearly, both our filters pick out the right structures — though in
somewhat different orders. It turns out that you need a lot of information to
predict what's happening at the boundaries between spirals — essentially
because you've got out-of-phase regions right next to each other, and so you
need to think hard to see what things at the boundaries will do — but
they're not very sensitive. Make a small perturbation on a boundary, and it's
going to get erased quickly by the waves radiating out from the spiral cores.
The story with the cores is just the other way around: you need less
information to predict them (though more than background), but disrupting one
screws with the whole spiral, making them highly sensitive and autonomous.

I mentioned that the two filters give complementary information; they look
at different kinds of system properties, as we say in the abstract, and their
values are basically uncorrelated. (You can find correlation coefficients in
the paper.) Like I said, I'd expected there would be a strong correlation at
least — that places where optimal prediction needs a lot of information
would be places where small perturbations have a large effect, and vice versa.
My co-authors were not convinced, and, as it happens, right. Remarkably
enough, however, if you look at the pictures, they're manifestly identifying
the right structures, but in different ways, like I just explained in the
spiral case. (We go into the contrast between the two filters in some depth in
the paper.)

The fact that the order parameter field looks almost the same as the
complexity field is, in my humble opinion, especially noteworthy. Let's see
them again.

The first is the product of Rob's insight, and the accumulated wisdom of the
statistical-physical community; it is also a function of the current
lattice configuration alone. The second is an entirely automatic product of
our code, and the definition of local statistical complexity, and depends on
the past time-evolution of each point's local neighborhood. It is not
at all obvious that there should be any relationship between them at all. I
admit that they're not exactly the same — they're only strongly
correlated, not identical — but the estimates of statistical complexity
are subject to finite sample noise, and I'd be willing to bet the difference is
due to that. It would be grossly irresponsible at this point to claim that
reconstructing the causal states gives us a way of automatically finding good
order parameters, but the identity between those figures suggests something
like that might be worth looking at.

Also worth looking into, even more, is actual data. Because the sensitivity
filter depends on making perturbations, and lots of them, it's probably not
suitable for many experimental situations. But the local complexity filter
just needs observations, so in principle any experiments producing sufficiently
fast movies could work. (Time resolution is important because we need to
ensure that a change at one point in one frame can only affect a finite region
in the next frame, or else our techniques break down. We'd also need to
somehow discretize space and color, but digital cameras do that to you anyway.)
For it to be really meaningful, you'd need either an effectively
two-dimensional system, or three-dimensional imaging data; we've got some ideas
on both fronts, but are open to suggestions. We also have some ideas about
weakening the requirement of having a movie, so we could work with just a
representative ensemble of snapshots, but they're not ready yet for public
consumption.