The Blog of Scott AaronsonIf you take just one piece of information from this blog:Quantum computers would not solve hard search problemsinstantaneously by simply trying all the possible solutions at once.

Update (June 3): A few days after we posted this paper online, Brent Werness, a postdoc in probability theory at the University of Washington, discovered a serious error in the “experimental” part of the paper. Happily, Brent is now collaborating with us on producing a new version of the paper that fixes the error, which we hope to have available within a few months (and which will replace the version currently on the arXiv).

To make a long story short: while the overall idea, of measuring “apparent complexity” by the compressed file size of a coarse-grained image, is fine, the “interacting coffee automaton” that we study in the paper is not an example where the apparent complexity becomes large at intermediate times. That fact can be deduced as a corollary of a result of Liggett from 2009 about the “symmetric exclusion process,” and can be seen as a far-reaching generalization of a result that we prove in our paper’s appendix: namely, that in the non-interacting coffee automaton (our “control case”), the apparent complexity after t time steps is upper-bounded by O(log(nt)). As it turns out, we were more right than we knew to worry about large-deviation bounds giving complete mathematical control over what happens when the cream spills into the coffee, thereby preventing the apparent complexity from ever becoming large!

But what about our numerical results, which showed a small but unmistakable complexity bump for the interacting automaton (figure 10(a) in the paper)? It now appears that the complexity bump we saw in our data is likely to be explainable by an incomplete removal of what we called “border pixel artifacts”: that is, “spurious” complexity that arises merely from the fact that, at the border between cream and coffee, we need to round the fraction of cream up or down to the nearest integer to produce a grayscale. In the paper, we devoted a whole section (Section 6) to border pixel artifacts and the need to deal with them: something sufficiently non-obvious that in the comments of this post, you can find people arguing with me that it’s a non-issue. Well, it now appears that we erred by underestimating the severity of border pixel artifacts, and that a better procedure to get rid of them would also eliminate the complexity bump for the interacting automaton.

Once again, this error has no effect on either the general idea of complexity rising and then falling in closed thermodynamic systems, or our proposal for how to quantify that rise and fall—the two aspects of the paper that have generated the most interest. But we made a bad choice of model system with which to illustrate those ideas. Had I looked more carefully at the data, I could’ve noticed the problem before we posted, and I take responsibility for my failure to do so.

The good news is that ultimately, I think the truth only makes our story more interesting. For it turns out that apparent complexity, as we define it, is not something that’s trivial to achieve by just setting loose a bunch of randomly-walking particles, which bump into each other but are otherwise completely independent. If you want “complexity” along the approach to thermal equilibrium, you need to work a bit harder for it. One promising idea, which we’re now exploring, is to consider a cream tendril whose tip takes a random walk through the coffee, leaving a trail of cream in its wake. Using results in probability theory—closely related, or so I’m told, to the results for which Wendelin Werner won his Fields Medal!—it may even be possible to prove analytically that the apparent complexity becomes large in thermodynamic systems with this sort of behavior, much as one can prove that the complexity doesn’t become large in our original coffee automaton.

So, if you’re interested in this topic, stay tuned for the updated version of our paper. In the meantime, I wish to express our deepest imaginable gratitude to Brent Werness for telling us all this.

For the background and context of this paper, please see my old post “The First Law of Complexodynamics,” which discussed Sean’s problem of defining a “complextropy” measure that first increases and then decreases in closed thermodynamic systems, in contrast to entropy (which increases monotonically). In this exploratory paper, we basically do five things:

We survey several candidate “complextropy” measures: their strengths, weaknesses, and relations to one another.

We propose a model system for studying such measures: a probabilistic cellular automaton that models a cup of coffee into which cream has just been poured.

We report the results of numerical experiments with one of the measures, which we call “apparent complexity” (basically, the gzip file size of a smeared-out image of the coffee cup). The results confirm that the apparent complexity does indeed increase, reach a maximum, then turn around and decrease as the coffee and cream mix.

We discuss a technical issue that one needs to overcome (the so-called “border pixels” problem) before one can do meaningful experiments in this area, and offer a solution.

We raise the open problem of proving analytically that the apparent complexity ever becomes large for the coffee automaton. To underscore this problem’s difficulty, we prove that the apparent complexity doesn’t become large in a simplified version of the coffee automaton.

Anyway, here’s the abstract:

In contrast to entropy, which increases monotonically, the “complexity” or “interestingness” of closed systems seems intuitively to increase at first and then decrease as equilibrium is approached. For example, our universe lacked complex structures at the Big Bang and will also lack them after black holes evaporate and particles are dispersed. This paper makes an initial attempt to quantify this pattern. As a model system, we use a simple, two-dimensional cellular automaton that simulates the mixing of two liquids (“coffee” and “cream”). A plausible complexity measure is then the Kolmogorov complexity of a coarse-grained approximation of the automaton’s state, which we dub the “apparent complexity.” We study this complexity measure, and show analytically that it never becomes large when the liquid particles are non-interacting. By contrast, when the particles do interact, we give numerical evidence that the complexity reaches a maximum comparable to the “coffee cup’s” horizontal dimension. We raise the problem of proving this behavior analytically.

Questions and comments more than welcome.

In unrelated news, Shafi Goldwasser has asked me to announce that the Call for Papers for the 2015 Innovations in Theoretical Computer Science (ITCS) conference is now available.

This entry was posted
on Tuesday, May 27th, 2014 at 11:55 am and is filed under Announcements, Complexity, Nerd Interest.
You can follow any responses to this entry through the RSS 2.0 feed.
Both comments and pings are currently closed.

80 Responses to “Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton”

Alexander: No, I hadn’t seen that paper; thanks! From the abstract, it looks like it’s addressing unrelated questions, but I’ll take a look when I have time. (Or if you’d like to save me some time, can you define the Q2R automaton in a few sentences, and explain how it differs from the coffee automaton?)

Fascinating post! I’m curious if you all considered Charles Bennett’s concept of “logical depth” as a possible complexity measure in this context. I note that in his paper on the topic, he considers Kolmogorov complexity and mutual information as possible measures and ultimately decides that logical depth has more of the phenomenological features of complexity, including “slow growth.” I have no idea if the measure can be meaningfully applied here and perhaps the Kolomogorov measure is preferable for technical reasons but the idea had a lot of intuitive appeal to me and I’m surprised that I haven’t seen it discussed more elsewhere. Thanks again for being so stimulating and thoughtful!

Scott #2 – if I understand correctly, coffee automaton is defined as lattice gas CA. Q2R is “usual” CA on check board and it does not look similar from the first site, but in very first definition it was defined as Q2R2 second order CA on usual field and it may be more convenient for comparison.
Such CA conserve sum of some “interaction energies” for links between cell, and so if energy in one place, it must became smaller near that. It may be considered as transfer of energy between different places and so comparison with lattice gas with motion of some particles is more justified.

Oh, glad to see something actually come of this! A little disappointing that most of it focuses on a measure requiring a choice of coarse-graining; I was really hoping to see more discussion of resource-bounded sophistication. Well — now the idea is out there and not just on your blog, at least. 🙂

By the way, on page 5, you write “Also, let K(x|S) be the length of the shortest program that outputs x given as input a description of x.” I assume that should be given as input a description of S?

Joseph #3: Yes, see Section 2.3 of our paper (entitled “Logical Depth”). There was also an interesting discussion in the comments of my original post about the strengths and weaknesses of logical depth compared to other complexity measures. For us, a decisive consideration was that we didn’t know a practical way to get even crude estimates or bounds for logical depth, for the purpose of numerical experiments. But in any case, logical depth can be shown to be closely related to sophistication, which does inspire the measure that we use.

Sniffnoy #5: As we discuss in the paper, the choice of coarse-graining that we use is far from arbitrary, because you could derive it from the causal structure of the automaton. But yes, I agree that it would be wonderful to be able to compute a complexity measure that didn’t need to “know” anything in advance, even about the causal structure of the automaton, and that simply figured out the right coarse-graining for itself! As I discussed in the earlier post, that’s exactly what the definition of resource-bounded sophistication tries to do. The problem, of course, is that you end up with something that, as far as we know, is exponentially-hard to compute or even approximate. So, as we again go into in the paper, you could think of apparent complexity as a cruder variant of resource-bounded sophistication, which is “helped along” by a “hint” about which is the right coarse-graining to use.

This is an area that really could use some more rigor and a general agreement on common definitions. Immense amounts of papers have been written on complexity, emergence and self-organization of the last decades. So I am a bit surprised that your paper only cites six references. I would have expected something along the lines of this paper that has five pages filled with references. And that’s still just scratching the surface.

quaz #8: Sorry! Your comment made me realize that I’d mistakenly posted a slightly-outdated version of the paper. The correct version has 13 references. 🙂

And yes, we do try to bring some order in Section 2 to the different ways one might think of to measure the thing that interests us. However, precisely because the literature on “Santa Fe complexity” is so immense (as you point out), we had no ambitions to survey the large swaths of the literature that weren’t relevant to our limited goals.

If there’s a particular reference we left out that’s relevant to what we’re doing, please let us know and we’ll add it.

Where is the complexity coming from when it’s rising and where is it going when it decreasing?

I’m not sure that’s a meaningful question, because I don’t know of any conservation laws relevant to the evolution of complexity.

In the coffee automaton, the complexity “comes from” the complicated tendrils of milk that form as the milk starts to mix with the coffee, and it “goes away” because after the milk and coffee have fully mixed, the coarse-grained state again becomes simple to describe.

Extra references because I added my book, then added Gell-Mann’s book so as not to seem too obvious.

More seriously: the general point “as entropy rises in a close system, complexity will first rise and then fall” seems pretty general and robust, but I don’t know of any quantitative examination of the idea before ours. Pointers very welcome!

If I may speculate, it seems to me that fred’s confusion stems from the relationship between information and complexity. You don’t need a lot of information to describe the macro states at the beginning and the end, the usual thermodynamic measures P,V,T will do. To capture anything in between will require to store lots of information on the level of micro-ensembles. Now if entropy is preserved so should information, but in case of the coffee mixing that is obviously not the case.

If we have some measurement device computer that tracks the coffee and include that in the system boundary and the thermally isolate it entropy will still increase as soon as the mixing sets in.

But Fred’s confusion triggers a very interesting follow up questions, what would a physical system have to look like in order to conserve complexity? The typical hand-wavy answer is that it needs to be a flow-through system far from thermodynamic equilibrium, yet if we had a proper analytical measure shouldn’t Noether’s theorem allow us to translate this into a specific required symmetry?

Regarding the coarse-graining issue, it reminds me a bit of dimensionality of fractals:
“A fractal dimension is an index for characterizing fractal patterns or sets by quantifying their complexity as a ratio of the change in detail to the change in scale”
Could there be some some interesting relation?

Instead of H[f(x)] where f(x) “smooths out noise”, you could use an entropic measure like “total correlation”, which automatically ignores uncorrelated noise. TC is zero for both extremes (deterministic and totally mixed). Of course it shares the difficulty of being hard to estimate or even lower bound and it is maximized by some apparently non-complex distributions (which is all to shamelessly promote my soon-to-appear arxiv papers solving these problems, and introducing an assumption-free notion of coarse-graining through optimizing an information-theoretic objective :).

As a software engineering application, I wonder if something along these lines can be used to quantify the complexity of general programs in a more rigourous way. Right now software engineering has pretty poor ways to quantify complexity, typically cyclomatic complexity and lines of code metrics, both of which are mildly correlated with bug counts.

Many algorithms are also closed systems, in that inputs are simple to describe, output is often similarly simple, but in computing the output, it builds structures with a great deal of complexity. Could a programming language-specific model be used to infer the “apparent complexity” of any program written in that language? How useful could such a metric be?

With the idea that interestingness may have adaptive significance, attracting us to environments where we have a competitive advantage extracting energy.

Think of the fuel value of s. Random strings have no fuel value and the above expression = -1. Simple strings (pile of carbohydrate on the floor) have maximal fuel value, but bacteria have an advantage processing them. The most competitive environment is half-random (approaching zero net fuel value since it takes energy to process each bit).

If so, then unfortunately, it doesn’t even come close to capturing what we want. For one thing, it’s going to be maximized (sometimes even infinite!) for a string whose first n/2 bits are random and whose last n/2 bits are all 0’s. But such a string isn’t intuitively “complex,” any more than its left or right halves are when considered individually. For another, your measure has the bizarre property that, if I increase the size of the string’s “random” part just slightly beyond n/2, then the complexity flips and becomes negative. What’s the interpretation of that? And why should “half-random, half-deterministic” be any kind of special threshold, let alone a singularity in complexity? Qualitatively, what’s different about 51% random and 49% deterministic?

Scott #18 Re. mirrored strings, my comment from 2011 was set in terms of an online reinforcement learner. I replaced that with gzip here as a gloss. In that setting, you just note the behavior changes suddenly when you cross the midpoint (if our machine has enough memory to detect the mirroring). The difference between 51% random and 49% random is that our machine can survive in the latter case but not in the former, because the latter case still has 1% net fuel value. I mean fuel value in the Szilard/Landauer/Bennett sense. This assumes the energy cost of a wrong prediction is equal to the energy gain of a correct one. That might not be true but the expression can be adjusted for the correct ratio.

Scott, do you have an intuition about complextropy of the Ising model near critical temperature? From one point, complexity of low and high temperature states should be low in accordance with the coffee cup theory. From the other point, model at exactly the critical temperature have some regularities that intuitively make it less inexplicable than a system with maximal complextropy.

Carl #19: It sounds like n/(n-2K(x)) might be a reasonable measure for something completely orthogonal to what interested us. We weren’t interested in the rate of consumption of free energy, but rather in the formation of “structure.” And it’s easy to come up with examples separating the two, in either direction. If you look at the “non-interacting automaton” in our paper, that gives an example where the rate of free energy consumption (suitably defined) should go up, reach a maximum, and then go back down, yet interesting structure never forms (and indeed, we can rigorously prove that the “apparent complexity” remains small throughout). Conversely, if there is interesting structure present, then in some sense we don’t care how much free energy there might be also be sitting unused in the system (except insofar as that energy could’ve been used to create even more structure, but wasn’t).

Oleg #20: Is this a good picture of the Ising model near the critical temperature? (I found it on Google Images.) If it is, then yes, the apparent complexity in our sense looks like it’s almost certainly large. Obviously eyeballing isn’t a proof, 🙂 but in any case, that’s a very nice example—thanks! Given how much is known about the 2D Ising model, maybe it would be easier to prove a lower bound on apparent complexity for that than for the intermediate states of the coffee automaton.

“Could a programming language-specific model be used to infer the “apparent complexity” of any program written in that language? How useful could such a metric be?”

Sandro, I think this is a really good question, speaking as an engineer not a scientist. I would be extremely interested to read about the correlation between computational complexity and code complexity, if there is one.

I would add, though, that to me it naively seems like any static analysis of code is ultimately doomed to failure. The input-process-output cycle is just a simplification, a view along a single thread. To me, the real complexity of a modern program is the dynamic state of its entire object graph, and how that state evolves along multiple threads of execution across time. Complex behavior emerges when the system is viewed at this level. No static analysis could ever uncover or emulate that, short of actually running the program itself, could it?

Sandro #16 and Darrell #23: I think the problem with the word “complexity” is not so much that it’s undefined as that it’s overloaded—people use that same word to refer to many more-or-less clear but totally different things, so that when you talk about one of the meanings, people are liable to get excited and think you’re saying something about their favorite meaning of the word, when actually that would be a completely different research project (or even field).

Sean, Lauren, and I were interested in quantifying the “structure” that forms at intermediate times in closed thermodynamic systems, typified by the coffee cup. And my feeling is that that problem, while already complicated, could be completely solved without telling us anything about the vastly more complicated problem of quantifying the “complexity” of arbitrary computer programs.

For note that we do know something about the latter problem: namely, we know that almost any natural formalization of it is going to be undecidable in Turing’s sense! Or, to put the point more dramatically: there are incredibly “simple”-looking computer programs, like

n:=2
do
n:=n+2
loop until n cannot be written as a sum of 2 primes

that are already so complicated in their behavior that no one can prove whether they halt or not.

Now, I realize that in real-world software development, one isn’t typically concerned with programs whose behavior depends on the truth or falsehood of Goldbach’s Conjecture! And I realize that there could be measures of “complexity” (not computational complexity, but more like “complicatedness” in the intuitive sense) for real software that might be useful, despite the general undecidability of such questions. But that gets into questions that, while interesting, are very far from what Sean, Lauren, and I were thinking about.

The Ising model is a good example — a very good example, actually. At zero temperature and nonzero magnetic field, all spins will be aligned — “simple” but also “low-entropy.” At high temperature spins will be randomly distributed — simple but high-entropy. At the critical point there will be a scale-free distribution of domains, which is precisely what we’re characterizing as “complex.” Of course that’s as a function of temperature rather than time, but I think it’s a similar idea.

Sean, I completely agree that it’s a great example! And while it’s not an example of complexity increasing and then decreasing under time evolution, presumably one could make it into such an example, by imagining that the temperature slowly increases, and the distribution over spin configurations tracks the ground state adiabatically? (OK, this still wouldn’t be an example of a closed thermodynamic system, since the temperature increase would be coming from the outside.)

In any case, the question that interests me the most here is the one from my last comment: namely, can we use known results about scale-freeness to prove that the apparent complexity is large at the critical temperature?

I bet we could. Informally: think of the Fourier transform. At low temperature it is peaked at long wavelengths, and the entropy is low with or without coarse-graining. At high temperature it peaks at short wavelengths, and coarse-graining wipes everything out. Only near criticality is there substantial entropy that is not wiped out by coarse-graining.

Hand-waving, of course. I have to hop on a plane, but let’s keep thinking.

Scott #21 Looks like you only get that result in the adjusted case. I guess the proof in the slide deck should increase my faith in the result. But you know, I don’t see much difference between the middle and right slides in the top rows of figs 3&4. Nor is a lack of structure apparent in the top row of fig 4 vs the top row of fig 3.

At least you *have* pictures, and tried different things. The literature on this topic strikes me as data-poor. Somebody should have a bunch of people rate a bunch of images and release it as a test corpus.

“And I realize that there could be measures of “complexity” (not computational complexity, but more like “complicatedness” in the intuitive sense) for real software that might be useful, despite the general undecidability of such questions. But that gets into questions that, while interesting, are very far from what Sean, Lauren, and I were thinking about.”

Yeah, the complexity measure Sandro mentioned (that I’m also interested in) relates to how complex a set of code is, from a maintainability and readability standpoint. The idea is that the more ‘complex’ the code, the more likely it is to contain errors, and the less likely that subsequent developers will be able to modify it correctly.

The state of the art, at least in the commercial world, is language-aware static analysis, which ‘reads’ the code and does a best-effort calculation of all the paths through the code. Based on this analysis it can calculate a number of metrics, one of which is cyclomatic complexity. The hope is that we can use metrics like this to compare different code sets in an apples-to-apples way, to measure how we’re doing at keeping code simple and well written. It doesn’t work all that well in practice, but it’s better than nothing. This is an active area of research, so there may be better approaches under study than what I’m talking about.

Anyway like you said code complexity is clearly a completely different domain from computational complexity, but I was curious whether anyone had studied whether they had any relationship. You showed the answer is clearly no, and in retrospect I should have realized that a bad coder can take any algorithm, however simple, and make the code arbitrarily complex. 🙂 So yeah, they are clearly not related. My bad for getting excited.

Obviously homogeneity can be modelled recursively, thus reducing complexity. Furthermore, usually approximation in modeling (particularly strategic approximation which targets elements based upon size and unimportance) can be used to further reduce the complexity.

I am thinking (I don’t know if it is true) that the minimum lenght program (the Kolmogorov complexity) that represent a physical system (or a toy model) can be the differential equation, or in this case the finite difference equation, that in some instant represent the data.
If this is true, then the separated and mixed state have a simple finite difference equation (low number of parameters), and the mixed state a great finite difference equation (complex surface, so that there is a complex finite difference equation).
The solution of the differential equation can be complex (each program can solve it in different long way and the differential equation is the gauge) but the minimization could be simple.

I have to confess that I am more intrigued by why this is somehow new territory. Mixing of two substances is not new science. However, if this is a new line of research, I would point you to mechanical engineering and the study of thermodynamics of steam as a source for insight. The general issues of quantifying change as two substances, such as liquid water and steam, combine is well understood. Now granted, this is a forced system, but then since the coffee model is in a gravitational field, it can be argued its is forced as well.

Relating Ising model:
1. Q2R CA is often considered as approximation of the model
2. It does not have temperature parameter
3. Working rule for simulation of Q2R may be obtained in this comment on a CA forum (in fact, I have learned about Q2R from comment in Shtetl-Optimized and prepared the rule after that to try) together with couple examples (software used in the forum is rather standard for CA simulation and freely available), but I am not certain about non-monotonous behavior of complexity with respect to time, yet it is interesting to observe in second example how one corner of a rectangle is not filled during quite long period of time after start from regular pattern with small noise.

Maybe I am missing something but it seems to me that all you have shown is that we mix cream and coffee there are intermediate stages of mixing before the two are fully mixed. I am not sure how that can be regarded as complexity.

If, for example, the coffee and cream congealed into small masses consistently in each run during the intermediate stage, I might think of that as complexity.

What would happen if you applied some simple ordering rules? For example, whenever a cream cell becomes surrounded by coffee and/or vice versa then the structure is allowed to persist for some time. Would more order arise from simpler order?

Sean #25 and Scott #26: What troubles me is that when you know the system is scale-free, it’s almost trivial to write a program to generate a representative case of it. To put it in other words, it feels like any scale-free system has the same complextropy.

I have this toy app for a slightly modified Ising model, which has 2D phase space and 2 transitions: one from low to high temperatures, and the other from 2 spin states to 4 spin states. I wonder how complextropy would change along the border of phase space domains, where the state is critical, but of the different sort than in conventional Ising model.

Hal Swyers #32: You’re correct that “mixing of two substances is not new science.” On the other hand, the notion of “sophistication” that inspired our measure of apparent complexity only dates from the 1980s, and the use of numerical experiments with cellular automata to study these kinds of questions also only really took off in the 80s.

So, in short: while there are excellent reasons why this paper couldn’t have been written in the 19th century, I don’t have a good answer as to why it wasn’t written in the 1980s! And maybe the answer is that it was. As Sean said, if you have specific references we’d be grateful for them.

I’d especially be interested if you knew of any tools from physics or chemistry that would allow us to prove our mathematical conjecture, that the apparent complexity really does become large at intermediate times in the coffee automaton. My impression was that there are countless results in the literature about ergodicity, equilibrium distributions, and mixing times, but surprisingly or not, almost nothing about how to characterize what a Markov chain is doing in the era before it mixes, in a way that would allow one to answer questions like ours.

Scott #24:
I realize that the “complexity” discussed here isn’t immediately related to the various overloaded program complexities, but I was trying to see if some narrow subset of such complexities were.

What I was thinking was some language-specific analysis that could generate an simplified automoton that models the approximate information added or thrown away at each step, in a way that could quantify a program’s runtime complexity. Obviously such an analysis will have to return “don’t know” in some cases (or many cases?).

Or would your analysis be little better at this application than just comparing the gzipped program source code, perhaps after some sort of identifier renaming pass that normalizes the source code input?

Maybe I am missing something but it seems to me that all you have shown is that we mix cream and coffee there are intermediate stages of mixing before the two are fully mixed. I am not sure how that can be regarded as complexity.

No. Whatever else you think about our paper, it’s certainly more than that. Consider, for example, our “control experiment” of the coffee automaton with interpenetrable cream particles (what we called the non-interacting case). That also has intermediate stages of mixing before the coffee and cream are fully mixed, but in that case, we can prove that our complexity measure (the apparent complexity) never becomes large. That means that interesting structure (in our sense) forming is something “optional,” which can happen in certain thermodynamic systems but doesn’t happen in other systems. What we’d like is to be able to predict, from first principles, when such structure will form and when it won’t, using only the rules of the automaton and its initial conditions.

What would happen if you applied some simple ordering rules? For example, whenever a cream cell becomes surrounded by coffee and/or vice versa then the structure is allowed to persist for some time.

Could you explain more precisely what modification of the coffee automaton you’re proposing?

Although I need more time to understand your work in context of recent mixing and stirring literature , the following might be of interest:
1). Using multiscale norms to quantify mixing and transport
JL Thiffeault, Nonlinearity 2012

It appears that your complexity calculation is measuring, up to a monotonically increasing function, the derivative of the net number of swaps from coffee to cream. In the long term the net number of swaps increases at a rate proportionally to the ratio of coffee to cream. Thus the long term constant complexity approaches is a function of the ratio of coffee to cream.

(I noticed everyone uses their real names around here. I’m the poster formerly known as MadRocketSci)

Very interesting paper!

I have only had time to glance at it, but I think the light-cone complexity measure might be very interesting to look into.

Most mathematically generated objects strike us as being austere and simple. One class though that seems to jump out at us and hammer whatever part of our brain measures interestingness and complexity are fractals (even if they are, ultimately, fairly boring and self-repeating, there is something there that grabs our interest.) As far as I know, one of the common themes of these structures is that they are all generated by some sort of repeated process applied to a data-set. They aren’t spat out of one-step processes or simple functions, but rather the result of some algorithm applied ad infinitum to an initial condition or point set. This would imply a time dimension to these things (even if the final result everything converges to is static) to potentially apply the light-cone measure to.

Anyway, something to look into in that imaginary future where I have arbitrary amounts of time.

Maybe everyone knew this already, but it strikes me that our interpretation of entropy evolution through time and the second law is not quite correct.

Roughly the text book interpretation is that over time the micro-state has to look more average. That is, when comparing the current micro-state to the distribution of every possible micro-state, over time the trend is toward the most probable micro-state (this interpretation seems to be little more than a recasting of martingale convergence theorems).

So, I am conjecturing that entropy in a dynamically evolving system is actually measuring, for a given point in time, the number (or distribution) of mirco-states accessible from the current micro-state. In this interpretation the second law of thermodynamics states that the number of accessible micro-states must increase.

But how does this relate to complexity? Well through comparing successive accessible sets of micro-states. At both the beginning and end of the evolution of the cream in coffee, for any succession of micro-states, the set of accessible micro-states overlap considerably, that is the set of accessible micro-states is not changing. But at the point of maximum complexity the rate of change of the set of accessible states is changing the fastest.

You could likely measure this through that amount information from the previous accessible set contained in the next accessible set (conditional information), in this way you can account for shifts from non-uniform distributions as well.

Hmmm, this sounds awfully concrete and computable, I think I will have to right some software tonight…

Have you tried to stir your virtual coffee with a virtual spoon to check if that’d increase its “complextropy” as in real life?

We didn’t—but in any case, you would only expect stirring to increase the complextropy if the coffee and cream were well-separated to begin with, and then only temporarily. If the coffee and cream were already partly mixed, then stirring could decrease the complextropy, by speeding up the approach to equilibrium.

If we can think of entropy as a measure for the probability distribution P(x) over the possible states of x, can we think of ‘complexity’ as a similar measure for the probability distributions over the possibility distributions P(P(x)) ?

In a homogeneous, high entropy system the possible states of every particle are high (the probability distribution is wide), but every particle will have a similar probability distribution, making the complexity low.
In an initial system such as the ones considered here, the possible states are few, and similar, so we have both low entropy and complexity.
But in an intermediate system, the possible states are not similar for all particles (in the coffee example, the coffee particles near the milk particles have a chance to ‘mix’, the ones surrounded only by similar particles do not ), so we have high complexity !

Scott #48, do you think we could predict at which point the patterns will be favorized/unfavorized? Suppose we examine the c_tropy* from two or three ressources separatly (say phase separation, temperature gradiant, active spoon), do you think we could predict the c_tropy we’ll get with these ressources together?

If we consider the two liquids to be continuous, i.e. each point has a density of coffee dc(x,y,z) and density of milk dm(x,y,z), each between [0.0,1.0]
with dc(x,y,z) + dm(x,y,z) = 1.0
then we should be able to compute a gradient field throughout the evolution of the mixing:

a) If the two liquids are not mixed much, there is a planar interface region where a gradient exists.

b) If the two liquids are starting to mix, by creating some mixing zone/interface of more or less complex structure.
Here the continuous densities can account for all possible types of mixing interfaces. I.e. smooth gradual mix with minimum surface (and no other discernible structure at any scale) vs a more complex zone made of complicating interpenetrating dendrites (with different possible size in relation to the size of the system).
Regardless of the type of mixing interface, there’s going to be more gradient than in a).

c) eventually the mixing interface structure expands to the whole system (and existing dendrites dissolve) and the two liquids are completely mixed, there is no more gradient.
This assumes that two liquids truly mix eventually with density 0.5/0.5, i.e. they don’t form a uniform mix of tiny bubbles/atoms (they behave like milk/coffee and not like a water/oil emulsion).

Seems like by doing some sort of integration of the gradient field it should be possible to come up with a complexity measure that’s always max in b) and minimum in a), c), regardless of the scale factor of the mixing interface.
When we take the discrete version of this (finite elements), for a given snapshot state of mixing, we start with a coarse grid, compute the gradient and complexity, and repeat until we hit a set minimum grid size. Either the complexity stays the same at each grid size in the case of continuous mixing interface, or it has a sharp sudden increase at a given grid size and then stays constant after that (with smaller grid size) in the case of more heterogeneous mixing interface.

This is all very interesting! Great post and it’s good that someone is working on this stuff.

Regarding the “border pixels problem”, I kept wondering why you didn’t just apply a Gaussian blur function followed by a threshold function. Some home experiments using GIMP have confirmed that this does indeed work – take a picture which is total random 1-bit noise, apply a Gaussian blur with a blur radius of 50 pixels and then threshold so that 50% of the pixels are black. The 7-zip file size goes down from 140kB to 35kB.

If you apply this process to an image with a feature size > 50 pixels, the file size hardly changes.

Rationalist #53: No, think about a coffee cup that’s partly but not fully mixed. Along the border between cream and coffee, the thresholding will create “spurious complexity”—i.e., complexity that’s only due to the rounding and doesn’t reflect anything intrinsic about the coffee cup. So then, even if you see the “complexity” increase and then decrease in your numerical experiment, you can’t trust that the increase reflected anything intrinsic. That’s the problem.

I thought that the whole point of this was that the border between cream and coffee was more interesting than a uniform region of just cream or just coffee. Therefore thresholding complexity at that boundary is intended rather than spurious.

Rationalist #55, #56: Yes, the problem is that you get a messy border after thresholding, and that really is just an artifact of the thresholding procedure—something that tells you nothing whatsoever about the underlying system that you cared about. A simple way to see that, once again, is to consider our “control case,” the non-interacting coffee automaton. Informally, it’s obvious that the non-interacting version never forms interesting structure, at any point in its evolution: as we prove in the appendix, we have complete mathematical control over what’s going on at every place and every time. Yet naïve thresholding would generate a messy border and hence a large complextropy value even in the non-interacting case. Crucially, our fix (which is nothing more, really, than a further smoothing step) gets rid of the messy border in the non-interacting case, while still preserving the complextropy bump in the interacting case.

I’m actually glad you asked about this, since it reassures me that the need to eliminate “border pixel artifacts” is not quite as obvious/trivial as I might’ve feared! 🙂

“My impression was that there are countless results in the literature about ergodicity, equilibrium distributions, and mixing times, but surprisingly or not, almost nothing about how to characterize what a Markov chain is doing in the era before it mixes, in a way that would allow one to answer questions like ours.”

I am not entirely sure what you mean, if by Markov chain you mean the system is following a Markov process, then characterization of the chain will be such that each event is independent of the prior events. If we accept that statement, then the prediction of what will happen next only depends on current conditions. I would think that the proof of whether the measure K(C(I)) is O(n) or o(log n) depends on whether one can show the system is markov and that K(C(I)) is always bounded by O(n) or o(log n) for any initial condition. http://en.wikipedia.org/wiki/Markov_process#Examples

Perhaps the most important averaging is the “coarse graining” involved in obtaining macroscopic variables. Two large numbers are involved in a typical measurement: the total number of degrees of freedom of the system and the number of degrees of freedom that are averaged together to obtain a macroscopic variable. The second number appears naturally in a system containing a large number of indistinguishable constituents. For instance, in determining the local density in a gas, one does not care about the trajectory of any single particle but rather about the average number of trajectories crossing a macroscopic volume at any time. Use of the laws of large numbers (see “A Tutorial on Probability, Measure, and the Laws of Large Numbers”) in this context guarantees that, in spite of the fact that the underlying dynamics may be time-reversal invariant, macroscopic variables (almost) always tend to relax to their equilibrium values. In other words, because of the large numbers involved in specifying macroscopic variables, the microscopically specified state of the system has overwhelming probability to evolve towards the equilibrium state, even if the microscopic dynamics is time-reversal invariant. Hence, an arrow of time exists at the macroscopic level even if it does not at the microscopic level. This frequently stated paradox of statistical mechanics is a straightforward consequence of the laws of large numbers.

There is some other useful analysis and discussion in the paper.

I emphasize this because I want to address this issue of scale dependence. Our world is not scale invariant at macroscopic levels. While many QFTs are scale invariant, as we transition from quantum to classical, scale invariance sort of evaporates. This brings up the other element of the discussion of the conjecture, it can only hold for course grained macroscopic systems. So I think your proof would have to follow along these lines:

For course grained systems, show that for markov processes that K(C(I)) is bounded by O(n) and o(log n) for all initial conditions.

I am thinking that there is the possibility to evaluate the use a Kolmogorov complexity operator to each function that make a calculus, then it is possible to apply iteratively each Kolmogorov operator to Kolmogorov operator, and each n power of the operator have an integer value, and this is fun.
I am thinking that in Physics the program is a differential equation, so that there is an integer value associate to each differential equation, and each differential equation is a surface in the derivative space, so that the Kolmogorov complexity is a integer value in a topology, and it could be an index in topology (the grid dimension can be arbitrary, if there is a theorem that demonstrate a finite calculus time).

The cream/coffee picture to first approximation shows the Navier-Stokes equations, which are deterministic but chaotic (have sensitive dependence on initial conditions, abbreviated SDIC). The mathematical NS equations push energy to arbitrarily fine scales, but for physical liquids this stops at the molecular scale. The mathematical equations are deterministic, but physical liquids experience random thermal motion of the molecules, which are then magnified to the visible scale by the SDIC of the NS equations, increasing the visible entropy. Eventually (long before molecular scale) stuff is too noised over to distinguish within the camera’s pixel resolution, so the visible entropy decreases.

Here is another example of a system where complexity first goes up, and then down, although depending not on time, but on temperature. Imagine a computer (0) simulation of a cellular automaton (1) which is in a configuration of universal computer (1), programmed to simulate CA (2) with the same rules as CA (1) and having initial configuration, corresponding to computer (1) + some additional memory.

At zero temperature CA (1) behavior is strictly deterministic, and it has a lower bound on complextropy, corresponding to entropy of computer (0). As temperature of CA (1) goes up, complextropy of computer (1) increases because a) entropy increases and b) to achieve exponentially bounded errors at (1) or (2) level some error correcting code must be introduces, which complicates structure of computer (1). At high temperatures no polynomial amount of error correcting is sufficient to make computations (proof?), CA (1) becomes random, and complextropy goes down.

Whenever, a cream cell is surrounded by coffee cells, the group of cells is allowed to persist unchanged for some defined time period.

Let’s imagine a matrix of cells 10 x 10.

There is a cream cell in (6,2) and coffee cells in (5,2), (6,1), (6,3) and (7,2). This cluster of 5 cells is allowed to persist unchanged for some time period. In other words, any random selection of an operation relating to these cells is simply skipped until the time period expires. Or perhaps any operation that leaves the structure unchanged might be allowed.. For example, swapping one of the coffee cells with another coffee cell.

This is just an example of a rule not necessarily one I would recommend.

This is based on what to me is an intuitive sense that complexity needs some degree of persistence of structure to create greater complexity.

On an unrelated topic, David Layzer had the idea that order could arise in an expanding universe because potential entropy was greater than actual entropy. Can your coffee cup expand?

Of course there’s been a huge amount of work on complexity in dynamical systems in the 70s and 80s. Regarding the automaton, there’s the classification scheme by Wolfram-so it would be interesting to understand just how the automaton defined here enters the scheme. The new feature of cellular automata with respect to the dynamical systems of the 70s and 80s is that the latter are ultra-local in space and evolve in time only. The automata evolve in space and time, so they define lattice field theories. Some simple rules were studied in this way by ‘t Hooft in the 90s.

Incidentally, a measure that is used to label the behavior of cellular automata is “damage spreading”: One takes two initial configurations, that differ in a fraction of their site variables (so one computes what’s technically known as the “Hamming distance”) and monitors how this fraction evolves as a function of time: In particular, if it remains finite at “long times”, this is a good indicator that the space of configurations is “complex” (once one has averaged over the initial configurations, of course, to eliminate global symmetries). While there has been a lot of numerical work here and in the characterization of the spatial “fronts”, a deeper understanding of the critical behavior is, still elusive.

For, indeed, the idea itself, that if one starts with a “few” degrees of freedom, the system’s complexity is expected to be “low”; for “many” degrees of freedom it’s expected to be high and for “very many” it’s expected to “decrease” as an averaging process becomes appropriate, has been a major way of characterizing chaotic behavior in time evolution, since deterministic chaos, precisely, offers a sort of counterexample to this picture, since it shows that “few” degrees of freedom *can* display “complex” behavior.

So the open issue is, indeed, how to characterize complex behavior of spatially extended systems.

I’m still struggling to understand what this post is about.
Is it something along the lines:
Every game of chess starts with a well defined configuration. Then, as the game progresses, there’s a point where the board looks pretty “messy”, meaning that at this point, the number of possible valid moves is quite high on average (not sure it’s necessarily true), or because the pieces are all over the place. Then, as the game ends, usually there are less pieces on the board (and typically the number of end states can be listed explicitly).
But are we trying to compare the relative complextropy evolution of various board games inherent in their rule set? Like Checkers seems to have way less maximum complextropy than Chess as a system, and Go seems to have way more complextropy than Chess?

My examples above with board games implicitly assumes that each game rule set also includes some additional rules/heuristics for deciding what move to make (either random or based on some objective function, etc), so that they’re finite automaton, like a game of life board, or the milk/coffee mix automaton.
But my question is whether we’re trying to find a good “complextropy” metrics to classify all those systems, and if we’re also trying to predict the maximum compextropy of a board game given its rules (do chess games played by masters lead to more complextropy on average than games played by total noobs?)

Before you get too excited about the Ising model, you should know that it is such a good example that it has been studied quite extensively by the Santa Fe school since the 80s! The main object of study there has been the complexity-entropy curve of these systems. A very nice short review is here. A longer discussion can be found here.

The basic takeaway is that, for things like the statistical complexity or excess entropy, this behavior is very common, but seems to not be universal in the sense that different systems exhibit fundamentally different complexity entropy curves in contrast to the hopes of the early 80s. Of course, who knows what it looks like for these new ideas like resource bound sophistications!

Abstract
We define predictive information I(pred)(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T:I(pred)(T) can remain finite, grow logarithmically, or grow as a fractional power law. If the time series allows us to learn a model with a finite number of parameters, then I(pred)(T) grows logarithmically with a coefficient that counts the dimensionality of the model space. In contrast, power-law growth is associated, for example, with the learning of infinite parameter (or nonparametric) models such as continuous functions with smoothness constraints. There are connections between the predictive information and measures of complexity that have been defined both in learning theory and the analysis of physical systems through statistical mechanics and dynamical systems theory. Furthermore, in the same way that entropy provides the unique measure of available information consistent with some simple and plausible conditions, we argue that the divergent part of I(pred)(T) provides the unique measure for the complexity of dynamics underlying a time series. Finally, we discuss how these ideas may be useful in problems in physics, statistics, and biology.

Relating June 3 update and my comment #32 – already after the comment I myself found some subtlety with Q2R model (June 2 update here).
However, Q2R is approximation of Ising model that is in turn related with some SLE models of W.Werner et. al. So, quite possibly that nonmonotonic behavior of file sizes is not occasional. I suppose, there is no border pixel problem in the Q2R2 model – some analogue of averaging here is due to time step.
Yet the averaging may disappears after separation of two lattices and if compression program may not compress 2D chessboard-like structures such effects could be simply missed. So the subtlety I mentioned may be even helped.

I wonder, would it make sense to define something like GINI coefficient for apparent complexity over smoothing parameter? That is, consider a family of smoothing functions such as gaussians and integrate apparent complexity (which should be a decreasing function of sigma) over sigma? That may also lead to distinction between coarse-grained and fine-grained contribution to entropy because fine-grained jitter would contribute to GINI only for sigmas close to 0, but coarse-grained would contribute to the integral over larger span of sigma values.

I’m a grad student at the University of Wisconsin and have seen him talk about this in person. The interesting thing about it is that it seems to empirically do pretty well in “paradigm cases” of consciousness, despite the shakiness of the theoretical foundation (as you have pointed out).

This is an interesting idea, the only problem is that the dynamic change with the input, so that there is a variation of the consciousness with the stimulus, or the thought neural path, so that a human can have different level of consciousness according to the type of thought (it seem a strange result).
A program that simulate the human thought (a complete brain computer simulation), can be compressed to the minimal information (Kolmogorov complexity) that could be a measure of consciousness (complex thought like complex program).
I think that this is not sufficient to measure consciousness, because so each complex program is conscious (and this is not true), so that it can be a conscious program (biological or other measurable conscious property) with a great complexity (the measure of consciousness complexity).

Domenico, that’s a super interesting idea about the Kolmogorov complexity, and it also (I think) relates to the consciousness discussion in the followup message. I’m not sure whether I should post this in this thread or that one!
The idea of Kolmogorov complexity of (a system that generates) consciousness as a measure of consciousness, has a certain intuitive appeal. But then you run into the problem of Chaitin’s incompleteness theorem, which says that it is impossible to prove that a particular string (i.e. system, in this case) is “very” complex. So that leaves you wondering, does this mean that:
a) the Kolmogorov-consciousness measure is just not very useful because of a failure of computability? After all, the LZW compression method has no problem being computed, so you can get *some* idea of complexity; or
b) the intuition is accurate, and the problem is actually that there *might* be *some* description language that concisely describes consciousness systems?
I have no idea what (b) would actually look like in any mathematically precise sense, but to me it’s somewhat intriguing just because the intuitions about consciousness that are the starting point (see Scott’s followup article) are, themselves, so simple.

There is a possible simple idea, to try many possible description of a simple conscious brain, with different program languages with an acceptable error in the dynamic, compress the program description with a usual compression program (for example Lempel-Ziv-Welch), and there is a upper limit for the measurement of the consciousness.
The biological brain structure is universal, so that a single neuron can be modeled ever better, and the program can use a ever better dynamical description.

There is only a problem with a complexity measure: a pure random program, in a appropriate programming language (that work with each program string), have a maximum complexity; so what is the difference between a random program and a brain simulation with the same complexity? It could be that each random program can be transformed in a working program with the right language because of each compression program transform a program in a random program.

When a program (simulation of a brain) is conscious? If a program don’t run (there is no mathematical operation) there is not consciousness, when the program run (at maximum speed) with external (or internal) stimulus is conscious, and there is a measure of intermediate states of the computing power for each program (like a benchmark for different levels of consciousness): it can be that the anesthesia is only a slowdown of the synaptic signals.

[…] decided to write here about Q2R2 and to make a few experiments after reading about coffee automaton. It was also interesting to try using YouTube for representation of CA simulation. It does not look […]