Comments on active information

By E. Tellgren

Posted October 08, 2007

William Dembski has been one of the most influential contributors to the Intelligent Design (ID) movement. Among other things, his work has
added the terms specified information, specified complexity, and complex specified information to the basic vocabulary of the ID movement. These terms are all directly related to the logarithms of special types of probabilities, e.g. the probability of a pattern of
interest given that it was produced in some way that excludes the foresight and guidance of an intelligent agent.

In a recent draft manuscript, Dembski and his coauthor Marks extend the vocabulary with three new terms [1]: endogenous information, exogenous information, and active information. They consider as given a search space and a fixed subset, called a target, that makes up some fraction ps of the search space. An issue of interest to them is how to measure how well a search algorithm [2] exploits the structure of the search problem. Two possible candidates are the probability p that a search algorithm is successful and the ratio p/ps. Readers of Dembski's previous writings will not be surprised to discover that Marks and Dembski prefer to log-transform their probabilities and rename them 'information'. In equations, their definitions are
endogenous information = -log(ps),
exogenous information = -log2(p),
active information = -log2(ps/p).

(More generally, it is indicated that ps can be replaced by the probability that some reference algorithm finds the target. Since the draft manuscript focuses on the case when ps is the fraction of points in the target, I will do the same.) It sounds a little less
strange to say that active information has to come from a programmer than it does to say the same of a probability value larger than ps. We are more prone to reify something called 'information' and expect it to come from somewhere in particular than to doing the same for a probability or a probability ratio. For a given search problem, the three types of information typically have a rather mundane origin. They typically arise from the formulation of the search problem. If the formulation of the search problem is informative enough to enable us to estimate the size of the target, then it usually also implies something about which regions of the search space that are likely to be in the target and which search algorithms are most efficient. Of course, Marks and Dembski are not only concerned with the mere mathematical existence of successful search algorithms, they also want to imply something about how such search algorithms are implemented or realized. Naturally, for those search algorithms that are implemented in man-made computers, human programmers tend to play a central role. Even in this case, however, a large ratio p/ps cannot be entirely attributed to the intrinsic creativity and cleverness of a human programmer. In practice, programmers learn from each other and from experience, and many algorithms are directly inspired by observations of nature. Examples include simulated annealing, genetic algorithms, swarm optimization, etc. To the extent that search problems, targets and large ratios p/ps also arise in nature (e.g. minimization of free energy in thermodynamic systems or maximization of reproductive success in evolving biological populations), a different answer seems to be required. For one thing it is implausible to think that a human programmer had the means and opportunity to be involved. Additionally, any process that is capable of reaching a long-lived equilibrium or stationary state can be viewed as optimizing something (e.g. minimizing a function that describes the rate of change squared or the distance to the equilbrium state).

Because of the use of information terminology and the appeal to the Maximum Entropy (MaxEnt) principle, it is interesting to compare Marks
and Dembski's work to older work within the MaxEnt tradition. It has been shown that the MaxEnt principle can be used as a framework to derive the optimal way to distribute search efforts over different cells of a search space [3]. Although the motivation and assumptions
behind that work are different, it is similar in that it led to definitions of quantities measuring the amount of prior knowledge about the target and the amount of knowledge utilized during the search. These measures are, in different ways, both more general and more restricted than Marks and Dembski's measures. They are much more general because there is no restriction to uniform probability and complete lack of knowledge about the target location. They are more restricted in that they presuppose that the target consists of a single point. In the special case of uniform probability and a single-point target, they differ only by trivial additive constants from the endogenous and exogenous information. However, the older measures are directly related to optimal distribution of search efforts and therefore have a clearer meaning and significance. In comparison, it is less clear which questions Marks and Dembski's three quantities can answer.

On another note, Marks and Dembski are mistaken when they refer to Wolpert and Macready's original No-Free-Lunch (NFL) theorem as if it
were directly applicable also to the situation they are analyzing. In the current version of their draft, Marks and Dembski stipulate that (a) the target makes up some fixed fraction ps of the search space and (b) no other assumptions about the search problem are to be made. The NFL scenario analyzed by Wolpert and Macready [4] differs in that stipulation (a) is not made. When one makes use of the Principle of Insufficient Reason or the MaxEnt principle to determine a Bayesian prior, it is important to be clear about what, precisely, it is that
is totally unknown. In the NFL scenario it is a cost/objective/fitness function f defined on the search space that is taken to be totally unknown. Performance is evaluated as an average over all mathematically possible functions. If one wishes to define a target it must be defined in terms of the totally unknown function f, e.g. as the region {x | f(x) > c} of the search space with function values larger than some threshold c for what is satisficingly good. For some functions such a target is very small, for other functions it is very large (an example is the function which assigns the maximum value to every search space point). Because all functions contribute, the NFL-averaged performance will include contributions targets of varying sizes, in contrast to the scenario analyzed by Marks and Dembski. Arriving at the target in the NFL scenario is not prohibitively hard, not even for random i.i.d. sampling [5].

In some ways the mistake is not very serious. They don't really need the original NFL results for their own conclusions. Moreover, a generalization of the NFL theorem to the situation when the function is known to belong to a special class of functions (which is closed under permutation of the search space) is applicable to their own scenario [6]. If the function to be sampled is known to assign
satisficing values to a fraction ps of the search space, but is otherwise totally unknown, then both stipulations (a) and (b) are respected and the generalized NFL theorem is applicable.

What makes the difference between the NFL scenario and Marks and Dembski's stipulations significant is the far-reaching conclusions
inspired by the latter. They write [1]:

"To have integrity, all search algorithms, especially computer simulations of evolutionary search, should make explicit (1) a
numerical measure of the difficulty of the problem to be solved, i.e., the endogenous information, and (2) a numerical measure of the amount
of problem-specific information resident in the search algorithm, i.e., the active information."

While there is nothing unreasonable about the general idea of quantifying problem difficulty and problem-specific information, the particular assumptions Marks and Dembski rely on are neither realistic enough nor general enough to become some sort of integrity requirement. The assumptions (a) and (b) correspond to the NFL scenario with an extra restriction of the function to be sampled. Why is the NFL scenario with this extra knowledge a privileged set of assumptions? Why give knowledge of the target size special treatment compared to, say, knowledge about the smoothness of the function? In reply to criticism of Dembski's older work on the basis that real-world fitness functions exhibit clustering of similar values, Marks and Dembski claim that taking such knowledge into account is to commit the crime of smuggling in unwarranted assumptions [7]. Given that, it seems that they are equally guilty in smuggling in an unwarranted assumption about the target size. Without the target size assumption (a) smuggled in, the NFL scenario is so benign that any search algorithm, random i.i.d. sampling included, is reasonably effective [5]. A characteristic of hard search problems is that the difficulty grows fast with the size of the search space. In contrast,
the NFL-averaged performance is (almost completely) independent of the size of the search space. If this is counter-intuitive, it is only because the original NFL theorem is a formal version of the problem of induction, whereas the real-world search problems we build our experience and intution on seem to come from domains that enable inductive learning and generalizations.

Acknowledgement: Thanks to Pete Dunkelberg for commenting on earlier versions.

References and notes:

[2] Note that, following Wolpert and Macready and others, a search algorithm in this context is just a sampling distribution of the type P(next sampling point | data about previously sampled points). This differs a bit from the more common notion of an algorithm as a list of instructions, input to a Turing machine, or an implementation in a programming language.

[3] E. T. Jaynes. Entropy and Search-Theory, 1985. Available online at http://bayes.wustl.edu/etj/articles/search.pdf (Note that Jaynes frequently referred to MaxEnt Bayesianism as 'information theory' and to work within this tradition as 'information-theoretic'. This is different from both Shannon's
information theory and algorithmic information theory.)