Entropy Maximization and intelligent behaviour

Jul 6, 2017
• Aidan Rocke

Introduction:

Sergio Hernandez, a Spanish mathematician, recently shared some very interesting results on the OpenAI gym environment which are based on a relatively unknown paper
published by Dr. Wissner-Gross, a physicist trained at MIT. What is impressive about Wissner’s meta-heuristic is that it is succinctly described by three equations which try to maximize the future freedom of your agent. In this analysis, I summarize the method, present its strengths and weaknesses, and attempt to improve it by making an important modification to one of the equations.

Causal entropic forces:

In the following summary of Wissner’s meta-heuristic, it’s assumed that the agent has access to an approximate or exact simulator. A close reading of
the original paper [1] will show that this assumption is actually necessary.

Macrostates:

For any open thermodynamic system, we treat the phase-space paths taken by the system over the time interval as microstates
and partition them into macrostates using the equivalence relation[1]:

As a result, we can identify each macrostate with a unique present system state . This defines a notion of causality over a time interval.

Causal path entropy:

We can define the causal path entropy of a macrostate with the associated present system state as the path integral:

where we have:

In (3) we basically integrate over all possible paths taken by the open system’s environment. In practice, this integral is intractable
and we must resort to approximations and the use of a sampling algorithm like Hamiltonian Monte Carlo [3].

Causal entropic force:

A path-based causal entropic force may be expressed as:

where and are two free parameters. This force basically brings us closer to macrostates that
maximize . In essence the combination of equations (2), (3) and (4) maximize the number of future options
of our agent. This isn’t very different from what most people try to do in life but this meta-heuristic does have very important
limitations.

Limitations of the Causal Entropic approach:

The Causal Entropic paper makes the implicit assumption that we have access to a reliable simulator of future states. In the
case of the OpenAI environments this isn’t a problem because environment simulators are provided but in general it’s a hard problem. Two useful approaches to this problem
are suggested by [4] and [5] using recurrent neural networks.

Maximizing your number of future options is not always a good idea. Sometimes fewer options are better provided that these are
more useful options. This is why for example, football players don’t always rush to the center of a football pitch, although from
that position they would maximize their number of future states i.e. possible positions on the pitch.

In the next section I would like to show that it’s possible to find a practical solution to the second limitation by modifying
(3).

Causal Path Utility:

Assuming that a recurrent neural network is used to define potential macrostates , it’s reasonable to assume
that our agent’s understanding of the future evolves with time and therefore macrostates are a function of time. So we have
rather than . In other words, our simulator which might be an RNN, will probably change its parameters and
even its topology over time.

In order to resolve the second limitation and encourage the agent to make confident decisions,
I propose that we replace with where:

This not only has the added value of simplifying calculations but also allows us to disentangle the relative contributions of utility and uncertainty.
It must also be noted that the two expressions in (5) can be calculated in parallel although the uncertainty calculation is more computationally
expensive.

Discussion:

If we assume that the agent’s perception of the future doesn’t change much, it might perceive some future states to be ideal. This is
consistent with the empirical observation that many people believe certain accomplishments would bring them ‘genuine happiness’. In other
words, if the state space is compact and approximately time-invariant the agent’s optimal future macrostate converges to a fixed point [6].

While the notion of Causal Path Utility just occurred to me today, I believe that this is a very promising approach which I shall follow-up with concrete implementations very soon.