There are many problems for which it is useful to keep a history. A few examples of such problems:

keep history of control-flow in an application for debugging purposes

keep history of updates to a document for undo purposes

keep history of a filesystem for backup and recovery purposes

keep history of a command and control system, sensor network, or cooperative work system for situational awareness and operator handoff

keep history of a database for queries and auditing

A complete history is too expensive: a too high space cost to record the history, a too high CPU cost to process the history. Fortunately, most problems that benefit from keeping history don’t need a complete history.

A common, naive approach to history is to keep a simple linear ring-buffer. When the buffer fills, we simply overwrite the element from some time ago. An advantage of this technique is constant space overhead. Unfortunately, a linear history too easily loses important contextual information about the deep past due to minute updates in the recent past.

The further into the past we look, the less information we need, but we don’t want to have any particular cutoff where we lose the history. What we need is a decay model with predictable computational properties. What we need is exponential decay or something similar – i.e. information with a half-life.

True exponential decay would, like the ring buffer, consume constant space. However, there are practical limits on how often we can cut information in half. We cannot readily cut one bit in half, for example, and we need to keep some metadata to make sense of any information we specify.

Simple Series of Ring Buffers

We can get something like exponential decay from a sequence of ring buffers. The naive technique is simple: every other frame from the first ring buffer goes to the second. Every other frame from the second buffer goes to the third. Etc. Each ring becomes a generational “half-life” for information. If 30 rings can run 1 year, then 40 rings can run 1000 years. This model achieves O(ln(T)) space, where T is the duration for which history is kept. Of course, it’s trivial to regain hard real-time guarantees by limiting history to something a couple orders of magnitude longer than the expected lifetime.

The naive model can be useful for many situational awareness tasks (debugging, unmanned systems operations, black boxes), but isn’t all that useful as a history: the “interesting” events are, by nature, the rare ones – and have a high probability of being forgotten. Also, it’s awful for certain kinds of data, e.g. sound streams cannot handle being disrupted in this manner.

One can enhance this history by using a non-blind transfer function, examine multiple frames and preserving or summarizing information into the next ring. One still loses half of information per ring, but we can distill the most interesting and useful information. It’s easiest to operate such a function at the opposite side of the ring to which data is being added. Doing so supports access to both the future and past for a given frame, and better opportunities for temporal compression (e.g. recognizing what will be copied into past or future frames) and accumulation (integrals, counters, etc.).

With 16 frames per ring and 20 ring buffers (absolute size 16*20 = 3200 frames) we could have useful, predictable context for well over a million observations. Or, to put that in perspective, a summary for a million frames of video (~9hrs at 30fps) would just barely start using the 20th buffer. At 30 rings we’d cover a year. At 40, a thousand. If our summary function identifies interesting frames or phenomenon, then our history can potentially grow more useful the further back we look.

This is a very simple, very powerful, and predictable abstraction that should work in a large variety of problem domains. In addition to the examples presented above, machine learning is another obvious application.

Summarizing with a Bigger Picture

The simple series of ring-buffers doesn’t make much information about the deep past readily accessible in the present. It might often be useful to pattern-match against much deeper histories when computing or comprehending current conditions, i.e. to lift information out of previous summaries. A simple variation is make the next level of ring buffers available to the summarize function. This can provide a natural feedback: the summary at level L has access to the buffer at level L and the buffer at level L+1. (Patterns detected at level L+2 only feedback to level L if reported by the level L+1 summary function.)

Avoiding Temporal Aliasing

Here, I describe a technique to reduce temporal aliasing without hindering real-time properties. The naive decay model will blindly select every other frame, which may cause problems if the frequency of incoming frames happens to be an even multiple of the frequency of some phenomenon. We can instead use a random selector – roll some virtual dice. On each frame committed, decide the next frame to commit – i.e. (K+R) mod N, where K is the current frame, R is the random number, N is the number of frames in the ring. If the random selector is weighted so that 2 frames is the average – e.g. R by a fair 3-sided die – then the system still propagates 50% of information to the next ring.

Usefully, one could tweak percentages (use weighted dice) to push more or less than 50% of information to the next ring. If more than 50% of frames are pushed to the next ring, then more rings will be used for an equal history (half-life for information is more than one ring). This may be a useful technique if there is some rich per-ring processing and thus some benefit to using more rings (compared to more frames).

History as Foundation for State

Most programming paradigms offer some models for state, e.g. mutable variables. History is stateful by nature. When developers need to represent history, they turn to these mutable variables and make it happen – albeit in an ad-hoc, problem specific, error-prone manner. But what happens if we turn this concept on its head? Assume history is fundamental. Build state atop history.

The decisions we make NOW are based on the observations we made in the PAST, which are part of our history. The observations we make in the FUTURE are based on the decisions we make NOW. There is a natural feedback cycle that makes history very usable as state… at least so long as we don’t directly include queries on history among our our observations (which may cause a computational explosion, divergence).

This is not a new idea. History for state is the basis for Elephant and Datomic, for example. Less directly, it is applied for temporal logic programming, such as beautifully modeled in Dedalus.

History as the basis for state has many nice properties. Decaying history is generic, robust, resilient, simple, predictable, incremental, real-time. In the face of race conditions, history allows us to make intelligent decisions about how things should have been ordered upon computing summaries. Summary computations are embarrassingly easy to parallelize. History provides opportunity for ad-hoc applications that were not envisioned early in development, perhaps even opportunity for applications that could not be envisioned or would not be useful without the established history. Should we choose to upgrade an application, e.g. to add features, I believe history will make for a more generic and easier upgrade target than would be ad-hoc problem-specific state.

History also encourages developers towards a more declarative programming style, e.g. developing rules based on recent observations.

The cost?

Predictable, near-constant amortized overheads to store the history and compute it. Albeit, that is a VERY HIGH near-constant factor. But it is predictable, easy to control. And I believe (but have not proven) that overhead might be mitigated by needing less state to do the same jobs. Eventful programming – i.e. programming in terms of sequential “updates” to an implicit external state – tends to require a lot of non-essential state. The more declarative approach to programming may allow us to forego state where we otherwise may have needed it. I doubt the state reduction will be equal to the imposed overhead. But, in combination with the simplicity and robustness advantages, decaying history may prove, in many scenarios, a superior model even compared to traditional state.

(Note: Clever orthogonal use of diffs, checkpoints, and compression could potentially save a 10x-100x factor in storage for deep histories, but those benefits cannot be guaranteed.)

5 Responses to Exponential Decay of History

I’m coming late to this post, via your reference at lambda the ultimate.
I have to say that I really like your ideas on ‘Exponential Decay of History’, It perfectly suits the data model I want to employ: to never destroy data.
I also believe that historical data can help us make better decisions in the future. And sure, the cost of retaining all that history is high, but I believe the benifits are much much higher.