Category: Vensim

That was a conceptual model; this is a mathematical model. This is a Vensim replication of:

Marisa Eisenberg, Mary Samuels, and Joseph J. DiStefano III

Extensions, Validation, and Clinical Applications of a Feedback Control System Simulator of the Hypothalamo-Pituitary-Thyroid Axis

Background:We upgraded our recent feedback control system (FBCS) simulation model of human thyroid hormone (TH) regulation to include explicit representation of hypothalamic and pituitary dynamics, and up-dated TH distribution and elimination (D&E) parameters. This new model greatly expands the range of clinical and basic science scenarios explorable by computer simulation.

Methods: We quantified the model from pharmacokinetic (PK) and physiological human data and validated it comparatively against several independent clinical data sets. We then explored three contemporary clinical issues with the new model: …

I ran across the Nelson Rules in a machine learning package. These are a set of heuristics for detecting changes in statistical process control. Their inclusion felt a bit like navigating a 787 with a mechanical flight computer (which is a very cool device, by the way).

The idea is pretty simple. You have a time series of measurements, normalized to Z-scores, and therefore varying (most of the time) by plus or minus 3 standard deviations. The Nelson Rules provide a way to detect anomalies: drift, oscillation, high or low variance, etc. Rule 1, for example, is just a threshold for outlier detection: it fires whenever a measurement is more than 3 SD from the mean.

In the machine learning context, it seems strange to me to use these heuristics when more powerful tests are available. This is not unlike the problem of deciding whether a random number generator is really random. It’s fairly easy to determine whether it’s producing a uniform distribution of values, but what about cycles or other long-term patterns? I spent a lot of time working on this when we replaced the RNG in Vensim. Many standard tests are available. They’re not all directly applicable, but the thinking is.

In any case, I got curious how the Nelson rules performed in the real world, so I developed a test model.

This feeds a test input (Normally distributed random values, with an optional signal superimposed) into a set of accounting variables that track metrics and compare with the rule thresholds. Some of these are complex.

Rule 4, for example, looks for 14 points with alternating differences. That’s a little tricky to track in Vensim, where we’re normally more interested in continuous time. I tackle that with the following structure:

There’s a trick here. To count alternating differences, we need to know (a) the previous count, and (b) whether the previous difference encountered was positive or negative. Above, N Switched stores both pieces of information in a single stock (INTEG). That’s possible because the count is discrete and positive, so we can overload the storage by giving it the sign of the previous difference encountered.

Thus, if the current difference is negative (Is Positive < 0) and the previous difference was positive (N Switched > 0), we (a) invert the sign of the count by subtracting 2*N Switched, and (b) augment the count, here by subtracting 1 to make it more negative.

Similar tricks are used elsewhere in the structure.

How does it perform? Surprisingly well. Here’s what happens when the measurement distribution shifts by one standard deviation halfway through the simulation:

There are a few false positives in the first 1000 days, but after the shift, there are many more detections from multiple rules.

The rules are pretty good at detecting a variety of pathologies: increases or decreases in variance, shifts in the mean, trends, and oscillations. The rules also have different false positive rates, which might be OK, as long as they catch nonoverlapping problems, and don’t have big differences in sensitivity as well. (The original article may have more to say about this – I haven’t checked.)

However, I’m pretty sure that I could develop some pathological inputs that would sneak past these rules. By contrast, I’m pretty sure I’d have a hard time sneaking anything past the NIST or Diehard RNG test suites.

If I were designing this from scratch, I’d use machine learning tools more directly – there are lots of tests for distributions, changes, trend breaks, oscillation, etc. that can be used online with a consistent likelihood interpretation and optimal false positive/negative tradeoffs.

Archetypes are generic causal loop diagram (CLD) templates, with a particular behavior story. The Escalation and Eroding Goals archetypes have identical feedback loop structures, but very different stories. So, there’s no unique mapping from feedback loops to behavior. In order to predict what a set of loops is going to do, you need more information.

Here’s an implementation of Eroding Goals:

Notice several things:

I had to specify where the stocks and flows are.

“Actions to Improve Goals” and “Pressure to Adjust Conditions” aren’t well defined (I made them proportional to “Gap”).

Gap is not a very good variable name.

The real world may have structure that’s not mentioned in the archetype (indicated in red).

Here’s Escalation:

The loop structure is mathematically identical; only the parameterization is different. Again, the missing information turns out to be crucial. For example, if A and B start with the same results, there is no escalation – A and B results remain constant. To get escalation, you either need (1) A and B to start in different states, or (2) some kind of drift or self-excitation in decision making (green arrow above).

Even then, you may get different results. (2) gives exponential growth, which is the standard story for escalation. (1) gives escalation that saturates:

The Escalation archetype would be better if it distinguished explicit goals for A and B results. Then you could mathematically express the key feature of (2) that gives rise to arms races:

A’s goal is x% more bombs than B

B’s goal is y% more bombs than A

Both of these models are instances of a generic second-order linear model that encompasses all possible things a linear model can do:

Notice that the first-order and second-order loops are disentangled here, which makes it easy to see the “inner” first order loops (which often contribute damping) and the “outer” second order loop, which can give rise to oscillation (as above) or the growth in the escalation archetype. That loop is difficult to discern when it’s presented as a figure-8.

Of course, one could map these archetypes to other figure-8 structures, like:

How could you tell the difference? You probably can’t, unless you consider what the stocks and flows are in an operational implementation of the archetype.

The bottom line is that the causal loop diagram of an archetype or anything else doesn’t tell you enough to simulate the behavior of the system. You have to specify additional assumptions. If the system is nonlinear or stochastic, there might be more assumptions than I’ve shown above, and they might be important in new ways. The process of surfacing and testing those assumptions by building a stock-flow model is very revealing.

The model is a mixed discrete/continuous simulation of an individual sleeping, working and drinking. This started out as a multi-agent model, but I realized along the way that sleeping, working and drinking is a fairly ergodic process on long time scales (at least with respect to UFOs), so one individual with a distribution of behaviors over time or simulations is as good as a population of agents.

The model replicates the data somewhat faithfully:

The model shows a morning peak (people awake but out and about) and a workday dip (inside, lurking near the water cooler) but the data do not. This suggests to me that:

Alcohol is the dominant factor in sightings.

I don’t party nearly enough to see a UFO.

Actually, now that I’ve built this version, I think the interesting model would have a longer time horizon, to address the non-ergodic part: contagion of sightings across individuals.

Interestingly, this work dates from a time in which the very idea of a mathematical model was still questioned:

Contrary to the impression commonly held, mathematical methods properly employed, far from making economic theory more abstract, actually serve as a powerful liberating device enabling the entertainment and analysis of ever more realistic and complicated hypotheses.

Samuelson should be hailed as one of the early explorers of a very big jungle.

The limitations inherent in so simplified a picture as that presented here should not be overlooked. In particular, it assumes that the marginal propensity to consume and the relation are constants; actually these will change with the level of income, so that this representation is strictly a marginal analysis to be applied to the study of small oscillations. Nevertheless it is more general than the usual analysis.

Samuelson hand-simulated the model (it’s fun – once – but he runs four scenarios): Samuelson then solves the discrete time system, to identify four regions with different behavior: goal seeking (exponential decay to a steady state), damped oscillations, unstable (explosive) oscillations, and unstable exponential growth or decline. He nicely maps the parameter space:

So where’s the problem?

The first is not so much of Samuelson’s making as it is a limitation of the pre-computer era. The essential simplification of the model for analytic solution is;

This is fine, but it’s incredibly abstract. Presented with this equation out of context – as readers often are – it’s almost impossible to posit a sensible description of how the economy works that would enable one to critique the model. This kind of notation remains common in econometrics, to the detriment of understanding and progress.

Non-conservation of stuff leads to problem #2. When you do implement inventories and capital stocks, the period of multiplier-accelerator oscillations moves to about 2 decades – far from the 3-7 year period of the business cycle that Samuelson originally sought to explain. This occurs in part because the capital stock, with a 15-year lifetime, introduces considerable momentum. You simply can’t discover this problem in the original multiplier-accelerator framework, because too many physical and behavioral time constants are buried in the assumptions associated with its 2 parameters.

Low goes on to introduce labor, finding that variations in capacity utilization do produce oscillations of the required time scale.

I think there’s a third problem with the approach as well: discrete time. Discrete time notation is convenient for matching a model to data sampled at regular intervals. But the economy is not even remotely close to operating in discrete annual steps. Moreover a one-year step is dangerously close to the 3-year period of the business cycle phenomenon of interest. This means that it is a distinct possibility that some of the oscillatory tendency is an artifact of discrete time sampling. While improper oscillations can be detected analytically, with discrete time notation it’s not easy to apply the simple heuristic of halving the time step to test stability, because it merely compresses the time axis or causes problems with implicit time constants, depending on how the model is implemented. Halving the time step and switching to RK4 integration illustrates these issues:

It seems like a no-brainer, that economic dynamic models should start with operational descriptions, continuous time, and engineering state variable or stock flow notation. Abstraction and discrete time should emerge as simplifications, as needed for analysis or calibration. The fact that this has not become standard operating procedure suggests that the invisible hand is sometimes rather slow as it gropes for understanding.

The ban was pushed by light bulb makers eager to up-sell customers on longer-lasting and much more expensive halogen, compact fluourescent, and LED lighting.

More expensive? Only in a universe where energy and labor costs don’t count (Texas?) and for a few applications (very low usage, or chicken warming).

Over the last couple years I’ve replaced almost all lighting in my house with LEDs. The light is better, the emissions are lower, and I have yet to see a failure (unlike cheap CFLs).

I built a little bulb calculator in Vensim, which shows huge advantages for LEDs in most situations, even with conservative assumptions (low social price of carbon, minimum wage) it’s hard to make incandescents look good. It’s also a nice example of using Vensim for spreadsheet replacement, on a problem that’s not very dynamic but has natural array structure.

Vensim‘s answer to exploring ill-behaved problem spaces is either to do hill-climbing with random restarts, or MCMC and simulated annealing. Either way, you need to start with some initial distribution of points to search.

It’s helpful if that distribution is somehow efficient at exploring the interesting parts of the space. I think this is closely related to the problem of selecting uninformative priors in Bayesian statistics. There’s lots of research about appropriate uninformative priors for various kinds of parameters. For example,

If a parameter represents a probability, one might choose the Jeffreys or Haldane prior.

Indifference to units, scale and inversion might suggest the use of a log uniform prior, where nothing else is known about a positive parameter

However, when a user specifies a parameter in Vensim, we don’t even know what it represents. So what’s the appropriate prior for a parameter that might be positive or negative, a probability, a time constant, a scale factor, an initial condition for a physical stock, etc.?

On the other hand, we aren’t quite as ignorant as the pure maximum entropy derivation usually assumes. For example,

All numbers have to lie between the largest and smallest float or double, i.e. +/- 3e38 or 2e308.

More practically, no one scales their models such that a parameter like 6.5e173 would ever be required. There’s a reason that metric prefixes range from yotta to yocto (10^24 to 10^-24). The only constant I can think of that approaches that range is Avogadro’s number (though there are probably others), and that’s not normally a changeable parameter.

For lots of things, one can impose more constraints, given a little more information,

A time constant or delay must lie on [TIME STEP,infinity], and the “infinity” of interest is practically limited by the simulation duration.

A fractional rate of change similarly must lie on [-1/TIME STEP,1/TIME STEP] for stability

Other parameters probably have limits for stability, though it may be hard to discover them except by experiment.

A parameter with units of year is probably modern, [1900-2100], unless you’re doing Mayan archaeology or paleoclimate.

At some point, the assumptions become too heroic, and we need to rely on users for some help. But it would still be really interesting to see the distribution of all parameters in real models. (See next …)