Friday, March 28, 2014

America's Next Top Model -- Part III

Recall that in his Nobel laureate speech, "The Pretence of Knowledge," Friedrich August von Hayek (the father, we may suppose, of Freddy September) pointed to organized complexity as a major issue in economics and similar fields:

Organized complexity... means that the character of the structures showing it depends not only on the properties of the individual elements of which they are composed, and the relative frequency with which they occur, but also on the manner in which the individual elements are connected with each other. In the explanation of the working of such structures we can for this reason not replace the information about the individual elements by statistical information, but require full information about each element if from our theory we are to derive specific predictions about individual events. Without such specific information about the individual elements we shall be confined to what on another occasion I have called mere pattern predictions - predictions of some of the general attributes of the structures that will form themselves, but not containing specific statements about the individual elements of which the structures will be made up. [Emph. added]

In classical thought, the part that depends on "the properties of the individual elements of which they are composed" is the material cause, a/k/a "reductionism." (The term 'matter' means simply the elements of which a thing is composed. Bricks are the matter of a wall.)

The part that depends on "the manner in which the individual elements are connected with each other" is the formal cause. In modern parlance this is sometimes called "emergent properties" because the whole system has the property while the individual elements do not.

In other words, it's the form (pattern, organization) that is the key to intelligibility. Given a set of interconnected elements X1, X2,... Xn, we cannot legitimately replace the specific Xs with X-bar as we may in cases of disorganized complexity. At best we would obtain only statistical conclusions about the entire system -- as we do in fact regarding quantum mechanics.

When there are only a few elements in the system, the scientist introduces simplifications: infinite Euclidean space, ideal gasses, perfectly elastic collisions, and the like. Arrhenius' law relating CO2 to temperature assumes the atmosphere extends to infinity. TOF read a joke - he has forgotten where - about using models to predict the SuperBowl, which is a sort of football game sometimes (but not this past year) played by two teams. In the punchline, the physicist says, "consider each player to be a perfectly elastic sphere on an infinite Euclidean field..." Mathematics tends to become ornery when bumping up against boundary values* and it is precisely at the extremes where many models pop their suspenders.(*) boundary values.TOF has his old college text, Fourier Series and Boundary Value Problems, by Ruel Churchill, which he will someday nerve himself to re-read.

Such simplifications may be illuminating precisely because they isolate and simplify certain aspects of the total system. (This is what we mean in Latin by abstractio.) No one ever dropped an actual cannon ball in an actual vacuum, but by thought-experimenting the motion of heavy objects in a theoretical vacuum, late medieval early modern physicists came "close enough" to modeling the motion of heavy objects in the air, a plenum which provides little resistance to cannonballs, water balloons, or ex-boyfriend's suitcases. But when the plenum becomes thicker -- dropping a ball in Jell-O™ -- resistance becomes a non-trivial factor in the model. The key to the success of Early Modern Science was to shift the discourse from the real world to an ideal world.

But no one supposes that dropping leaves, birds, or air balloons from the Tower of Pisa invalidates Benedetti's law of falling bodies* precisely because we realize that these are outside the boundaries of the model.(*) What, did you think Galileo was first to do this?

2. Statistical Uncertainty: Any uncertainty in which a statistical expression of the uncertainty can be formulated. This is what scientists usually mean by "uncertainty" and includes such things as measurement uncertainty, sampling, confidence intervals, et al. It is the stable from which the wild p-value rides the night deceiving the unwary into unwarranted certainty. To rise to this level of uncertainty requires that:

the functional relationships in the model are good descriptions of the phenomena being simulated, and

the data used to calibrate the model are representative of circumstances to which the model will be applied.

3. Scenario Uncertainty. There is a range of possible outcomes, but the mechanisms leading to these outcomes are not well understood and therefore, it's not possible to formulate the probability of any particular outcome. Statistical uncertainty sinks into scenario uncertainty where a continuum of outcomes expressed stochastically changes to a range of discrete possibilities of unknown likelihoods.

These scenarios are "plausible descriptions of how the system and/or its driving forces may develop in the future." Typically, they involve the context or "environment" of the model and include such things as future technology changes, public attitudes, commodity prices, etc. Many forecasts of the economic effects legislation have come to grief because people have changed their economic behavior as a result of incentives built into the law. Scenario assumptions are usually unverifiable -- since they mostly involve the future. Climate models, for example, may use several scenarios for future CO2
emissions ranging from "no further increase" through "continues as
today" to "increases exponentially." How much temperature change to forecast will depend on which of these scenarios eventually unfolds. Unlike physical processes, behavioral processes do not follow mathematical laws.

For a continuous variable, we may apply various distributions as approximations, but this is not so easily done with discrete scenarios which do not relate to one another by simple magnitude. Hence, the uncertainty "Which Scenario Will Unfold" cannot be folded into the statistically uncertainty and the reported statistical uncertainty will be smaller than the actual uncertainty. Instead of a projection of what will happen, the model produces a range of forecasts of what might happen under each of several discrete cases. Scenario uncertainty can be expressed:

as a range in the outcomes of an analysis due to different underlying assumptions (i.e., "If S, then X")

as uncertainty about which changes and developments are relevant for the outcomes of interest, (i.e., have we considered the right scenarios?)

as uncertainty about the levels of these relevant changes and developments

4. Recognized Ignorance: Fundamental uncertainty about the mechanisms and functional relationships being studied. "Known unknowns." When neither functional relationships nor statistical properties are known, the scientific basis even for for scenario development is very weak. For example: do clouds increase temperatures or decrease it? This sort of ignorance may be either:

Reducible ignorance: which can be resolved by further research

Irreducible ignorance: neither research nor development can provide sufficient knowledge about the essential relationships

5. Total Ignorance: We don't even know what we don't know. "Unknown unknowns." Newton, for example, did not include c (speed of light) in his equations of motion.

The Uncertainty of Kind. The third dimension of uncertainty is two-fold:

Epistemic uncertainty: Uncertainty due to imperfect knowledge. This might be reducible by further research.

Ontological uncertainty: Uncertainty due to inherent variability in the system being modeled. This is especially true of systems concerning social, economic, and technological developments.

Epistemic uncertainty is related to such things as limited and inaccurate data, selection error, measurement error, confounding, limited understanding, imperfect models, subjective judgement, ambiguities, etc. One way to come to terns with this is to understand the "pedigree" or "ancestry" of the data. This is a clear understanding of the "production process" that resulted in the data.

"The government are very keen on amassing statistics. They collect them, add them, raise them to the nth power, take the cube root and prepare wonderful diagrams. But you must never forget that every one of these figures comes in the first instance from the village watchman, who just puts down what he damn pleases."
-- Sir Josiah Stamp

Error bars for #1 and #5 do not overlap any of the others.

Recall that any measurement is a product produced by a series of operations and the measurement is defined by these operations. There is no such thing as the speed of light; but there is a number delivered by applying a certain method of measurement. Different methods will in general deliver different results. Consider a series of determinations of the fine structure constant at left -- complete with what physicists are pleased to call "error bars" (set at one sigma so the estimates appear tighter than they are). Either the universe is erratic (ontological uncertainty) or our ability to measure it has been (epistemic uncertainty).

Millikan's Oil Drop experiment

In estimating the electron charge, successive estimates tended to creep upward. This was because later experimenters assumed that Millikan had nailed it. He had reported an estimate within ±0.5%. Hence, when they obtained higher values, they assumed that they had goofed and adjusted their numbers downward toward previous values. Eventually, the accumulated measurements of electron charge converged on the present value. Supposedly, Millikan had excluded some of his data for unstated reasons and obtained an error of ±0.5% instead of ±2%. This possible cherry-picking was then followed by confirmation bias on the part of later researchers who tried to duplicate Millikan's results, and not simply Millikan's experiment.

Ontological uncertainty is related to what quality practitioners call "process capability." Process output will vary from time to time and from source to source. Sometimes this variation can be assigned to particular causes, but there is always a residuum of variation due to complex combinations of many causes that are impossible (or perhaps only impractical) to resolve further. This is sometimes called for pragmatic reasons "random variation."* This residual variation, remaining after all assignable causes have been accounted for, is called the process capability,(*) random. But not because randomness is a cause of anything. It is because there is no one particular cause that accounts for the variation. The dice may show a 12 for many different reasons: orientation in the hand, angle of the throw, force of the throw, friction of the felt, etc., so that no matter how well-controlled any one causal factor is, the same results can occur from chance combinations of other causes.

Sources of ontological uncertainty include

Inherent randomness of nature: the chaotic and unpredictable nature of natural processes. The ultimate example is (in the Copenhagen interpretation) quantum events.

Human behaviour (behavioural variability): non-rational behaviour, discrepancies between what people say and what they actually do (cognitive dissonance), deviations from standard behavioural patterns (micro-level behaviour)

Technological surprise: New developments or breakthroughs in technology or unexpected side-effects of technologies.

If the uncertainty drops to statistical uncertainty, the model may use a frequency distribution to represent it. However, it is important to distinguish uncertainty within a distribution and uncertainty about which distribution is in play ("between distributions"). That is, it one thing to sample (as most do in college "stats" classes) from a "normal distribution with μ=0 and σ=1" and quite another to know whether the mean really is 0 and the standard deviation really is 1 -- or that the data are adequately modeled by a normal distribution at all!*(*) adequately modeled. The normal distribution runs to infinity in both directions. No real-world process is known to do so. Therefore, the model will fail somewhere in the extremes, if nowhere else. In fact, darn few stochastic processes show "a state of statistical control" enough to merit any statistical distribution at all.

For our (not) final installment -- Yes, TOF can hear the cries of relief -- we will take a tiptoe through the tulips, as it were, and look at some particular examples of uncertainties.

5 comments:

Is Part III the last of your series on model building? Part IV?: Possible comments on insurance rate/actuarial modeling? Medical diagnosis/treatment? Human choice modeling? War gaming? Difficult-to-achieve human-centered models in future SF scenarios and stories?

Can you explain further your statement -- "There is no such thing as the speed of light; but there is a number delivered by applying a certain method of measurement"?Is the notion of speed inherently metrical (if that's the word I want -- I guess I mean "using numbers")?

A process is a set of causes that work together to produce an effect. Different processes will in general produce different products, which is why FDA requires revalidation when a pharm or med dev is shifted from one production line to another.

A number is a product, produced by a process; viz., the measurement process. Two different methods of measurement will in general produce two different numbers; for example, measuring coefficient of friction using the inclined plane v. dynamic pull methods. TOF has seen different values delivered on the same item by two different dial indicators otherwise similar in set-up; and different values delivered by two different techs using the same gauge on the same piece. W.E.Deming made the observation about speed of light in this context in his Quality, Productivity, and Competitive Position. TOF notes that speeds of light measured by geodimeter differ from those delivered by Kerr cells or by rotating mirrors. The differences may be slight, but they are there. See the forthcoming ANALOG article (Jul/Aug 2014) entitled "Spanking Bad Data Won't Make Them Behave."

The way I learned in Architectural School is you break a big (unsolvable) problem down into multiple, smaller (solvable) problems. Meaning, you size the roof deck and joists > size the girders > size the columns > size the footings and then check these "verticals" against the "horizontals" i.e. seismic/wind.

The ability of the deck, joists, girders, columns and footings to withstand stresses are determined empirically and are listed in tables.

A structural engineer is not "predicting" that a beam can withstand a load. He knows that it can - so long as the stresses don't exceed the design parameters.

I came across while watching movies. .Thanks for sharing..^__^..I will tell my sister and friends about your blog...pls continue to write an article...I'm looking forward for the upcoming posts you will be posting