Archive for the 'Probability theory' Category

The standard or classical model in decision theory is called Maximum Expected Utility (MEU) theory, which I have excoriated here and here (and which Cosma Shalizi satirized here). Its flaws and weaknesses for real decision-making have been pointed out by critics since its inception, six decades ago. Despite this, the theory is still taught in economics classes and MBA programs as a normative model of decision-making.

A key feature of MEU is the the decision-maker is required to identify ALL possible action options, and ALL consequential states of these options. He or she then reasons ACROSS these consequences by adding together the utilites of the consquential states, weighted by the likelihood that each state will occur.

However, financial and business planners do something completely contrary to this in everyday financial and business modeling. In developing a financial model for a major business decision or for a new venture, the collection of possible actions is usually infinite and the space of possible consequential states even more so. Making human sense of the possible actions and the resulting consequential states is usually a key reason for undertaking the financial modeling activity, and so cannot be an input to the modeling. Because of the explosion in the number states and in their internal complexity, business planners cannot articulate all the actions and all the states, nor even usually a subset of these beyond a mere handful.

Therefore, planners typically choose to model just 3 or 4 states – usually called cases or scenarios – with each of these combining a complex mix of (a) assumed actions, (b) assumed stakeholder responses and (c) environmental events and parameters. The assumptions and parameter values are instantiated for each case, the model run, and the outputs of the 3 or 4 cases compared with one another. The process is usually repeated with different (but close) assumptions and parameter values, to gain a sense of the sensitivity of the model outputs to those assumptions.

Often the scenarios will be labeled “Best Case”, “Worst Case”, “Base Case”, etc to identify the broad underlying principles that are used to make the relevant assumptions in each case. Actually adopting a financial model for (say) a new venture means assuming that one of these cases is close enough to current reality and its likely future development in the domain under study- ie, that one case is realistic. People in the finance world call this adoption of one case “taking a view” on the future.

Taking a view involves assuming (at least pro tem) that one trajectory (or one class of trajectories) describes the evolution of the states of some system. Such betting on the future is the complete opposite cognitive behaviour to reasoning over all the possible states before choosing an action, which the protagonists of the MEU model insist we all do. Yet the MEU model continues to be taught as a normative model for decision-making to MBA students who will spend their post-graduation life doing business planning by taking a view.

Bayesians are so prevalent in Artificial Intelligence (and, to be honest, so strident) that it can sometimes be lonely being a Frequentist. So it is nice to see a critical review of Nate Silver’s new book on prediction from a frequentist perspective. The reviewers are Gary Marcus and Ernest Davis from New York University, and here are some paras from their review in The New Yorker:

Silver’s one misstep comes in his advocacy of an approach known as Bayesian inference. According to Silver’s excited introduction,

Bayes’ theorem is nominally a mathematical formula. But it is really much more than that. It implies that we must think differently about our ideas.

Lost until Chapter 8 is the fact that the approach Silver lobbies for is hardly an innovation; instead (as he ultimately acknowledges), it is built around a two-hundred-fifty-year-old theorem that is usually taught in the first weeks of college probability courses. More than that, as valuable as the approach is, most statisticians see it is as only a partial solution to a very large problem.

A Bayesian approach is particularly useful when predicting outcome probabilities in cases where one has strong prior knowledge of a situation. Suppose, for instance (borrowing an old example that Silver revives), that a woman in her forties goes for a mammogram and receives bad news: a “positive” mammogram. However, since not every positive result is real, what is the probability that she actually has breast cancer? To calculate this, we need to know four numbers. The fraction of women in their forties who have breast cancer is 0.014, which is about one in seventy. The fraction who do not have breast cancer is therefore 1 – 0.014 = 0.986. These fractions are known as the prior probabilities. The probability that a woman who has breast cancer will get a positive result on a mammogram is 0.75. The probability that a woman who does not have breast cancer will get a false positive on a mammogram is 0.1. These are known as the conditional probabilities. Applying Bayes’s theorem, we can conclude that, among women who get a positive result, the fraction who actually have breast cancer is (0.014 x 0.75) / ((0.014 x 0.75) + (0.986 x 0.1)) = 0.1, approximately. That is, once we have seen the test result, the chance is about ninety per cent that it is a false positive. In this instance, Bayes’s theorem is the perfect tool for the job.

This technique can be extended to all kinds of other applications. In one of the best chapters in the book, Silver gives a step-by-step description of the use of probabilistic reasoning in placing bets while playing a hand of Texas Hold ’em, taking into account the probabilities on the cards that have been dealt and that will be dealt; the information about opponents’ hands that you can glean from the bets they have placed; and your general judgment of what kind of players they are (aggressive, cautious, stupid, etc.).

But the Bayesian approach is much less helpful when there is no consensus about what the prior probabilities should be. For example, in a notorious series of experiments, Stanley Milgram showed that many people would torture a victim if they were told that it was for the good of science. Before these experiments were carried out, should these results have been assigned a low prior (because no one would suppose that they themselves would do this) or a high prior (because we know that people accept authority)? In actual practice, the method of evaluation most scientists use most of the time is a variant of a technique proposed by the statistician Ronald Fisher in the early 1900s. Roughly speaking, in this approach, a hypothesis is considered validated by data only if the data pass a test that would be failed ninety-five or ninety-nine per cent of the time if the data were generated randomly. The advantage of Fisher’s approach (which is by no means perfect) is that to some degree it sidesteps the problem of estimating priors where no sufficient advance information exists. In the vast majority of scientific papers, Fisher’s statistics (and more sophisticated statistics in that tradition) are used.

Unfortunately, Silver’s discussion of alternatives to the Bayesian approach is dismissive, incomplete, and misleading. In some cases, Silver tends to attribute successful reasoning to the use of Bayesian methods without any evidence that those particular analyses were actually performed in Bayesian fashion. For instance, he writes about Bob Voulgaris, a basketball gambler,

Bob’s money is on Bayes too. He does not literally apply Bayes’ theorem every time he makes a prediction. But his practice of testing statistical data in the context of hypotheses and beliefs derived from his basketball knowledge is very Bayesian, as is his comfort with accepting probabilistic answers to his questions.

But, judging from the description in the previous thirty pages, Voulgaris follows instinct, not fancy Bayesian math. Here, Silver seems to be using “Bayesian” not to mean the use of Bayes’s theorem but, rather, the general strategy of combining many different kinds of information.

To take another example, Silver discusses at length an important and troubling paper by John Ioannidis, “Why Most Published Research Findings Are False,” and leaves the reader with the impression that the problems that Ioannidis raises can be solved if statisticians use Bayesian approach rather than following Fisher. Silver writes:

[Fisher’s classical] methods discourage the researcher from considering the underlying context or plausibility of his hypothesis, something that the Bayesian method demands in the form of a prior probability. Thus, you will see apparently serious papers published on how toads can predict earthquakes… which apply frequentist tests to produce “statistically significant” but manifestly ridiculous findings.

But NASA’s 2011 study of toads was actually important and useful, not some “manifestly ridiculous” finding plucked from thin air. It was a thoughtful analysis of groundwater chemistry that began with a combination of naturalistic observation (a group of toads had abandoned a lake in Italy near the epicenter of an earthquake that happened a few days later) and theory (about ionospheric disturbance and water composition).

The real reason that too many published studies are false is not because lots of people are testing ridiculous things, which rarely happens in the top scientific journals; it’s because in any given year, drug companies and medical schools perform thousands of experiments. In any study, there is some small chance of a false positive; if you do a lot of experiments, you will eventually get a lot of false positive results (even putting aside self-deception, biases toward reporting positive results, and outright fraud)—as Silver himself actually explains two pages earlier. Switching to a Bayesian method of evaluating statistics will not fix the underlying problems; cleaning up science requires changes to the way in which scientific research is done and evaluated, not just a new formula.

It is perfectly reasonable for Silver to prefer the Bayesian approach—the field has remained split for nearly a century, with each side having its own arguments, innovations, and work-arounds—but the case for preferring Bayes to Fisher is far weaker than Silver lets on, and there is no reason whatsoever to think that a Bayesian approach is a “think differently” revolution. “The Signal and the Noise” is a terrific book, with much to admire. But it will take a lot more than Bayes’s very useful theorem to solve the many challenges in the world of applied statistics.” [Links in original]

Also worth adding here that there is a very good reason experimental sciences adopted Frequentist approaches (what the reviewers call Fisher’s methods) in journal publications. That reason is that science is intended to be a search for objective truth using objective methods. Experiments are – or should be – replicable by anyone. How can subjective methods play any role in such an enterprise? Why should the journal Nature or any of its readers care what the prior probabilities of the experimenters were before an experiment? If these prior probabilities make a difference to the posterior (post-experiment) probabilities, then this is the insertion of a purely subjective element into something that should be objective and replicable. And if the actual numeric values of the prior probabilities don’t matter to the posterior probabilities (as some Bayesian theorems would suggest), then why does the methodology include them?

Many proponents of Bayesianism point to Cox’s theorem as the justification for arguing that there is only one coherent method for representing uncertainty. Cox’s theorem states that any representation of uncertainty satisfying certain assumptions is isomorphic to classical probability theory. As I have long argued, this claim depends upon the law of the excluded middle (LEM).

Mark Colyvan, an Australian philosopher of mathematics, published a paper in 2004 which examined the philosophical and logical assumptions of Cox’s theorem (assumptions usually left implicit by its proponents), and argued that these are inappropriate for many (perhaps even most) domains with uncertainty.

Although these papers are several years old, I mention them here for the record – and because I still encounter invocations of Cox’s Theorem.

IME, most statisticians, like most economists, have little historical sense. This absence means they will not appreciate a nice irony: the person responsible for axiomatizing classical probability theory – Andrei Kolmogorov – is also one of the people responsible for axiomatizing intuitionistic logic, a version of classical logic which dispenses with the law of the excluded middle. One such axiomatization is called BHK Logic (for Brouwer, Heyting and Kolmogorov) in recognition.

Normblog has a regular feature, Writer’s Choice, where writers give their opinions of books which have influenced them. Seeing this led me recently to think of the mathematical ideas which have influenced my own thinking. In an earlier post, I wrote about the writers whose books (and teachers whose lectures) directly influenced me. I left many pure mathematicians and statisticians off that list because most mathematics and statistics I did not receive directly from their books, but indirectly, mediated through the textbooks and lectures of others. It is time to make amends.

Here then is a list of mathematical ideas which have had great influence on my thinking, along with their progenitors. Not all of these ideas have yet proved useful in any practical sense, either to me or to the world – but there is still lots of time. Some of these theories are very beautiful, and it is their elegance and beauty and profundity to which I respond. Others are counter-intuitive and thus thought-provoking, and I recall them for this reason.

Euclid’s axiomatic treatment of (Euclidean) geometry

The various laws of large numbers, first proven by Jacob Bernoulli (which give a rational justification for reasoning from samples to populations)

The differential calculus of Isaac Newton and Gottfried Leibniz (the first formal treatment of change)

The Identity of Leonhard Euler: exp ( i * \pi) + 1 = 0, which mysteriously links two transcendental numbers (\pi and e), an imaginary number i (the square root of minus one) with the identity of the addition operation (zero) and the identity of the multiplication operation (1).

The epsilon-delta arguments for the calculus of Augustin Louis Cauchy and Karl Weierstrauss

The non-Euclidean geometries of Janos Bolyai, Nikolai Lobachevsky and Bernhard Riemann (which showed that 2-dimensional (or plane) geometry would be different if the surface it was done on was curved rather than flat – the arrival of post-modernism in mathematics)

The diagonalization proof of Gregor Cantor that the Real numbers are not countable (showing that there is more than one type of infinity) (a proof-method later adopted by Godel, mentioned below)

The axioms for the natural numbers of Guiseppe Peano

The space-filling curves of Guiseppe Peano and others (mapping the unit interval continuously to the unit square)

The axiomatic treatments of geometry of Mario Pieri and David Hilbert (releasing pure mathematics from any necessary connection to the real-world)

The algebraic topology of Henri Poincare and many others (associating algebraic structures to topological spaces)

The paradox of set theory of Bertrand Russell (asking whether the set of all sets contains itself)

The Fixed Point Theorem of Jan Brouwer (which, inter alia, has been used to prove that certain purely-artificial mathematical constructs called economies under some conditions contain equilibria)

The theory of measure and integration of Henri Lebesgue

The constructivism of Jan Brouwer (which taught us to think differently about mathematical knowledge)

The statistical decision theory of Jerzy Neyman and Egon Pearson (which enabled us to bound the potential errors of statistical inference)

The axioms for probability theory of Andrey Kolmogorov (which formalized one common method for representing uncertainty)

The BHK axioms for intuitionistic logic, associated to the names of Jan Brouwer, Arend Heyting and Andrey Kolmogorov (which enabled the formal treatment of intuitionism)

The incompleteness theorems of Kurt Godel (which identified some limits to mathematical knowledge)

The theory of categories of Sam Eilenberg and Saunders Mac Lane (using pure mathematics to model what pure mathematicians do, and enabling concise, abstract and elegant presentations of mathematical knowledge)

We noted before that one consequence of the rise of coffee-houses in 17th-century Europe was the development of probability theory as a mathematical treatment of reasoning with uncertainty. Ian Hacking’s history of the emergence of probabilistic ideas in Europe has a nice articulation of the key events, all of which took place a decade either side of 1664:

1654: Pascal wrote to Fermat with his ideas about probability

1657: Huygens wrote the first textbook on probability to be published, and Pascal was the first to apply probabilitiy ideas to problems other than games of chance

1662: The Port Royal Logic was the first publication to mention numerical measurements of something called probability, and Leibniz applied probability to problems in legal reasoning

1662: London merchant John Gaunt published the first set of statistics drawn from records of mortality

Late 1660s: Probability theory was used by John Hudde and by Johan de Witt in Amsterdam to provide a sound basis for reasoning about annuities (Hacking 1975, p.11).

Developments in the use of symbolic algebra in Italy in the 16th-century provided the technical basis upon which a formal theory of uncertainty could be erected. And coffee-houses certainly aided the dissemination of probabilistic ideas, both in spoken and written form. Coffee houses may even have aided the creation of these ideas – new mathematical concepts are only rarely created by a solitary person working alone in a garret, but usually arise instead through conversation and debate among people each having only partial or half-formed ideas.

However, one aspect of the rise of probability in the mid 17th century is still a mystery to me: what event or phenomena led so many people across Europe to be interested in reasoning about uncertainty at this time? Although 1664 saw the establishment of a famous brewery in Strasbourg, I suspect the main motivation was the prevalence of bubonic plague in Europe. Although plague had been around for many centuries, the Catholic vs. Protestant religious wars of the previous 150 years had, I believe, led many intelligent people to abandon or lessen their faith in religious explanations of uncertain phenomena. Rene Descartes, for example, was led to cogito, ergo sum when seeking beliefs which peoples of all faiths or none could agree on. Without religion, alternative models to explain or predict human deaths, morbidity and natural disasters were required. The insurance of ocean-going vessels provided a financial incentive for finding good predictive models of such events.

Hacking notes (pp. 4-5) that, historically, probability theory has mostly developed in response to problems about uncertain reasoning in other domains: In the 17th century, these were problems in insurance and annuities, in the 18th, astronomy, the 19th, biometrics and statistical mechanics, and the early 20th, agricultural experiments. For more on the connection between statistical theory and experiments in agriculture, see Hogben (1957). For the relationship of 20th-century probability theory to statistical physics, see von Plato (1994).

POSTSCRIPT (ADDED 2011-04-25):

There appear to have been major outbreaks of bubonic plague in Seville, Spain (1647-1652), in Naples (1656), in Amsterdam, Holland (1663-1664), in Hamburg (1663), in London, England (1665-1666), and in France (1668). The organist Heinrich Scheidemann, teacher of Johann Reincken, for example, died during the outbreak in Hamburg in 1663. Wikipedia now has a listing of global epidemics (albeit incomplete).

References:

Ian Hacking [1975]: The Emergence of Probability: a Philosophical study of early ideas about Probability, Induction and Statistical Inference. London, UK: Cambridge University Press.

How do companies make major decisions? The gurus of classical Decision Theory – people like economist Jimmie Savage and statistician Dennis Lindley – tell us that there is only one correct way to make decisions: List all the possible actions, list the potential consequences of each action, assign utilities and probabilities of occurence to each consequence, multiply these numbers together for each consequence and then add the resulting products for each action to get an expected utility for each action, and finally choose that action which maximizes expected utility.

There are many, many problems with this model, not least that it is not what companies – or intelligent, purposive individuals for that matter – actually do. Those who have worked in companies know that nothing so simplistic or static describes intelligent, rational decision making, nor should it. Moreover, that their model was flawed as a description of reality was known at the time to Savage, Lindley, et al, because it was pointed out to them six decades ago by people such as George Shackle, an economist who had actually worked in industry and who drew on his experience. The mute, autistic behemoth that is mathematical economics, however, does not stop or change direction merely because its utter disconnection with empirical reality is noticed by someone, and so – TO THIS VERY DAY – students in business schools still learn the classical theory. I guess for the students it’s a case of: Who are we going to believe – our textbooks, or our own eyes? From my first year as an undergraduate taking Economics 101, I had trouble believing my textbooks.

So what might be a better model of decision-making? First, we need to recognize that corporate decision-making is almost always something dynamic, not static – it takes place over time, not in a single stage of analysis, and we would do better to describe a process, rather than just giving a formula for calculating an outcome. Second, precisely because the process is dynamic, many of the inputs assumed by the classical model do not exist, or are not known to the participants, at the start, but emerge in the course of the decision-making process. Here, I mean things such as: possible actions, potential consequences, preferences (or utilities), and measures of uncertainty (which may or may not include probabilities). Third, in large organizations, decision-making is a group activity, with inputs and comments from many people. If you believe – as Savage and Lindley did – that there is only one correct way to make a decision, then your model would contain no scope for subjective inputs or stakeholder revisions, which is yet another of the many failings of the classical model. Fourth, in the real world, people need to consider – and do consider – the potential downsides as well as the upsides of an action, and they need to do this – and they do do this – separately, not merged into a summary statistic such as “utility”. So, if one possible consequence of an action-option is catastrophic loss, then no amount of maximum-expected-utility quantitative summary gibberish should permit a rational decision-maker to choose that option without great pause (or insurance). Shackle knew this, so his model considers downsides as well as upsides. That Savage and his pals ignored this one can only assume is the result of the impossibility of catastrophic loss ever occurring to a tenured academic.

So let us try to articulate a staged process for what companies actually do when they make major decisions, such as major investments or new business planning:

Describe the present situation and the way or ways it may evolve in the future. We call these different future paths scenarios. Making assumptions about the present and the future is also called taking a view.

For each scenario, identify a list of possible actions, able to be executed under the scenario.

For each scenario and action, identify the possible upsides and downsides.

Some actions under some scenarios will have attractive upsides. What can be done to increase the likelihood of these upsides occurring? What can be done to make them even more attractive?

Some actions under some scenarios will have unattractive downsides. What can be done to eliminate these downsides altogether or to decrease their likelihood of occurring? What can be done to ameliorate, to mitigate, to distribute to others, or to postpone the effects of these downsides?

In the light of what was learned in doing steps 1-5, go back to step 1 and repeat it.

In the light of what was learned in doing steps 1-6, go back to step 2 and repeat steps 2-5. For example, by modifying or combining actions, it may be posssible to shift attractive upsides or unattractive downsides from one action to another.

As new information comes to hand, occasionally repeat step 1. Repeat step 7 as often as time permits.

This decision process will be familiar to anyone who has prepared a business plan for a new venture, either for personal investment, or for financial investors and bankers, or for business partners. Having access to spreadsheet software such as Lotus 1-2-3 or Microsoft EXCEL has certainly made this process easier to undertake. But, contrary to the beliefs of many, people made major decisons before the invention of spreadsheets, and they did so using processes similar to this, as Shackle’s work evidences.

Because this model involves revision of initial ideas in repeated stages, it bears some resemblance to the retroflexive argumentation theory of philosopher Harald Wohlrapp. Hence, I call it Retroflexive Decision Theory. I will explore this model in more detail in future posts.

References:

D. Lindley [1985]: Making Decisions. Second Edition. London, UK: John Wiley and Sons.

Over at “This Blog Sits”, Grant McCracken has a nice post about a paradigm example often used in mainstream economics to chastise everyday human reasoners. A nice discussion has developed. I thought to re-post one of my comments, which I do here:

“The first point — which should be obvious to anyone who deals professionally with probability, but often seems not — is that the answer to a problem involving uncertainty depends very crucially on its mathematical formulation. We are given a situation expressed in ordinary English words and asked to use it to make a judgment. The probability theorists have arrived at a way of translating such situations from natural human language into a formal mathematical language, and using this formalism, to arrive at an answer to the situation which they deem correct. However, natural language may be imprecise (as in the example, as gek notes). Imprecision of natural language is a key reason for attempting a translation into a formal language, since doing so can clarify what is vague or ambiguous. But imprecision also means that there may be more than one reasonable translation of the same problem situation, even if we all agreed on what formal language to use and on how to do the translation. There may in fact be more than one correct answer.

There is much of background relevance here that may not be known to everyone, First, note that it took about 250 years from the first mathematical formulations of uncertainty using probability (in the 1660s) to reach a sort-of consensus on a set of mathematical axioms for probability theory (the standard axioms, due to Andrei Kolmogorov, in the 1920s). By contrast, the differential calculus, invented about the same time as Probability in the 17th century, was already rigorously formalized (using epsilon-delta arguments) by the mid-19th century. Dealing formally with uncertainty is hard, and intuitions differ greatly, even for the mathematically adept.

Second, even now, the Kolmogorov axioms are not uncontested. Although it often comes as a suprise to statisticians and mathematicians, there is a whole community of intelligent, mathematically-adept people in Artificial Intelligence who prefer to use alternative formalisms to probability theory, at least for some problem domains. These alternatives (such as Dempster-Shafer theory and possibility theory) are preferred to probability theory because they are more expressive (more situations can be adequately represented) and because they are easier to manipulate for some types of problems than probability theory. Let no one believe, then, that probability theory is accepted by every mathematically-adept expert who works with uncertainty.

Historical aside: In fact, ever since the 1660s, there has been a consistent minority of people dissenting from the standard view of probability theory, a minority which has mostly been erased from the textbooks. Typically, these dissidents have tried unsuccessfully to apply probability theory to real-world problems, such as those encountered by judges and juries (eg, Leibniz in the 17th century), doctors (eg, von Kries in the 19th), business investors (eg, Shackle in the 20th), and now intelligent computer systems (since the 1970s). One can have an entire university education in mathematical statistics, as I did, and never hear mention of this dissenting stream. A science that was confident of its own foundations would surely not need to suppress alternative views.

Third, intelligent, expert, mathematically-adept people who work with uncertainty do not even yet agree on what the notion of “probability” means, or to what it may validly apply. Donald Gillies, a professor of philosophy at the University of London, wrote a nice book, Philosophical Theories of Probability, which outlines the main alternative interpretations. A key difference of opinion concerns the scope of probability expressions (eg, over which types of natural language statements may one validly apply the translation mechanism). Note that Gillies wrote his book 70-some years after Kolmogorov’s axioms. In addition, there are other social or cultural factors, usually ignored by mathematically-adept experts, which may inform one’s interpretations of uncertainty and probability. A view that the universe is deterministic, or that one’s spiritual fate is pre-determined before birth, may be inconsistent with any of these interpretations of uncertainty, for instance. I have yet to see a Taoist theory of uncertainty, but I am sure it would differ from anything developed so far.

I write this comment to give some context to our discussion. Mainstream economists and statisticians are fond of castigating ordinary people for being confused or for acting irrationally when faced with situations involving uncertainty, merely because the judgements of ordinary people do not always conform to the Kolmogorov axioms and the deductive consequences of these axioms. It is surely unreasonable to cast such aspersions when experts themselves disagree on what probability is, to what statements probabilities may be validly applied, and on how uncertainty should be formally represented.