One of the few statisticians that I have on my blogroll is Andrew Gelman. Although not sharing his Bayesian leanings, yours truly finds his open-minded, thought-provoking and non-dogmatic statistical thinking highly recommendable. The plaidoyerinfra for “reverse causal questioning” is typical Gelmanian:

When statistical and econometrc methodologists write about causal inference, they generally focus on forward causal questions. We are taught to answer questions of the type “What if?”, rather than “Why?” Following the work by Rubin (1977) causal questions are typically framed in terms of manipulations: if x were changed by one unit, how much would y be expected to change? But reverse causal questions are important too … In many ways, it is the reverse causal questions that motivate the research, including experiments and observational studies, that we use to answer the forward questions …

Reverse causal reasoning is different; it involves asking questions and searching for new variables that might not yet even be in our model. We can frame reverse causal questions as model checking. It goes like this: what we see is some pattern in the world that needs an explanation. What does it mean to “need an explanation”? It means that existing explanations — the existing model of the phenomenon — does not do the job …

By formalizing reverse casual reasoning within the process of data analysis, we hope to make a step toward connecting our statistical reasoning to the ways that we naturally think and talk about causality. This is consistent with views such as Cartwright (2007) that causal inference in reality is more complex than is captured in any theory of inference … What we are really suggesting is a way of talking about reverse causal questions in a way that is complementary to, rather than outside of, the mainstream formalisms of statistics and econometrics.

In a time when scientific relativism is expanding, it is important to keep up the claim for not reducing science to a pure discursive level. We have to maintain the Enlightenment tradition of thinking of reality as principally independent of our views of it and of the main task of science as studying the structure of this reality. Perhaps the most important contribution a researcher can make is reveal what this reality that is the object of science actually looks like.

Science is made possible by the fact that there are structures that are durable and are independent of our knowledge or beliefs about them. There exists a reality beyond our theories and concepts of it. It is this independent reality that our theories in some way deal with. Contrary to positivism, I would as a critical realist argue that the main task of science is not to detect event-regularities between observed facts. Rather, that task must be conceived as identifying the underlying structure and forces that produce the observed events.

In Gelman’s essay there is no explicit argument for abduction — inference to the best explanation — but I would still argue that it is de facto nothing but a very strong argument for why scientific realism and inference to the best explanation are the best alternatives for explaining what’s going on in the world we live in. The focus on causality, model checking, anomalies and context-dependence — although here expressed in statistical terms — is as close to abductive reasoning as we get in statistics and econometrics today.

Yours truly and people like Tony Lawson have for many years been urging economists to pay attention to the ontological foundations of their assumptions and models. Sad to say, economists have not paid much attention — and so modern economics has become increasingly irrelevant to the understanding of the real world.

Within mainstream economics internal validity is still everything and external validity nothing. Why anyone should be interested in that kind of theories and models is beyond imagination. As long as mainstream economists do not come up with any export-licenses for their theories and models to the real world in which we live, they really should not be surprised if people say that this is not science, but autism!

Studying mathematics and logics is interesting and fun. It sharpens the mind. In pure mathematics and logics we do not have to worry about external validity. But economics is not pure mathematics or logics. It’s about society. The real world. Forgetting that, economics is really in dire straits.

Mathematical axiomatic systems lead to analytic truths, which do not require empirical verification, since they are true by virtue of definitions and logic. It is a startling discovery of the twentieth century that sufficiently complex axiomatic systems are undecidable and incomplete. That is, the system of theorem and proof can never lead to ALL the true sentences about the system, and ALWAYS contain statements which are undecidable – their truth values cannot be determined by proof techniques. More relevant to our current purpose is that applying an axiomatic hypothetico-deductive system to the real world can only be done by means of a mapping, which creates a model for the axiomatic system. These mappings then lead to assertions about the real world which require empirical verification. These assertions (which are proposed scientific laws) can NEVER be proven in the sense that mathematical theorems can be proven …

Many more arguments can be given to explain the difference between analytic and synthetic truths, which corresponds to the difference between mathematical and scientific truths. As I have explained in greater detail in my paper, the scientific method arose as a rejection of the axiomatic method used by the Greeks for scientific methodology. It was this rejection of axiomatics and logical certainty in favour of empirical and observational approach which led to dramatic progress in science. However, this did involve giving up the certainties of mathematical argumentation and learning to live with the uncertainties of induction. Economists need to do the same – abandon current methodology borrowed from science and develop a new methodology suited for the study of human beings and societies.

Advocates for choice-based solutions should take a look at what’s happened to schools in Sweden, where parents and educators would be thrilled to trade their country’s steep drop in PISA scores over the past 10 years for America’s middling but consistent results. What’s caused the recent crisis in Swedish education? Researchers and policy analysts are increasingly pointing the finger at many of the choice-oriented reforms that are being championed as the way forward for American schools. While this doesn’t necessarily mean that adding more accountability and discipline to American schools would be a bad thing, it does hint at the many headaches that can come from trying to do so by aggressively introducing marketlike competition to education.
There are differences between the libertarian ideal espoused by Friedman and the actual voucher program the Swedes put in place in the early ’90s … But Swedish school reforms did incorporate the essential features of the voucher system advocated by Friedman. The hope was that schools would have clear financial incentives to provide a better education and could be more responsive to customer (i.e., parental) needs and wants when freed from the burden imposed by a centralized bureaucracy …

But in the wake of the country’s nose dive in the PISA rankings, there’s widespread recognition that something’s wrong with Swedish schooling … Competition was meant to discipline government schools, but it may have instead led to a race to the bottom …

It’s the darker side of competition that Milton Friedman and his free-market disciples tend to downplay: If parents value high test scores, you can compete for voucher dollars by hiring better teachers and providing a better education—or by going easy in grading national tests. Competition was also meant to discipline government schools by forcing them to up their game to maintain their enrollments, but it may have instead led to a race to the bottom as they too started grading generously to keep their students …

Maybe the overall message is … “there are no panaceas” in public education. We tend to look for the silver bullet—whether it’s the glories of the market or the techno-utopian aspirations of education technology—when in fact improving educational outcomes is a hard, messy, complicated process. It’s a lesson that Swedish parents and students have learned all too well: Simply opening the floodgates to more education entrepreneurs doesn’t disrupt education. It’s just plain disruptive.

Ray Fisman is not the only critical international reviewer of the Swedish voucher experiment. This is what Henry M. Levin — distinguished economist and director of the National Center for the Study of Privatization in Education at Teachers College, Columbia University — wrote when he recently reviewed the evidence about the effects of vouchers:

In 1992 Sweden adopted a voucher-type plan in which municipalities would provide the same funding per pupil to either public schools or independent (private) schools. There were few restrictions for independent schools, and religious or for-profit schools were eligible to participate. In 1994, choice was also extended to that of public schools where parents could choose either a public or private school. In the early years, only about 2 percent of students chose independent schools. However, since the opening of this century, independent school enrollments have expanded considerably. By 2011-12 almost a quarter of elementary and secondary students were in independent schools. Half of all students in the upper secondary schools in Stockholm were attending private schools at public expense.

On December 3, 2012, Forbes Magazine recommended for the U.S. that: “…we can learn something about when choice works by looking at Sweden’s move to vouchers.” On March 11 and 12, 2013, the Royal Swedish Academy of Sciences did just that by convening a two day conference to learn what vouchers had accomplished in the last two decades … The following was my verdict:

On the criterion of Freedom of Choice, the approach has been highly successful. Parents and students have many more choices among both public schools and independent schools than they had prior to the voucher system.

On the criterion of productive efficiency, the research studies show virtually no difference in achievement between public and independent schools for comparable students. Measures of the extent of competition in local areas also show a trivial relation to achievement. The best study measures the potential choices, public and private, within a particular geographical area. For a 10 percent increase in choices, the achievement difference is about one-half of a percentile. Even this result must be understood within the constraint that the achievement measure is not based upon standardized tests, but upon teacher grades. The so-called national examination result that is also used in some studies is actually administered and graded by the teacher with examination copies available to the school principal and teachers well in advance of the “testing”. Another study found no difference in these achievement measures between public and private schools, but an overall achievement effect for the system of a few percentiles. Even this author agreed that the result was trivial.

In evaluating these results, we must also keep in mind that the overall performance of the system on externally administered and evaluated tests used for international comparisons showed substantial declines over the last fifteen years for Sweden. For those who are interested in the patterns of achievement decline across subjects and grades, I have provided the enclosed powerpoint presentation …

With respect to equity, a comprehensive, national study sponsored by the government found that socio-economic stratification had increased as well as ethnic and immigrant segregation. This also affected the distribution of personnel where the better qualified educators were drawn to schools with students of higher socio-economic status and native students. The international testing also showed rising variance or inequality in test scores among schools. No evidence existed to challenge the rising inequality. Accordingly, I rated the Swedish voucher system as negative on equity.

Among the industrialized countries, only three have a universal voucher or choice system Chile, Holland, and Sweden. Some would also argue that Belgium qualifies in this category. The former three countries have very different designs with the Dutch system being the most highly regulated and devoting the most attention to equity. Even so, the tracking that takes place at age 12 in the Netherlands between vocational and academic secondary schools has important equity consequences in terms of socio-economic stratification. Although based upon choice, the available choices available to a student are heavily dependent on her achievement test results. The Chilean system has witnessed an increasingly notable stratification of the population, both within and between public and private sectors. Students from more educated and wealthier families are found in the private schools which receive public funding, but can choose which students to accept from among applicants. The Chilean system allows schools to charge additional fees beyond the voucher, also favoring more advantage families.

A recent Swedish study on the effects of school-choice concluded:

The results from the analyses made in this paper confirm that school choice, rather than residential segregation, is a more important factor determining variation in grades than is residential segregation.

The empirical analysis in this paper confirms the PISA-based finding that between-school variance in student performance in the Swedish school system has increased rapidly since 2000. We have also been able to show that this trend towards increasing performance gaps cannot be explained by shifting patterns of residential segregation. A more likely explanation is that increasing possibilities for school choice have triggered a process towards a more unequal school system. A rapid growth in the number of students attending voucher-financed, independent schools has been an important element of this process …

The idea of voucher-based independent school choice is commonly ascribed to Milton Friedman. Friedman’s argument was that vouchers would decrease the role of government and expand the opportunities for free enterprise. He also believed that the introduction of competition would lead to improved school results. As we have seen in the Swedish case, this has not happened. As school choice has increased, differences between schools have increased but overall results have gone down. As has proved to be the case with other neo-liberal ideas, school choice—when tested—has not been able to deliver the results promised by theoretical speculation.

When one works – as one must at an aggregate level – with quantities measured in value terms, the appearance of a well-behaved aggregate production function tells one nothing at all about whether there really is one. Such an appearance stems from the accounting identity that relates the value of outputs to the value of inputs – nothing more.

All these facts should be well known. They are not, or, if they are, their implications are simply ignored by macroeconomists who go on treating the aggregate production function as the most fundamental construct of neoclassical macroeconomics …

The consequences of the non-existence of aggregate production functions have been too long overlooked. I am reminded of the story that, during World War II, a sign in an airplane manufacturing plant read: “The laws of aerodynamics tell us that the bumblebee cannot fly. But the bumblebee does fly, and, what is more, it makes a little honey each day.” I don’t know about bumblebees, but any honey supposedly made by aggregate production functions may well be bad for one’s health.

Attempts to explain the impossibility of using aggregate production functions in practice are often met with great hostility, even outright anger. To that I say … that the moral is: “Don’t interfere with fairytales if you want to live happily ever after.”

Neoclassical marginal productivity theory is a collapsed theory from both a historical and – as
shown already by Sraffa in the 1920s, and in the Cambridge capital controversy in the 1960s
and 1970s – a theoretical point of view. As Joan Robinson wrote in 1953:

The production function has been a powerful instrument of miseducation. The student of economic theory is taught to write Q = f (L, K) where L is a quantity of labor, K a quantity of capital and Q a rate of output of commodities. He is instructed to assume all workers alike, and to measure L in man-hours of labor; he is told something about the index-number problem in choosing a unit of output; and then he is hurried on to the next question, in the hope that he will forget to ask in what units K is measured. Before he ever does ask, he has become a professor, and so sloppy habits of thought are handed on from one generation to the next.

For more on the inadequacy of marginal productivity theory this article of yours truly may perhaps be of some interest.

Kevin Hoover: Let me follow up on my question to make it more pointed so that Mike Lovell and others can comment on it. In 1986, Mike, you wrote a pretty well known paper in which you examined the empirical success of a variety of alternatives to rational expectations, including adaptive expectations, structural expectations, and implicit expectations [Lovell (1986)]. And in your paper, rational expectations does not dominate these alternatives. You even cite a paper by Muth, which comes down more or less in favor of implicit expectations. What I am wondering, then, is that, given the way that you have approached this empirically or the way it could be approached empirically, does this mean that we should find an alternative to rational expectations or are there other expectational approaches that are an empirical complement to rational expectations?

Michael Lovell: I wish Jack Muth could be here to answer that question, but obviously he can’t because he died just as Hurricane Wilma was zeroing in on his home on the Florida Keys. But he did send me a letter in 1984. This was a letter in response to an earlier draft of that paper you are referring to. I sent Jack my paper with some trepidation because it was not encouraging to his theory. And much to my surprise, he wrote back. This was in October 1984. And he said: “I came up with some conclusions similar to some of yours on the basis of forecasts of business activity compiled by the Bureau of Business Research at Pitt.” [Letter Muth to Lovell (2 October 1984)] He had got hold of the data from five business firms, including expectations data, analyzed it, and found that the rational expectations model did not pass the empirical test. He went on to say, “It is a little surprising that serious alternatives to rational expectations have never really been proposed. My original paper was largely a reaction against very naıve expectations hypotheses juxtaposed with highly rational decision-making behavior and seems to have been rather widely misinterpreted. Two directions seem to be worth exploring: (1) explaining why smoothing rules work and their limitations and (2) incorporating well known cognitive biases into expectations theory (Kahneman and Tversky). It was really incredible that so little has been done along these lines.” Muth also said that his results showed that expectations were not in accordance with the facts about forecasts of demand and production. He then advanced an alternative to rational expectations. That alternative he called an “errors-in-the- variables” model. That is to say, it allowed the expectation error to be correlated with both the realization and the prediction. Muth found that his errors-in-variables model worked better than rational expectations or Mills’ implicit expectations, but it did not entirely pass the tests. In a shortened version of his paper published in the Eastern Economic Journal he reported: “The results of the analysis do not support the hypotheses of the naive, exponential, extrapolative, regressive, or rational models. Only the expectations revision model used by Meiselman is consistently supported by the statistical results. . . . These conclusions should be regarded as highly tentative and only suggestive, however, because of the small number of firms studied.” [Muth (1985, p. 200)] Muth thought that we should not only have rational expectations, but if we’re going to have rational behavioral equations, then consistency requires that our model include rational expectations. But he was also interested in the results of people who do behavioral economics, which at that time was a very undeveloped area.

Hoover: Does anyone else want to comment on issue of testing rational expec-tations against alternatives and if it matters whether rational expectations stands up to empirical tests or whether it is not the sort of thing for which testing would be relevant?

Robert Shiller: What comes to my mind is that rational expectations models have to assume away the problem of regime change, and that makes them hard to apply. It’s the same criticism they make of Kahnemann and Tversky, that the model isn’t clear and crisp about exactly how you should apply it. Well, the same is true for rational expectations models. And there’s a new strand of thought that’s getting impetus lately, that the failure to predict this crisis was a failure to understand regime changes … Omitting key variables because we don’t have the data history on them creates a fundamental problem That’s why many nice concepts don’t find their way into empirical models and are not used more. They remain just a conceptual model …

Hoover: Bob, did you want to comment on that? You’re looking unhappy, I thought.

Robert Lucas: No. I mean, you can’t read Muth’s paper as some recipe for cranking out true theories about everything under the sun—we don’t have a recipe like that. My paper on expectations and the neutrality of money was an attempt to get a positive theory about what observations we call a Phillips curve. Basically it didn’t work. After several years, trying to push that model in a direction of being more operational, it didn’t seem to explain it. So we had what we call price stickiness, which seems to be central to the way the system works. I thought my model was going to explain price stickiness, and it didn’t. So we’re still working on it; somebody’s working on it. I don’t think we have a satisfactory solution to that problem, but I don’t think that’s a cloud over Muth’s work. If Jack thinks it is, I don’t agree with him. Mike cites some data that Jack couldn’t make sense out of using rational expectations. . . . There’re a lot of bad models out there. I authored my share, and I don’t see how that affects a lot of things we’ve been talking about earlier on about the value of Muth’s contribution.

Warren Young: Just to wrap up the issue of possible alternatives to rational expectations or complements to rational expectations. Does behavioral economics or psychology in general provide a useful and viable alternative to rational expectations, with the emphasis on “useful”? {laughter}

Shiller: Well, that’s the criticism of behavioral economics, that it doesn’t provide elegant models. If you read Kahnemann and Tversky, they say that preferences have a kink in them, and that kink moves around depending on framing. But framing is hard to pin down. So we don’t have any elegant behavioral economics models. The job isn’t done, and economists have to read widely and think about these issues. I am sorry, I don’t have a good answer. My opinion is that behavioral economics has to be on the reading list. Ultimately, the whole rationality assumption is another thing; it’s interesting to look back on the history of it. Back at the turn of the century—around 1900—when utility-maximizing economic theory was being discovered, it was described as a psychological theory—did you know that, that utility maximization was a psychological theory? … The idea about rational expectations, again, reflects insights about people—that if you show people recurring patterns in the data, they can actually process it—a little bit like an ARIMA model—and they can start using some kind of brain faculties that we do not fully comprehend. They can forecast— it’s an intuitive thing that evolved and it’s in our psychology. So, I don’t think that there’s a conflict between behavioral economics and classical economics. It’s all something that will evolve responding to each other—psychology and economics.

Lucas: I totally disagree.

Hoover: The Great Recession and the recent financial crisis have been widely viewed in both popular and professional commentary as a challenge to rational expectations and to efficient markets. I really just want to get your comments on that strain of the popular debate that’s been active over the last couple years …

Lucas: You know, people had no trouble having financial meltdowns in their economies before all this stuff we’ve been talking about came on board. We didn’t help, though; there’s no question about that. We may have focused attention on the wrong things; I don’t know.

Shiller: Well, I’ve written several books on that. … Another name that’s not been mentioned is John Maynard Keynes. I suspect that he’s not popular with everyone on this panel … To understand Keynes, you have to go back to his 1921 book, Treatise on Probability … He said—he’s really into almost this regime-change thing that we brought up before—that people don’t have probabilities, except in very narrow, special circumstances. You can think of a coin-toss experiment, and then you know what the probabilities are. But in macroeconomics, it’s always fuzzy.

According to some people there’s really no need for heterodox theoretical critiques of mainstream neoclassical economics, but rather challenges to neoclassical economics “buttressed by good empirical work.” Out with “big-think theorizing” and in with “ordinary empiricism.”

Although thought provoking, the view on empiricism and experiments offered is however too simplistic. And for several reasons — but mostly because the kind of experimental empiricism it favours is largely untenable.

Experiments are actually very similar to theoretical models in many ways — they e. g. have the same basic problem that they are built on rather artificial conditions and have difficulties with the “trade-off” between internal and external validity. The more artificial conditions, the more internal validity, but also less external validity. The more we rig experiments/models to avoid the “confounding factors”, the less the conditions are reminicent of the real “target system”. The nodal issue is how economists using different isolation strategies in different “nomological machines” attempt to learn about causal relationships. I doubt the generalizability of both research strategies, because the probability is high that causal mechanisms are different in different contexts and that lack of homogeneity/ stability/invariance doesn’t give us warranted export licenses to the “real” societies or economies.

If we see experiments as theory tests or models that ultimately aspire to say something about the real “target system”, then the problem of external validity is central.

Assume that you have examined how the work performance of Swedish workers A is affected by B (“treatment”). How can we extrapolate/generalize to new samples outside the original population (e.g. to the UK)? How do we know that any replication attempt “succeeds”? How do we know when these replicated experimental results can be said to justify inferences made in samples from the original population? If, for example, P(A|B) is the conditional density function for the original sample, and we are interested in doing a extrapolative prediction of E [P(A|B)], how can we know that the new sample’s density function is identical with the original? Unless we can give some really good argument for this being the case, inferences built on P(A|B) is not really saying anything on that of the target system’s P'(A|B).

As I see it is this heart of the matter. External validity/extrapolation/generalization is founded on the assumption that we can make inferences based on P(A|B) that is exportable to other populations for which P'(A|B) applies. Sure, if one can convincingly show that P and P’ are similar enough, the problems are perhaps not insurmountable. But arbitrarily just introducing functional specification restrictions of the type invariance/stability/homogeneity is, at least for an epistemological realist, far from satisfactory. And often it is — unfortunately — exactly this that we see when we take part of neoclassical economists’ models/experiments.

By this I do not mean to say that empirical methods per se are so problematic that they can never be used. On the contrary, I am basically — though not without reservations — in favour of the increased use of experiments within economics as an alternative to completely barren “bridge-less” axiomatic-deductive theory models. My criticism is more about aspiration levels and what we believe we can achieve with our mediational epistemological tools and methods in social sciences.

Many ‘experimentalists’ claim that it is easy to replicate experiments under different conditions and therefore a fortiori easy to test the robustness of experimental results. But is it really that easy? If in the example given above, we run a test and find that our predictions were not correct – what can we conclude? The B “works” in Sweden but not in the UK? Or that B “works” in a backward agrarian society, but not in a post-modern service society? That B “worked” in the field study conducted in year 2005 but not in year 2014? Population selection is almost never simple. Had the problem of external validity only been about inference from sample to population, this would be no critical problem. But the really interesting inferences are those we try to make from specific labs/experiments to specific real world situations/institutions/structures that we are interested in understanding or (causally) to explain. And then the population problem is more difficult to tackle.

Just as traditional neoclassical modelling, randomized experiments is basically a deductive method. Given the assumptions (such as manipulability, transitivity, separability, additivity, linearity etc) these methods deliver deductive inferences. The problem, of course, is that we will never completely know when the assumptions are right. Real target systems are seldom epistemically isomorphic to our axiomatic-deductive models/systems, and even if they were, we still have to argue for the external validity of the conclusions reached from within these epistemically convenient models/systems. Causal evidence generated by randomization procedures may be valid in “closed” models, but what we usually are interested in, is causal evidence in the real target system we happen to live in.

Ideally controlled experiments (still the benchmark even for natural and quasi experiments) tell us with certainty what causes what effects – but only given the right “closures”. Making appropriate extrapolations from (ideal, accidental, natural or quasi) experiments to different settings, populations or target systems, is not easy. “It works there” is no evidence for “it will work here”. Causes deduced in an experimental setting still have to show that they come with an export-warrant to the target population/system. The causal background assumptions made have to be justified, and without licenses to export, the value of “rigorous” and “precise” methods is despairingly small.

Many advocates of randomization and experiments want to have deductively automated answers to fundamental causal questions. But to apply “thin” methods we have to have “thick” background knowledge of what’s going on in the real world, and not in (ideally controlled) experiments. Conclusions can only be as certain as their premises – and that also goes for methods based on randomized experiments.

The claimed strength of a social experiment,relatively to non-experimental methods, isthat few assumptions are required to establishits internal validity in identifying a project’simpact. The identification is not assumption-free.People are (typically and thankfully)free agents who make purposive choicesabout whether or not they should take up anassigned intervention. As is well understood bythe randomistas, one needs to correct for such selective compliance … Therandomized assignment is assumed to onlyaffect outcomes through treatment status (the“exclusion restriction”).

There is another, more troubling, assumption just under the surface. Inferences are muddied by the presence of some latent factor—unobserved by the evaluator but known to the participant—that influences the individual-specific impact of the program in question … Then the standard instrumental variable method for identifying [the average treatment effect on the treated] is no longer valid, even when the instrumental variable is a randomized assignment … Most social experiments in practice make the implicit and implausible assumption that the program has the same impact for everyone.

While internal validity … is the claimed strength of an experiment, its acknowledged weakness is external validity—the ability to learn from an evaluation about how the specific intervention will work in other settings and at larger scales. The randomistas see themselves as the guys with the lab coats—the scientists—while other types, the “policy analysts,” worry about things like external validity. Yet it is hard to argue that external validity is less important than internal validity when trying to enhance development effectiveness against poverty; nor is external validity any less legitimate as a topic for scientific inquiry.

25 Dec, 2014 at 14:38 | Posted in Varia | Comments Off on Perfect Day (private)

After almost forty years in Lund, yours truly has returned to the town where he was born and bred — Malmö. Taking a stroll in a sunny and snow glittering Pildammsparken today, further convinced me returning was a good decision …

A chicken is an egg’s way of constructing another egg, and empirical research is a scientific theory’s way of uncovering the theory’s flaws.

It’s harder to make such clear statements about statistics or engineering or computer science, as these are essentially tools in the service of science rather than being the object of study themselves.

And there are fields where new paradigms don’t seem so apparent. Consider three areas with which I’m somewhat familiar: political science, psychology, and economics. In political science, I see persistent difficulties in integrating different perspectives coming from the studies of public opinion, institutions, and political maneuvering. It really feels like we’re not seeing the whole elephant at once. And I include my own research as an example of this incomplete perspective. Psychology seems to be undergoing a reforming process, in which various unsuccessful paradigms such as embodied cognition are being rejected, with no clear unification of the cognitive and behavioral approaches. Similarly in economics, although there it seems worse in that various incomplete perspectives are taken by their proponents as being all-encompassing …

How does statistics fit into all this? Statistics can (potentially) do a lot:
– Guidance in data collection and the assessment of measurements. And recall that “data collection” is not just about how to collect a random sample or assign treatments in an experiment; it also includes considerations of what to measure and how to measure it …
– Methods for calibrating variation by comparing to models of randomness. This is where I think that statistical significance and p-values fit in: not as a way to make scientific discoveries (“p less than .05 so we get published in the tabloids!”) but as a measuring stick when interpreting observed comparisons and variation.
– Tools for combining information. That to me is the most general way to think of “inference,” and it encompasses all sorts of things, from classical “iid” models to more complicated approaches …
– Methods for checking fit, for revealing the aspects of data that are not well explained by our models. To me this includes all of exploratory data analysis, which is about learning the unexpected …

Again, statistics is in the service of science, and I see statistics as a way of organizing science rather than as a way of making scientific discovery …

I think most of the real scientific heavy lifting is coming from existing substantive theories; the statistics is more of a way of rearranging the data or … of adjudicating between competing hypotheses or underlying models of reality.

In order to make causal inferences from simple regression, it is now conventional to assume something like the setting in equation (1) … The equation makes very strong invariance assumptions, which cannot be tested from data on X and Y.

(1) Y = a + bx + δ

What happens without invariance? The answer will be obvious. If intervention changes the intercept a, the slope b, or the mean of the error distribution, the impact of the intervention becomes difficult to determine. If the variance of the error term is changed, the usual confidence intervals lose their meaning.

How would any of this be possible? Suppose, for instance, that — unbeknownst to the statistician — X and Y are both the effects of a common cause operating through linear statistical laws like (1). Suppose errors are independent and normal, while Nature randomizes the common cause to have a normal distribution. The scatter diagram will look lovely, a regression line is easily fitted, and the straightforward causal interpretation will be wrong.

Comments Policy

I like comments. Follow netiquette. Comments — especially anonymous ones — with pseudo argumentations, abusive language or irrelevant links will not be posted. And please remember — being a full-time professor leaves only limited time to respond to comments.