Saturday, December 20, 2014

Should theories be testable?

Lots of interesting philosophy-of-science arguments around the web these days.

In physics, there's an editorial in Nature complaining about string theory - and especially the "string theory landscape" - isn't falsifiable. (Personally I think the word "falsifiability" is a little silly, since it's just testability + strongish priors against your own hypothesis.) Brian Greene, a string theorist, has a response in Smithsonian magazine. On Twitter, Sean Carroll complains about the "falsifiability police". Personally, I think Chad Orzel has the best take on the whole thing.

I don't see why we should insist that any theory be testable. After all, most of the things people are doing in math departments aren't testable, and no one complains about those, do they? I don't see why it should matter if people are doing math in a math department, a physics department, or an econ department.

I think testability starts to matter when you start thinking about applying theories to the real world. This is why I get annoyed when people ignore the evidence in business cycle theory, but not when they do it in pure theory.

Suppose you're studying the properties of repeated games. Who cares if those games represent anything that really exists today? They might represent something we might implement with algorithms somewhere in the future. Or even if not, it's fun (i.e. valuable) to just know a bunch of cool stuff about how concepts fit together (i.e. math). The same is true about the kind of abstract "math of value functions" stuff that Miles Kimball taught me in grad school.

But when you start making models that claim to be about some specific real thing (e.g. monetary policy), you're implying that you think those models should be applied. And then, it seems important to me to have some connection to real data, to tell if the theory is a good one to use, or a crappy one to use. That's testability.

Anyway, this sort of seems very college-freshman-dorm-discussion-level when I write it out like this, but I think there are a surprising number of people who don't seem to agree with it...

Elsewhere, Kevin Bryan has a post up about "minimal model explanations" in economics, which basically echoes Friedman's "methodology of positive economics". Brad DeLong links to an Itzhak Gilboa paper about economic models as analogies. Moises Macias Bustos informs me that the Stanford Encyclopedia of Philosophy has updated its entry on "scientific explanation". And Robert Waldmann reminds me of this interesting post, in which he argues that Friedman's ideas and Lucas' ideas about economic methodology are mutually contradictory.

Update: I think it's worth pointing out once again that purely mathematical theories, which don't describe any pre-existing phenomenon (and hence are not "testable"), can be useful.

A good example is the Stable Matching Theory developed by Al Roth and Lloyd Shapley. When this theory was developed, it didn't describe anything that existed in the world. So you couldn't go out and test it. It was obvious that it "worked", in the sense that you could program computers that implemented it. That was trivial. You could know that just from working out the math. So this theory, when it was made, wasn't a "testable" theory like General Relativity. But then, eventually, people came up with a way to use Stable Matching Theory for assigning organ transplants. And it worked really well. So it turned out to be useful.

Now look at a lot of the stuff people are doing in math departments. How much of that stuff will eventually be useful? The answer is "We don't know, and we can't know." In 1896, in a letter discussing the new theory of vectors, Lord Kelvin - one of history's greatest physicists - said "'[V]ector' is a useless survival...and has never been of the slightest use to any creature." To put it mildly, he was extremely wrong.

So abstract, mathematical "theories" that can't be tested like science theories can still be useful. And we can't know which of them will be useful in the future. And it's cheap and harmless to have people sit around and work on those things. And I can't see how it matters whether those people are in math departments, or physics departments, or econ departments, or computer science departments, or statistics departments, or applied math departments, etc.

But as soon as people start saying - or even implying - that their theories describe real phenomena, then the ball game changes.

50 comments:

Theoretical physics is useful. The theoretical models often precede the verification/empirical results. Having these models , even if they are untestable with existing technology, is better than none at all. Typically, a model must exhibit some mathematical characteristics that need not be proven with physical tests such as invariance about various symmetry and transformations, conservation of energy, a path of least action, etc.. Some on the left set the bar of falsifiability for physics too high, yet believe everything Keynes wrote.

I really think this comes down to personality more than anything. I agree that many experienced people don't agree on this, and I also believe that not everyone needs to agree to move forward. It is difficult for a person to throw themselves into a work that demands falsifiability if they really like dreaming up new theories. The same goes for people who like falsifiability, that they would probably feel uncomfortable dreaming up theories that they may never see tested to their desired level.

Just because we can't dream up a way to falsify something doesn't mean that it cannot be done eventually or that it cannot provide interesting and correct predictions. It is better to have a guide than to have nothing, and pure theory can provide that guide when we attempt to make progress.

Not everyone has the luxury of dealing with non-testable theories. Civil engineers that build structures have to use theories that have been tested and proven in the real world. Failure is too real an option and the consequences too great. If you are dealing with non-testable theories you are either playing (and there certainly is serious playing) or are dealing with theories that do not have any role in daily life (at least yet). However either case seems boring and/or light weight.

Noah,I believe that one critical issue in current economic theories is about contradiction of theoretical equilibrium methods with economic conservation laws. This critical issue is related to many of your recent posts and mishandled by most economists from what I saw.

Economic theories are actually temporal logic statements about time-series data. Temporal logic operators such as ALL, EXIST, FUTURE, GOBAL, NEXT, etc. http://en.wikipedia.org/wiki/Temporal_logic, must be used to assert whether or not economic statements are true.

Equilibrium methods implicitly assume the temporal operator is either EXIST or FUTURE in time-series equations. We often use narrative descriptions such as “in the long-run”, “in the short-run”, etc., in equilibrium methods without explicit proof of existing such a specific time period t in temporal logic statements.

We must recognize that equilibrium methods can contradict economic conservation laws, which are based on temporal logic operator “ALL” and accounting identity. For example, one conservation law in macroeconomics is:

Exactly. But is matching theory "not even wrong"? When it was made, there existed no phenomena that matched the theory. But using the theory, algorithms were invented to assign people organ transplants, and those algorithms have proven superior to the old ones...

"When it was made, there existed no phenomena that matched the theory."

You are confusing two meanings of the word "theory". A theory in physics makes a claim about physical reality. A theory in math is a body of knowledge about a class of abstract entities: number theory, game theory. These "theories" are very different beasts from, say, "string theory", and for mathematicians, it's an occasional nicety when something in one of their bodies of knowledge happens to map onto something in the real world. Theories in physics are claims about how the world works.

So a claim about the way the real world works that makes no testable or usable predictions about what happens when you do things in said real world is very much "not even wrong". Mathematicians don't have that problem: they're playing a formal game with its own rules. String theory does have that problem.

I think the Nature editorial is right, and you’re resisting it for the wrong reason.

You’re concerned with defending the scienceyness of the social sciences, and especially economics, which involve positing hypotheses that can’t be definitively verified, because human behavior is so complex and so many people are involved in shaping each others’ behavior.

The Nature editorial has nothing to do with that. The authors are presumably reasonable people who would presumably agree that just because we can’t know and keep complete track of every last little factor that goes into the behavior of billions of people doesn’t mean that the explanations we attempt about human behavior aren’t scientific.

The Nature editorial is about something else entirely – hypotheses that can’t be tested because they don’t relate to the observable universe. If a theory of multiverses is scientific, is a theory that the universe is a bubble on the tongue of a cosmic turtle scientific?

Validity is important if you want to use a theory. Math has ways of checking validity without data, but once you start making assumptions on some basis other than "Good for proving interesting results" you need other ways to check validity.

It seems like untestable economics falls into a trap similar to one that caused problems in old fashioned AI. Theorists call some variable "utility" or "growth" and then think they are saying something related to the way that word is used in the real world. One fix is to make up meaningless symbols and see if the arguments are still interesting. Usually they are not. I think economic theorists (e.g. RBC folks) are still getting away with this.

Spinning theories without severe testing tends to lead to less and less interesting ideas. Coming up with results is too easy and unconstrained. Everything I hear about DSGE indicates that's a problem there.

Math is an exception because mathematicians organize themselves around hard problems with broad implications -- but I don't see that in economic theory. What are the equivalent of Hilbert's Problems in economics -- ones that have clear criteria for solution, and that if solved have major implications for how economic theory is done?

I think the entire premise of this article is based on a butchering of the word “theory”. A theory is by definition testable; in fact, a theory has, by definition, been tested and passed every empirical test that has been thrown at it. A hypothesis may not be testable, and I think that is what you are really getting at. String theory, and scores of “theories” in macroeconomics, are no really theories in the correct sense of the word, and calling them theories is a disservice to public that does not understand the difference between a theory and a hypothesis, which inevitably leads people to say things like “well, evolution is JUST as theory”. This is a misuse of the term, and many people in the hard sciences and the social sciences are guilty of misusing the term in this way.

Of course, one could sit around and hypothesize all day without any intention of ever testing those hypotheses, and in this case it would not matter at all if those hypotheses were testable. However, that person would not be doing science. He might be doing philosophy, but science is ultimately an empirical endeavor. If a hypothesis is not testable then it does not fall within the domain of science. Furthermore, there is a compelling reason to try to test our hypotheses regardless of whether we are trying to engage in science or in some other activity because testability is the only characteristic that allows us to determine with a reasonable level of accuracy whether or not our hypotheses actually correspond to reality.

I think string theorists would get a lot less flack if they (more correctly) labelled their work "models" instead of "theories". Mathematicians also do not prove "theories", but "theorems", which are a completely different thing.

Correct terminology makes all the difference here. An unfalsifiable theory (prediction of observable future behavior) is an oxymoron (we either observe the predicted behavior or not). An unfalsifiable model (cause or explanation for observable future behavior) is par for the course (you can observe behavior, but you cannot observe causation). An unfalsifiable theorem (logical deduction from a set of axioms) is a confusion of terms.

I don't see why we should insist that any theory be testable. After all, most of the things people are doing in math departments aren't testable, and no one complains about those, do they? I don't see why it should matter if people are doing math in a math department, a physics department, or an econ department.

It matters because people doing math aren't necessarily doing physics. It doesn't matter if they're doing math in a physics department, if they're doing math while calling themselves physicists, etc.,

It doesn't matter how good or elegant the math is. If it disagrees with experiment, then it's wrong.

But with string theory, the theories are "not even wrong" - you can't test them. You could test them in principle, but only with equipment that humanity won't be able to construct anytime soon.

I don't see a problem with people in physics departments working on stuff like that. It's incredibly cheap for society, it keeps their skills sharp, and it's cool. It already won one, arguably two Fields medals.

"Anyway, this sort of seems very college-freshman-dorm-discussion-level when I write it out like this, but I think there are a surprising number of people who don't seem to agree with it..."

It is baffling to people outside the economics profession but who come into contact with it how people in come to think that in a policy discipline it is acceptable to have so much theorising that has nothing to do with evidence, especially when so much is available. Thanks for trying to explain why, but we are still baffled.

A large portion of the population has come to believe as Peter Pan: Reality is whatever we really, really, REALLY want it to be! If we believe something hard enough, it is TRUE!

Obama is a socialist, global warming is a hoax, guns don't kill people, the Earth is 6000 years old, ...

People can have any reality they desire... as long as... they do NOT test their beliefs.

Untested beliefs should not be labeled with the same word as tested theories. Let's call them what they are: dogmas. They should carry the stigma of "pure speculation without any relation to reality" until they are TESTED.

I appreciate all the above well-stated defenses of testability. It may or may not be coincidental that one side of the debate writes with so much more clarity than the other.Even a lay reader like myself can tell, usually about 2/3 of the way through a popular science book, when physics is straying into metaphysics. No doubt it is brilliant stuff, if untethered from observation, but it can't but seem like the cause or effect of physics being in a long drought.I'll know it's over, if I live that long, when my physics books for dummies don't have to keep going back to the well that was the first half of the twentieth century.Meanwhile, the fields of biology and medicine, which don't seem similarly riven with doubt, or rife with never-to-be-tested ideas, get more mind-blowing every day.

Niels Bohr famously claimed to have proven the impossibility of observing the magnetic moment of a free electron. Then H. Richard Crane measured it. So although it doesn't look like experiments to test string theory are feasible - it ain't necessarily so.

When you use a non-standard definition for "scientific theory", you end up confused by editorials like the one in Nature. It seems abundantly clear to me that the writers of the editorial are using this standard definition to which you do not subscribe. Under the standard definition, your objections make no sense.

Thanks for some sanity here. There is a lot of reading by the author and his reporting on the reading is good; comprehension of what was read is, well, I better not grade! It might be still not too late for the author to go back to Physics. The issue is very simple: string theory is not a theory in the classical sense (per your link). However, it does extend our thinking to the smallest energy level of well proven physics (Planck energy), and how this energy may manifest as a string that may be closed, open, twisted stretched etc. This formulation using math predicts how the particles and stuff can come about from such construct. Testing physics of small requires bigger and bigger machines and it is likely that such machines may not be feasible for a long time, except that cosmos itself may provide the data...

Someday, Noah will realize that there nothing scientific about economics - it is more like Alice in Wonderland.

Courtesy of wisgeek.com - A hypothesis attempts to answer questions by putting forth a plausible explanation that has yet to be rigorously tested. A theory, on the other hand, has already undergone extensive testing by various scientists and is generally accepted as being an accurate explanation of an observation. This doesn’t mean the theory is correct; only that current testing has not yet been able to disprove it, and the evidence as it is understood, appears to support it.

Term “testable” is defined as either verifiable (making it true) or falsifiable (making it false) in logic and computing. Non-fiction scientific theories require testability. Only fiction scientific theories do not need.

A scientific hypothesis can be either ALL or EXIST logic statement. Examples,Does GOD exist? Is General Relativity theory correct?

Due to our limitations in enumerating all instances in real world, for an ALL statement, it is easier to be falsifiable, but harder to be verifiable.for an EXIST statement, it is easier to be verifiable, but harder to be falsifiable.

However, scientists keep open minds before finding any instance to test a hypothesis if it is false or true. They modify the theories if finding a new instance against it. I think this is the way how sciences can make progress in discovering real world knowledge in a consistent way.

A valid scientific theory (fiction or non-fiction) means two things (one coin of two sides):(a) So far, we have not found any real world instance against it.(b) All instances we find so far are consistent with the theory.

The more instances consistent with a theory, the more valuable the theory is. Unfortunately, many economic theories miss this scientific principle and thus have less value (see my previous comment on economic theories in this topic)

It's all pretty much data dependent now isn't it. How frequently are National data post hoc revised ? As for the fisher equation, this holds for either P (lot's of controversy) and Q. When it comes to V, we have a very volatile variable here as well. I left this so-called equilibrium approach a long time ago, i am on financial markets for 14 years now.

TofGovaerts,In real economy, V = (R - NIC)/Q for all time periods in NIPA. This can be derived from the basic definition of P Q= GDP. In QTM, V = GDP/Ms at unknown "equilibrium" time period.

Thus, we can falsify QTM theory for the time periods which QTM's V value is not equal to (R - NIC)/Q. So far, we have not yet found these two values are equal in any single year from NIPA. Technically speaking, this QTM "equilibrium" time has not came yet. But please don't stay tune!. It may never come in the future.

To be a bit picky, when Gale and Shapley published their paper on the deferred acceptance algorithm, it actually did describe something in the world, namely the method used already at that time to match new graduates of medical schools with residencies. They didn't know that, which is perhaps the important point.

For those who don't know, the boy-propose version of deferred acceptance begins with each boy sending a proposal to his favorite girl. Each girl who receives more than one proposal rejects all but her favorite, each rejected boy sends a proposal to his second favorite, each girl now holding more than one proposal rejects all but the best, the rejects send proposals to their favorites among those who have not rejected them, and so forth. Eventually each boy has been rejected by every girl he prefers to the one holding his proposal (if there is one) and outstanding proposals are accepted. This pairing is stable because whenever a girl prefers some other guy, that guy is paired with someone he prefers to her.

There are postulations that aren't testable because too many moving parts are involved to fashion a controlled experiment. This is the situation for social sciences.

And there are theories that aren't testable because they don't relate to the observable universe. This is the situation for metaphysics.

These are two totally different issues. Trying to combine them just generates confusion. The Nature editorial is written against proponents of certain variants of string theory who have left the realm of scientific physics and ventured into conjectures about metaphysics, and yet still want their conjectures about metaphysics to be considered part of scientific physics. Nobody’s saying their conjectures about metaphysics have no value. They’re just not science.

Re: "most of the things people are doing in math departments aren't testable"

That's not true at all. They are testable for internal consistency and for consistency with the rest of mathematics in whole or in part. Even something as problematic as the axiom of choice gets its day in court as the usefulness, necessity and weaknesses of its various strong and weak forms are tested against what we know, don't know and can't know.

Mathematical concepts are also testable by experiment, but for mathematicians that is by calculation. For example, the planar four color theorem was proven by exhaustively exploring a set of graphs and their properties. The experiment has since been repeated and optimized, just as various physicists and biologists have repeated classic experiments over the years.

No, they aren't testing the Pythagorean theorem by drawing and measuring triangles, but mathematics is built on testability.

P.S. Giancarlo Rota in 'Indiscreet Thoughts' pointed out that one common method of testing a theory is to consider some of its theorems to be axioms and see if one can prove its axioms as theorems. Mathematics is about building logical linkages.

P.P.S. Fermat's last theorem was proven by demonstrating a logical linkage between modular forms and elliptic curves. Mathematicians are always testing things. It is one of the few ways they know to move forward.

You don't define "testable" here. Is the standard for "testable" just "consistent with data" -- in which case there's no claim that "testable" implies the ability to differentiate between the quality of different theories many of which are likely to be consistent with the same data? Or is the standard for "testable" that one can actually test the causality implied by the theory?

First, it is science that insists on experimental test-ability, not math. Math is not science, rather it is a logical relationship exercise that has proven valuable to use in science and everyday life.

So, if science is not testable, then how is its validity checked? That is, what would then differential science from philosophy and religion? It is the quality of experimental test-ability that makes science the unique expositor of physical truth.

The inverse is also true, that if something is not testable by experimental methods then it is not science.

Your playing with words, especially in the ambiguity between theorem and theory. In math departments, you prove theorems. You do not work with theories. A theory is the application of math to reality, and thus must be testable.

It's perfectly fine to do math in physics or econ and prove theorems and logical implications. It's a completely different thing to hypothesize that a beautiful mathematical structure has something to do with the real world.

"most of the things people are doing in math departments aren't testable"wft? You should have taken more math classes in college. Test the pythagorean theorem all you want; we've proven you won't find any counter-examples. The Riemann hypothesis is more interesting: we haven't proven you can't find a counter-example, although a large number of tests have been performed.

Your link to "Stable Matching Theory" links to the "Stable Matching Problem". Stable Matching is proposed as a mathematical problem (with motivation drawn from the real world). It is then worked on as a mathematical problem until theorems can be proved; in this case showing that there is an algorithm for solving the problem. This theorem is then testable. You can create input sets for the algorithm, run the algorithm, and see whether or not the resulting pairing is stable. Just because a proof exists showing that the result will always result in a stable pairing doesn't mean you can run the test. In this particular case, it's reasonable to test the implementation of the algorithm for typos.

If you can take a real world problem, whether markets or organs, and cast it into the form in which the Stable Matching Problem is defined, then you can use Stable Matching Algorithms to come up with stable matches for your real-world problem.

The important thing here is that a theory must make a prediction in order to be useful. If a theory is not falsafiable, then it does not make a prediction. Now, we already have a perfectly good non-falsafiable theory that explains absolutely everything: God.

"A good example is the Stable Matching Theory developed by Al Roth and Lloyd Shapley. When this theory was developed, it didn't describe anything that existed in the world."

It may not have existed on the your home planet, but it more certainly did exist on Earth - ie any matching that is self-enforcing, such as certain types of job search & recruitment. IIRC, Shapely/Roth use the example of college admissions in their paper. Not only did the problem exist, they were formalizing a solution that existed for at least a decade before they did it ie National Resident Matching Program.A "good example" indeed.