Kremer urged Lipeyah to test his program using what’s called a randomized controlled trial: he would monitor and collect data for fourteen local schools, implementing the program in seven of them, while letting the other seven go about business as usual. By collecting data from all fourteen schools to see which fared better, he could find out if his program actually worked.
In hindsight, Kremer’s idea seems obvious. Randomized controlled trials are the gold-standard method of testing ideas in other sciences, and for decades pharmaceutical companies have used them to test new drugs. In fact, because it’s so important not to sell people ineffective or harmful drugs, it’s illegal to market a drug that hasn’t gone through extensive randomized controlled trials. But before Kremer suggested it, the idea had never been applied to the development world.

…

Robustness of evidence? Very robust. There have been multiple randomized controlled trials and two meta-analyses supporting the efficacy of bed nets.
Implementation? Extremely good. AMF has been extremely transparent and open in communication.
Room for more funding? Very large. AMF could productively use $20 million in 2015.
Living Goods
What do they do? Run a network of community health promoters in Uganda who go door-to-door selling affordable health products such as treatments for malaria, diarrhea, and pneumonia; soap; menstrual pads; contraception; solar lanterns; and high-efficiency cookstoves, and providing health-care advice.
Estimated cost-effectiveness? Very cost-effective. According to the estimates from the randomized controlled trial they’re running on their project, $3,000 spent on their program would save a life and provide a number of other benefits; GiveWell estimates their cost per life saved at $11,000.

…

Knowing that, it’s even better if the charity has done its own independently audited or peer-reviewed randomized controlled evaluations of its programs.
Robustness of evidence is very important for the simple reason that many programs don’t work, and it’s hard to distinguish the programs that don’t work from the programs that do. If we’d assessed Scared Straight by looking just at before-and-after delinquency rates for individuals who went through the program, we would have concluded it was a great program. Only after looking at randomized controlled trials could we tell that correlation did not indicate causation in this case and that Scared Straight programs were actually doing more harm than good.
One of the most damning examples of low-quality evidence concerns microcredit (that is, lending small amounts of money to the very poor, a form of microfinance most famously associated with Muhammad Yunus and the Grameen Bank). Intuitively, microcredit seems like it would be very cost-effective, and there were many anecdotes of people who’d received microloans and used them to start businesses that, in turn, helped them escape poverty.

In 1990, Wyeth-Ayerst had requested that the FDA approve Premarin for the prevention of heart disease in postmenopausal women, notwithstanding the lack of evidence from randomized controlled trials documenting such a benefit. Cynthia Pearson, of the nonprofit, independent National Women’s Health Network, pointed out in an FDA hearing that the evidence supporting this claim was weak: “You couldn’t approve a drug for healthy men without a randomized clinical trial. Even aspirin [to prevent heart disease] had to have a randomized controlled trial with healthy men.”
Ms. Pearson’s argument—that the standard for the gander ought to apply to the goose—prevailed. The FDA ruled that a randomized controlled trial was necessary to justify the claim that HRT decreased a woman’s risk of heart disease. Wyeth-Ayerst agreed to perform the requisite study, confident that the results would come out in their favor.

…

Perhaps the strongest evidence supporting routine HRT was presented in a 1997 article published in NEJM showing that “mortality among women who use postmenopausal hormones is lower than among nonusers,” again overriding continuing concerns about the link to breast cancer.
HOW DID SO MANY PEOPLE GET IT SO WRONG?
It helps to take a step back and look at the methods used in medical research. The two most common types of medical studies are randomized controlled trials (RCTs) and observational studies. A simple example demonstrates how these types of studies differ and illustrates the inherent strengths and weaknesses of each. Imagine that researchers want to study the impact that running a 10-kilometer road race has on women’s health over a one-year period.
The simplest way to do this would be to set up an observational study. Researchers would wait at the finish line of a local 10K race and ask women if they would be willing to participate in the study.

…

But perhaps when the researchers designed the questionnaire, they weren’t smart enough to include a question that identified this belief, which could be the real reason why the runners were healthier one year after the race. Without being aware of this difference between the groups, the researchers might incorrectly attribute the runners’ better health to their having participated in the race.
The other way to do this study is a randomized controlled trial, the gold standard of medical research. This study design provides a much more precise way to identify the factors that contribute to a particular outcome. Continuing with the example of the 10K race, researchers would find 200 women who agreed to participate in a study about the health effects of running such a race. The women would then be randomly assigned to the treatment group (to run in the race) or the control group (not to run in the race).

Moreover, much of what economists do is to collect and analyze data about how markets work, work that is largely done with great care and statistical expertise, and importantly, most of this research does not depend on the assumption that people optimize. Two research tools that have emerged over the past twenty-five years have greatly expanded economists’ repertoire for learning about the world. The first is the use of randomized control trial experiments, long used in other scientific fields such as medicine. The typical study investigates what happens when some people receive some “treatment” of interest. The second approach is to use either naturally occurring experiments (such as when some people are enrolled in a program and others are not) or clever econometrics techniques that manage to detect the impact of treatments even though no one deliberately designed the situation for that purpose.

…

One way to unfreeze people is to remove barriers that are preventing them from changing, however subtle those barriers might be.
2. We can’t do evidence-based policy without evidence. Although much of the publicity about the BIT has rightly stressed its use of behavioral insights to design changes in how government operates, an equally important innovation was the insistence that all interventions be tested using, wherever possible, the gold-standard methodology of randomized control trials (RCTs)—the method often used in medical research. In an RCT, people are assigned at random to receive different treatments (such as the wording of the letters in the tax study), including a control group that receives no treatment (in this case, the original wording). Although this approach is ideal, it is not always feasible.¶ Sometimes researchers have to make compromises in order to be able to run any sort of trial.

…

I have two reasons. First, I have never come across a better example of the Lewin principle of removing barriers. In this case, the removal is quite literal. Whether or not this specific implementation will ever be adopted on a large scale, remembering this example may provide someone with an inspiration for a powerful nudge in another situation.
Second, the example illustrates potential pitfalls of randomized controlled trials in field settings. Such experiments are expensive, and lots of stuff can go wrong. When a lab experiment gets fouled up, which happens all too often in labs run by Humans, a relatively small amount of money paid to subjects has been lost, but the experimenter can usually try again. Furthermore, smart experimenters run a cheap pilot first to detect any bugs in the setup. All of this is hard in large-scale field experiments, and to make matters worse, it is often not possible for the experimenters to be present, on site, at every step along the way.

But first we will ask: How is this possible? How can something be a failure when the statistics seem to show that it is a success? How can it be failing when virtually every expert is lining up to endorse it? To answer that question we will examine one of the most important scientific innovations of the last two hundred years, and one that takes us to the heart of the closed-loop phenomenon—and how to overcome it.
The randomized control trial.
II
Closed loops are often perpetuated by people covering up mistakes. They are also kept in place when people spin their mistakes, rather than confronting them head on. But there is a third way that closed loops are sustained over time: through skewed interpretation.
That was the problem that bedeviled bloodletting, practiced by medieval doctors. The doctors had what seemed like clear feedback on what worked and what didn’t.

You have to look at the non-digital context.
Similarly, to better understand our technology fixation, it’s important to recognize its larger social and historical context. As I began to doubt the hype around packaged interventions, I wanted to see if I could bypass their problems. Maybe there were other approaches to social change. So I engaged with three ideas that have growing support – randomized controlled trials, social enterprises, and happiness as a goal. These are largely unrelated efforts, but they all have great merit and are well-regarded within their specializations. Promisingly, each had a potential claim to exorcising the curse of packaged interventions.
The Randomista Revolution
In July 2011 I visited a school in Kotra, a little village in southern Rajasthan. The small hut had white plaster walls and a thatched roof.

…

What made this project unique was that world-renowned researchers had used a rigorous methodology to establish something that seemed to contradict the Law of Amplification. The research team was led by Esther Duflo, a brilliant MIT economist who counts among her honors a MacArthur “genius grant” as well as the John Bates Clark Medal, a good predictor of future Nobel laureates. As a pioneering member of the Abdul Lateef Jameel Poverty Action Lab (JPAL), Duflo has been a tireless advocate for the use of randomized controlled trials (RCTs) to verify the value of antipoverty programs. This is the methodology used in clinical medicine, whereby a control group establishes a baseline against which the effectiveness of a treatment can be compared. In applying the rigor of hard science to social questions, Duflo and her colleagues are revolutionaries. Rivals and supporters have nicknamed them “randomistas.”
In a paper describing the effort, Duflo and her colleagues reported dramatic results.

In doing so, we were following a long tradition of development economists who have emphasized the importance of collecting the right data to be able to say anything useful about the world. However, we had two advantages over the previous generations: First, there are now high-quality data from a number of poor countries that were not available before. Second, we have a new, powerful tool: randomized control trials (RCTs), which give researchers, working with a local partner, a chance to implement large-scale experiments designed to test their theories. In an RCT, as in the studies on bed nets, individuals or communities are randomly assigned to different “treatments”—different programs or different versions of the same program. Since the individuals assigned to different treatments are exactly comparable (because they were chosen at random), any difference between them is the effect of the treatment.

…

To get them started, BRAC designed a program in which they would be given an asset (a pair of cows, a few goats, a sewing machine, and so on), a small financial allowance for a few months (to serve as working capital and to ensure they would not be tempted to liquidate the asset), and a lot of hand-holding: regular meetings, literacy classes, encouragement to save a little bit every week. Variants of this program are currently being evaluated in six countries, using randomized control trials (RCTs). We were involved in one of these studies, in partnership with Bandhan, an MFI in West Bengal. We visited households before the program was started and heard, from each of the families that were selected for the program, stories of crisis and desperation: A husband was a drunkard and regularly beat his wife; another died in an accident, leaving a young family behind; a widow was abandoned by her children; and so forth.

…

More than half the schools got nothing at all. Inquiries suggested that a lot of the money most likely ended up in the pockets of district officials.
It is easy to get depressed by such findings (which have been corroborated by similar studies in several other countries). We are often asked why we do what we do: “Why bother?” These are the “small” questions. William Easterly, for one, criticized randomized control trials (RCTs) on his blog in these terms: “RCTs are infeasible for many of the big questions in development, like the economy-wide effects of good institutions or good macroeconomic policies.” Then, he concluded that “embracing RCTs has led development researchers to lower their ambitions.”3
This statement was a good reflection of an institutionalist view that has strong currency in development economics today.

Westman was the first physician researcher to take Atkins up on that offer to go through all those medical files. He visited Atkins’s office in New York City in the late 1990s and was impressed by his success in helping patients to lose weight and improve health. But he decided that the files weren’t good enough. “I need science,” he told Atkins. Westman knew that the only way to make sense of various anecdotal accounts was to do randomized controlled trials, the gold standard of medical evidence. So he, along with a few colleagues around the country, started conducting those trials.
This new group of researchers entering the field were young and relatively ignorant about the professional sandpit into which they’d be sinking. Gary Foster, for instance, a professor of psychology at Temple University who took part in a landmark trial comparing different diets in 2003, says he had no idea that including the Atkins regime in his study would be so contentious.

…

., “Low-Fat Dietary Pattern and Risk of Cardiovascular Disease: The Women’s Health Initiative Randomized Controlled Dietary Modification Trial,” Journal of the American Medical Association 295, no. 6 (2006): 655–666; Ross L. Prentice et al., “Low-Fat Dietary Pattern and Risk of Invasive Breast Cancer: The Women’s Health Initiative Randomized Controlled Dietary Modification Trial,” Journal of the American Medical Association 295, no. 6 (2006): 629–642; Ross L. Prentice et al., “Low-Fat Dietary Pattern and Cancer Incidence in the Women’s Health Initiative Dietary Modification Randomized Controlled Trial,” Journal of the National Cancer Institute 99, no. 20 (2007): 1534–1543.
In writing this book: The author has no conflicts of interest; she has never received any financial or in-kind support, either directly or indirectly, from any party with an interest related to any of the topics covered in this book.
1. The Fat Paradox: Good Health on a High-Fat Diet
Observers estimated that: Vihjalmur Stefansson, The Fat of the Land, enlg. ed. of Not by Bread Alone (1946, repr., New York: Macmillan, 1956), 31; calculated by the author from Hugh M.

This is in contrast to some American diets that may include as much as 60 grams of soy protein a day, from various processed forms of soy, soy supplements, soy milk, and so on.
So far, the proven benefit that soy proponents can offer is that substituting soy protein for animal protein can slightly reduce cholesterol levels. But can soy foods help hot flashes? The evidence is mixed. In fact, in eight different randomized controlled trials of soy foods, only one of the studies found a significant reduction in the frequency of hot flashes, but several showed a slight reduction in frequency. Generally, there’s little published evidence to support the idea that increasing soy isoflavone intake from food or supplements substantially improves hot flashes. At the same time, we know that Asian women, who traditionally have a higher amount of soy in the diet, have much lower rates of hot flashes than American women.

Reiner, Are You Willing to Be Nudged into Making the Right Decision, Slate (Aug. 13, 2013) (emphasis in original), available at http://www.slate.com/blogs/future_tense/2013/08/13/research_shows_when_nudging_works_and_when_it_doesn_t.html.
8. Susan Parker, Esther Duflo Explains Why She Believes Randomized Controlled Trials Are So Vital, The Center for Effective Philanthropy Blog (June 23, 2011), http://www.effectivephilanthropy.org/blog/2011/06/esther-duflo-explains-why-she-believes-randomized-controlled-trials-are -so-vital. Duflo develops these ideas in detail in her 2012 Tanner Lectures. See Esther Duflo, Abdul Latif Jameel Professor of Poverty Alleviation & Dev. Econ., Mass. Inst. of Tech., Tanner Lectures on Human Values and the Design of the Fight Against Poverty (May 2, 2012), http://economics.mit.edu/files/7904.

…

It makes far more sense to say that people display bounded rationality than to accuse them of “irrationality,” and for many purposes, bounded rationality is just fine, producing outcomes that are equal to or perhaps even better than what would emerge from efforts to optimize by assessing all costs and benefits.
With respect to errors, more is being learned every day. Some behavioral findings remain highly preliminary and need further testing. There is much that we do not know. Randomized controlled trials, the gold standard for empirical research, must be used far more to obtain a better understanding of how the relevant findings operate in the world.22 Even at this stage, however, the underlying findings have been widely noticed, and behavioral economics, cognitive and social psychology, and related fields have had a significant effect on policies in several nations, including the United States and the United Kingdom.

…

On the general point, see Andrei Shleifer, Psychologists at the Gate, 50 J. ECON. LITERATURE 1080 (2012).
21. See HEURISTICS: THE FOUNDATIONS OF ADAPTIVE BEHAVIOR (Gerd Gigerenzer et al. eds., 2011).
22. Michael Greenstone, Toward a Culture of Persistent Regulatory Experimentation and Evaluation, in NEW PERSPECTIVES ON REGULATION 111 (David Moss & John Cisternino eds., 2009). For a number of discussions of randomized controlled trials, including nudges, see ABHIJIT V. BANERJEE & ESTHER DUFLO, POOR ECONOMICS: A RADICAL RETHINKING OF THE WAY TO FIGHT GLOBAL POVERTY (2011).
23. See SUNSTEIN, supra note 9.
24. See, e.g., id.; see also Theresa M. Marteau et al., Changing Human Behavior to Prevent Disease: The Importance of Targeting Automatic Processes, 337 SCIENCE 1492 (2012) (exploring role of automatic processing in behavior in the domain of health).
25.

pages: 386words: 114,405

The Death of Cancer: After Fifty Years on the Front Lines of Medicine, a Pioneering Oncologist Reveals Why the War on Cancer Is Winnable--And How We Can Get There
by
Vincent T. Devita, Jr., M. D.,
Elizabeth Devita-Raeburn

Also bad news: Under the Kefauver-Harris Amendment, or “Drug Efficacy Amendment,” of 1962, proof of efficacy was to be determined in “adequate and well controlled trials.” The act mentions only the use of historical controls—that is, data from previous studies. The amendment did not require new randomized controlled trials, as people often think. The requirement for these new trials—often an unnecessary impediment in early drug trials—was added by the FDA in its interpretation of the regulations: another FDA grab.
Today we seem to be mindlessly wedded to the use of randomized controlled trials. They have their place. But randomized clinical trials can be unethical. Doctors sometimes have strong beliefs about the effectiveness of treatments being compared in a randomized trial—often with good reason. And if they truly believe that the treatments are effective—while a placebo given to some patients is not—then it is their duty as physicians to tell patients so.

…

Experienced investigators, following their instincts, had skipped trying the drug alone, because they knew it worked best as part of a cocktail of drugs, just as we had found in treating many other kinds of cancers. Fearing the FDA would approve this practice, he testified before the ODAC to protest his agency’s own positions. (He told me in a recent phone call that had he testified as an FDA employee, he would have had to support its position.)
Young wanted data on cisplatin tested alone in testicular cancer and in a randomized controlled trial against other treatments, as required in the Code of Federal Regulations, his bible. Our data in childhood leukemia and Hodgkin’s disease had already shown the need for drugs to be used in combination to cure cancers. What Young wanted to do—treating patients with a single agent, cisplatin—would have meant jeopardizing the lives of patients. (When cisplatin was ultimately approved, it proved part of the curative combination chemotherapy treatment for Lance Armstrong, who had very advanced metastatic testicular cancer in his lungs and even in his brain.)

…

It had been noted, anecdotally, that some patients who present with kidney cancer that has already spread to their lungs go into a remission when you remove the primary tumor—the involved kidney.
This was once considered a rash thing to do. Why subject patients to expensive major surgery when they already had widespread cancer? There was no proof it worked, just the observation of a few overzealous (or very astute) doctors. In 2001, we did get some evidence in the form of two randomized controlled trials.10 Half of patients who had a new diagnosis of metastatic kidney cancer were treated with interferon, a mediocre treatment for kidney cancer, while the other half were treated with removal of the diseased kidney plus interferon. In both studies, the survival of patients who had a kidney removed was significantly longer.11
What was the primary tumor doing to influence the growth of metastases?

But I was impressed by the fact that the article was in The Lancet, one of the world’s three most influential medical journals, along with The New England Journal of Medicine and The Journal of the American Medical Association. The other articles I had found were all in smaller, specialized journals. Moreover, the topiramate study was larger and longer than the baclofen studies in the other articles, and it was a randomized controlled trial, the gold standard of modern medicine. Last but not least, it was brand-new.
“This must be the cutting edge,” I thought. It seemed like my best hope yet for achieving complete abstinence from alcohol. I went to the medical library at the Pompidou Centre to read the entire article and make a photocopy.
Over a ten-day period, I tapered my baclofen dose down to zero. I used my doctor’s medical card to purchase topiramate, and then I followed the Lancet article’s protocol, taking topiramate for a total of twelve weeks and escalating the dose from 25 to 300 milligrams a day.

The Last Best Cure: My Quest to Awaken the Healing Parts of My Brain and Get Back My Body, My Joy, a Nd My Life
by
Donna Jackson Nakazawa

Eating less saturated fat was one of the multiple interventions tested. When the disappointing results were published in 1982, The Wall Street Journal headline said it all: “Heart Attacks, a Test Collapses.”
*In September 2009, the World Health Organization’s Food and Agricultural Organization published a reassessment of the data on dietary fat and heart disease. “The available evidence from [observational studies] and randomized controlled trials,” the report stated, “is unsatisfactory and unreliable to make judgment about and substantiate the effects of dietary fat on risk of CHD [coronary heart disease].”
*This was the trial of calorie-restricted diets carried out by researchers from Harvard and the Pennington Biomedical Research Center by Frank Sacks and his colleagues that I discussed in chapter 2. An editorial that accompanied the article in the NEJM explained the concept of HDL as a “biomarker for dietary carbohydrate” this way: “When fat is replaced isocalorically by carbohydrate, high-density lipoprotein (HDL) cholesterol decreases in a predictable fashion

The arguments on sick populations and preventive public health are compelling, but they come with four critically important caveats.
First, Rose’s logic does not differentiate between hypotheses. It would invariably be invoked to explain why studies failed to confirm Keys’s fat hypothesis, and would be considered extraneous when similar studies failed to generate evidence supporting competing hypotheses. It is precisely to avoid such subjective biases that randomized controlled trials are necessary to determine which hypotheses are most likely true.
Second, as Rose observed, all public-health interventions come with potential risks, as well as benefits—unintended or unimagined side effects. Small or negligible risks to an individual will also add up and can lead to unacceptable harm to the population at large. As a result, the only acceptable measures of prevention are those that remove what Rose called “unnatural factors” and restore “‘biological normality’—that is…the conditions to which presumably we are genetically adapted.”

…

As a result, a joint 1997 report of the World Cancer Research Fund and the American Institute for Cancer Research, entitled Food, Nutrition and the Prevention of Cancer, said this:
The degree to which starch is refined in diets, particularly when the intake of starch is high, may itself be an important factor in cancer risk, as may the volume of refined starches and sugars in diets. Epidemiological studies have not, however, generally distinguished between degrees of refining or processing of starches, and there are, as yet, no reliable epidemiological data specifically on the effects of refining on cancer risk.
Cleave’s saccharine-disease hypothesis may be intuitively appealing, but it is effectively impossible to test without a randomized controlled trial. If Cleave was right, then epidemiologists comparing populations or individuals with and without chronic disease have to take into account not just sugar consumption but flour, and whether that flour is white or whole-grain, and whether rice is polished or unpolished, white or brown, and even how much beer is consumed compared with, say, red wine or hard liquor. They might have to distinguish between table sugar and the sugar in soft drinks and fruit juices.

…

The pattern is precisely what would be expected of a hypothesis that simply isn’t true: the larger and more rigorous the trials set up to test it, the more consistently negative the evidence. Between 1994 and 2000, two observational studies—of forty-seven thousand male health professionals and the eighty-nine thousand women of the Nurses Health Study, both run out of the Harvard School of Public Health—and a half-dozen randomized control trials concluded that fiber consumption is unrelated to the risk of colon cancer, as is, apparently, the consumption of fruits and vegetables. The results of the forty-nine-thousand-women Dietary Modification Trial of the Women’s Health Initiative, published in 2006, confirmed that increasing the fiber in the diet (by eating more whole grains, fruits, and vegetables) had no beneficial effect on colon cancer, nor did it prevent heart disease or breast cancer or induce weight loss.

These arguments have led to a movement toward more careful evaluation, often with an emphasis on randomized controlled trials as the best way of finding out whether a given project worked and, beyond that, of finding out “what works” in general. (In randomized controlled trials, some “units”—people or schools or villages—get treated, and some—the controls—do not, with units assigned to one of the two groups at random.) According to this view, aid has been much less effective than it would have been had past projects been seriously evaluated. If the World Bank had subjected all of its projects to rigorous evaluation, the argument goes, we would by now know what works and what does not work, and global poverty would have vanished long ago. Those who favor randomized controlled trials—the randomistas—tend to be very skeptical of typical self-evaluations by NGOs, and they have worked with cooperative NGOs to help strengthen their evaluation procedures.

…

One key innovation in managing cardiovascular disease was the discovery that diuretics—cheap pills, sometimes called “water pills” because they increase the frequency of urination—are effective antihypertensives, meaning that they reduce high blood pressure, one of the major risk factors for heart disease. According to the Mayo Clinic, “Diuretics … help rid your body of salt (sodium) and water. They work by making your kidneys put more sodium into your urine. The sodium, in turn, takes water with it from your blood. That decreases the amount of fluid flowing through your blood vessels, which reduces pressure on the walls of your arteries.”7 An important randomized controlled trial from the U.S. Veterans Administration was published in 1970,8 and thereafter practice changed quickly in the United States.
One of the characteristics of the U.S. health-care system is that innovations tend to be introduced very quickly—not only the good ones like antihypertensives, but also many that are of dubious value. Britain, with its cash-constrained and centrally run National Health Service, tends to be much slower and more cautious about introducing medical innovations—today it has a National Institute of Clinical Excellence, with the splendid acronym NICE, to test new products and new procedures and make recommendations—so even the cheap and effective diuretics took a while to be adopted.

…

Those who favor randomized controlled trials—the randomistas—tend to be very skeptical of typical self-evaluations by NGOs, and they have worked with cooperative NGOs to help strengthen their evaluation procedures. They have also persuaded the World Bank to use randomized controlled trials in some of its work.
Finding out whether a given project was or was not successful is important in itself but unlikely to reveal anything very useful about what works or does not work in general. Often, the experimental and control groups are very small (experiments can be expensive), which makes the results unreliable. More seriously, there is no reason to suppose that what works in one place will work somewhere else. Even if an aid-financed project is the cause of people doing well—and even if we were to be absolutely sure of that fact—causes usually do not operate alone; they need various other factors that help them to work.

Iain Chalmers was the first to raise TGN1412 and anti-arrhythmics as examples of the harm done when individual early trials are left unpublished. They are the best illustrations of this problem, but you should not imagine that they are unusual: the quantitative data shows that they are just two among many, many similar cases.
11 Antman EM, Lau J, Kupelnick B, Mosteller F, Chalmers TC. A comparison of results of meta-analyses of randomized control trials and recommendations of clinical experts. Treatments for myocardial infarction. JAMA. 1992 Jul 8;268(2):240–8.
12 Here is the classic early paper arguing this point: Chalmers Iain. Underreporting Research Is Scientific Misconduct. JAMA. 1990 Mar 9;263(10):1405–1408.
13 Sterling T. Publication decisions and their possible effects on inferences drawn from tests of significance – or vice versa.

The solution lay in statistics: Randomly assigning people to one group or the other would mean whatever differences there are among them should balance out if enough people participated in the experiment. Then we can confidently conclude that the treatment caused any differences in observed outcomes. It isn’t perfect. There is no perfection in our messy world. But it beats wise men stroking their chins.
This seems stunningly obvious today. Randomized controlled trials are now routine. Yet it was revolutionary because medicine had never before been scientific. True, it had occasionally reaped the fruits of science like the germ theory of disease and the X-ray. And it dressed up as a science. There were educated men with impressive titles who conducted case studies and reported results in Latin-laden lectures at august universities. But it wasn’t scientific.

…

The rate of the development of science is not the rate at which you make observations alone but, much more important, the rate at which you create new things to test.11
It was the absence of doubt—and scientific rigor—that made medicine unscientific and caused it to stagnate for so long.
Putting Medicine to the Test
Unfortunately, this story doesn’t end with physicians suddenly slapping themselves on their collective forehead and putting their beliefs to experimental tests. The idea of randomized controlled trials was painfully slow to catch on and it was only after World War II that the first serious trials were attempted. They delivered excellent results. But still the physicians and scientists who promoted the modernization of medicine routinely found that the medical establishment wasn’t interested, or was even hostile to their efforts. “Too much that was being done in the name of health care lacked scientific validation,” Archie Cochrane complained about medicine in the 1950s and 1960s, and the National Health Service—the British health care system—had “far too little interest in proving and promoting what was effective.”

…

If crime went up, that might show the policy was useless or even harmful, or it might mean crime would have risen even more but for the beneficial effects of the policy. Naturally, politicians would claim otherwise. Those in power would say it worked; their opponents would say it failed. But nobody would really know. The politicians would be blind men arguing over the colors of the rainbow. If the government had subjected its policy “to a randomized controlled trial then we might, by now, have known its true worth and be some way ahead in our thinking,” Cochrane observed. But it hadn’t. It had just assumed that its policy would work as expected. This was the same toxic brew of ignorance and confidence that had kept medicine in the dark ages for millennia.
Cochrane’s frustration is palpable in his autobiography. Why couldn’t people see that intuition alone was no basis for firm conclusions?

Over two million K–12 students take at least one class online.
At these online schools, the degree of contact with flesh-and-blood teachers varies. Instructors might answer questions by email, phone, or videoconference, supplemented by periodic meetings, class trips, and “live,” in-the-classroom exams. It’s often for less than half the price of a traditional K–12 schooling experience.
The world still awaits systematic, rigorous (randomized control trial) studies of all of these methods of learning, and it is too early to say what is working and what is not. Nonetheless, we do know two things for sure. First, very often the online methods are much cheaper and also more flexible than the previous alternatives. Second, some learners—quite possibly a minority—love the online methods. We can thus expect that online education in its various manifestations will likely represent a fair-sized chunk of the future of the sector.

…

We still haven’t dispensed with models, because there are a few models we believe in pretty strongly, such as that when price goes up, people usually buy less of that good or service, all other things being held equal. But those are old theories and the real action and value-add comes from the data and its handling, including data from field experiments, laboratory experiments, and from randomized control trials. The underlying models just aren’t getting that much better, and when the underlying models are more complicated, they very often are not more persuasive to the typical research economist.
I would sum up the blend as follows: (a) much better data, (b) higher standards for empirical tests, and (c) lots of growth in complex theory but not matched by a corresponding growth in impact. Mathematical economics, computational economics, complexity economics, and game theory continue to grow, as we would expect of a diverse and specialized discipline, but they are if anything losing relative ground in terms of influence.

…

That included data about income, new jobs or businesses, failure to repay loans, and many other features of their daily economic lives. The basic question was a pretty simple one: whether the group with access to the microcredit did better. It turned out they were more likely to have started their own businesses and thus a classic paper was born. Most people see this as the most important study of microcredit, in addition to another large-scale randomized control trial from Dean Karlan at Yale University. It’s a long way from grabbing a publicly available database from a government agency, without much worrying about the quality or meaning of the numbers, and running some regressions. Setting up the entire field experiment is also a uniquely human contribution and it does not approximate any task that is replicable with smart machines.
Outside of economics, a computer program will look at a lot of numbers, search for patterns in a more complex way than current empirical researchers can do, and report back the results.

Job-training programs, for example, often focus on inputs such as the number of participants and outputs such as the number of graduates from a scheme, rather than on the numbers who secure employment. That is like measuring the number of widgets a factory produces in an hour, but not how many of them are sold.
The result is that money—a lot of money—is being pumped into programs that are not actually delivering decent results. Between 1990 and 2010, ten federal government social programs in the United States were evaluated using randomized control trials (RCTs), a method of randomly assigning people between one group whose members are receiving certain services and another group whose members are not. Measuring the difference in outcomes between the two tells you how useful a specific program is. Nine of the ten federal programs were found to have weak or no positive effects: they were a waste of taxpayers’ money.4
Some initiatives actually end up doing harm.

How often did they run? One executive told us, with obvious pride, that the company had bought newspaper inserts every single Sunday for the past twenty years in 250 markets across the United States.
So how could they tell whether these ads were effective? They couldn’t. With no variation whatsoever, it was impossible to know.
What if, we said, the company ran an experiment to find out? In science, the randomized control trial has been the gold standard of learning for hundreds of years—but why should scientists have all the fun? We described an experiment the company might run. They could select 40 major markets across the country and randomly divide them into two groups. In the first group, the company would keep buying newspaper ads every Sunday. In the second group, they’d go totally dark—not a single ad.

Comparisons made at 1–2-year and 3–4-year
follow-ups with 12 control subjects who did not have food
allergy and 5 subjects with food allergy who were noncompliant with their diet showed that the 17 food-allergic
subjects with appropriate dietary restriction demonstrated
highly significant improvement in their AD compared with
the control groups. The amount of time for resolution of
their food hypersensitivity was also reduced.
Lever et al. [42] performed a randomized, controlled trial
of an egg exclusion diet in 55 children who presented to
a dermatology clinic with AD and possible egg sensitivity
identified by radioallergosorbent (RAST) testing before randomization. True egg sensitivity was confirmed by DBPCFC
after the trial. The 55 children were randomized either to a
4-week regimen in which mothers received general advice on
the care of AD and additional specific advice from a dietician about an egg elimination diet (diet group), or to a control
group in which only general advice was provided.

A World Bank study of evaluation in 2000 began with the confession, “Despite the billions of dollars spent on development assistance each year, there is still very little known about the actual impact of projects on the poor.22
After years of pressure, the IMF created an Independent Evaluation Office in 2001. The World Bank in 2004 laudably created a Development Impact Evaluation Task Force. The task force will use the randomized controlled trial methodology discussed in chapter 2 to assess the impact of selected interventions on the intended beneficiaries. The task force has started two dozen new evaluations in five areas (conditional cash transfers in low-income countries; school-based management; contract teachers; use of information as an accountability tool for schools; and slum upgrading programs). It remains to be seen if the evaluation results change the incentives to do effective programs in the operational side of the World Bank.

…

This includes the ten-step M&E program (step 3: “NAC [National AIDS Councils] and stakeholders engage in an intensive participatory process to build ownership and buy-in, particularly for the overall M&E system and programme monitoring”). There is also the list of thirty-four indicators (none of which involves monitoring “core transmitters”), the nineteen-point terms of reference for the M&E consultant to the NAC, and the “summary terms of reference for specialized programme activity monitoring entity.” The accepted scientific standard for any program evaluation, the randomized controlled trial, did not make it into the manual.
The Kitty Genovese Effect
Winston Moseley killed Kitty Genovese, a twenty-eight-year-old bar manager, in Queens, New York, in 1964. Her murder is the first news story I remember from my childhood. As Moseley first stabbed Kitty, neighbors heard her screams but didn’t call the police. Moseley drove away and then came back and stabbed her some more, till she died.

…

Then hold the aid agencies accountable for their results by having truly independent evaluation of their efforts.
Perhaps the aid agencies should each set aside a portion of their budgets (such as the part now wasted on self-evaluation) to contribute to an international independent evaluation group made up of staff trained in the scientific method from the rich and poor countries, who will evaluate random samples of each aid agency’s efforts. Evaluation will involve randomized controlled trials where feasible, less pure statistical analysis if not, and will at least be truly independent, even when randomized trials and statistical analysis are not feasible. Experiment with different methods of simply asking the poor if they are better off. Mobilize the altruistic people in rich countries to put heat on the agencies to make their money actually reach the poor, and to get angry when the aid does not reach the poor.

Instead a small but vocal group of doctors and patients refused to accept these results, refused even to accept the designation of Post–Lyme Disease syndrome. They clung, instead, to “chronic Lyme disease” and insisted that these symptoms did reflect an ongoing infection that warranted continuing treatment with antibiotics. They countered the randomized controlled trials with research of their own, which often showed improvement in patients given antibiotics. But none of these studies compared the antibiotics against a placebo. The randomized controlled trials showed that while patients getting antibiotics did improve, so did those getting the saltwater placebo. Studies done without the placebo had no way of telling whether the antibiotics were really effective or if the improvement was due to something in the normal ebbs and flows of any human condition.

…

That first look through the skin, into the inner structures of the living body, laid the groundwork for the computerized axial tomography (CT) scan in the 1970s and magnetic resonance imaging (MRI) in the 1990s. Blood tests have exploded in number and accuracy, providing doctors with tools to help make a definitive diagnosis in an entire alphabet of diseases from anemias to zoonoses.
Better diagnosis led to better therapies. For centuries, physicians had little more than compassion with which to help patients through their illnesses. The development of the randomized controlled trial and other statistical tools made it possible to distinguish between therapies that worked and those that had little to offer beyond the body’s own recuperative powers. Medicine entered the twenty-first century stocked with a pharmacopeia of potent and effective tools to treat a broad range of diseases.
Much of the research of the past few decades has examined which therapies to use and how to use them.

I could take up the next tens of pages of this book outlining similar studies that have further confirmed what I’ve long thought to be true: vitamins don’t live up to the hype. But for fear of inundating you with too many academically minded summaries, let me briefly mention just a couple more that are more recent than those already described:
• In 2010, the Agency for Healthcare Research and Quality published a review of sixty-three randomized, controlled trials (again, the gold-standard research method) on multivitamins, finding that multivitamins did nothing to prevent cancer or heart disease in most populations. The only exception occurred in developing countries where nutritional deficiencies are widespread.
• In 2009, scientists at the Fred Hutchinson Cancer Research Center in Seattle, Washington, published a paper after following 160,000 postmenopausal women for about ten years.

…

Cancerous cells could already have started to propagate, at which point the inherent DNA repair system is no longer effective. It cannot fix the cancer.
The clear association between inflammation and cancer is real and has many examples. One of the most exciting recent studies was published in the June 22, 2010, issue of the Journal of the American College of Cardiology. The analysis of two dozen randomized, controlled trials that were studying therapies for cholesterol found that each 10 mg/dl higher increment of HDL cholesterol (the good cholesterol) was associated with a relative 36 percent lower risk of cancer. The relationship persisted even after adjusting for LDL cholesterol (the bad cholesterol), age, body mass index (BMI), diabetes, sex, and smoking status. The researchers were quick to note that these association studies cannot prove cause and effect, although it’s been suggested that HDL may have anti-inflammatory and antioxidant properties that could potentially fight cancer.

In development economics, new evidence has led to policy innovations in health, education, and finance that have the potential to improve the lives of hundreds of millions of people.
Another way we can observe the transformation of the discipline is by looking at the new areas of research that have flourished in recent decades. Three of these are particularly noteworthy: behavioral economics, randomized controlled trials (RCTs), and institutions. What’s striking is that all these areas have been greatly influenced, and in fact stimulated, by fields from outside economics—psychology, medicine, and history, respectively. Their growth disproves the claim that economics is insular and ignores the contributions of other cognate disciplines.
In some ways, the rise of behavioral economics marks the greatest departure for standard economics because it undercuts the benchmark, almost canonical assumption of economic models: that individuals are rational.

Proving Effectiveness
Long before Holden Karnofsky and Elie Hassenfeld wondered which organizations would make the best use of their donations, Esther Duflo and Abhijit Banerjee at the Massachusetts Institute of Technology founded the Jameel Poverty Action Lab on the premise that we can and should use scientific methods to find out which aid projects work. As the gold standard of scientific rigor they take the random controlled trial used for testing the efficacy of new drugs. In such a trial, half the patients are randomly assigned to receive the new drug, while the other half get a placebo. Randomization ensures that the two groups are not different in any way that could affect the course of their illness or the impact of the drug. We have just seen an example of these methods—the study of the effect of loans given by the South African microfinance organization—which was carried out by associates of the Poverty Action Lab.

…

Thanks to controlled trials, we know that providing drugs to kill parasitical worms in Kenyan children improves learning, that education in condom use reduces the likelihood of people getting AIDS, and that offering mothers in India a cheap bag of lentils means that more of them will bring in their children for immunization.12
So why don’t we test all poverty programs this way? One reason is the cost of administering the trials. Oxfam America found that a random controlled trial of one of its microcredit programs in West Africa would cost almost as much as the project itself. The money would have come out of the budget for the project, with the result that microcredit could be extended to only half as many villages as would otherwise be possible. Oxfam did not go ahead with the randomized trial. This is an understandable decision, but it would probably pay, over the long term, for organizations to set aside some money specifically for proper studies of the effectiveness of their programs.

A randomized study might make particular conclusions about the effectiveness of a medicine—but in truth it has only judged that effectiveness in the subset of people who were randomized. The power of the experiment is critically dependent on its strong limits—and this is the very thing that makes it limited. The experiment may be perfect, but whether it is generalizable is a question.
The reverential status of randomized, controlled trials in medicine is its own source of bias. The BCG vaccine against tuberculosis was shown to have a potent protective effect in a randomized trial, but the effectiveness of the vaccine seems to decrease almost linearly as we move in latitude from the North to the South—where, incidentally, TB is the most prevalent (we still don’t understand the basis for this effect, although genetic variation is the most obvious culprit).

This is the first written record of a comparative experiment in which a hypothesis is tested and a control group is used. A few centuries later, these events would be immortalized in the biggest bestseller ever: the Bible (see Daniel 1:1–16). But it would still be several hundred years before this kind of comparative research came to be considered the scientific gold standard. These days, we would call this a randomized controlled trial, or RCT. If you were a medical researcher, you would proceed as follows: Using a lottery system, you divide people with the same health problem into two groups. One gets the medicine you want to test and the other gets a placebo.7
In the case of bloodletting, the first comparative experiment was published in 1836 by the French doctor Pierre Louis, who had treated some pneumonia sufferers by immediately relieving them of a few pints of blood and others by holding off on the leeches for a few days.

…

This is nothing less than a whole new approach to economics. The randomistas don’t think in terms of models. They don’t believe humans are rational actors. Instead, they assume we are quixotic creatures, sometimes foolish and sometimes astute, and by turns afraid, altruistic, and self-centered. And this approach appears to yield considerably better results.
So why did it take so long to figure this out?
Well, several reasons. Doing randomized controlled trials in poverty-stricken countries is difficult, time-consuming, and expensive. Often, local organizations are less than eager to cooperate, not least because they’re worried the findings will prove them ineffective. Take the case of microcredit. Development aid trends come and go, from “good governance” to “education” to the ill-fated “microcredit” at the start of this century. Microcredit’s reckoning came in the form of our old friend Esther Duflo, who set up a fatal RCT in Hyderabad, India, and demonstrated that, all the heartwarming anecdotes notwithstanding, there is no hard evidence that microcredit is effective at combating poverty and illness.13 Handing out cash works way better.

Kate Hammer, “Winning Back Dropouts with a Simple Call,” Globe and Mail, May 31, 2012.
19. UNICEF, “Basic Education and Gender Equality: The Big Picture,” February 6, 2014, http://​www.​unicef.​org/​education/​index_​bigpicture.​html.
20. Dana Burde and Leigh Linden, “The Effect of Village-Based Schools: Evidence from a Randomized Controlled Trial in Afghanistan,” NBER Working Paper 18039 (Cambridge, MA: National Bureau of Economic Research, 2012); Dana Burde and Leigh Linden, “Bringing Education to Afghan Girls: A Randomized Controlled Trial of Village-Based Schools,” Applied Economics 5, no. 3 (2013).
21. Though I review the evidence in more detail in Chapters 6 and 7, a comparison of effects for two meta-analyses—the first of interactive book reading to stimulate literacy, the second on literacy and academic outcomes from nine one-to-one laptop programs—showed effect sizes (d) ranging from 0.36 to 0.72 in the interaction study and 0.17 to 0.28 in the laptop study.

They
showed that households with signs of self-discipline problems were more likely
than others to borrow through microfinance institutions featuring enforced,
regular weekly payments. Though taking the loans was costlier than saving, it
provided the households with an effective way to accumulate.
11. For an excellent presentation of these issues, we refer readers to Mullainathan 2005.
12. See Ashraf, Karlan, and Yin 2006. They evaluated the impact of this “commitment” saving product using a randomized controlled trial, where 1,800 customers of a bank were randomized to either receive an offer to open the new type
of account or not. (Everyone already had access to a standard account.) Among
those offered the new type of accounts, 28 percent opened one. After 12 months,
average savings balances increased by 80 percent in the group offered the new
type of account compared to the control group. This translates as a 300 percent
increase for the impacts among those who actually opened the accounts—a large
and meaningful increase in savings.
13.

…

Interview by Stuart Rutherford with Shafiqual Haque Choudhury, ASA president, November 2007.
259
NOTES TO CHAPTER SEVEN
Chapter Seven
1. See Duflo, Kremer, and Robinson 2006.
2. See World Bank 2008, chap. 1.
3. Foreign investment in microfinance, for example, more than tripled between 2004 and 2006, to $4 billion. See Reille and Forester 2008.
4. For a review of early experiences with branchless banking, see Ivatury and
Mas 2008.
5. New field research adapts methods from medical research, particularly
the use of randomized controlled trials, to test the value and logic of financial
innovations. Recently, the Financial Access Initiative, a consortium of researchers at New York University, Yale, Harvard, and Innovations for Poverty Action,
has been formed to extend field trials in Latin America, Africa, and Asia.
Working with microfinance providers, researchers are investigating, for example, how sensitive borrowers are to changes in interest rates, the value of structured savings devices, and the impact of business training alongside credit.

In 2004, Leon Hempel and Eric Töpfer, writing from the Center for Technology and Society in Berlin, analyzed studies of closed-circuit television (CCTV) use in Europe and found that many of the studies lacked control groups to compare crime trends in the areas where cameras were installed to crime trends in the wider areas without cameras, and lacked analysis of the displacement of crime from the target areas to other areas.
The few studies that have used control groups show little support for the theory that cameras can prevent crime. Another Urban Institute study from 2011 analyzing the impact of surveillance cameras on crime in parking lots—and using a randomized controlled trial method—showed that the cameras made no real difference. The study compared a year’s worth of car-related crime in twenty-five parking lots near Metro stations in Washington, D.C., that had installed motion-activated cameras with identical crimes in twenty-five similar “control” parking lots with no cameras installed. Although these were digital still cameras, researchers posted signs that gave the impression of constant camera surveillance of the parking lot.

…

In 2004, Leon Hempel and Eric Töpfer: Leon Hempel and Eric Töpfer, “On the Threshold to Urban Panopticon?: Analysing the Employment of CCTV in European Cities and Assessing Its Social and Political Implications” (Working Paper No. 1: Inception Report, Urban Eye, January 2002), 23, http://www.urbaneye.net/results/ue_wp1.pdf.
Another Urban Institute study from 2011: Nancy G. La Vigne and Samantha S. Lowry, “Evaluation of Camera Use to Prevent Crime in Commuter Parking Facilities: A Randomized Controlled Trial,” Urban Institute, September 2011, http://www.urban.org/publications/412451.html.
In 2004, the criminologists Brandon Welsh and David Farrington: Brandon C. Welsh and David Farrington, “Surveillance for Crime Prevention in Public Space: Results and Policy Changes,” Criminology and Public Policy 3, no. 3 (July 2004): 497.
The authors of the Urban Institute study: La Vigne and Lowry, “Evaluation of Camera Use.”

But COX-1 protects the lining of the stomach,
and its inhibition causes ulcers.11 Overdosage of NSAIDs is thus a leading cause of death among the elderly.12 Merck had the bright idea (as
did Searle) to create a drug targeted to block COX-2, but not COX-1.13
Merck developed such a drug; named it Vioxx; and got it approved by
the FDA. But that approval carried with it the further stipulation of a
more rigorous randomized controlled trial than had been so far conducted.14 Merck named that study VIGOR (the VIoxx Gastrointestinal
Outcomes Research study). The events surrounding VIGOR will give
us a feeling for why, despite our modern safeguards, we are still vulnerable to phishing by Pharma.
Like publishing houses bringing out a best-selling book, the Pharmaceuticals carefully orchestrate the rollout of a blockbuster drug.

…

For this reason the Pharmaceuticals with a new drug
take special care to midwife such articles. In selecting the authors,
who will receive the data from the experiments, the drug companies are not shooting in the dark. Their many connections (including those from the research support given by the company) clue them
in: both regarding who will be influential and who will be favorable.
The selectees are given easy access to the randomized controlled
trials required by the FDA. They are also typically given “editorial
support”—less graciously known as “ghostwriting”—for the article.15
It is thus no coincidence that a higher fraction of journal articles
sponsored by pharmaceutical companies are favorable to the drugs
reviewed than articles funded by other sources.16 Part of drug marketing is not just about the content of the articles published; it is
also about their number.

It is the triage nurse’s job to match patients and doctors as best as possible. One doc may therefore get all the psychiatric cases on a shift, or all the elderly patients. Because an old person with shortness of breath is much more likely to die than a thirty-year-old with the same condition, we have to be careful not to penalize the doctor who happens to be good with old people.
What you’d really like to do is run a randomized, controlled trial so that when patients arrive they are randomly assigned to a doctor, even if that doctor is overwhelmed with other patients or not well equipped to handle a particular ailment.
But we are dealing with one set of real, live human beings who are trying to keep another set of real, live human beings from dying, so this kind of experiment isn’t going to happen, and for good reason.
Since we can’t do a true randomization, and if simply looking at patient outcomes in the raw data will be misleading, what’s the best way to measure doctor skill?

…

Semmelweis wondered if the women patients admitted to the doctors’ ward were sicker, weaker, or in some other way compromised.
No, that couldn’t be it. Patients were assigned to the wards in alternating twenty-four-hour cycles, depending on the day of the week they arrived. Given the nature of pregnancy, an expectant mother came to the hospital when it was time to have the baby, not on a day that was convenient. This assignment methodology wasn’t quite as rigorous as a randomized, controlled trial, but for Semmelweis’s purpose it did suggest that the divergent death rates weren’t the result of a difference in patient populations.
So perhaps one of the wild guesses listed above was correct: did the very presence of men in such a delicate feminine enterprise somehow kill the mothers?
Semmelweis concluded that this too was improbable. After examining the death rate for newborns in the two wards, he again found that the doctors’ ward was far more lethal than the midwives’: 7.6 percent versus 3.7 percent.

pages: 403words: 111,119

Doughnut Economics: Seven Ways to Think Like a 21st-Century Economist
by
Kate Raworth

The two subsequently joined forces with Frank Moss, director of the MIT Media Lab, and out of that collaboration came thelamfoundation.org, a website that allows patients to report on their health. The data in the reports are aggregated and analyzed to aid researchers in mapping out new research scenarios. This crowdsourcing approach to research differs substantially from traditional randomized controlled trials used in conventional research, which are expensive and time consuming and conceived of and carried out by researchers from the top down, with patients serving as passive subjects. The LAM site, like other research efforts on the health-care Commons, starts with the patients’ collective wisdom, which helps determine the research protocols. Moss explains that “we’re really turning patients into scientists and changing the balance of power between clinicians and scientists and patients.”54
The Association of Cancer Online Resources (ACOR), founded by Gilles Frydman, has taken the idea of patient-driven health care a step further by creating a more comprehensive health Commons where over 600,000 patients and caregivers are actively engaged in 163 public online communities.

…

So professional research has a built-in lethal lag time—a period of delay between the time some people know about an important medical breakthrough and the time everyone knows.60
While double-blind, controlled clinical studies are extremely expensive, patient-initiated observational studies using Big Data and algorithms to discover health patterns and impacts can be undertaken at near zero marginal cost.
Still in its infancy, this open-source approach to research often suffers from a lack of verification that the slower, time-tested professional review process brings to conventional randomized control trials. Advocates are aware of these shortcomings but are confident that patient-directed research can begin to build in the appropriate checks, much like Wikipedia does in the shakeout process of verifying and validating articles on its websites. Today, Wikipedia has 19 million contributors. Thousands of users fact check and refine articles, assuring that the open-source website’s accuracy is competitive with other encyclopedias.

Treating Fibromyalgia
Treatment for ﬁbromyalgia usually involves a combination of medication and
exercise, along with behavior modiﬁcation techniques, like stress reduction,
and other coping strategies.
“Regular analgesics don’t work very well in pain ampliﬁcation syndromes.
Things like acetaminophen and nonsteroidal anti-inﬂammatory drugs have
virtually no effect in pain ampliﬁcation syndromes. Opioids also do not seem
to work well,” comments Dr. Clauw. “The best drugs for these syndromes are
those that act on the central nervous system, like tricyclic antidepressants.
Some randomized controlled trials indicate that medications that act on the
352 The Autoimmune Connection
neurotransmitters serotonin and norepinephrine are among the most effective.” Aerobic exercise acts as a natural painkiller and an antidepressant, he
adds.
Antidepressant medications can help reduce pain signals from nerves and
aid sleep. “While the major treatment of ﬁbromyalgia is really physical exercise, for those patients who have sleep disturbances we often use a low dose
of a tricyclic antidepressant, or the antiseizure medication gabapentin (Neurontin), which helps both chronic pain and sleep problems,” says Dr.

…

A small preliminary study at the
Center for Integrative Medicine at Thomas Jefferson University Hospital in
Philadelphia suggests that symptoms can be reduced by eliminating potential
food allergens, including wheat, dairy, and citrus. “However, given the fact
that ﬁbromyalgia is a neural pain ampliﬁcation syndrome, it’s unlikely that
nutritional factors play a really prominent role,” remarks Dr. Clauw.
Cognitive behavioral therapy (CBT) teaches coping skills and behavioral
changes to help you manage an often-frustrating illness. “Every randomized,
controlled trial of cognitive behavioral therapy in any chronic illness has
Fibromyalgia, Chronic Fatigue, Endometriosis, and Interstitial Cystitis 355
shown it to be effective,” says Dr. Clauw. For women with ﬁbromyalgia, a
pain-based CBT program can be especially effective. In CBT, you’ll learn
relaxation techniques (such as deep breathing and positive visual imagery),
how to reframe negative thoughts and behaviors that intensify pain responses,
how to effectively solve problems, and how to pace activities to accommodate
whatever limitations you may have.

But it wasn’t a human’s lifetime. They had no idea whether rats were good models for humans. Moreover, as other researchers had implied at the same conference, they couldn’t even know if the rats they used were good models for other rats, since some of the observations were what researchers would call “strain specific.” Eating sugar seemed to shorten the lives of some strains of rats but not others.
The kind of randomized controlled trials over the course of ten or twenty years that would truly test the hypothesis that sugar caused heart disease or diabetes, as Yudkin noted, were no different from the kind the NIH was then considering and would soon reject for the dietary-fat/cholesterol hypothesis. Such trials were certainly far beyond the budget of any single researcher or even collaboration of researchers; they required that the National Institutes of Health or the Medical Research Council in the U.K. or some other government agency create a concerted program to test the idea.

…

In 2005, Scottish researchers reported that diabetic patients who took a drug called metformin, which works to reduce insulin resistance and therefore lower circulating levels of insulin, also had a significantly reduced risk of cancer compared with diabetics on other medications. That association has been confirmed multiple times, and has led researchers to test whether metformin acts as an anti-cancer drug, preventing or inhibiting cancer’s recurrence in randomized controlled trials. These observations also served to focus the attention of cancer researchers further on the possibility that insulin and insulin-like growth factor are cancer promoters, and thus that abnormally elevated levels of insulin—caused by insulin resistance, for instance—would increase our cancer risk.
This was another area of research that had emerged in the 1960s, with laboratory work by some of the leading cancer researchers—including Howard Temin, who would later win the Nobel Prize—demonstrating that cancer cells require insulin to propagate; at least they do so outside the human body, growing as cell cultures in the laboratory.

Secondary education has been much less researched but informal evidence
indicates a similar picture.9 The big question that arises is why expansion
in educational inputs has not been translated into better educational outcomes. There has been quite a lot of high-​quality research on this, based
[ 178 ]
Stability and Inclusion
179
on econometric work with nationally representative data sets, and on ‘randomized control trials’. (Again, the rigorous scholarly studies are mostly
about primary education.)10
My reading of what these studies show is that the poor educational outcomes are the product of inappropriate incentives. The faulty incentives
relate partly to prevailing pedagogic practice. Teaching in Indian schools is
curriculum-​driven to an absurd degree; the over-​riding objective of teachers is to ‘finish’ the curriculum of each year even if the majority of students
are falling behind.

…

Scepticism about the cost-​effectiveness of the technology package in
making cash transfers is nevertheless quite natural. There are complex technical and logistical issues to be sorted out. It may also be feared that rent-​
losers from the process would stymie the operation of the package in one
way or another. It is very pertinent, therefore, that the merits of harnessing
the new technologies have been demonstrated by Karthik Muralidharan
and his associates in a randomized control trial that examined the delivery
of NREGS wage payments into bank accounts via the introduction of ‘smart
cards’, in a setting large enough to be policy-​relevant, viz. 158 sub-​districts
(mandals) of Andhra Pradesh with 19 million people.16 In the ‘treatment’
mandals, the new system was introduced two years before it was in the
‘control’ mandals (in which payments continued to be disbursed in the old
way).

Third, new information flows and increasing transparency can help shift citizen behaviour on a large scale, as it becomes the path of least resistance within a new set of business and social norms for a sustainable circular system. Fruitful convergence between the fields of economics and psychology has been producing insights into how we perceive the world, behave and justify our behaviour, while a number of large-scale randomized control trials by governments, corporations and universities have shown that this can work. One example is OPower, which uses peer-comparison to entice people into consuming less electricity, thereby protecting the environment while reducing costs.
Fourth, as the previous section detailed, new business and organizational models promise innovative ways of creating and sharing value, which in turn lead to whole system changes that can actively benefit the natural world as much as our economies and societies.

pages: 198words: 52,089

Dream Hoarders: How the American Upper Middle Class Is Leaving Everyone Else in the Dust, Why That Is a Problem, and What to Do About It
by
Richard V. Reeves

Specifically, they propose that $100 million a year of Title X money be invested through the Office of Population Affairs to state-led campaigns. On fairly conservative assumptions, they predict five dollars of savings from each dollar spent on well-crafted campaigns.6
The second problem is on the supply side, in particular a lack of knowledge or training among health professionals. Indeed, staff training alone seems to have a significant impact on the take-up of LARCs, according to a randomized control trial. The work of organizations like Upstream training providers in states including Ohio, New York, Texas, and Delaware is extremely promising.7 Other steps can be taken to broaden access, including ensuring sufficient supplies in health clinics, simplifying billing procedures, and providing same-day service.
It is worth noting, too, that if all states implemented Medicaid expansion—at a cost to the federal government of around $952 billion over ten years—millions more low-income women would be able to access family planning services more easily.8 It is worth noting that Vice President Mike Pence, as governor of Indiana, was one of ten Republican governors accepting Medicaid expansion under Obamacare.

As recently as ten years ago, development stakeholders had no real way to test whether solutions like free school lunches actually get kids into the classroom—just counting the number of kids before and after doesn’t determine causality. And they definitely didn’t know whether free lunches were better and cheaper than alternative programs, such as conditional cash transfers, deworming medications to reduce illness, or free uniforms.
Kremer saw a solution in the randomized controlled trial (RCT), the research method used by pharmaceutical companies to determine whether a drug is effective or not. Kremer was one of the first social scientists to design an RCT to test a social program, helping start a movement that has since become the gold standard in social research.1 (As it turns out, school-based deworming programs are the cheapest and most effective way to get poor kids to go to school.)

In the world of CAM, evidence matters no more than compassion or belief. Weil spells it all out in Healthy Aging:
To many, faith is simply unfounded belief, belief in the absence of evidence, and that is anathema to the scientific mind. There is a great movement toward “evidence-based medicine” today, an attempt to weed out ideas and practices not supported by the kind of evidence that doctors like best: results of randomized controlled trials. This way of thinking discounts the evidence of experience. I maintain that it is possible to look at the world scientifically and also to be aware of nonmaterial reality, and I consider it important for both doctors and patients to know how to assess spiritual health. (Italics added.)
Evidence of experience? He is referring to personal anecdotes, and allowing anecdotes to compete with, and often supplant, verifiable facts is evidence of its own kind—of the denialism at the core of nearly every alternative approach to medicine.

When asked what she would do to change things, the wife responded, “Raise the minimum wage!”
HUD seeks to alleviate some of the burden of the high housing costs faced by low-income families through maintaining public housing developments and through the housing choice voucher program, colloquially known as Section 8. While these programs are far from perfect, there’s solid evidence from the gold standard of social science research—a randomized control trial—that they reduce housing instability considerably. Access to a Section 8 voucher, in particular, reduces the chances that a family will be homeless—either doubled up or out on the streets. It lessens by half the share of families living in overcrowded units, and it greatly diminishes the average number of moves a family makes over a five-year period.
But while the cost of housing has grown and wages have stagnated, the size of government housing programs has not kept pace, a trend of reduced investment that began in the 1980s during the Reagan administration.

See Craig Bowron, “A Simple Question Leads to Answers in Medical Mystery,” MinnPost, February 28, 2008; www.minnpost.com/politics-policy/2008/02/simple-question-leads-answers-medical-mystery.
5. To read the full details of the study, see Ann D. Bagchi, Stacy Dale, Natalya Verbitsky-Savitz, Sky Andrecheck, Kathleen Zavotsky, and Robert Eisenstein, “Examining Effectiveness of Medical Interpreters in Emergency Departments for Spanish-Speaking Patients with Limited English Proficiency: Results of a Randomized Controlled Trial,” Annals of Emergency Medicine 57, no. 3 (March 2011); the abstract is available at www.annemergmed.com/article/S0196-0644%2810%2900557-3/abstract.
6. To see a graphic that was published in the New York Times and traces the discovery of the disease in Mexico, visit www.nytimes.com/imagepages/2009/05/02/health/0502-health-timeline.ready.html?ref=health.
7. The examples from the Tampa Tribune can be found at www.amtaweb.org/AMTA2006/AMTA_2006-08-06.pdf.
8.

That tendency to ignore the need for statistical verification is only now beginning to change, with the slow and often grudging acceptance that we need more than a plausible anecdote (a single wave) before instituting a new policy for reoffenders, for teaching methods, for health care, or any other state function. Politicians are among the most recalcitrant, sometimes pleading that the genuine pressure of time, expense, and public expectation makes impossible the ideally random-controlled trials that would be able to identify real stripes from fake, sometimes apparently not much caring or understanding, but, one way or another, often resting their policies on little more than luck and a good story, becoming as a result the willing or unwilling suckers of chance.
A politician with a taste for a calculated gamble is a disappointingly welcoming way in for chance to do its dirty work.

Willett and the
paper’s authors both emphasized that nineteen people in a subsample were almost eight pounds below their initial weight thirty months after the start of
the study. But even that favorable ﬁnding is offset by the fact that twenty-six
of those who dropped out of the study and allowed themselves to be weighed
after eighteen months had an average weight gain of nine pounds. Katherine
McManus, Linda Antinoro, and Frank Sacks, “A Randomized Controlled Trial
262 Notes
of a Moderate-Fat, Low-Energy Diet Compared with a Low-Fat, Low-Energy
Diet for Weight Loss in Overweight Adults,” International Journal of Obesity 25
(2001): 1503–11.
50. Richard Klein,“Big Country,” New Republic, September 19, 1994, pp. 28–
33; Sander Gilman, Fat Boys (Lincoln: University of Nebraska Press, 2004); Peter
N. Stearns, Fat History (New York: New York University Press, 2002); Glassner,
Bodies; Eric Oliver, Obesity: The Making of an American Epidemic (New York:
Oxford University Press, 2005).
51.

If the patients are randomly assigned to groups, then it can be assumed that the groups will be broadly similar in terms of any factor, such as age, income, gender or the severity of the illness, which might affect a patient’s outcome. Randomization even allows for unknown factors to be balanced equally across the groups. Fairness through randomization is particularly effective if the initial pool of participants is large. In this case, the number of participants (366 patients) was impressively large. Today medical researchers call this a randomized controlled trial (or RCT) or a randomized clinical trial, and it is considered the gold standard for putting therapies to the test.
Although Hamilton succeeded in conducting the first randomized clinical trial on the effects of bloodletting, he failed to publish his results. In fact, we know of Hamilton’s research only because his documents were rediscovered in 1987 among papers hidden in a trunk lodged with the Royal College of Physicians in Edinburgh.

Afterward the head of the committee pointedly noted that I was in Scios’s speakers’ bureau. Though he didn’t say it explicitly, the clear implication was that my assessment was being influenced by money. In the end, the committee decided to restrict Natrecor use to me (the resident heart failure specialist) and a handful of other cardiologists. By then I’d tapered off the talks anyway, and soon afterward I quit the speakers’ bureau. (A randomized, controlled trial later showed that Natrecor was safe but no more effective than existing, cheaper therapies.)
During the two years I gave these talks, I often thought of what Jacob Hirsch, a cardiologist at NYU, once told me when we were sitting in the echo reading room during my fellowship. He was eating a sandwich that had been brought in by a drug rep. “It’s not the doctors at the academic centers that they should be policing,” he said.

But even he warns that the current “misguided use of statistical knowledge” in medicine “systematically excludes the individualized knowledge and data essential to patient care.”52
Gary Klein, a research psychologist who studies how people make decisions, has deeper worries. By forcing physicians to follow set rules, evidence-based medicine “can impede scientific progress,” he writes. Should hospitals and insurers “mandate EBM, backed up by the threat of lawsuits if adverse outcomes are accompanied by any departure from best practices, physicians will become reluctant to try alternative treatment strategies that have not yet been evaluated using randomized controlled trials. Scientific advancement can become stifled if front-line physicians, who blend medical expertise with respect for research, are prevented from exploration and are discouraged from making discoveries.”53
If we’re not careful, the automation of mental labor, by changing the nature and focus of intellectual endeavor, may end up eroding one of the foundations of culture itself: our desire to understand the world.

The science of deflation
The ability of individuals to ‘strive’ and ‘grow’ came under a somewhat different scientific spotlight between 1957 and 1958, due to accidental and coincidental discoveries made by two psychiatrists, Ronald Kuhn and Nathan Kline, working in the United States and Switzerland respectively. As with so many major scientific breakthroughs, it is impossible to specify who exactly got there first, for the simple reason that neither quite understood where exactly they had got to. The era of psychopharmacology was still very young, with the discovery of the first drug effective against schizophrenia in 1952 and the running of the first successful ‘randomized control trials’ (whereby a drug is tested alongside a placebo, with the recipients not knowing which one they’ve received) on Valium in 1954. These breakthroughs opened up a new neurochemical terrain for psychiatrists to explore.
Unlike the developers of those anti-anxiety and anti-schizophrenia drugs, Kline and Kuhn were not sure precisely what disorder they were seeking to target. Kline began experimenting with a drug called iproniazid, which had first been used against tuberculosis, while Kuhn was trialling imipramine in the hope that it might target psychosis.

As a starting point, see Eduardo Sabaté, ed., Adherence to Long-Term Therapies: Evidence for Action (Geneva: World Health Organization, 2003). This book also contains adherence data for a wide variety of diseases.
more than 28 percent of total yield: December 15, 2009. The benefits of weeding for any one farmer may be hard to generalize from these studies, which rely on model plots or on cross-sectional data. A careful randomized control trial of the benefits to farmers of weeding would be particularly useful in this area. For the current estimates in Africa, see L. P. Gianessi et al., “Solving Africa’s Weed Problem: Increasing Crop Production and Improving the Lives of Women,” Proceedings of “Agriculture: Africa’s ‘engine for growth’—Plant Science and Biotechnology Hold the Key,” Rothamsted Research, Harpenden, UK, October 12–14, 2009 (Association of Applied Biologists, 2009).

Politicians and civil servants too seldom appreciate how tools drawn from both the natural and social sciences can be used to design more effective policies, and even to win votes.
In education and criminal justice, for example, interventions are regularly undertaken without being subjected to proper evaluation. Both fields can be perfectly amenable to one of science’s most potent techniques—the randomized controlled trial—yet these are seldom required before new initiatives are put into place. Pilots are often derisory in nature, failing even to collect useful evidence that could be used to evaluate a policy’s success.
Sheila Bird, of the Medical Research Council, for instance, has criticized the UK’s introduction of a new community sentence called the Drug Treatment and Testing Order, following pilots designed so poorly as to be worthless.

The program uses the same basic principles found in deliberate practice: breaking learning down into a series of well-specified skills, designing exercises to teach each of those skills in the correct order, and using feedback to monitor progress. According to teachers who have used the curriculum, this approach has allowed them to teach the relevant math skills to essentially every student, with no one left behind. Jump was evaluated in a randomized controlled trial in Ontario with twenty-nine teachers and approximately three hundred fifth-grade students, and after five months the students in the Jump classes showed more than twice as much progress as the others in understanding mathematical concepts as measured by standardized tests.
Unfortunately, the results of the trial have not appeared in a peer-reviewed scientific journal, so it is hard to judge them objectively, and we will need to see the results reproduced in other school districts before we can trust them completely, but the results agree with what I have observed generally in a variety of fields, not just singing and math, but writing, drawing, tennis, golf, gardening, and a variety of games, such as Scrabble and crossword solving: People do not stop learning and improving because they have reached some innate limits on their performance; they stop learning and improving because, for whatever reasons, they stopped practicing—or never started.

Most of the data we have on file are observational. Scientists look at a large group of people—some of whom practiced one behavior and others another—and then they study the outcomes, attempting to make the groups equal with respect to other variables, with men and women of roughly the same age in each group, who share similar lifestyles in terms of their diet and exercise habits. These large, randomized controlled trials are the best resource we have to identify behaviors that can alter our risk for disease. The problem is that it is very hard to dictate behavior to a group of people and expect them to be compliant for years and then study an outcome that has a very long lag time, meaning time until the desired effect is seen. Few, if any, scientists want to stake their efforts and career on an experiment that won’t yield a result for a decade or more.

pages: 435words: 95,864

Childhood Disrupted: How Your Biography Becomes Your Biology, and How You Can Heal
by
Donna Jackson Nakazawa

Some of the problems of health and education the poor countries face today are specific to their situation and cannot really be addressed by drawing on the past experience of today’s developed countries (think of the problem of AIDS, for example). Hence new experiments, perhaps in the form of randomized controlled trials, may be justified. See, for example, Abhijit Banerjee and Esther Duflo, Poor Economics (New York: Public Affairs, 2012). As a general rule, however, I think that development economics tends to neglect actual historical experience, which, in the context of this discussion, means that too little attention is paid to the difficulty of developing an effective social state with paltry tax revenues. One important difficulty is obviously the colonial past (and therefore randomized controlled trials may offer a more neutral terrain).
50. See Thomas Piketty and Nancy Qian, “Income Inequality and Progressive Income Taxation in China and India: 1986–2015,” American Economic Journal: Applied Economics 1, no. 2 (April 2009): 53–63.

The baby seems to act like a heart-softening magnet. … ‘Empathy can’t be taught, but it can be caught,’ Gordon often says—and not just by children. ‘Programmatically my biggest surprise was that not only did empathy increase in children, but it increased in their teachers,’ she added. ‘And that, to me, was glorious, because teachers hold such sway over children.’ Scientific studies with randomized control trials have shown extraordinary reductions in ‘proactive aggression’?the deliberate and cold-blooded aggression of bullies who prey on vulnerable kids—as well as ‘relational aggression’?things like gossiping, excluding others, and backstabbing.”
David Bornstein, “Fighting Bullying with Babies,” Opinionator, The New York Times, November 8, 2010. For more information, see www.rootsofempathy.org.
74 Parker Palmer, A Hidden Wholeness (San Francisco: Jossey-Bass, 2009), 58-59.

But their conclusions provide little help for people in today’s developing countries, as they suggest that their fate is tied to decisions and actions taken centuries ago or factors outside their control. They do not help us understand the recent acceleration of development progress or the reasons why so many developing countries began to turn at roughly the same time in the 1990s.
The second field of research has been the opposite: microlevel studies on the effectiveness of specific actions and programs in particular contexts, often evaluated through rigorous randomized controlled trials (RCTs).III These studies focus on questions such as the impact of pricing on the uptake of insecticide-treated malaria bed nets, whether identity cards reduce theft and improve the delivery of subsidized rice to the poor, and the impact of shouting at bus drivers to get them to drive more safely. (It turns out that it helps, a lot.) RCTs have been brought to prominence through the pathbreaking work of Abhijit Banerjee and Esther Duflo at the Massachusetts Institute of Technology (MIT), among others.6 These studies offer insights into the nature of poverty at the individual and family levels, the constraints and incentives people face, and the reasons they make the decisions they do.

Any single-assignment language is easier to reason about, but that doesn’t make the programs easier to write, nor is there persuasive evidence that programs are easier to write. In fact, most comparative questions about languages, coding techniques, development methodologies, and software engineering in general, are appallingly unscientific.
Here’s a quote from R. Bausell’s Snake Oil Science [Oxford University Press]:
Carefully controlled research (such as randomized, controlled trials) involving numerical data has proved more dependable for showing us what works and what does not than has reliance upon expert opinions, experience, hunches, or the teachings of those we revere.
Software is still a craft, rather like furniture making. There are Chippendales, there are craftsmen, and there are lesser practitioners. I’m a little far off your original question here.

In 1994, the
The Galapagos Islands of Finance
•
239
mathematician Don Coppersmith revealed that he had purposefully
built the S-boxes to be resistant to differential cryptanalysis, which
had been anticipated by IBM and the National Security Agency decades
before.26
D. E. Shaw & Co.’s commanding lead did not come cheaply. In a
virtuous cycle, Shaw used his profits to fund further research. Newer
strategies were built on previous findings and funded the next cycle of
innovation, neatly paralleling the modern growth of information
technologies. As Shaw explained, “We were taking profit and paying for
experimentation. We were able to run randomized controlled trials, for
example, in which we could compare two models or parameter values
to see which one performed better in actual trading. Analyzing the results
of live trading taught us things that couldn’t be learned by studying
historical data. We were doing a lot of trading, and the data we accumulated during one round of trading was helping us to increase our returns in the next round.”
“As we continued to discover new anomalies,” said Shaw, “we also
benefited from a sort of a second-order effect: if the profit that could be
gained from a given single effect was exceeded by the transaction cost
that would be incurred to exploit it, it would be a mistake for anybody
to bet on that effect in isolation.