...research suggests that Tetris can ease us through periods of anxiety by getting us to a blissfully engrossed mental state that psychologists call "flow."

"The state of flow is one where you're completely absorbed or engaged in some kind of activity," Sweeny explains. "You lose your self-awareness, and time is just flying by."

Here's more on the detail:

Sweeny and her collaborators gathered a group of more than 300 college students and told them their peers would be evaluating how attractive they were. "I know, it's kind of cruel, but we found it's a really effective way to get people stressed out," Sweeny says. While the participants awaited their attractiveness scores, the researchers had them play Tetris.

Some played a painfully slow, easy version of the game — which bored them. Some played an extremely challenging, fast version — which frustrated them. And everyone else played the classic version, which adapts to each player's individual skill level and gets them into that state of flow.[People were randomly assigned to the three groups.]

In the end, everyone experienced a degree of worry. But the third group reported slightly higher levels of positive emotions (on average, about a quarter of a point higher on a five-point scale) and slightly lower levels of negative emotions (half a point lower on a five-point scale).

"It wasn't a huge difference, but we think it's noticeable," Sweeny says. "And over time, it can add up."

Questions:

a) In this study, they decided to manipulate the conceptual variable, "degree of flow." How did they operationalize this variable?

b) What were the dependent variables in this study? (there seem to be two DVs here)

c) What was the independent variable? What were its levels?

d) Does this seem to be an experiment or a correlational study? How do you know?

e) Sketch a graph of the results.

f) The journalist mentions details about the results (e.g., "about a quarter of a point higher on a five-point scale" and "half a point lower on a five-point scale"). Which aspect of statistical validity is being discussed here?

g) What questions would you ask to decide if this study was internally valid? Which of the internal validity threats in Table 11.1 could you rule out? Which could you ask about?

h) What about the external validity of this study? How might you see if this effect might generalize to other flow-related activities (other than Tetris)?

05/10/2018

I'm standing at my desk as I compose this post....could that make my writing go better? Yes, according to an editorial entitled, "Standing up at your desk could make you smarter." The editorial leads with a strong causal claim and then describes three studies, each with a different design. Here's one of the studies:

A study published last week...showed that sedentary behavior is associated with reduced thickness of the medial temporal lobe, which contains the hippocampus, a brain region that is critical to learning and memory.

The researchers asked a group of 35 healthy people, ages 45 to 70, about their activity levels and the average number of hours each day spent sitting and then scanned their brains with M.R.I. They found that the thickness of their medial temporal lobe was inversely correlated with how sedentary they were; the subjects who reported sitting for longer periods had the thinnest medial temporal lobes.

a) What were the two variables in this study? Were they manipulated or measured? Was this a correlational or experimental study?

b) The author writes that the study "showed that sedentary behavior is associated with reduced thickness of the medial temporal lobe." Did he use the correct verb? Why or why not?

Here's a second study described in the editorial:

Intriguingly, you don’t even have to move much to enhance cognition; just standing will do the trick. For example, two groups of subjects were asked to complete a test while either sitting or standing [randomly assigned]. The test — called Stroop — measures selective attention. Participants are presented with conflicting stimuli, like the word “green” printed in blue ink, and asked to name the color. Subjects thinking on their feet beat those who sat by a 32-millisecond margin.

c) What are the two variables in this study? Were they manipulated or measured? Was this a correlational or experimental study?

d) Does this study support the author's claim that "you don't have to move much to enhance cognition; just standing will do the trick"? Why or why not?

e) Bonus: What kind of experiment was being described here? (Posttest only, prettest/posttest, repeated measures, or concurrent measures?) Comment, as well, on the effect size.

It’s also yet another good argument for getting rid of sitting desks in favor of standing desks for most people. For example, one study assigned a group of 34 high school freshmen to a standing desk for 27 weeks. The researchers found significant improvement in executive function and working memory by the end of the study.

f) What are the variables in this study? Were they manipulated or measured?

g) Do you think this study can support a causal claim about standing desks improving executive function and working memory?

The author added the following statement to the third study on high school freshmen:

True, there was no control group of students using a seated desk, but it’s unlikely that this change was a result of brain maturation, given the short study period.

h) What threat to internal validity has the author identified in this statement?

i) What do you think of his evaluation of this threat?

j) Of the three studies presented, which provides the strongest evidence for the claim that "standing up at your desk could make you smarter"? What do you think? On the basis of this evidence, should I keep standing here?

01/20/2018

Do people with wider faces (left) show more antisocial tendencies than those with narrow faces (right)? Photo: istockphoto

According to several previous studies in psychological science, men with wider faces--a greater ratio of width to height (like in the photo on the left, compared to the right)--tend to show antisocial tendencies such as racial bias, exploitation, and even aggression. Researchers attributed this link to exposure to testosterone during development, which, they say, causes both wider facial structure and antisocial behavior.

Kosinski found that previous studies often had methodological shortcomings such as small sample sizes. Half of the previous studies that he identified involved fewer than 25 participants and the average sample size was 40. And seven out of ten of the studies only just crossed the conventional threshold for significance of p=.05.

These factors led Kosinski to conduct a large-scale study of face measurements and behavioral tendencies. His research, published in Psychological Science, finds no relationship between facial width-to-height ratios (fWHR) and behavioral tendencies in a large sample of over 135,000 participants.

Questions

a) Review the material in Chapters 11 and 14, and explain why studies based on small samples can lead to results that are difficult to replicate. (You might also want to review the "kindergarten height" example in this recent blog post).

b) Why is it a problem that, in 7 out of 10 studies, the results "only just crossed the conventional threshold for significance?"

Now read a bit more about the "big data" methods that Kosinski employed in his research:

Kosinski turned to a very large dataset collected via a Facebook app called MyPersonality.org. The app comprised a collection of psychometric tests and surveys that Facebook users could take and then see how they scored — they could also volunteer their scores and Facebook profile data to be used in research projects. Using this bank of over 800,000 users’ surveys and over 2 million profile pictures, Kosinski tested his research question: Do broad faces indicate antisocial tendencies? [...]

After a preliminary experiment with 1,692 users showed that a computer could measure width-to-height ratios with the same accuracy that humans could, Kosinski analyzed 173,241 photos from 137,163 male and female participants (some users had multiple profile pictures and their measurements were averaged before analysis).

The results showed that facial broadness didn’t substantially correlate with any of the 55 personality measures tested....For example, broader-faced people reported themselves to be more prosocial, sympathetic, trusting, and cooperative,” says Kosinski. “Also, broader-faced people reported less interest in drug use, weapons, piercing, and tattoos. Moreover, broader-faced people did not score significantly higher on any of the traits positively related to antisocial and aggressive behavioral tendencies, including the personality facets of excitement-seeking and anger, impulsiveness, and militarism (i.e., interest in paramilitary groups, the armed forces, bodybuilding, martial arts, and survivalism).”

c) According to this description, Kosinski is basically running a series of bivariate correlations. Each one was between a self-reported trait and _________?

d) Pick one of the personality variables tested in the study. Now sketch a scatterplot of the result, labelling your axes carefully.

e) Kosinski's sample included more than a hundred thousand users. Why might this lead to a more stable estimate of the true relationship between facial broadness and personality? (This is the complement to question a), above)

f) Kosinki's study is an example of a "failure to replicate." Review the concepts in Table 14.1 and indicate which elements might apply in this case.

g) What questions might you ask about the construct validity of the personality measures used in Kosinski's study?

Suggested answers

a) and e) Small samples are more likely to be affected by one or two extreme scores, whereas in very large samples, the extreme scores are much more likely to be balanced out by other scores. The gifs in this blog post show the principle dynamically.

b) Some researchers have proposed that when a manuscript reports p-values very close to the conventional cutoff of .05 (p-values of .04 or .03), it's a sign that a researcher might have "p-hacked" the study. P-hacking is when a researcher goes through a series of options when analyzing the data, such as eliminating outliers, adding covariates, or testing multiple dependent measures, stopping analysis only when p just crosses under the .05 threshold. Therefore, when, in a body of literature, most of the p-values are just below .05, we might suspect that the underlying finding is a fluke, not a real result.

c) Facial broadness, as measured by width-to-height ratio.

d) One axis should be labelled "facial broadness" and the other might be labelled "interest in drug use." The cloud of points should be extremely spread out, showing no pattern or discernible slope.

e) see a) answer above.

f) The concepts in Table 14.1 that seem to apply best are the third (the original study's sample was very small) and perhaps the fourth (the original study may have tried multiple statistical analyses). (We cannot be sure without more investigation into the original studies, but these are the two issues raised in the APS summary of Kosinski's work.)

g) Indeed, we don't know much about the personality measures used in the study. The full manuscript might report more about whether data collected with these personality measures shows that they are reliable and valid.

01/10/2018

When the general public critiques research, I often hear them say that the samples are "too small." It's true that sample sizes (N) in psychology research should be large. One of the outcomes of the so-called "replication crisis" is that large samples are more and more important in psychology. But why?

A common misconception--held by both students and the general public--is that large samples are important because they ensure external validity. This misconception is incorrect. External validity (that is, the ability to generalize from a sample to a population of interest) is about how a sample has been recruited, not how many people are in it (see Chapter 7, 14). For example, say you recruited a sample of 1000 fans attending the national championship college game. You'd have a pretty large sample, but you couldn't generalize from that sample to college students in the U.S. (for example). In fact, unless the 1000 fans were selected at random from the 70,000 fans at the game, you couldn't even generalize from this sample to "people attending the national championship football game."

If not external validity, why are large samples important? It's about accuracy of our statistical estimates. When estimating values in the population such as means or differences between means, large samples are less likely to be influenced by chance variability. For example, imagine you're estimating the mean height of kindergarteners in your local school. Now imagine that you select 5 kindergarteners at random, one of whom, by chance, turns out to be extremely tall for her age. That tall kindergartener is going to "pull" the mean estimate upwards when combined with only 4 other kids. But what if you select 25 kindergarteners instead? Now the tall kindergartener is going to be balanced out by 24 other scores, and her height will have less influence on the mean estimate.

Below is a pair of animations that illustrate this principle. They come from the data science blog R Explorations. The animation used the program R to run a simulation study over and over and over. First, they created a very large population of scores whose mean was known to be 10.0 and whose standard deviation was known to be 1.0. Then they asked the computer to draw a random sample of size 10, compute the mean of the 10 scores, and plot them. You can watch the samples appear in real time on the animation below. Here, xbar is the sample's mean and s is the sample's standard deviation. The red line represents the mean for each sample as it is drawn:

Questions

a) First, watch the top animation, where N = 10. What do you notice about the movement of the vertical red line representing the mean in the top animation? What is it doing, and what does that represent?

b) Now watch the bottom animation, where N = 1000. What do you notice about the movement of the vertical red line representing the mean in this second animation? What is it doing, and what does that represent?

c) What do you notice about the s values of the two animations? Which animation has a steadier estimate of s?

d) Answer this one only if you've had a statistics course: Which of the two animations will have a smaller standard error? How is the standard error represented in the two animations?

e) Given the behavior of the two animations, explain why a large sample is important for research.

f) Which validity does sample size best address, if not external validity?

g) Let's tie this concept back to the "replication crisis" (or, as some are now calling it, "credibility revolution"*). When a finding in psychology has not replicated in a direct replication study, one reason might be that the original study used a small sample. Another reason might be that the replication study used a small sample. Why might the sample size of a study be linked to its replicability? Explain in your own words.

05/10/2017

If you have purchased a pair of TOMS shoes, you may remember a tag promising that with each pair purchased, TOMS will donate a pair to a child who needs them. This kind of charity is called "aid-in-kind." Does this kind of aid help to improve people's lives? Or might it have no effect or even a negative effect? This Economist article summarizes some potential downsides to aid-in-kind philanthropy:

Handing out aid in kind gives plenty to worry about. It could suck life from local markets, and foster a culture of aid-dependency. Handing out goods rather than cash runs the risk of spending money on things people neither need nor want.

The Economist article points out that TOMS took the bold step of deciding to test whether their free shoes program was having positive or negative effects on communities. They conducted a randomized experiment with several dependent variables. The journalist describes how the company

...asked a group of academics to investigate and gave them assurances that they could publish whatever they liked. In late 2012 they randomly picked which of 1,578 children across 18 rural communities in El Salvador would receive pairs of TOMS’ black-canvas, rubber-soled shoes. By comparing the places and children who received the shoes with ones that did not, they could work out how much these boots really gave back.

a) What was the independent variable in this study? Was the experiment independent groups or within groups, and how do you know?

b) Now, read the following description and identify three of the dependent variables they measured:

[they] found that TOMS was not wrecking local markets. On average, for every 20 pairs of shoes donated, people bought just one fewer pair locally—a statistically insignificant effect. The second study also found that the children liked the shoes. Some boys complained they were for “pregnant women” and some mothers griped that they didn’t have laces. But more than 90% of the children wore them.

c) Here are some more findings--name these dependent variables, too (there are at least six more mentioned here):

Unfortunately, the academics failed to find much other good news. They found handing out the free shoes had no effect on overall shoelessness, shoe ownership (older shoes were presumably thrown away), general health, foot health or self-esteem. “We thought we might find at least something,” laments Bruce Wydick, one of the academics. “They were a welcome gift to the children…but they were not transformative.”

More worrying, whereas 66% of the children who were not given the shoes agreed that “others should provide for the needs of my family”, among those who were given the shoes the proportion rose to 79%. “It’s easier to stomach aid-dependency when it comes with tangible impacts,” says Mr Wydick.

d) The results for general health, foot health, and self-esteem were null effects, which you can read about in Chapter 11. One probable reason the study found a null effect is that giving out free shoes doesn't actually impact people's health or self-esteem. What are some other possible reasons for this null effect? (Use Table 11.2 to guide your answer).

Coda:

You might be interested to learn how TOMS has reacted to these results. They have used the research to refine their program:

The findings have prompted TOMS to change its strategy. It is adopting approaches more likely to have a big impact, such as matching purchases of sunglasses with free sight-correction. Increasingly it gives shoes as rewards for children who join community-building projects.

11/20/2016

Responding to texts while you are working can cause work quality to suffer. Photo credit: Amble Design/Shutterstock

Do you study and write with your phone buzzing away nearby? Do you work in a location where roommates and friends can stop by to chat? Have you ever wondered what the impact of such distractions might be on the quality of your work? As you prepare for your final papers, projects, and exams, here is an experimental study of the effects of distractions on your work quality.

Lead author Cyrus Foroughi did his own anecdotal study on how many interruptions he experienced on a quiet Monday morning: “In those two hours, I received five text messages, one phone call, about a dozen messages on Gchat and six emails. A fellow graduate student wandered into my room twice to strike up a conversation...."

With this experience in mind, the researchers were interested in how interruptions affect the overall quality of a person’s work.

“To study this topic, we needed a task whereby quality could be defined beyond the number of errors made or time to complete the task,” Foroughi and colleagues explain. “We selected a complex, creative thought task that mirrors a common real-world task, outlining and writing an essay.”

Before reading on, reflect: What do you think the dependent variable will be in their study? What will the independent variable be? Read on to see if you were correct.

[In one of two studies], around 50 college students were asked to write three essays based on standard college essay prompts created by the College Board. Participants were given 12 minutes to plan and outline their essays on paper, and then were given 12 minutes to actually write their essays using a computer and keyboard.

While they working on their essays, the students were interrupted at random intervals with sets of unrelated puzzle tasks, like solving math problems or unscrambling words. Participants were instructed to complete as much of the interruption task as possible during each of the 60-second interruptions before switching back to working on their essays. These interruptions occurred during two of the three essays so that each participant completed an essay under each of the three conditions (i.e., no interruptions, interruptions during the planning phase, and interruptions during the writing phase).

The essays were then assessed by two trained graders based on a 0-6 scale drawn from the College Board Essay Scoring Guide. The researchers also analyzed the total number of words written and participants’ accuracy on the interruption tasks.

Now that you've read the summary, answer these questions:

a) What is the independent variable? How many levels are there and what are they? What was the dependent variable? (Note--there were a few DVs, as is true in most experiments)

b) Was the independent variable manipulated as independent groups or within groups? What keywords helped you figure this out?

In both of the interruption conditions the essays received significantly lower ratings compared with the control condition — on average, interrupted students received scores that were about half a point lower on the rating scale.

d) Sketch a graph of the results they describe above.

Think about what you can do to minimize distractions as you prepare your final work at the end of the semester. Your GPA will thank you!

Suggested answers

a) The independent variable was "type of distraction" with three levels: No distraction, distraction during planning, and distraction during writing. The dependent variables included number of words in the essay, and coded quality of the essay on a 0 to 6 scale.

b) The independent variable was manipulated as within-groups. One cue is that the author states students "were asked to write three essays" (so you know they saw all three levels of the IV). Another cue is this statement: "each participant completed an essay under each of the three conditions."

c) This is a repeated measures design

d) Your graph should have "essay quality" or "essay words total" on the y-axis. On the x-axis, three bars, one for "no distractions" one for "distractions during planning" and one for "distractions during writing."

11/10/2016

Which snack do the big food companies want you to pick? Photo: Shutterstock

Like everyone, teenagers need nutritious food to stay healthy. Most American teens probably consume too much soda, chips, and other junk food. But have you ever tried convincing a teen to eat better? Perhaps you remember back to your high school health classes: You may not have been convinced when the adults in charge tried to persuade you to eat less junk food and more fruits and vegetables.

One new study tried a different approach, as summarized in this journalist's report. Researchers tried to take advantage of teenagers' disregard for authority figures, making healthy eating seem like an act of defiance.

According to a new study from the University of Texas (UT), ... teenagers can be encouraged to eat healthy foods if they see doing so as a way to rebel against authority....the research suggests that informing teenagers about the manipulative tricks used by big companies to sell junk food reduces their taste for unhealthy snacks.

Study co-author David Yeager [said] “If the normal way of seeing healthy eating is that it is lame, then you don’t want to be the kind of person who is a healthy eater. But if we make healthy eating seem like the rebellious thing that you do, you make your own choices, you fight back against injustice, then it could be seen as high status.”

To test their hypothesis, the researchers conducted two studies on teens aged 13 to 15.

The teens were randomly split into two groups, with one being assigned an article to read on the long-term health benefits of eating a balanced diet. The other group read an expose on the manipulative techniques used by the food industry to sell unhealthy products, such as deceptive labelling and advertising that targets children from poor backgrounds.

The next day, in an unrelated class, the teenagers were each given a snack-pack as a reward from their headteacher for hard work. They could choose the contents of the pack themselves, deciding between snacks like Oreos and Doritos, or healthier options such as carrots and trail mix.

...Just 43 percent of the teens who had learned about the sneaky tricks of the food industry chose unhealthy options, compared to 54 percent of those who had received the health-related information the day before. There was a smaller decrease in preference for sugary drinks but overall, the sugar content of the food-industry-woke teens’ snacks was reduced by 9 percent.

b) What are the IVs and DVs in the design? (Hint: There is more than one dependent variable.)Is the independent variable manipulated as independent groups or within groups? What keywords in the description tell you how the independent variable was manipulated?

c) Sketch a small graph of the study's results. Pick one of the dependent variables to graph.

d) What do you think? Can this study support the causal claim that "reading an article about the manipulative techniques of the food industry causes teens to select more healthy snacks"? Apply the three causal criteria of covariance, temporal precedence, and internal validity.

10/10/2016

Playing checkers won't make you smarter in general, but it will make you smarter at checkers!

Photo: Shutterstock / Jaren Jai Wicklun

A variety of research methods concepts are illustrated in this NPR story, Brain Games Fail a Big Scientific Test. Let's go through the journalist's story, connecting the material to course concepts.

The article starts with the conclusion: Brain games do not make you smarter.

That's the conclusion of an exhaustive evaluation of the scientific literature on brain training games and programs. It was published Monday in the journal Psychological Science in the Public Interest."

It would be really nice if you could play some games and have it radically change your cognitive abilities," Simons says. "But the studies don't show that on objectively measured real-world outcomes."

a) When the journalist writes, "exhaustive evaluation of the scientific literature," what kind of journal article is he probably describing--an empirical journal article, or a review journal article (Ch 2)?

Next, here's the backstory of the journal article in question:

In October 2014, more than 70 scientists published an open letter objecting to marketing claims made by brain training companies.

Pretty soon, another group, with more than 100 scientists, published a rebuttal saying brain training has a solid scientific base. ...

In an effort to clarify the issue, Simons and six other scientists reviewed more than 130 studies of brain games and other forms of cognitive training. The evaluation included studies of products from industry giant Lumosity....

After combining the results of all the studies (and, notably, after separating the well-conducted studies from the poorly-conducted ones), the researchers concluded that brain games do not make you smarter. Here's a statement of the results:

There were some good studies, Simons says. And they showed that brain games do help people get better at a specific task. "You can practice, for example, scanning baggage at an airport and looking for a knife," he says. "And you get really, really good at spotting that knife."

But there was less evidence that people got better at related tasks, like spotting other suspicious items, Simons says. And there was no strong evidence that practicing a narrow skill led to overall improvements in memory or thinking.

That's disappointing, Simons says, because "what you want to do is be better able to function at work or at school."

b) This example shows the theory-data cycle in action (Ch. 1). What were the two competing theories this literature review was testing? How do the data support one theory and lead us to reject the other one?

Next, as the researchers collected the studies, they found that some of them were flawed in design:

The scientists found that "many of the studies did not really adhere to what we think of as the best practices," Simons says.Some of the studies included only a few participants. Others lacked adequate control groups or failed to account for the placebo effect, which causes people to improve on a test simply because they are trying harder or are more confident.

c) How might you design a really strong study to test the effects of a brain training game? What controls would be necessary?

d) What is the downside to including only a few participants in a study? (Ch 10)

Suggested Answers

a) This was almost certainly a review journal article.

b) One theory said that brain-training games improve your general intelligence; the other theory said that brain-training games do not improve general intelligence. The review of all the studies combined is consistent with the second theory.

c) Your study designs will vary.

d) The main downside to conducting a study with only a few participants is that small samples can be more likely to produce results that are unusual or extreme, because a few outliers in the data can have an undue influence on the overall pattern. Studies based on larger sample sizes are usually more stable and we can be more confident that they will replicate.

06/10/2015

Do you think this was the kind of kindergarten play the authors had in mind? Photo: gmutlu/ iStockphoto

Kindergarten is in the news. Specifically, teachers, parents, and school administrators are discussing how much time kids should spend in kindergarten just playing. This storytells how some schools are re-introducing water and sand tables into their classrooms. And this storydescribes some research about whether kindergarten is too early to teach formal skills in reading or math.

There are three studies described in this article--all quasi-experimental. Interestingly, there are three different patterns of results, but all are used to support the argument that kindergarteners should play more. Let's present each one in turn.

Here's the first study:

A 2009 study by Sebastian P. Suggate, an education researcher at Alanus University in Germany, looked at about 400,000 15-year-olds in more than 50 countries and found that early school entry provided no advantage.

a) What seem to be the key independent and dependent variables in this study? What makes this one quasi-experimental, and which design would it likely be?

b) Could you sketch a graph of this result?

c) This study shows a null result. In Chapter 11, you learned how to interrogate a null result to decide if it's a problem of a weak manipulation, insensitive measures, and so on. Apply the tools from Table 11.2 to decide if this study really does support a null effect.

Here's the second study:

Advocates say that starting formal education earlier will help close [acheivement] gaps. But these moves, while well intentioned, are misguided. Several countries, including Finland and Estonia, don’t start compulsory education until the age of 7. In the most recent comparison of national educational levels, the Program for International Student Assessment, both countries ranked significantly higher than the United States on math, science and reading.

d) What seem to be the key independent and dependent variables in this study? What makes this one quasi-experimental, and which design would it likely be?

e) Could you sketch a graph of this result?

f) Can this second study support the claim that "starting school entry later causes kids to score higher on math, science, and reading?

And here is the third study:

Other research has found that early didactic instruction might actually worsen academic performance. Rebecca A. Marcon, a psychology professor at the University of North Florida, studied 343 children who had attended a preschool class that was “academically oriented,” or one that encouraged “child initiated” learning. She looked at the students’ performance several years later, in third and fourth grade, and found that by the end of the fourth grade those who had received more didactic instruction earned significantly lower grades than those who had been allowed more opportunities to learn through play.

g) What seem to be the key independent and dependent variables in this study? What makes this one quasi-experimental, and which design would it likely be?

h) Could you sketch a graph of this result?

i) Can this second study support the claim that "receiving didactic instruction causes kids to get lower grades in fourth grade?"

Suggested/selected answers:

a) This sparse description makes it difficult to determine, but the variables seem to be "early school entry or not" (as the IV) and "academic performance" (as the DV). This study is apparently a non-equivalent control groups post-test only design.

b) A graph of this result might look like this (data are fabricated for the purpose of illustration):

c) You might conclude that with 400,000 kids, there's plenty of power in this study to find an effect of early school entry. So apparently, there is no covariance between the variables of "early school entry" and "school success at age 15."

d) The IV could be stated as "country" (levels being Finland, Estonia, and U.S.) or the IV could be stated as "age at compulsory education" (levels being age 7 or age 5). The DV would be math, science, and reading scores on the Program for International Student Assessment (PISA) This study is apparently a non-equivalent control groups post-test only design.

e) Here's one possible graph (Data are fabricated for purpose of illustration):

f) No; although the covariance is in the appropriate direction and although temporal precedence is established (Because the age of starting school comes before grades on the PISA), there is no internal validity. As a non-equivalent control groups post-test only design, this study has selection effects, because as the article points out, Estonia and Finland "are smaller, less unequal and less diverse than the United States."

g) The IV is type of preschool learning, (levels were "didactic" vs. "child initiated"). The DV would be grades in 4th grade. This study is also a non-equivalent control groups post-test only design.

h) A graph of this result might look like this (data are fabricated for the purpose of illustration):

i) Once again, we have covariance and temporal precedence here (kids in child initiated instruction did better, and the type of instruction came before the 4th grade grades). However, the study does not mention that students were randomly assigned to the two types of instruction, so we have no internal validity. There is a likely selection effect.

07/20/2014

July is turning out to be Ethics Month at EverydayResearchMethods blog. Here's a great piece about the damage that was done by a case of data fabrication.

In 1998, a physician named Dr. Andrew Wakefield published a study on 12 sample of patients whose autism symptoms, he claimed, had appeared right after receiving the vaccine for measles mumps, and rubella (MMR). At the time, the study was criticized because it had some methodological problems:

Researchers subsequently sought to rigorously test the study's main, disturbing claim: that vaccines cause autism. And every study that tested this claim with rigorous methods and large samples found absolutely no connection between the two.

Even after the journal Lancet retracted Wakefield's paper, saying that Wakefield had fabricated the results of his original study, and even after all other studies showed a null result, a small percentage parents still refuse to vaccinate and have fears of vaccination. When parents refuse to vaccinate their healthy children, it makes it more likely that diseases will spread in a community, potentially harming people who are not healthy enough to receive vaccinations.

a) One lesson from this story is about how people come to believe what they believe. Chapter 2 explains that some people base their beliefs on intuition, authority, or experience. But good research is the only basis for belief that is truly data-driven, and free of bias.

Reflect here on the impact that both good and bad science seems to have on public opinion. Why do you think Wakefield's original study was so influential to some parents? Why are some parents so readily convinced that vaccines are dangerous? And why do you think the subsequent studies, all of which had null results, seem to have had so little influence on these same parents?

b) This story also illustrates why data fabrication is so harmful. It not only harms the progress of science and harms people's trust in science. In this case, fabricated data also led thousands of parents to put their children, and other people's children, in harm's way by not vaccinating them. What might scientists and physicians do to repair the harm caused by this falsified study?

If you’re a research methods instructor or student and would like us to consider your guest post for everydayresearchmethods.com, please contact Dr. Morling. If, as an instructor, you write your own critical thinking questions to accompany the entry, we will credit you as a guest blogger.