“We know that in humans there’s a strong correlation between cognitive health and social connections, but we don’t know if it’s having a group of friends that’s protecting people or if it’s that people with declining brain health withdraw from their human connections,” [Study researcher] Kirby said.

[The n]ew research ...found that mice housed in groups had better memories and healthier brains than animals that lived in pairs.

a) Before reading on, reflect: Why would a researcher probably need an animal model to test this question experimentally?

Here's some more detail about the experiment:

Some mice lived in pairs, which Kirby refers to as the “old-couple model.” Others were housed for three months with six other roommates, a scenario that allows for “pretty complex interactions.”

The mice were 15 months to 18 months old during the experiment – a time of significant natural memory decline in the rodent lifespan.

In tests of memory, the group-housed mice fared better.

One test challenged the mice to recognize that a toy, such as a plastic car, had moved to a new location. ...“With the pair-housed mice, they had no idea that the object had moved. The group-housed mice were much better at remembering what they’d seen before and went to the toy in a new location, ignoring another toy that had not moved,” Kirby said.

In another common maze-based memory test, mice are placed on a well-lit round table with holes, some of which lead to escape hatches. Their natural tendency is to look for the dark, unexposed and “safe” escape routes.

The “couples” mice didn’t get faster at the test when it was repeated over the course of a day.“But over the course of many days, they developed a serial-searching strategy where they checked every hole as quickly as possible. It’d be like walking as quickly as possible through each row of a parking lot to look for your car rather than trying to remember where your car actually is and walk to that spot,” Kirby said.

The group-housed mice improved with each trial, though. “They seemed to try to memorize where the escape hatches are and walk to them directly, which is the behavior we see in healthy young mice,” Kirby said. “And that tells us that they’re using the hippocampus, an area of the brain that is really important for good memory function.”

b) What was the independent variable in this study? How was it operationalized?

c) What was the dependent variable? What were the two ways it was operationalized?

d) How does this experiment help us decide which comes first--social life or better memory? (note: This is temporal precedence!)

e) Do you think the journalist is justified in generalizing this study's results from mice to older adult humans? Why or why not?

f) Chapter 3 explains how internal validity and external validity are often in a trade-off. Describe how this study with mice illustrates that trade-off.

07/10/2018

Her school achievement later in life can be predicted from her ability to wait for a treat (or by her family's SES). Photo: Manley099/Getty Images

There's a new replication study about the famous "marshmallow study", and it's all over the popular press. You've probably heard of the original research: Kids are asked to sit alone in a room with a single marshmallow (or some other treat they like, such as pretzels). If the child can wait for up to 15 minutes until the experimenter comes back, they receive two marshmallows. But if they eat the first one early, they don't. As part of the original study, kids were tracked over several years. One of the key findings was that the longer children were able to wait at age 4, the better they were doing in school as teenagers. Psychologists have often used this study as an illustration of how self-control is related to important life outcomes.

The press coverage of this year's replication study illustrates at least two things. First, it's a nice example of multiple regression. Second, it's an example of how different media outlets assign catchy--but sometimes erroneous--headlines on the same study.

First, let's talk about the multiple regression piece. Regression analyses often try to understand a core bivariate relationship more fully. In this case, the core relationship they start with is between the two variables, "length of time kids waited at age 4" and "test performance at age 15." Here's how it was described by Payne and Sheeran in the online magazine Behavioral Scientist:

The result? Kids who resisted temptation longer on the marshmallow test had higher achievement later in life. The correlation was in the same direction as in Mischel’s early study. It was statistically significant, like the original study. The correlation was somewhat smaller, and this smaller association is probably the more accurate estimate, because the sample size in the new study was larger than the original. Still, this finding says that observing a child for seven minutes with candy can tell you something remarkable about how well the child is likely to do in high school.

a) Sketch a well-labelled scatterplot of the relationship described above. What direction will the dots slope? Will they be fairly tight to a straight line, or spread out?

b) The writers (Payne and Sheeran) suggest that a larger sample size leads to a more accurate estimate of a correlation. Can you explain why a large sample size might give a more accurate statistical estimate? (Hint: Chapter 8 talks about outliers and sample size--see Figures 8.10 and 8.11.)

Now here's more about the study:

The researchers next added a series of “control variables” using regression analysis. This statistical technique removes whatever factors the control variables and the marshmallow test have in common. These controls included measures of the child’s socioeconomic status, intelligence, personality, and behavior problems. As more and more factors were controlled for, the association between marshmallow waiting and academic achievement as a teenager became nonsignificant.

c) What's proposed above is that social class is a third variable ("C") that might be associated with both waiting time ("A") and school achievement ("B"). Using Figure 8.15. draw this proposal. Think about it, too: Why does it make sense that lower SES might go both with lower waiting time (A)? Why might lower SES go with lower school achievement (B)?

d) Now create a mockup regression table that might fit the pattern of results being described above. Put the DV at the top (what is the DV?), then list the predictor variables underneath, starting with Waiting time at Age 4, and including things like Child's Socioeconomic Status and Intelligence. Which betas should be significant? Which should not?

Basically, here we have a core bivariate relationship (between wait time and later achievement), and then a critic suggests a possible third variable (SES). They used regression to see if the core relationship was still there when the third variable was controlled for. The core relationship went away, suggesting that SES was a third variable that can help explain why kids who wait longer do better in school later on.

Next let's talk about some of the hype around this replication study. The Behavioral Scientist piece (quoted above) is one of the more balanced descriptions. Its headline was, Try to Resist Misinterpreting the Marshmallow Test. It emphasized that the core relationship was replicated. It also explains in some detail why SES is related to self-control, and how the two probably cannot be meaningfully separated--it's a nuanced report. But other press coverage had a doomsday feel:

One person on Twitter even wrote,"The marshmallow/delayed gratification study always felt "wrong" to me - this year it was reported to be hopelessly flawed"

Are these headlines and comments fair? Probably not. As Payne and Sheeran write in Behavioral Scientist,

The problem is that scholars have known for decades that affluence and poverty shape the ability to delay gratification. Writing in 1974, Mischel observed that waiting for the larger reward was not only a trait of the individual but also depended on people’s expectancies and experience. If researchers were unreliable in their promise to return with two marshmallows, anyone would soon learn to seize the moment and eat the treat. He illustrated this with an example of lower-class black residents in Trinidad who fared poorly on the test when it was administered by white people, who had a history of breaking their promises. Following this logic, multiple studies over the years have confirmed that people living in poverty or who experience chaotic futures tend to prefer the sure thing now over waiting for a larger reward that might never come. But if this has been known for years, where is the replication crisis?

QRock 100.7 was one of several news outlets that had fun describing this study for its readers.

Let's find out what kind of study was conducted to test the claim. The Q100.7 journalist wrote:

A new study found loud music makes us more likely to order unhealthy food when we’re dining out. A new study in Sweden found loud music in restaurants makes us more likely to choose unhealthy menu options. And we’re more likely to go with something healthy like a salad when the music ISN’T so loud.

Researchers went to a café and played music at different decibel levels to see how it affected what people ordered. Either 55 decibels, which is like background chatter or the hum from a refrigerator . . . or 70 decibels, which is closer to a vacuum cleaner.

And when they cranked it to up 70, people were 20% more likely to order something unhealthy, like a burger and fries.

They did it over the course of several days and kept getting the same results. So the study seems pretty legit.

a) OK, go: What seems to be the independent variable in this study? What were its levels? How was it operationalized?

b) What seems to be the dependent variable? How was it operationalized? Think specifically about how they might have operationalized the concept "unhealthy."

c) Do you think this study counts as an experiment or a quasi-experiment? Explain your answer.

d) This study can be called a "field study" or perhaps a "field experiment". Why?

e) To what extent can this study support the claim that loud music makes you eat bad food? Apply covariance, temporal precedence, and internal validity to your response.

f) If you were manipulating the loudness of the music for a study like this, how might you do so in order to ensure it was the music, and not other restaurant factors, were responsible for the increase in ordering "unhealthy" food?

g) The Q100.7 journalist argues that the study seems "pretty legit." What do you think the journalist meant by this phrase?

h) The study on food and music volume are summarized in an open-access conference abstract, published here. You might be surprised to read, contrary to the journalist's report, that the field study was conducted on only two days--with one day at 50db and the other at 70 db. How does this change your thoughts about the study?

g) Conference presentations are not quite the same as peer-reviewed journal publications. Take a moment (and use your PSYCinfo skills) to decide if the authors, Biswas, Lund, and Szocs, have published this work yet in a peer reviewed journal. Why might journalists choose to cover a story that has only been presented at a conference instead of peer-reviewed? Is this a good practice in general?

As you can see, these journalists (or their editors) attached extremely strong titles to their science articles! An actual scientist wouldn't describe the results of a study with such strong terms as "prove" or "They work." That's because research in science is a steady accumulation of evidence--each study teaches us a little bit more, but no study can "prove" a theory or a claim.

The "study" mentioned in the three headlines above was actually a meta-analysis of 522 clinical trials (that is, randomized controlled studies) of antidepressants. Here's a summary and interpretation according to the Neuroskeptic blog:

...the authors, Andrea Cipriani et al., conducted a meta-analysis of 522 clinical trials looking at 21 antidepressants in adults. They conclude that “all antidepressants were more effective than placebo”, but the benefits compared to placebo were “mostly modest”. Using the Standardized Mean Difference (SMD) measure of effect size, Cipriani et al. found an effect of 0.30, on a scale where 0.2 is considered ‘small’ and 0.5 ‘medium’.

a) Review: What does a meta-analysis do? Why might we value a meta-analysis over a single study?

b) When the journalist describes the Standardized Mean Difference (SMD), they are referring to a statistic very much like Cohen's d. As you can see, the conventions for SMD are the same as for Cohen's d. Do you agree that the effect size of .31 could be considered "modest" according to these conventions?

c) I wrote above that "no study can 'prove' a theory or a claim." But what about a meta-analysis--do you think meta-analyses are more likely to be able to prove a theory? Are they definitive? (Why or why not?).

The Neuroskeptic criticized the media's coverage of this meta-analysis on a couple of grounds. First, they pointed out how the results of the new study are almost exactly the same as several old studies, suggesting that the new study is not particularly groundbreaking:

The thing is, “effective but only modestly” has been the established view on antidepressants for at least 10 years. Just to mention one prior study, the Turner et al. (2008) meta-analysis found the overall effect size of antidepressants to be a modest SMD=0.31 – almost exactly the same as the new estimate.

Second, the Neuroskeptic cleverly points out that, a few years ago, the media assigned the opposite headline to virtually the same result:

Cipriani et al.’s estimate of the benefit of antidepressants is also very similar to the estimate found in the notoriousKirsch et al. (2008) “antidepressants don’t work” paper! Almost exactly a decade ago, Irving Kirsch et al. found the effect of antidepressants over placebo to be SMD=0.32, a finding which was, inaccurately, greeted by headlines such as “Anti-depressants ‘no better than dummy pills‘”.

d) What is a placebo, and why might it be important to use one in a study of antidepressants?

e) Why do you think the media wrote such different headlines about similar meta-analystic results?

Finally, here are some important additional comments from the Neuroskeptic article:

I’m not criticizing Cipriani et al.’s study, which is a huge achievement. It’s the largest antidepressant meta-analysis to date, including an unparalleled number of difficult-to-find unpublished studies (although both Turner et al. and Kirsch et al. did include some.) It includes a broader range of drugs than previous work, although it’s not quite comprehensive: there are no MAOis, for instance, and in general older drugs are under-represented.

Even so, Cipriani et al. meta-analyzed the evidence on all of the most commonly prescribed drugs, and they were able to produce a comparative ranking of the different medications in terms of effectiveness and side-effects, which is likely to be useful.

f) Explain why Neuroskeptic is praising Cipriani's study on its use of "difficult-to-find unpublished studies." Why is this important in meta-analysis?

01/20/2018

Do people with wider faces (left) show more antisocial tendencies than those with narrow faces (right)? Photo: istockphoto

According to several previous studies in psychological science, men with wider faces--a greater ratio of width to height (like in the photo on the left, compared to the right)--tend to show antisocial tendencies such as racial bias, exploitation, and even aggression. Researchers attributed this link to exposure to testosterone during development, which, they say, causes both wider facial structure and antisocial behavior.

Kosinski found that previous studies often had methodological shortcomings such as small sample sizes. Half of the previous studies that he identified involved fewer than 25 participants and the average sample size was 40. And seven out of ten of the studies only just crossed the conventional threshold for significance of p=.05.

These factors led Kosinski to conduct a large-scale study of face measurements and behavioral tendencies. His research, published in Psychological Science, finds no relationship between facial width-to-height ratios (fWHR) and behavioral tendencies in a large sample of over 135,000 participants.

Questions

a) Review the material in Chapters 11 and 14, and explain why studies based on small samples can lead to results that are difficult to replicate. (You might also want to review the "kindergarten height" example in this recent blog post).

b) Why is it a problem that, in 7 out of 10 studies, the results "only just crossed the conventional threshold for significance?"

Now read a bit more about the "big data" methods that Kosinski employed in his research:

Kosinski turned to a very large dataset collected via a Facebook app called MyPersonality.org. The app comprised a collection of psychometric tests and surveys that Facebook users could take and then see how they scored — they could also volunteer their scores and Facebook profile data to be used in research projects. Using this bank of over 800,000 users’ surveys and over 2 million profile pictures, Kosinski tested his research question: Do broad faces indicate antisocial tendencies? [...]

After a preliminary experiment with 1,692 users showed that a computer could measure width-to-height ratios with the same accuracy that humans could, Kosinski analyzed 173,241 photos from 137,163 male and female participants (some users had multiple profile pictures and their measurements were averaged before analysis).

The results showed that facial broadness didn’t substantially correlate with any of the 55 personality measures tested....For example, broader-faced people reported themselves to be more prosocial, sympathetic, trusting, and cooperative,” says Kosinski. “Also, broader-faced people reported less interest in drug use, weapons, piercing, and tattoos. Moreover, broader-faced people did not score significantly higher on any of the traits positively related to antisocial and aggressive behavioral tendencies, including the personality facets of excitement-seeking and anger, impulsiveness, and militarism (i.e., interest in paramilitary groups, the armed forces, bodybuilding, martial arts, and survivalism).”

c) According to this description, Kosinski is basically running a series of bivariate correlations. Each one was between a self-reported trait and _________?

d) Pick one of the personality variables tested in the study. Now sketch a scatterplot of the result, labelling your axes carefully.

e) Kosinski's sample included more than a hundred thousand users. Why might this lead to a more stable estimate of the true relationship between facial broadness and personality? (This is the complement to question a), above)

f) Kosinki's study is an example of a "failure to replicate." Review the concepts in Table 14.1 and indicate which elements might apply in this case.

g) What questions might you ask about the construct validity of the personality measures used in Kosinski's study?

Suggested answers

a) and e) Small samples are more likely to be affected by one or two extreme scores, whereas in very large samples, the extreme scores are much more likely to be balanced out by other scores. The gifs in this blog post show the principle dynamically.

b) Some researchers have proposed that when a manuscript reports p-values very close to the conventional cutoff of .05 (p-values of .04 or .03), it's a sign that a researcher might have "p-hacked" the study. P-hacking is when a researcher goes through a series of options when analyzing the data, such as eliminating outliers, adding covariates, or testing multiple dependent measures, stopping analysis only when p just crosses under the .05 threshold. Therefore, when, in a body of literature, most of the p-values are just below .05, we might suspect that the underlying finding is a fluke, not a real result.

c) Facial broadness, as measured by width-to-height ratio.

d) One axis should be labelled "facial broadness" and the other might be labelled "interest in drug use." The cloud of points should be extremely spread out, showing no pattern or discernible slope.

e) see a) answer above.

f) The concepts in Table 14.1 that seem to apply best are the third (the original study's sample was very small) and perhaps the fourth (the original study may have tried multiple statistical analyses). (We cannot be sure without more investigation into the original studies, but these are the two issues raised in the APS summary of Kosinski's work.)

g) Indeed, we don't know much about the personality measures used in the study. The full manuscript might report more about whether data collected with these personality measures shows that they are reliable and valid.

01/10/2018

When the general public critiques research, I often hear them say that the samples are "too small." It's true that sample sizes (N) in psychology research should be large. One of the outcomes of the so-called "replication crisis" is that large samples are more and more important in psychology. But why?

A common misconception--held by both students and the general public--is that large samples are important because they ensure external validity. This misconception is incorrect. External validity (that is, the ability to generalize from a sample to a population of interest) is about how a sample has been recruited, not how many people are in it (see Chapter 7, 14). For example, say you recruited a sample of 1000 fans attending the national championship college game. You'd have a pretty large sample, but you couldn't generalize from that sample to college students in the U.S. (for example). In fact, unless the 1000 fans were selected at random from the 70,000 fans at the game, you couldn't even generalize from this sample to "people attending the national championship football game."

If not external validity, why are large samples important? It's about accuracy of our statistical estimates. When estimating values in the population such as means or differences between means, large samples are less likely to be influenced by chance variability. For example, imagine you're estimating the mean height of kindergarteners in your local school. Now imagine that you select 5 kindergarteners at random, one of whom, by chance, turns out to be extremely tall for her age. That tall kindergartener is going to "pull" the mean estimate upwards when combined with only 4 other kids. But what if you select 25 kindergarteners instead? Now the tall kindergartener is going to be balanced out by 24 other scores, and her height will have less influence on the mean estimate.

Below is a pair of animations that illustrate this principle. They come from the data science blog R Explorations. The animation used the program R to run a simulation study over and over and over. First, they created a very large population of scores whose mean was known to be 10.0 and whose standard deviation was known to be 1.0. Then they asked the computer to draw a random sample of size 10, compute the mean of the 10 scores, and plot them. You can watch the samples appear in real time on the animation below. Here, xbar is the sample's mean and s is the sample's standard deviation. The red line represents the mean for each sample as it is drawn:

Questions

a) First, watch the top animation, where N = 10. What do you notice about the movement of the vertical red line representing the mean in the top animation? What is it doing, and what does that represent?

b) Now watch the bottom animation, where N = 1000. What do you notice about the movement of the vertical red line representing the mean in this second animation? What is it doing, and what does that represent?

c) What do you notice about the s values of the two animations? Which animation has a steadier estimate of s?

d) Answer this one only if you've had a statistics course: Which of the two animations will have a smaller standard error? How is the standard error represented in the two animations?

e) Given the behavior of the two animations, explain why a large sample is important for research.

f) Which validity does sample size best address, if not external validity?

g) Let's tie this concept back to the "replication crisis" (or, as some are now calling it, "credibility revolution"*). When a finding in psychology has not replicated in a direct replication study, one reason might be that the original study used a small sample. Another reason might be that the replication study used a small sample. Why might the sample size of a study be linked to its replicability? Explain in your own words.

12/20/2017

Most research methods instructors hope their course will teach students to be better consumers of information. They want to not only help students read empirical journals; they also want to help students become critical thinkers about anything they encounter in the "real world" of the Internet.

Maybe you'd like to start next semester with a few outside readings on spotting fake news. If so, here are some resources you . might use. I got these from Morton Ann Gernsbacher's wonderful online, open-access methods course, which focuses on identifying and critically reading online news. Check it out for other wonderful resources.

10/10/2017

Most people's advice for success in life? Try again after failure. It almost always pays off to try again, work on a new strategy, or think through things differently. But how do people acquire the motivation to keep trying after they've hit a snag? One answer might be through social modeling: watching others around us who have succeeded after retrying.

Science journalist Ed Yong describes a study in which 1-year-old babies played with an adult under two conditions. One of the adult models performed two simple tasks easily, and the other succeeded at the same tasks only after failing multiple times. A team researchers led by MIT graduate student Julia Leonard conducted the study with 103 infants who visited a children's museum:

As the babies watched, Leonard tried to retrieve a toy from a container, and detach some keys from a carabiner, narrating her efforts along the way. In front of some babies, she succeeded at each task immediately, performing each three times in the span of 30 seconds. In front of others, she spent the same period struggling, and only retrieved the toy and keys just before the time ran out.

What happened next?

“Now it’s your turn to play with a toy,” she said to the infants. She then handed them a music box that she had already activated. The box came with a large, conspicuous, and completely useless button. Pressing it did nothing, but it was the act of pressing that mattered. Leonard found that babies who had seen her struggling with her own objects prodded the button more often than those who had seen her succeed effortlessly.

First some questions about the study:

a) Is this study experimental or correlational? (and why?) What are the independent and dependent variables?

b) If you had to guess, would you say this study was between subjects or within subjects?

c) How long do you think it might have taken for the researcher who conducted this study to get over 100 babies to participate?

Now that you've considered question c above, you'll have a greater appreciation for what happened next: The graduate student, Julia Leonard, was asked by her advisor to conduct the whole study all over again! As you read this next quoted passage, look for themes introduced in Chapter 14:

Her results came in just as psychologists were starting to grapple with their reproducibility crisis—a deep concern that many of the results in published papers might be unreliable due to poorly-designed studies and sloppy practices. To weed out such results, many psychologists have said that their field should put more emphasis on replication—repeating studies to check if their findings hold up. Others believe that more experiments should be preregistered—that is, scientists should specify their research plans ahead of time. [...]

So after Leonard had spent a year studying the value of persistence, her advisor Laura Schulz told her to do the experiment again. “It was a very meta moment,” she says. She recruited another 120 infants, and she preregistered her plans. And to her delight, she got exactly the same results.

Review the information in Chapter 14 and consider these questions:

d) Why might it have been important for Leonard to conduct a replication of her original study? Give three reasons why replication is important.

e) Did Leonard conduct a direct replication, a conceptual replication, or a replication-plus-extension study?

f) Would it have been better for the replication to have been conducted by a different scientist? Why or why not?

g) Not all replication studies are preregistered, but this one was. What are two of the main benefits of preregistering a study--either a replication study or an original study?

Devoting more attention to your smartphones than to your children could mean that they'll have improper brain development and emotional disorders later in life.

That sounds serious. Put down this blog right now and pay attention to your kiddos! On the other hand, keep on listening, and you'll hear that the study in question was done on....rats.

a) Before reading the description of the study, what are the conceptual variables (constructs) that the journalist wants you to believe are linked? (Hint: What are the three variables in the red quote above?)

Now, read this excerpt from the Time article and decide how each of those conceptual variables was operationalized in the study:

Dr. Tallie Baram, professor of pediatrics and anatomy-neurobiology at University of California, Irvine, and her colleagues used a rat model to study how good but disrupted attention from mothers can affect their newborns. Baram placed some mothers and their pups in modified cages that did not have sufficient material for nesting or bedding. This was enough to distract the mothers into running around looking for better surroundings and end up giving their babies interrupted and unreliable attention. Baram and her team compared the development of newborns raised in this environment to those raised in the normal cages where mothers had enough material to create a comfortable home.

When the offspring grew older, the researchers tested them on how much sugar solution they ate, and how they played with their peers, two measures of how much pleasure the animals were feeling and a proxy for their emotional development. The rats raised in the modified environments consistently ate less of the sugar solution and spent less of their time playing and chasing their peers than the rats raised in the normal setting.

c) How were "Emotional disorders later in life" operationalized in this study?

d) How was "Improper brain development" operationalized?

e) What do you think? To what extent is it reasonable to generalize from rat models of parenting to human parenting?

f) When the journalist (and, indeed, the scientist) go beyond the rat model and apply these results to human parents, which validity are they working with?

You might have concluded that a study on rats has a way to go before it can be applied to human kids, and you'd have a good point. But before you dismiss the entire study, you should know that there's a great deal of experimental, behavioral evidence on real human children on the topic of responsive parenting. Studies typically find that attentive, responsive parenting when kids are young can lead to improved outcomes, as I have blogged about here. Many of these studies are conducted by my colleague Mary Dozier with her colleagues and students.

Many thanks to Dr. Barbara Sarnecka of University of California-Irvine for bringing this example to my attention!

Selected answers

a) the variables are "devoting attention to smartphones", "brain development" and "emotional disorders later in life"

08/30/2017

My textbook describes several studies from Dr. Brian Wansink's lab. They make excellent teaching examples because students are able to understand the theory and hypotheses almost immediately and therefore focus on the methodological details. For example, in Chapter 10 of the 2nd and 3rd editions, I feature studies in which pasta was served from either large or small serving bowls (van Kleef, Shimizu, & Wansink, 2012). And in the Supplemental Chapters on statistics, I feature a study involving stale and fresh popcorn serving sizes (Wansink & Kim, 2005).

Instructors and students should know that Wansink's lab has come under intense scrutiny over the past year. First, he was attacked for publishing a (seemingly innocent) blog post admitting to questionable research practices (including HARKing--see Chapter 14 of the 3rd edition). Second, some researchers have alleged impossible values in data tables which suggest some sloppy statistical reporting. Third, he has admitted to using the same wording in more than one publication (sometimes called self-plagiarizing) and publishing some of his data in two places. According to reports, Wansink appears open to checking all past work and publishing corrections as needed. This story in The Chronicle summarizes the issues in a fairly balanced report (from March, 2017), and this story explains the results of an inquiry by Wansink's institution, Cornell University (from April, 2017).

In two of my own classes, I conducted demonstration versions of portion size studies, and I have obtained the predicted pattern, with large effect sizes, both times. In my opinion, the portion size effect is real. However, it's definitely worth telling students about the alleged problems with Wansink's work.

So far, the study in Chapter 10 (van Kleef et al., 2012) has not been identified as problematic. However, the popcorn study (Wansink & Kim, 2005) was alleged to have reported impossible values on a key table; the problems were described as "relatively minor" (Source). I changed that table for Figure S1.7 of the 3rd Edition in order to delete the ANOVA values that were found to be problematic. The entire table and discussion will be omitted from the 4th edition. However, despite my changes to the table, I think instructors should use Figure S1.7 only as an example for how to read data tables, and not endorse it as a replicable scientific finding.

Dr. Wansink has agreed to resign from Cornell University after six of his articles were retracted from the journal JAMA and after his university concluded that he had engaged in academic misconduct. Here's a CNN story on the situation.

If you’re a research methods instructor or student and would like us to consider your guest post for everydayresearchmethods.com, please contact Dr. Morling. If, as an instructor, you write your own critical thinking questions to accompany the entry, we will credit you as a guest blogger.