a) The study was an experiment. What was the independent variable (IV)?

b) Was the IV manipulated as between-groups (independent groups) or within-groups? What keywords in the video description helped you answer this question?

c) The dependent variable (DV) was operationalized in two ways. What were they?

d) One of the DVs did not support the hypothesis, but the other DV did. Explain the results that they found. (You can sketch two little bar graphs, too.)

e) What do you think--does "opening a door to release a crying owner" indicate "empathy?" (P.S., That's a construct validity question.)

f) Does the study support the claim that "hearing their owners ask for help while crying causes dogs to help their owners faster"? Apply the three causal criteria to support your answer.

Suggested answers:

a) The IV was whether the owners were saying "help" while pretending to cry, or saying "help" in a neutral tone, while humming Twinkle Twinkle Little Star.

b) This was independent-groups--the reporter used keywords such as "some owners" and "other owners". This was a posttest-only design.

c) One operationalization of the DV was whether the dogs opened the door, or not. The other operationalization of the DV was how long each dog took to open the door.

d) Only the "time taken" DV showed the predicted effect.

e) Answers will vary

f) The results show covariance because dogs whose owners were crying opened the door three times as fast as dogs whose owners were calm. Temporal precedence is ensured by the methodology--by randomly assigning owners to cry (vs. sing), they ensured that this condition came before opening the door. The study would have good internal validity if they randomly assigned owner/dog pairs to the two conditions--this would take care of selection threats such as having more "already helpful" dogs in the crying condition, or having owners who are better at acting in one condition or the other. As far as design confounds, we might ask about whether the owners in the two conditions acted exactly the same in all ways except their assigned conditions.

08/20/2018

What evidence would it take to convince us that it's the schools, not other factors, that are responsible for the outcomes of private school students? Photo: Image Source / Alamy Stock Photo

A large study has compared the outcomes of children who've attended private schools to those who've attended public schools. A journalist summarized the report in the Washington Post. The study provides a nice example of how multivariate regression can be used to test third variable hypotheses.

When we look simply at the educational acheivement of students in private schools vs. public schools, private school students have higher achievement scores. However, all such studies are correlational, because the two variables--Type of School and Level of Achievement--are measured.

Therefore, such studies how covariance, because the results depict a relationship. The study may even show temporal precedence, because attending school presumably precedes the measure of achievement. However, such studies are weak on internal validity. We can think of several alternative explanations for why children in private schools are scoring higher.

One major alternative explanation is socioeconomic status. Children from wealthier families are more likely to afford private schools. And in general, children from wealthier families tend to score higher on achievement tests.

The Washington Post journalist quoted one of study's authors, Robert Pianta, who summed up the study's results this way:

“You only need to control for family income and there’s no advantage,” Pianta said in an interview. “So when you first look, without controlling for anything, the kids who go to private schools are far and away outperforming the public school kids. And as soon as you control for family income and parents’ education level, that difference is eliminated completely.”

Questions

a) Draw little diagrams similar to those in Figure 8.15 (in the 3rd ed.) to depict the arguments being made in this study. What would A be? What about B? In the quote from Pianta, above, what would the C variable(s) be?

b) The researchers used type of school (private vs. public), which is a categorical variable. But in some analyses, the researchers also used "number of years in private school" as an alternative version of this variable. Is "number of years in private school" categorical, ordinal, interval, or ratio data?

c) Sketch a mock-up regression table with the criterion variable at the top and predictors below (Use Table 9.1 as a model). Which variable do you think the researchers selected as the criterion (dependent) variable in their analyses? Which variable(s) would have been the predictors?

d) Now that you know what the results were, think about how the beta associated with "number of years in private school" would change when parental SES is added and removed from the regression analyses.

a) A and B would be Type of School and Level of Acheivement. It doesn't really matter which one is A and which one is called B. C would be Family Income and/or Parental Education.

b) Ratio data (zero is meaningful in this scale because you could attend zero years of private school)

c) The criterion variable would be Achievement, and the predictors would be Number of Years of Private School, Family Income, and Parental Education.

d)When the Number of Years of Private School is on the table (in the analysis) by itself, its beta is likely to be positve and significant (more years of private school goes with higher achievement). When Family Income and Parental Education are added to the table, the beta for Number of Years of Private School should drop to zero. This pattern of results is consistent with the argument that Family Income and Parental Education are the alternative explanation for the original relationship .

07/20/2018

Is social media use responsible for depressed mood? Photo: Ian Allenden/Alamy stock

Do smartphones harm teenagers? If so, how much? In this blog, I've written before about the quasi-experimental and correlational designs used in research on screen time and well-being in teenagers. In that post you can practice identifying the different designs we can use to study this question.

Today's topic is more about the size of the effect in studies that have been published. A recent Wired story tried to put the effect size in perspective.

One side of the argument, as presented by Robbie Gonzalez in Wired, scares us into seeing social media as dangerous.

For example, first

...there were the books. Well-publicized. Scary-sounding. Several, really, but two in particular. The first, Irresistible: The Rise of Addictive Technology and the Business of Keeping Us Hooked, by NYU psychologist Adam Alter, was released March 2, 2017. The second, iGen: Why Today's Super-Connected Kids are Growing Up Less Rebellious, More Tolerant, Less Happy – and Completely Unprepared for Adulthood – and What That Means for the Rest of Us, by San Diego State University psychologist Jean Twenge, hit stores five months later.

In addition,

...Former employees and executives from companies like Facebook worried openly to the media about the monsters they helped create.

But is worry over phone use warranted? Here's what Gonzalez wrote after talking to more researchers:

When Twenge and her colleagues analyzed data from two nationally representative surveys of hundreds of thousands of kids, they calculated that social media exposure could explain 0.36 percent of the covariance for depressive symptoms in girls.

But those results didn’t hold for the boys in the dataset. What's more, that 0.36 percent means that 99.64 percent of the group’s depressive symptoms had nothing to do with social media use. Przybylski puts it another way: "I have the data set they used open in front of me, and I submit to you that, based on that same data set, eating potatoes has the exact same negative effect on depression. That the negative impact of listening to music is 13 times larger than the effect of social media."

In datasets as large as these, it's easy for weak correlational signals to emerge from the noise. And a correlation tells us nothing about whether new-media screen time actually causes sadness or depression.

There are several things to notice in the extended quote above. First let's unpack what it means to, "explain 0.36% of the covariance". Sometimes researchers will square the correlation coefficient r to create the value R2. The R2 tells you the percentage of variance explained in one variable by the other (incidentally, they usually say "percent of the variance" instead of "percent of covariance."). In this case, it tells you how much of the variance in depressive symptoms is explained by social media time (and by elimination, it tells you what percentage is attributable to something else). We can take the square root of 0.0036 (that's the percentage version of 0.36%) to get the original r between depressive symptoms and social media use. It's r = .06.

Questions

a) Based on the guidelines you learned in Chapter 8, is an r of .06 small, medium, or large?

b) Przybylski claims that the effect of social media use on depression is the same size as eating potatoes. On what data might he be basing this claim? Illustrate your answer with two well-labelled scatterplots, one for social media and the other for potatoes. Now add a third scatterplot, showing listening to music.

c) When Przybylski states that the correlation held for the girls, but not the boys, what kind of model is that? (Here are your choices: moderation, mediation, or a third variable problem?)

d) Finally, Przybylski notes that in large data sets, it's easy for weak correlation signals to appear from the noise. What statistical concepts are being applied here?

e) Chapter 8 presents another example of a large data set that found a weak (but statistically significant) correlation. What is it?

f) The discussion above between Gonzalez and Przybylski concerns which of the four big validities?

a) An r of .06 is probably going to be characterized as "small" or "very small" or even "trivial." That's what the "potatoes" point is trying to illustrate, in a more concrete way.

b) One scatterplot should be labeled with "potato eating" on the x axis and "depression symptoms" on the y axis. The second scatterplot should be labeled with "social media use" on the x axis and "depression symptoms" on the y axis. These first two plots should show a positive slope of points with the points very spread out--to indicate the weakness of the association. The spread of the first two scatterplots should be almost the same, to represent the claim the two relationships are equal in magnitude. The third scatterplot should be labeled with "listening to music" on the x axis and "depression symptoms" on the y axis, and this plot should show a much stronger, positive correlation (a tighter cloud of points).

c) It is a moderator. Gender moderates (changes) the relationship between screen use and depression.

d) Very large data sets have a lot of statistical power. Therefore, large data sets can show statistical significance for even very, very, small correlations--even correlations that are not of much practical interest. A researcher might report a "statistically significant' correlation, but it's essential to also ask about the effect size and its practical value (the potatoes argument). Note: you can see the r = .06 value in the original empirical article here, on p. 9.

e) The example in Chapter 8 is the one about meeting one's spouse online and having a happier marriage--that was a statistically significant relationship, but r was only .03. That didn't stop the media from hyping it up, however.

f) Statistical validity

g) The research on smartphones and depressive symptoms is correlational, making causal claims (and causal language) inappropriate. That means that we can't be sure if social media is leading to the (slight) increase in depressive symptoms, or if people who have more depressive symptoms end up using more social media, or if there's some third variable responsible for both social media use and depressive symptoms. As the Wired article states,

...research on the link between technology and wellbeing, attention, and addiction finds itself in need of similar initiatives. They need randomized controlled trials, to establish stronger correlations between the architecture of our interfaces and their impacts; and funding for long-term, rigorously performed research.

04/10/2018

Legalizing marijuana is associated with lower rates of opioid prescriptions in those U.S. states. Photo: Gina Kelly/Alamy Stock Photo

Opioid addition is a major health crisis in the United States. Deaths from overdose increased dramatically in the last 5 years. Opioid addiction sometimes starts when a person in pain is prescribed legal opioid drugs by a physician. Opioid prescriptions can also be sold illegally. For these reasons, opioid prescription rates are an indicator of opioid abuse in a particular region.

Some public health researchers have investigated whether legalizing marijuana can reduce rates of opioid use and abuse. Marijuana is an alternative for controlling chronic pain that, according to many experts, has a lower addiction risk. Recently, researchers published two studies, both with quasi-experimental designs, that tested whether legalized marijuana could lower the rates of opioid prescriptions. Like many quasi-experiments, the researchers took advantage of a real-world situation: Some U.S. states have legalized marijuana and other states have not.

One looked at trends in opioid prescribing under Medicaid, which covers low-income adults, between 2011 and 2016. It compared the states where [medical] marijuana laws took effect versus states without such laws....

Results showed that laws that let people use marijuana to treat specific medical conditions were associated with about a 6 percent lower rate [over the years studied] of opioid prescribing for pain. That's about 39 fewer prescriptions per 1,000 people using Medicaid.

And when states with such a law went on to also allow recreational marijuana use by adults, there was an additional drop averaging about 6 percent.

Questions:

a) What is the "independent" variable in this quasi-experiment? What is the dependent variable? Was the independent variable independent groups or within groups?

b) What makes this a quasi-independent variable?

c) Of the four quasi experimental designs, which seems to be the best fit: Non-equivalent control group posttest only? Non-equivalent control group pretest-posttest? Interrupted time series design? Non-equivalent control group posttest-only design?

d) How might you graph the results described above?

e) To what extent can these data support the causal claim that "legalizing marijuana, either for medical use or recreational use, can lower the rates of opioid prescriptions in the Medicaid system"?

a) The independent variable was whether a state had legalized marijuana or not. It was independent groups (states either had, or had not, legalized the drug). The dependent variable was the number of opioid prescription rates through Medicaid. Another variable, somewhat difficult to discern from the journalist's description, was year of study (from 2011 to 2016)

b) This IV was not manipulated/controlled by the experimenter. The researcher did not decide which states could legalize marijuana or not.

c) This is probably best characterized as a non-equivalent control group, pretest-posttest design. There were two types of states (legalized and not) and one main outcome variable: opioid prescriptions. The prescription rate was compared over time (from 2011 to 2016), making it pretest-posttest.

d) Your y-axis should have "opioid prescriptions" and the x-axis should include the years 2011 to 2016. You could then have "States with legalization" and "States without legalization" as two different colored lines.

e) The results of the study show covariance (States with legalized marijuana had lower opioid prescriptions). The fact that they compared opioid prescriptions over time (2011 to 2016) suggest that the design is able to establish temporal precedence. Presumably (although this is not clear from the articles), 2011 represents a year before many of the marijuana laws took effect and 2016 data occurred after the laws had been active. As for internal validity, it's possible that states that legalize are different in systematic ways than states that do not. For example, states that legalize marijuana are more likely to be in the North and West, have lower poverty rates, and so on. However, the pretest-posttest design, in which they studied the "drop in opioid prescriptions over time" rather than "overall rate of opioid prescriptions" helps minimize some of these concerns. As with most quasi-experiments, causation is not a slam-dunk, because the experimenter does not have full control over the independent variable.

01/20/2018

Do people with wider faces (left) show more antisocial tendencies than those with narrow faces (right)? Photo: istockphoto

According to several previous studies in psychological science, men with wider faces--a greater ratio of width to height (like in the photo on the left, compared to the right)--tend to show antisocial tendencies such as racial bias, exploitation, and even aggression. Researchers attributed this link to exposure to testosterone during development, which, they say, causes both wider facial structure and antisocial behavior.

Kosinski found that previous studies often had methodological shortcomings such as small sample sizes. Half of the previous studies that he identified involved fewer than 25 participants and the average sample size was 40. And seven out of ten of the studies only just crossed the conventional threshold for significance of p=.05.

These factors led Kosinski to conduct a large-scale study of face measurements and behavioral tendencies. His research, published in Psychological Science, finds no relationship between facial width-to-height ratios (fWHR) and behavioral tendencies in a large sample of over 135,000 participants.

Questions

a) Review the material in Chapters 11 and 14, and explain why studies based on small samples can lead to results that are difficult to replicate. (You might also want to review the "kindergarten height" example in this recent blog post).

b) Why is it a problem that, in 7 out of 10 studies, the results "only just crossed the conventional threshold for significance?"

Now read a bit more about the "big data" methods that Kosinski employed in his research:

Kosinski turned to a very large dataset collected via a Facebook app called MyPersonality.org. The app comprised a collection of psychometric tests and surveys that Facebook users could take and then see how they scored — they could also volunteer their scores and Facebook profile data to be used in research projects. Using this bank of over 800,000 users’ surveys and over 2 million profile pictures, Kosinski tested his research question: Do broad faces indicate antisocial tendencies? [...]

After a preliminary experiment with 1,692 users showed that a computer could measure width-to-height ratios with the same accuracy that humans could, Kosinski analyzed 173,241 photos from 137,163 male and female participants (some users had multiple profile pictures and their measurements were averaged before analysis).

The results showed that facial broadness didn’t substantially correlate with any of the 55 personality measures tested....For example, broader-faced people reported themselves to be more prosocial, sympathetic, trusting, and cooperative,” says Kosinski. “Also, broader-faced people reported less interest in drug use, weapons, piercing, and tattoos. Moreover, broader-faced people did not score significantly higher on any of the traits positively related to antisocial and aggressive behavioral tendencies, including the personality facets of excitement-seeking and anger, impulsiveness, and militarism (i.e., interest in paramilitary groups, the armed forces, bodybuilding, martial arts, and survivalism).”

c) According to this description, Kosinski is basically running a series of bivariate correlations. Each one was between a self-reported trait and _________?

d) Pick one of the personality variables tested in the study. Now sketch a scatterplot of the result, labelling your axes carefully.

e) Kosinski's sample included more than a hundred thousand users. Why might this lead to a more stable estimate of the true relationship between facial broadness and personality? (This is the complement to question a), above)

f) Kosinki's study is an example of a "failure to replicate." Review the concepts in Table 14.1 and indicate which elements might apply in this case.

g) What questions might you ask about the construct validity of the personality measures used in Kosinski's study?

Suggested answers

a) and e) Small samples are more likely to be affected by one or two extreme scores, whereas in very large samples, the extreme scores are much more likely to be balanced out by other scores. The gifs in this blog post show the principle dynamically.

b) Some researchers have proposed that when a manuscript reports p-values very close to the conventional cutoff of .05 (p-values of .04 or .03), it's a sign that a researcher might have "p-hacked" the study. P-hacking is when a researcher goes through a series of options when analyzing the data, such as eliminating outliers, adding covariates, or testing multiple dependent measures, stopping analysis only when p just crosses under the .05 threshold. Therefore, when, in a body of literature, most of the p-values are just below .05, we might suspect that the underlying finding is a fluke, not a real result.

c) Facial broadness, as measured by width-to-height ratio.

d) One axis should be labelled "facial broadness" and the other might be labelled "interest in drug use." The cloud of points should be extremely spread out, showing no pattern or discernible slope.

e) see a) answer above.

f) The concepts in Table 14.1 that seem to apply best are the third (the original study's sample was very small) and perhaps the fourth (the original study may have tried multiple statistical analyses). (We cannot be sure without more investigation into the original studies, but these are the two issues raised in the APS summary of Kosinski's work.)

g) Indeed, we don't know much about the personality measures used in the study. The full manuscript might report more about whether data collected with these personality measures shows that they are reliable and valid.

11/20/2017

The sun sets in Amarillo, TX an hour later than it does in Huntsville, AL though they are on the same time zone. Amarillo residents get less sleep and earn more money: Is there a causal connection? Photo: Creativeedits/Wikimedia Common

Sleep is an essential human function and getting more sleep is associated with improved mood, cognitive performance, and physical performance. Therefore, it might make sense that sleep would improve people's productivity and ability to earn money. That's the topic of a Freakonomics episode on the "Economics of Sleep." You can read the transcript or listen to the 45 minute episode here. (The section I focus on starts around minute 10.)

Freakonomics' hosts interviewed a set of economists (including Matthew Gibson, Jeff Shrader, Dan Hamermesh, and Jeff Biddle) about their research on sleep, work hours, and income. The economists mentioned that, in order to establish a causal link between sleep and income:

What we need is something like an experiment for sleep. Almost as though we go out in the United States and force people to sleep different amounts and then watch what the outcome is on their wages.

While it is theoretically possible to conduct such an experiment, it is practically difficult to assign people to different sleep conditions for a long enough period of time to notice an impact on their wages. So the economists took an alternative path and used quasi-experimental data. In a creative twist, they compared wages at two ends of a single American time zone. The example they gave is Huntsville, AL and Amarillo, TX. Here's why. Gibson stated:

It turns out that ever since we’ve put time zones into place, we’ve basically been running just that sort of giant experiment on everyone in America.

The story continued. You'll see the transcript version quoted below:

Consider two places like Huntsville, Alabama — which is near the eastern edge of the Central Time Zone — and Amarillo, Texas, near the western edge of the Central zone. [...]

...even though Amarillo and Huntsville share a time zone, the sun sets about an hour later in Amarillo, according to the clock, and since the two cities are at roughly the same latitude as well, they get roughly the same amount of daylight too.

So you’ve got two cities on either end of a time zone, roughly the same size — just under 200,000 people each — where, according to the clock time, sunset is an hour apart. Now, what good is that to a pair of economists interested in sleep research?

GIBSON: It turns out that the human body, our sleep cycle responds more strongly to the sun than it does to the clock. People who live in Huntsville and experience this earlier sunset go to bed earlier.

GIBSON: If we plot the average bedtime for people as a function of how far east they are within a time zone, we see this very nice, clean nice straight line with earlier bedtime for people at the more eastern location.

But since Huntsville and Amarillo are in the same time zone, people start work at roughly the same time, which means alarm clocks go off at roughly the same time.

GIBSON: That means if you go to bed earlier in Huntsville, you sleep longer.

The economists didn't use only Huntsville and Amarillo--they also conducted multiple comparisons of cities around the U.S. that were similarly on each end of a single time zone. Using "city of residence" as their quasi-experimental operationalization of "amount of sleep", the economists were ready to report the results for wages:

So now Gibson and Shrader plugged in wage data for Huntsville vs. Amarillo and other pairs of cities that had a similar sleep gap.

GIBSON: We find that permanently increasing sleep by an hour per week for everybody in a city, increases the wages in that location by about 4.5 percent.

Four and a half percent — that’s a pretty good payout for just one extra hour of sleep per week. If you get an extra hour per night, Gibson and Shrader discovered — here, let me quote you their paper: “Our main result is that sleeping one extra hour per night on average increases wages by 16%, highlighting the importance of restedness to human productivity.”

Questions:

a) What is the independent variable in this time zone and wages study? What is the dependent variable?

b) Is the IV independent groups or within groups?

c) Which of the four quasi-experimental designs is this? Non-equivalent control group posttest only, Non-equivalent control group pretest-posttest, Interrupted time series, or Non-equivalent control group interrupted time series?

d) The economists asserted, "sleeping one extra hour per night on average increases wages by 16%" (italics added). What do you think? Can their study support this claim? Apply the three causal rules, especially taking note of internal validity issues that this study might have.

e) If you consider only one pair of cities, there are multiple alternative explanations, besides sleep, that can account for wage differences. Name two or three such threats (considering Huntsville and Amarillo as an example). Now consider, how might many of these internal validity threats be reduced by conducting the same analysis over many other city pairs?

f) This Freakonomics episode was aired in 2015, but the study (about time zones) they reviewed is not yet published. What do you think about that?

Answers to selected questions

a) The IV is "Hours of sleep" (but you could also call it "location on the time zone: East or West") and the DV is "Wages".

b) The IV is independent-groups.

c) Non-equivalent control group posttest only.

d & e) The results of the study support covariance: People in cities in the Eastern portion of time zones get more sleep and have higher wages than people in the Western portions. Temporal precedence is unclear, I think: Because the data were collected at the same time, it's not clear if the timezone came first, leading to more sleep and higher wages, or if people began to earn higher wages first, and then systematically moved Eastward. (However, the second direction certainly seems less plausible than the first.)

As for internal validity, if we consider only the city pair of Huntsville and Amarillo, we could come up with several alternative explanations. The two cities have different historical trajectories and different ethnic diversities; they are in two different states that have different fiscal policies and industry bases. Perhaps Amarillo has poorer wages in general and people are losing out on sleep there because they are working more than one job. However, these internal validity threats become less of an issue when you consider multiple pairs of cities. It is less plausible that internal validity threats that apply to one city pair would also, coincidentally, apply to all the other city pairs that are at opposite ends of a time zone.

Even though the method is fairly strong, psychologists would be unlikely to make a strong causal claim simply from quasi-experimental data like these, because the independent variable is not truly manipulated. Nevertheless, the method and results of this quasi-experiment are certainly consistent with the argument that getting more sleep may be a factor in earning higher wages.

Devoting more attention to your smartphones than to your children could mean that they'll have improper brain development and emotional disorders later in life.

That sounds serious. Put down this blog right now and pay attention to your kiddos! On the other hand, keep on listening, and you'll hear that the study in question was done on....rats.

a) Before reading the description of the study, what are the conceptual variables (constructs) that the journalist wants you to believe are linked? (Hint: What are the three variables in the red quote above?)

Now, read this excerpt from the Time article and decide how each of those conceptual variables was operationalized in the study:

Dr. Tallie Baram, professor of pediatrics and anatomy-neurobiology at University of California, Irvine, and her colleagues used a rat model to study how good but disrupted attention from mothers can affect their newborns. Baram placed some mothers and their pups in modified cages that did not have sufficient material for nesting or bedding. This was enough to distract the mothers into running around looking for better surroundings and end up giving their babies interrupted and unreliable attention. Baram and her team compared the development of newborns raised in this environment to those raised in the normal cages where mothers had enough material to create a comfortable home.

When the offspring grew older, the researchers tested them on how much sugar solution they ate, and how they played with their peers, two measures of how much pleasure the animals were feeling and a proxy for their emotional development. The rats raised in the modified environments consistently ate less of the sugar solution and spent less of their time playing and chasing their peers than the rats raised in the normal setting.

c) How were "Emotional disorders later in life" operationalized in this study?

d) How was "Improper brain development" operationalized?

e) What do you think? To what extent is it reasonable to generalize from rat models of parenting to human parenting?

f) When the journalist (and, indeed, the scientist) go beyond the rat model and apply these results to human parents, which validity are they working with?

You might have concluded that a study on rats has a way to go before it can be applied to human kids, and you'd have a good point. But before you dismiss the entire study, you should know that there's a great deal of experimental, behavioral evidence on real human children on the topic of responsive parenting. Studies typically find that attentive, responsive parenting when kids are young can lead to improved outcomes, as I have blogged about here. Many of these studies are conducted by my colleague Mary Dozier with her colleagues and students.

Many thanks to Dr. Barbara Sarnecka of University of California-Irvine for bringing this example to my attention!

Selected answers

a) the variables are "devoting attention to smartphones", "brain development" and "emotional disorders later in life"

08/10/2017

To what extent does the evidence support a causal influence of vacations on happiness and stress? Photo: Syda Productions/Shutterstock

Here are some quasi-experimental and correlational studies on vacations, just in time for the end of summer. The APS website describes a few studies consistent with the argument that vacations can be good for your mental health. Here's one study by researchers Sabine Sonnentag and Jana Kühnel:

The researchers surveyed 131 teachers before and after a two week break from school.

First, they had the teachers complete a measure of exhaustion—how emotionally drained and burned out they felt the day before heading out for vacation. The teachers then completed weekly surveys on how engaged they were with their work, relaxed, and stressed they felt four weeks after returning from vacation.

As predicted, the results indicated that vacationing had a beneficial effect. Not only did the teachers report feeling less tired and emotionally burned out, they also reported feeling more engaged and positive about their work.

a) This is a quasi-experiment. What is the study's "independent" variable? What is/are its dependent variable(s)?

b) Would you call the design a non-equivalent groups posttest only? non-equivalent groups pretest/posttest? Interrupted time series? Or non-equivalent groups interrupted time series?

c) Consider the 12 internal validity threats in Table 11.1. Which threats can this study rule out? Which threats might still apply?

d) Sketch a graph of the results of the study, incorporating this (more negative) message:

But, these benefits were fairly short-lived, particularly for those teachers who came back to especially difficult students and heavy workloads. Within four weeks, the vacation’s positive benefits had faded and teachers were back to their initial levels of stress and emotional exhaustion.

The article also suggests that when it comes to spending money, money spent on vacations is associated with more happiness than money spent on material goods.

...psychological scientists Amit Kumar and Thomas Gilovich of Cornell University and Matthew Killingsworth of University of California, San Francisco tracked moment-to-moment data from 2,266 adults as part of a large-scale experience-sampling project. Participants received notifications from the researchers on their iPhones at random times throughout the day.

Comparing data from individual participants across different times, Gilovich and colleagues found that people were happier at times when they were thinking about a future experiential purchase, like a ski trip, than they were at times when they weren’t thinking about a purchase at all. There was no relative increase or decrease in happiness when people were thinking about a future material purchase.

e) The above study is a correlational one, with a twist. The researchers computed a correlation for each individual person, using "experience" as the unit of analysis. Given the results described above, what might a bar graph depict for a typical person in this study? (what would be on each axis, and what would the results pattern depict?)

02/10/2017

The death of a fellow officer on duty was rated as one of the most stressful events for police officers. Credit: W. Smith/Epa/REX/Shutterstock

What happens when people are exposed to very stressful events? A study has investigated this question using a sample of police officers. The study found a correlation between exposure to stressful job events and cortisol change over time. This is a journalist's take on the study.

Here's an overview :

For most people, cortisol, the vital hormone that controls stress, increases when they wake up. It's the body's way of preparing us for the day....[Now, a] study of more than 300 members of the Buffalo Police Department suggests that police events or conditions considered highly stressful by the officers may be associated with disturbances of the normal awakening cortisol pattern. That can leave the officers vulnerable to disease, particularly cardiovascular disease, which already affects a large number of officers.

The study's two main variables were the experience of major stress and cortisol patterns. First, read how they measured stress in the sample:

For this study, participating officers assessed a variety of on-the-job stressors using a questionnaire that asks officers to rate 60 police-related events with a "stress rating." Events perceived as very stressful are assigned a higher rating.

Exposure to battered or dead children ranked as the most stressful event, followed by: killing someone in the line of duty; having a fellow officer killed on duty; a situation requiring the use of force; and being physically attacked.

Identifying the five most intense stressors police can face was significant, Violanti said. "When we talk about interventions to help prevent disease, it's tricky because these stressors are things that can't be prevented," he said. ... The survey showed that the officers experienced one of the five major stressors, on average, 2.4 times during the month before the survey was completed.

Second, read how they measured cortisol patterns. Notice how in this case, the variable is operationalized not as a single outcome, but as a pattern over four time periods:

Cortisol was measured using saliva samples taken upon waking up, and 15, 30 and 45 minutes thereafter.

Here's how the journalist described and interpreted the result:

Officers who weren't as stressed showed a steep and steady, or regular, increase in cortisol from baseline. However, officers with a moderate and high major stress index had a blunted response over time.

That's because stress affects a system in the body known as the hypothalamic pituitary adrenal axis, or HPA Axis. When you're stressed, the HPA Axis elicits cortisol, a hormone that gets the body going and activates against the stressor, Violanti explained. Under normal circumstances, the body's cortisol pattern looks like a normal bell curve: It rises when we wake up, peaks around midday and comes back down at bed time.

"If you experience chronic stress or high stress situations, the cortisol can no longer adjust normally like this. So what happens with people under a lot of stress, the cortisol flattens out. For some people it goes down and others it goes up and stays up. That's called the dysregulation of the HPA axis," said Violanti, who served with the New York State Police for 23 years before shifting into academia.

Questions to answer:

Draw a well-labeled figure depicting the study's main result. (Will it be a bar graph, line graph, or a scatterplot? What are the best labels?)

What makes this a correlational study?

Evaluate the construct validity of a) the operationalization of stress and b) the operationalization of cortisol patterns. In your opinion, how well did they measure these two variables?

Consider the external validity of this study. What are the characteristics of the Can we assume that the results will generalize to other cops? Do we know if the results generalize to other professions or people who have experienced stress? Why or why not?

Can the study support the causal claim that "exposure to stress causes cortisol dysregulation in cops?" Consider temporal precedence (the directionality problem) as well as internal validity (the third variable problem).

Question 4 is about external validity. You might be interested to read the authors' take on this question:

While the current study focused on Buffalo officers, the findings have implications for cops around the country, said paper co-author Michael Andrew, PhD, chief of the Biostatistics and Epidemiology Branch of the CDC/NIOSH Health Effects Laboratory Division in Morgantown, West Virginia.

"These findings show that exposure to major events inherent to police work may lead to a temporary reduction in the biological ability to respond to further stressful events. Since the major stressor events in this study were originally developed to reflect events that can apply to any police department, these results should generalize, more or less, to any police department in the U.S.," Andrew said, adding, "This points to the need for continued focus on supporting police officer health."

For Question 5, you already know that this study establishes covariance. However, temporal precedence is not very clear. It's possible that cops with poor cortisol regulation are more likely to be involved in future stressful events (for some reason). Internal validity is more of a problem, because, at least based on what's presented here, we don't know if they controlled for third variables such as what type of neighborhood the cops usually patrolled, or for personality characteristics such as impulsiveness, Type A personality, or other traits. For example, an impulsive personality might be associated with more stressors on the job, and might also be associated with cortisol patterns.

[H]azard perception...involves visually scanning the road ahead for clues that a dangerous situation may be developing, such as a pedestrian getting ready to cross the street or cars up ahead starting to brake. This sounds simple enough, but research suggests that a knack for this kind of visual scanning actually takes years – even decades – to learn.

Here's a research finding quoted in the article:

[N]ovice drivers, particularly teens, are so much more accident prone compared to older, more experienced drivers. Eye-tracking studies have shown that less experienced drivers tend to look at the road right in front of them, while more experienced drivers tend to automatically look far ahead, scanning all around the road for signs of trouble.

a) Is the finding above from a correlational or experimental study? What are the two main variables in the result? If it's an experimental study, what is its design?

Here is a second research finding quoted in the article:

...research has also demonstrated that even very short interventions can lead to major improvements in driving safety.

In one California study, drivers who had just passed an on-road driving test were randomly assigned to either receive a 17-minute hazard perception training or to receive no additional training. Over the course of the following year, male drivers who received the training had a rate that was nearly 25% lower than the group of untrained males. However, there was no such drop in accident for female drivers who had received the training.

b) Is the finding above from a correlational or experimental study? What are the two main variables in the result? If it's an experimental study, what is its design?

Here's a final research result:

However, unlike other driving skills, hazard perception has been empirically linked to crash risk.

c) Is the finding above from a correlational or experimental study? What are the two main variables in the result? If it's an experimental study, what is its design?

a) This is a correlational study, and the two measured variables are driver experience (or driver age) and how far ahead drivers train their eyes while driving.

b) This is an experimental study. It appears to be a post-test only design. The independent variable is whether drivers received the 17 minute training or whether they received no training. The dependent variable is accident rate. This study had a participant variable, gender. You read that the training affected males but not females. Therefore, you could also consider this a factorial design (IVxPV) design with an interaction.

c) This is a correlational study, and the two measured variables are skill at hazard perception and crash risk.

If you’re a research methods instructor or student and would like us to consider your guest post for everydayresearchmethods.com, please contact Dr. Morling. If, as an instructor, you write your own critical thinking questions to accompany the entry, we will credit you as a guest blogger.