psych research methods, stats, pedagogy, and more

Menu

A friend posted a question to a group of research colleagues recently:

“Three weeks ago, I ran a 100 person two-condition study on Mturk. Result: t = 2.95, p = .004. Today I ran another 100 person two condition study on Mturk, using the identical measure. No differences in what came before that measure. Result? t = 0.13, p = .89.”

The friend was exasperated and didn’t know what to do – What are the best practices for how researchers should adjudicate conflicting study results like these? I wrote the friend a long response, but I realized that my advice might be of use to others too.

The group had several suggestions for courses of action. I list the options below and explain my preferred option.

Drop the project. This is an unsatisfactory choice, because as we will see below, the first two studies were likely underpowered, so we’re risking missing out on a true effect by abandoning the research question too soon (i.e., we risk a Type II error).

Report the significant study and ignore the non-significant one. Ok, no one actually recommended this choice. But I think this is what a mentor might have recommended back in the old days. We know now that file drawering the non-significant study substantially inflates the Type I error rate of the published literature, which would be dishonest and not cool.

Look for a moderator. Perhaps the first study was run on a Tuesday, and the effect only shows up on Tuesday. Or perhaps, more interestingly, the first study had more women participants, and the effect is stronger for women participants. These post-hoc moderators could explain why the effect shows up in one study but not the other. However, there are an infinite number of these potential moderators, and we have no way of knowing for sure which one is actually responsible. The most likely explanation is simple sampling error.

Meta-analyze and use the meta-analytic confidence interval to test significance of the effect. This is not a terrible choice, and in the absence of more resources to conduct further research, this is probably a researcher’s best bet. But ultimately, without additional data, we can’t be very confident whether Study 1 was a false positive or Study 2 was a false negative.

Use the meta-analytic effect size estimate to determine the needed sample size for a third study with 80% power. This is my recommended, best practices, option for the reasons outlined in point 4. Note that this third study should not be viewed as a tiebreaker, but rather as a way to get a more precise estimate of the actual effect size in question.

What follows is a step-by-step guide using the R statistics software package to conduct the meta-analysis and estimate the number of participants needed for Study 3.

Step 0 – Download the compute.es, metafor, and pwr libraries if you don’t have them already. This step only needs to be completed once per computer. You’ll need to remove the # first.

Step 4 – Look at the estimate from the random effects meta-analysis. In this case it is 0.31 (this is in standardized units). Its 95% CI is [-0.23, 0.86]. There is significant heterogeneity (Q = 3.92, p = .048), but who cares? In this case, it just means that the two estimates are pretty far apart.

Step 5 – Run a post-hoc power analysis to see what the combined power of the two first studies was. The n is per cell, so we have n=100 over the two studies. The d is the estimate from the meta-analysis.

Step 6 – The post-hoc power is .59 based on a true ES of 0.31. This means that given a true ES of 0.31, 59% of the time, we’d expect the combined estimate from the two studies to be statistically significant. Now we’ll run an a priori power analysis to see how many participants a researcher needs to get 80% power based on d = 0.31.

Conclusion: The test says my friend needs 165 participants per group to get 80% power for d = 0.31. Of course, if researchers want to be more efficient, they could also try out sequential analysis.

I hope this guide is useful for researchers looking for a practical “what to do” guide in situations involving conflicting study results. I’m also interested in feedback – what would you do in a similar situation? Drop me a line on Twitter (@katiecorker), or leave a comment here.

I have tried out various brief writing assignments to inject a little personality into my Personality Theories course. This semester I decided to try another new one – and so far it is working out pretty well! I asked the students to act as science reporters and write about a recent finding in personality psychology. In my research methods course, students write a lot of article critiques, but I wanted this assignment to be distinct from that one in that I wanted students to write for a more general audience. To raise the stakes even further, I told the students that their pieces would be posted here, on my blog. I’m tagging these posts “Personality Science Student Guest Posts.” I’ll have the first few up shortly, and there’ll be more to come in a few weeks. I’m also providing the instructions I gave to students below, in case anybody else wants to try out the assignment. Enjoy!

I’ve decided to quit academia.edu and researchgate and put all of my pre-prints/manuscripts on PsyArXiv. I deleted any manuscript copies that I had uploaded to academia.edu and RG and removed my accounts from them. I’m writing you because you posted a copy of our collaborative work on researchgate. It is of course your prerogative as to how you share our work, but I thought I might ask you to consider taking that copy of our paper down. I’m trying to streamline access points for our work and also to redirect traffic away from these commercial sites. PsyArXiv is indexed by Google scholar, so the work remains freely accessible in a space backed by a non-profit entity (the Open Science Framework). Another benefit of OSF is that it is backed by a large preservation grant, so that the works on PsyArXiv will be supported in perpetuity even if OSF grows or changes.

I doubt you need this info, but just in case, here’s a bit more about PsyArXiv and its mission:

Will Gervais just posted a really, really cool simulation showing differences in the number of findings discovered by Dr. Power (who runs 100 person per condition studies, all day everyday) and Dr. Wide Net (who runs 25 person per condition pilot studies and follows up on promising – aka statistically significant – ideas). Both researchers have access to a limited number (4,000) of participants in a given year. The question is, which strategy is better for netting creative new ideas?

Luckily for me, Will shared his code. The code is amazing, and Will is modest. It was easy to modify and add a few pieces to find out a few things I wanted to know. Specifically, Will presents the rate of “findings” (aka true positives) that each approach yields. But what about false positives? Missed effects (aka false negatives)? Correct rejections? Are there any differences for these other findings for Dr. Power vs. Dr. Wide Net? My results are below – as figures instead of tables, sorry Will!

Dr. Power is on the right, and Dr. Wide Net is on the left. I ran the simulation at 3 different prior levels (.25, .50, .75), because I’m even lazier than Will claims to be (he’s obviously not, given this awesome sim). The green line represents the total number of ideas tested (I replicate Will’s finding that for Dr. Wide Net, the number of ideas tested goes down as the prior goes up, whereas for Dr. Power, the number of ideas tested is a direct function of n/cell and total N).

The yellow-y line is the number of true positives (“findings”) identified. Just as Will found, I find that as the prior goes up, Dr. Power finds more findings. (Note that my simulation is done with the alpha for Dr. Wide Net’s pilot studies set at .10, so the same as Will’s Table 2).

The purple line is the number of findings that represent true negatives (i.e., no effect exists, and the test returns non-significant). These go down as the prior goes up, definitionally.

The blue line represents the number of misses – true effects that go undetected. Dr. Wide Net has a ton of these! Dr. Power barely misses out on any effects. This makes sense, because Dr. Wide Net is sacrificing power for the ability to test many ideas. Lower power means that there will be more missed true effects, by definition. (However, for both Drs., misses increase as the prior increases. I don’t actually know why this is. Why should power decrease as the prior increases? Readers?)

Now here’s where it gets really strange. It’s almost imperceptible in the graph above, but the rate of false positives is higher for Dr. Power than it is for Dr. Wide Net. Neither doctor has a particularly high false positive rate, but Dr. Power’s rate is higher. What’s going on? My hunch is that Dr. Wide Net’s filtering of the effects she studies (via pilot testing) is helping to lower the overall false positive rate of her studies.

Let’s look at these results another way:

Here we can clearly see that the rate of false positive studies is more perceptible for Dr. Power than Dr. Wide Net (this figure shows the percentage of studies done that yield a particular result). As we know, Dr. Wide Net does way, way more studies.

Another way to think about this is as the False Discovery Rate, or the proportion of statistically significant findings that are false positives. We can also consider the False Omission Rate, the proportion of non-significant findings that are missed (false negatives). Here’s a graph:

Dr. Power does have a higher false discovery rate (but the FDR decreases as the prior increases). Dr. Wide Net’s false discovery rate is almost zero. So this is a little weird, because it almost seems like a win for Dr. Wide Net.

BUT – and there’s always a but!

Dr. Wide Net’s False Omission Rate is off the charts. With a 50-50 prior, about 40% of Dr. Wide Net’s non-significant results are actually real effects. By contrast, with the same prior, Dr. Power has only about 18% non-significant results that are actually real effects. When we take this finding into account together with efficiency (again, Dr. Wide Net has to do tons more studies than Dr. Power), I’m pretty sure the lower false discovery rate isn’t worth it.

My code (a slightly modified version of Will’s) is here. I welcome corrections and comments!

SPSP 2016 has just wrapped up and with it another year of fantastic meetings and discussion. This year, I (together with Jordan Axt, Erica Baranski, and David Condon) hosted a professional development session on daily open science practices – little things you can do each day to make your work more open and reproducible. You can find all of our materials for the session here, but I wanted to elaborate on my portion of the session concerning pre-registration.

A person approached me after the session and told me the following:

“I want to give this pre-registration thing a try, but I don’t know where to start. How can I show an editor that my work is pre-registered?”

So here it is: a how-to guide to pre-registration. As I said at SPSP, there is not one perfect/only way to pre-register – scientists can choose to pre-register only locally (nothing online – just some documentation for themselves), privately (pre-registration plan posted online, but with closed access), or publicly (pre-registration plan posted online, in a registry, and free for all to see). The key ingredient across all of these approaches is that flexibility in analysis and design is constrained by pre-specifying the researcher’s plan (more on that in a bit). For now, let’s consider the options one-by-one.

Pros: Pre-registration in any form helps you slow down and be more sure that your project can test the question you want it to. I would argue that the quality of science improves as a result. You have protection, even if only to yourself and your team, against over-interpreting an exploratory finding (by decreasing hindsight bias or reducing hypothesizing after the results are known, aka HARKing).

Cons: An editor or reviewer doesn’t have evidence, apart from your word, that the pre-registration actually happened. A scientist’s word is worth a lot, but when it comes to convincing a skeptic, you might have a tough time.

Options: Your imagination is the limit when it comes to thinking of ways to do internal documentation. You could go old-school and write long-hand in ink in a lab notebook. You could use Evernote or Google docs or some other kind of cloud based document storage. The key is that you make your notes to yourself (and perhaps your local team), and those notes don’t get edited later on. They are just a record of your plans. I should note that you would benefit from using a standard type of template (more on templates in a minute), if only so that you don’t forget to think through the most important factors in your study (trust me, forgetting happens to the best of us).

2. Private pre-registration: Same as internal only pre-registration, except you post the pre-registration privately to a repository. Private pre-registrations can be selectively shared with editors and reviewers, for the purposes of proving that a pre-registration occurred as specified.

Pros: You cannot be “scooped” – meaning your ideas stay private until such time as you later choose, but you can definitively prove that your (perhaps un-Orthodox) analysis was the plan all along.

Cons: You cannot attract collaborators, either. Others working in a similar area don’t know what you’re up to, and you might miss out on a valuable collaboration. For the field writ large, this isn’t a very attractive long term option, because we don’t get a record of abandoned projects either – studies that for whatever reason don’t make it past the data collection stage and into the published literature.

Options: For easy to do private pre-registration, you can’t beat aspredicted.org. One author on the team simply answers 9 questions about the planned project, and a .pdf of the pre-registration is generated. Pre-registrations can stay private indefinitely on aspredicted, but authors do have the option to generate a web link to share with editors/reviewers. Another option would be to use the Open Science Framework (osf.io). The OSF has a pre-registration function that researchers can choose to make private for up to 4 years (at which point, the pre-registration does become public). The pre-registration function freezes the content of an OSF project so that a record of the project is preserved and no longer able to be edited. As an alternative to the pre-registration function, OSF timestamps all researcher activity on the site, and it allows researchers to keep their (non-registered) projects private indefinitely. This means that a researcher could post a document containing a pre-registration to their private project and use the OSF timestamping system to prove to an outside party when the pre-registration occurred, relative to when data were collected. The clunkiness of this system means that researchers who want to have indefinitely private pre-registrations will likely want to use aspredicted.org, or use OSF and accept that after the researcher-determined embargo period of up to 4 years, their pre-registrations will become public. Again, the public vs. private distinction has downstream consequences for the field, because public pre-registrations allow researchers to understand the magnitude of the file drawer problem in a given area of the literature.

3. Public pre-registration: Same as private pre-registration, except that researchers post their plans publicly on the web.

Pros: Fully open, complete with mega-credibility points. Your work is fully verifiable to an outside party. Outside parties can contact you and ask to collaborate. As a side note, we all have projects that are interesting and potentially fruitful, but that get left by the wayside due to lack of time or other constraints. To me, pre-registration (or really any form of transparent documentation) is a way of keeping track of these projects and letting others pick them up as the years go on (I have this fantasy that when a student joins my lab, I’ll be able to direct them to the documentation of an in-progress, but stalled, project, and they’ll just pick it right back up where the previous student faltered). So there are potential benefits of increased transparency and better record keeping beyond the type-I error control that proponents of pre-registration are so quick to note.

Cons: Scooping? I’m not sure this is a real concern, but insofar as people have anxiety about it, it needs to be addressed. If you make your whole train of logic/program of research fully transparent, there is always the risk that someone better/smarter/faster/stronger than you will swoop in and run off with the idea. To me, the potential for fruitful collaborations far outweighs the risk of scooping, and actually both are trumped by a third possibility, which is that all this documentation won’t attract much attention at all. In my own experience, a handful of people are interested, but mostly my work goes on as usual. Others have noted that public pre-registration actually could help you stake a claim on a project, insofar as you are able to demonstrate the temporal precedence of the idea relative to the alleged scoop-er. A final con is that there is a time cost to getting the study materials up to snuff for public consumption. However, as I noted before, the quality of the work likely increases, and the project is less likely to get shelved if a collaborator loses interest or there are other hiccups down the road. I’m a big fan of designing studies so that they are informative, null results or not, so that there is (ideally) no such thing as a “failed” study, and instead only limitations in our time, motivation, and fiscal resources to publish every (properly executed) study. Doing a good job of documentation on the front end of a project means that even if you never get around to publishing a boring/null/whatever result, a future meta-analyst could, with some ease, find your project and incorporate it into their work.

Options: The OSF is likely to be your best bet at this point, and although OSF is a powerful, flexible system, it is not the most user friendly for beginners. However, the opportunity cost of learning the system more than pays for itself down the road. Anna van’t Veer and Roger Giner-Sorolla have this nice step-by-step that explains how to create and pre-register a new project on OSF. The Center for Open Science pre-registration challenge also has a bunch of materials that will help you get started. And if you want to do the pre-registration challenge, and you’re an R user, you’ll definitely want to check out Frederick Aust’s prereg package for R.

Regardless of which option you choose to pursue, I would encourage you to think about using a template (either make your own or use someone else’s) so that you get all of the most important details of your project ironed out ahead of time. It will definitely happen that once you have your data in hand, you realize that you’ve forgotten to specify something important. That’s OK, and you ought to just honestly report such discrepancies and move on. Don’t let perfect be the enemy of done.

Regardless of occupation, age, or social status, all people can relate to the difficulty of achieving ideal performance under challenging circumstances. Being able to brush off the stress of a demanding situation and produce desirable results is often referred to as ‘mental toughness.’

Mental toughness is commonly applied in a sports context, like when the main character in the stereotypical feel-good sports movie overcomes all odds to win the big game. However, mental toughness as a concept is applicable to broad range of contexts, including education. The average college student utilizes mental toughness when they deny the gratification of going out with friends on a Friday night and instead study for a difficult test. Examples of mental toughness can also be highlighted in both workplace and military environments. Despite pervasive mentions and implications of mental toughness, the term lacks a substantive definition.

A recent study by Gucciardi et al. in the Journal of Personality aimed to produce a working definition of mental toughness. The study also sought to characterize features of mental toughness, including whether or not it could be recognized as a trait or a product of certain situations. Additionally, the researchers examined if the traditional positive association between mental toughness and successful performance, as well as the negative relationship between mental toughness and stress levels, would be affirmed. The study consisted of five smaller studies, each aimed at addressing a subcomponent of mental toughness.

The first study focused on creating a composite definition of mental toughness that incorporated definitions and concepts from previous research. The researchers organized focus groups and polls with a combined 30 experts in fields related to mental toughness, including researchers, students, athletes, coaches, and businesspeople. The researchers used this consultation and sampling of experts to eliminate terms unrelated to mental toughness, and create a working definition of the term that was both face and content valid (meaning that it both seemed valid, and covered all of the theoretically relevant material). Ultimately, Gucciardi et al. (2015) defined mental toughness as a “personal capacity to produce consistently high levels of subjective (e.g. personal goals or strivings) or objective performance (e.g. sales, race time, GPA) despite everyday challenges and stressors as well as significant adversities” (p. 28).

The second study developed an eight-item measure of mental toughness. This study highlighted mental toughness as unidimensional, rather than multidimensional. This means that mental toughness can be identified as a unique characteristic, rather than a factor that is multidetermined, or dependent on the existence of other characteristics. The third study implemented the recently-developed measure of mental toughness to evaluate whether mental toughness was correlated with stress or workplace performance. The researchers surveyed the stress levels of friends, and then had the participants’ work supervisors report on their performance.

Ultimately the researchers found that mental toughness was directly associated with positive reports from supervisors, and that those who had higher levels of mental toughness were less likely to be stressed and more likely to have better stress coping methods. Apparently, the commonly-held belief that mental toughness breeds success has some statistical basis.

The fourth study explored the relationship between mental toughness and psychological health. Researchers surveyed both the presence of positive emotions and the absence of negative symptoms of mental health in order to test their prediction that mental toughness would be positively related to psychological health. Ultimately, mental toughness emerged as a good predictor of not only negative emotional states, but also positive emotions. Additionally, researchers asserted that both differences between and within people contribute to the level of mental toughness realized in a given situation. This finding is consistent with the notion that it is neither the person nor the situation that determines a person’s behavior, but rather the interaction of the two factors. Furthermore, the researchers indicated that mental toughness operates on a continuum, rather than being a dichotomous variable. Thus some people have greater mental toughness than others, as opposed to either having or not having mental toughness.

Having already shown that mental toughness is positively correlated with successful performance, the final study analyzed whether mental toughness predicted sustained performance. Interestingly enough, the researchers framed this study within the context of a military selection test. The results indicated not only that a significant association existed between mental toughness and passing the selection test, but also that this association existed even while considering additional factors like self-efficacy (an individual’s belief that they can control their own behavior).

Overall, this study provided a wealth of knowledge on mental toughness, although it was not without its flaws. Weaknesses of note include a dependence on self-report data as well as a lack of causal framework. Although self-report data is easy and cheap to obtain, asking an individual about their own characteristics can be subject to bias or lies. Regarding causality, the researchers used exclusively correlational designs. All five studies did not incorporate active manipulation of a variable or the random assignment of participants to varying conditions. This is no fault of the researchers, since you cannot manipulate participants’ mental toughness, but it does prevent them from claiming, for example, that being high in mental toughness causes individuals to be successful in the workplace.

For all five of the studies, the sample size consisted of entirely ‘white collar’ workers. This sampling choice omits a significant portion of the population, notably individuals who perform physically demanding occupations. One direction for future research could examine variations in mental toughness between job types or socio-economic status. Cross-cultural differences in mental toughness would also be worth examining.

However, this study did generate a straightforward definition of mental toughness, which is no small feat. More than anything, the researchers demonstrated that mental toughness is not just a term used to describe a composition of traits. So the next time a friend questions your choice not to go out for drinks, tell them you are exercising mental toughness and point them in the direction of this article.

The language you speak affects many aspects of your life, including – according to recent research – your personality. Psychologists Chen, Benet-Martínez, and Ng looked at whether what language Chinese-English bilinguals spoke affected their personality perception.

Much of this study relies on the idea of “dialectical thinking,” so let’s get defining that out of the way. Essentially, dialectical thinking is the acceptance of contradicting, ambiguous, or inconsistent information. It is largely tied to Eastern philosophy, and pops up again and again when looking at cultural differences between East and West. From proverbs to arguments to self-descriptions, Easterners tend to be okay with things not quite lining up. Westerners, on the other hand, have low dialectical thinking – they like everything to make sense and stay the same.

The researchers predicted that speaking Chinese would draw out these dialectical thinking tendencies – the tendency not to force everything to fit together into one cohesive whole. That means that they thought Chinese speakers would notice more differences in personality and behavior (both in themselves and in others).

In order to test this, the researchers first had to test whether speaking a different language really does elicit different levels of dialectical thinking. They did so by recruiting college students who could speak both English and Chinese. They gave these participants a test measuring their dialectical thinking in both languages. Lo and behold, higher levels were found when responding in Chinese. When different participants were randomly assigned to respond in either Chinese or English, the Chinese group once again showed higher dialectical thinking.

Previous research has shown that there is a cultural difference in dialectical thinking– Chinese people tend to be more tolerant of contradictions than Americans – but this study goes one step further. In the exact same people, its level changes depending on which language they are speaking.

This study also looked at whether what language the questions were in affected how participants rated personalities. In both Chinese and English, participants rated their own personality, as well as the personalities of “typical” native Chinese and English speakers. Researchers then calculated how different all these ratings were from each other. They found that differences were significantly higher in Chinese than in English – participants responding in Chinese were more likely to assign different personalities to different people, than were those responding in English.

So the researchers had it pretty locked down that these differences exist, on paper at least. But what about in actual interactions between people? Do these results carry over into behavior?

To test this, participants spoke with research assistants in English and in Chinese. They were then asked if they thought they behaved any differently when they were speaking one language or the other. Those who were higher in dialectical thinking were more likely to report that they were acting differently in the two situations. The researchers, the other half of the conversation, was also more likely to report high behavioral differences in high dialectical thinking participants. The same is also true of observers who just watched a video of the participant speaking.

Now that seems like a lot of ratings, but hear me out. Not only do participants think that they are acting differently in different situations, but strangers, people watching these random conversations, also see the participant acting differently. They’re actually changing in some significant, noticeable way depending on what language they are speaking.

All this discussion of language and behavior and “dialectical thinking” circles around one main idea – culture affects how we act. It’s as simple as that. Well, sorta.

Using a certain language evokes aspects of its connected culture. When you speak Chinese, you’re more likely to act in accordance with Eastern culture (have high dialectical thinking, be more okay with contradictions). And when you speak English, you’re more likely to act in accordance with Western culture (have lower dialectical thinking, what things to be consistent).

Now this study is not without its faults. All of the participants were bilinguals, which may in and of itself account for higher perception of differences. Bilingualism alone does not, however, explain away the differences within this group of bilinguals. There is also, however, the fact that the participants were Chinese. They did have to know English well to be selected for this study, but the possibility remains that their (presumably) higher fluency in Chinese accounted for the more complex and varied reports of personality. Maybe they simply didn’t have as firm a grasp of the English language, and therefore couldn’t account for its nuances.

Either way, this study looks at how you see yourself, how you see others, even how you act – and it finds that culture, as drawn to the surface by what language you’re speaking, affects all of those things. Your culture has a lot to say about who you are, and language is a big part of that.

It’s fairly clear that an individual’s personality changes throughout the course of their lifetime, but most of the studies demonstrating that change only account for certain ages. Previous studies that have been really successful in looking at personality change over the lifespan failed to obtain significant amounts of old-aged adults as part of their samples (e.g., Lucas & Donnellan, 2011; Roberts & Delvecchio, 2000). Researchers Kandler, Kornadt, Hagemeyer, and Neyer decided to try and fill the informational age gap. Their work attempted to answer some of the underlying psychological questions that explain phenomena everyone witnesses, such as “Why does Grandma hate everything from the 21st century?” In their longitudinal study of twin pairs aged 64 to 89, Kandler and his colleagues found that contrary to what previous studies might suggest due to their lack of age-range, adults in later life still experience significant personality change.