Acquiescence and Social Desirability: Psychometric Bogeymen

Just as young children can be easily led to believe in non-existent monsters that hide under beds, ready to attack at night, it seems that many persons have been easily led to believe in psychometric bogeymen that lurk in the shadows, waiting to destroy the validity of personality tests. The most well-known such spectre is social desirability bias. Social desirability bias is the alleged tendency to claim socially-valued personality characteristics that you do not possess and to deny the possession of socially undesirable personality traits. Obviously if people respond to personality items only in ways they believe make them look good rather than the way they really are, this will invalidate the results of the personality test.

Another (and less well-known) hobgoblin that allegedly destroys personality test validity is acquiescent response bias (sometimes called yea-saying). Acquiescence refers to the tendency to agree with all personality items, regardless of the content of the item. If acquiescent response bias is operating, it is easy to see how this would invalidate the results of a test. Instead of selectively endorsing statements that actually apply to them, acquiescent respondents indiscriminately agree with all statements, therefore failing to describe their distinctive personality characteristics.

After over 30 years in the personality business, I normally don't think much about psychometric bogeymen such as social desirability bias and acquiescent response bias. After all, I was taught in graduate school that researchers such as Jack Block (The Challenge of Response Sets, 1965) and Leonard Rorer "The great response style myth," Psychological Bulletin, 1965) long ago laid to rest the rumor of response bias bogeymen. Those who continue to believe in response biases today are like children who cling to irrational fears about monsters.

But very recently the response bias issue reared its ugly head in a personal way. I had submitted for publication a manuscript on the development of a shorter version of the IPIP-NEO personality inventory. The IPIP-NEO is a 300-item questionnaire that measures constructs similar to the most widely-used commercial measure of the Five-Factor Model, the NEO Personality Inventory, Revised (NEO PI-R). The IPIP-NEO is a great free version of the NEO PI-R. It uses items from the public domain website, the International Personality Item Pool. If you like, you can take an online version of the 300-item IPIP-NEO and receive a narrative report describing your standing on the five broad major domains of personality and on 30 narrower facets of personality.

The only problem with the original IPIP-NEO is that it is long. Very long. At 300 items, it is even longer than the 240-item NEO PI-R on which it was based. That is why I constructed a shorter version by identifying 120 of the original 300 items that seemed capable of representing the five broad domains and 30 facets without much drop-off in reliability. The reviewers and editor for my manuscript thought that this shorter inventory, which I call the IPIP-NEO-120, had good potential, but they had one serious reservation: I had not made an effort to identify equal numbers of positively-worded and negatively-worded items for all 30 facets. For example, the Anxiety facet scale had all positively-worded items, where agreement represents the presence of anxiety: "Worry about things." "Fear for the worst." "Am afraid of many things." And "Get stressed out easily." I could have included negatively-worded items from the original IPIP-NEO such as "Am not easily bothered by things." And "Am relaxed most of the time." but I did not. (The reason I did not was because I was more concerned about creating scales with the highest degree of internal-consistency reliability.)

The problem with not including equal numbers of positively- and negatively-worded items, according to the reviewers, is that individuals who are prone to acquiescent response bias will score higher than they should on scales with all positively-worded items and lower than they should on scales with all negatively-worded items. (The converse would be true for people with a bias to disagree with personality items, sometimes called nay-saying bias.)

Fortunately, the editor and reviewers gave me an out. They suggested a procedure for examining whether or not acquiescent response bias plus imbalance between positively- and negatively-worded items was adversely affecting the IPIP-NEO-120. If you are interested in the technical details of the procedure, you can find a description in the appendix of an article by Soto, John, Gosling, and Potter, published in 2008 in the Journal of Personality and Social Psychology (Vol. 94, pp. 718-737). But, basically, the procedure involves creating an Acquiescence Index (AI), a measure of individuals' tendencies toward acquiescent response bias, and then seeing whether correcting for AI produces higher validity coefficients and a clearer factor structure.

So, I followed the procedure and guess what? IPIP-NEO-120 scales that, when scored normally, had correlated significantly with relevant acquaintance ratings, now failed to correlate with the ratings when scores were adjusted for acquiescent response bias. And the average factor congruence between the IPIP-NEO-120 and the parent NEO PI-R was .93 when scales were scored normally; after correction for acquiescence, average congruence decreased to .70. In short, correcting for acquiescent response bias decreased the validity of the IPIP-NEO-120 scores. My conclusion? Acquiescent response bias is a psychometric bogeyman. Like any imaginary monster, it is best ignored.

The issue of social desirability, I believe, is a little more complicated than the alleged problem of acquiescent responding, and I am not going to present data here to discredit the existence of social desirability responding. Instead, I would simply like to discuss the alleged problem of social desirability and to make a few points as food for thought. First, let me describe the position of researchers and practitioners who see social desirability bias as a threat to personality scale validity.

The principle worry from those who take social desirability bias seriously is that some people care more about making a good impression on a personality test than about objectively describing their actual thoughts, feelings, and behaviors. Different researchers have different opinions about how many people are affected by this bias. One of the original social desirability theorists, Allen Edwards, was fond of pointing out the frequency of agreement with personality items correlated .80 or more with the independently rated social desirabilities of the items, which might imply that for everyone the probability of agreeing with an item was almost completely determined by the social desirability of the item.

Yet most researchers recognize social desirability bias as an individual-differences variable, with some strongly affected by it and others, less so. Edwards himself created a Social Desirability Scale from MMPI items to measure differences in the bias. Another widely-used measure of social desirability bias is the Marlowe-Crowne Social Desirability Scale. Whereas the Edwards scale is heavily laden with psychopathology, the Marlowe-Crowne items refer to uncommon but socially desirable behaviors. Still another well-known measure is the Good Impression scale from the California Psychological Inventory. This scale was created by having a group of subjects respond to a set of items twice—first as they ordinarily would, and then with instructions to respond as if they were trying to create as favorable impression as possible. Items answered differently between the two occasions became the Good Impression scale. Finally, probably the most sophisticated effort at measuring social desirability today is a set of scales authored by Del Paulhus, called the Balanced Inventory of Desirable Responding (BIDR). The BIDR contains two scales, one to assess unintentional tendencies to present an exaggerated positive image of oneself, and a second to assess intentional, deliberate tendencies to create a positive impression. Paulhus describes his measure as well as the Edwards Social Desirability Scale, Marlowe-Crowne and Good Impression in a 1991 book chapter.

As you see, there has been a considerable amount of research over the past 50 years on social desirability bias. Many researchers still take social desirability bias as a serious threat to personality scale validity. Often when a new personality measure is introduced, people will want to know how strongly the scale correlates with one of the aforementioned measures of social desirability bias. If the correlation is too high, researchers will say that the scale is "contaminated" by social desirability, which means that respondents may be answering in terms of the desirability rather than the content of the items.

Practitioners are also gravely concerned about respondents providing desirable rather than honest answers, because such bias could cause incorrect decisions to be made about the respondents. A prime example of this would be the use of personality measures in personnel selection. Companies are generally looking for individuals with socially positive traits, individuals who are friendly, cooperative, conscientious, and free from anxiety, depression, and hostility. If job applicants exaggerate their positive traits and minimize their negative traits, the wrong kind of people might be hired for the job.

Now you have heard the concerns of researchers and practitioners about social desirability bias. If you had not thought about this issue before, you might find yourself agreeing that this bias represents a serious threat to the validity of personality scores. And you should at least consider the possibility that such a threat exists because it is reasonable to think of social desirability bias as a potential threat. It is easy to imagine job applicants answering questions in ways that they think will make them look like the kind of person who would get hired, rather than the kind of person they are. And what would stop them from doing this?

Let me now raise some points that might make you think twice about whether the potential for biased responding actually occurs in real-life testing situations. (I won't discuss artificial laboratory settings, where it is easy to obtain artificial results.) Let's begin with that situation where many psychologists think that people would be most likely to bias their answers in a socially desirable way: personnel selection. Here's the question I would like to raise: when job applicants engage in a face-to-face interview, do they respond to the interviewer's questions with blunt, disinterested, objectivity honesty, or do they try to tailor their answers in ways that they think will make them look good? Obviously any interviewee with at least a modicum of social intelligence will try to look good. So the real question is not whether job applicants are trying to create a good impression on personality tests (of course they are), but whether their responses are any more or less valid than the responses of a job applicant in a face-to-face interview.

Those who worry about social desirability bias on test might speculate that it is easier to say untrue things on a personality test because in a live interview the interviewer can compare the applicant's verbal and nonverbal behavior to determine how truthful the applicant is being. But that is just speculation. One can just as easily hypothesize that being the focus of an interviewer's few pointed questions might cause an applicant to be even more concerned about saying what he or she thinks the interviewer wants to hear. On the other hand, when taking a personality inventory (especially a long one) there are so many questions to answer that trying to figure out the "desirable" answer becomes burdensome. It is much easier to go with your automatic reactions, which reflect your habitual (i.e., typical, valid) personality tendencies.

You might want to consider the possibility that responding to items on a personality inventory is not so different from conversing with people in everyday life. When we interact with people in everyday life, our main goal is usually not to communicate lists of objective, scientific facts about ourselves. Instead, we are trying to achieve a variety of interpersonal goals such as entertaining others so they will like us, impressing potential romantic partners, eliciting help when we feel weak or unwell, convincing rivals that we are smarter or tougher than they are, and so forth. A person's "personality" represents his or her habitual, typical strategies—both verbal and nonverbal—for pursuing goals that are important to the person. For several decades I have been suggesting that personality item responses are an extension of a person's habitual interpersonal style, not a detached, disinterested, objective reporting of behavior. Here is an early article on the topic; here is a more recent one.

Another point I think you should consider is that, in everyday life, not everyone is equally interested in creating a socially desirable impression. Some people (e.g., professional critics) strike us as cold, harsh, and unfeeling. Other individuals (e.g., delinquents) come across as tough, aggressive, and hostile. Still others (e.g., hypochondriacs) whine about how their inability to cope with life. Depressed people express sorrow and lack of pleasure. Anxious people constantly talk about their unrealistic worries. None of these interpersonal styles would be described as "socially desirable" (although the styles do accomplish certain interpersonal goals). And guess what? On valid personality inventories, critics score low on agreeableness, delinquents score low on both agreeableness and conscientiousness, and the scores of neurotic people reflect the specific dysphorias they express in everyday life. Fact: people do not always answer personality items in socially desirable ways. Personality scores predict both desirable and undesirable real-life behaviors.

The conclusion I have drawn about social desirability is that people vary widely in the social desirability of their everyday behavior, and these differences tend to be reflected correspondingly in the social desirability of their responses to personality items. (There could be exceptions to this, discussed intelligently by Del Paulhus in the Future Research Directions section of his book chapter.) This means that if we try to remove social desirability statistically, we will decrease the validity of personality scores, and that is exactly what McCrae and Costa (1983) found. Just like when I removed acquiescent response bias from the IPIP-NEO-120. Acquiescence and social desirability? Bogeymen.

I am really glad that you believe that Social Desirability is a Psychometric Bogeyman because I have always wanted to use personality tests as an additional aid in selecting staff but believed implicitly that "surely it would be easy to cheat at personality questionnaire; one just had to choose the answers that will make him/her appear good to others." But you say that this can’t be done! I would like now to make two comments arising from reading your post and from personal experiences, bearing in mind that my knowledge and expertise in psychology do not go beyond my reading books and articles written for the general public.

1. I would think that one way of testing for the validity and for improving a personality test would be to take a representative sample of those who have done it and then track their performance in their jobs as observed by colleagues and line managers after a few years of working together. Has such a study been undertaken?

2. Personal face-to-face interviews are dynamic and allows for more freedom of exploration compared with a test questionnaire. One forms and reforms questions according to the responses and body language one receives and perceives. For example, I remember interviewing a person for a senior position for an institute who was very good on paper but noticed that he had lied in answering a question (he turned a bit red, sweated and scratched the back of his neck) which prompted me to ask some more specific questions and made me conclude that he was not the candidate we were looking for. But since we had no alternative we employed him after engaging ourselves in some doubt about our perception of him and a bit of wishful thinking. Unfortunately we had to ask him to leave within six months because his behaviour confirmed our initial face-to-face assessment of him and contrary to what he put down in his CV. What evidence is there that such a person couldn’t cheat a personality test? And should I rely more on a personality test than on my impressions in a face-to-face interview?

Ibn, it is standard procedure to follow up those who have taken a personality test to see how well it measures performance. Normally we choose tests that have already been demonstrated to predict job behavior, or we chose tests that measure differences in performance among incumbents. Properly chosen tests predict future behavior well.

The validity of interviews depends on the perceptiveness of the interviewer, and interviewers often overrate their own perceptiveness. The flexibility of interviews could be considered an advantage, but there is also a problem of in that the procedure then become non-standardized. Fortunately we do not have to choose between personality tests and interviews, and I would not want to rely totally on one or the other.

I'm not saying that cheating on personality tests never occurs. Neither would I say that people cannot lie convincingly during interviews. We do have validity scales that can detect attempts to dissemble, if you are worried about that. One kind of validity scale is called Unlikely Virtues. Each item describes a generally desirable characteristic that is probably rare in the general population. A person who endorses too many of these items is almost surely exaggerating that desirable traits.