Thursday, November 2, 2017

Statistical Sins: Hello from the 'Other' Side

I'm currently analyzing data from our job analysis survey (in fact, look for a post in the near future about why it's important for psychometricians to remember how to solve systems of equations), and saved the analysis of demographics for last. Why? Because I'm fighting a battle with responses in the 'Other' category for some of these questions. I think I'm winning. Maybe.

I've blogged about survey design before. But I've never really discussed this concept of having an 'Other' category in your items. You assume when you write a survey that people will find a selection that matches their situation and for those few cases where no options match, you have 'Other' to capture those responses. You then ask people to specify why they are 'Other' so you can create additional categories to capture them in tables.

Or, you know, you could have what people actually do and end up with a bunch of people whose situation perfectly fits one of the existing categories and instead selects 'Other,' then writes an almost word-for-word version of an existing category in the specify box. For instance, in one item on our survey, we had 36 people select 'Other'. After I had looked at their responses and placed the ones that fit an existing category into that box, I had 7 'Other's left.

Actually, that's the second best outcome to hope for when allowing for 'Other.' More often, you get vaguely worded responses that could fit in any number of existing categories, if you only had the proper details. For example, in another item on our survey, I have 17 'Others.' I have no idea where to put two-thirds of them because they lack enough detail for me to choose between 2-3 existing options.

Fortunately, those two items are the standouts, and for remaining questions with 'Other' options, only 4-5 people selected them. Even for those few other responses that are gibberish, I'm not losing a lot of cases by calling them unclassifiable. But still, going through other responses is time consuming and requires a lot of judgement calls.

Obviously, what you want to do is minimize the number of 'Other' responses from the beginning. I know this is far easier said than done. But there are some tricks.

Get experts involved in the development of your survey. And I don't just mean experts in survey design (yes, those too, but...); I mean people with expertise in the topic being surveyed. Ask them what terms they use to describe these categories. And ask them what terms their coworkers and subordinates use to describe these categories. Find potential responses that are widely used and as unambiguous as possible. You'll still have a few stragglers who don't know the terms you're using, but you'll hopefully minimize your stragglers.

If possible, pilot test your survey with people who work in the field. And if your survey is very complex, consider doing cognitive interviews (look for a future blog post on that).

Find a balance in the number of options. What this really comes down to is balancing cognitive effort. You want to have enough to cover relevant situations, because that requires less cognitive effort from you when analyzing your data. You just run your descriptives and away we go.

But you also need to minimize response options to a number people can hold in memory at one time. The question above with the 17 other responses was also the question with the most response options. More isn't always better. Sometimes it's worse. I think for this item, we just got greedy about how much information and delineation we wanted in our responses. But if your response options become TL;DR, you'll get people skipping right to 'Other' because that requires less cognitive effort from them.

Balancing cognitive effort won't be 50/50. Someone is always going to pay with more cognitive effort than they'd like to exert and that someone should almost always be you. If you instead make your respondents pay with more effort than they'd like to exert, you end up with junk data or no data (because people stop completing the survey).

And of course, decide whether you care about other at all. If there are only a fixed number of situations and you think you have all of them addressed, you could try just dropping the other category altogether. Know that you'll probably have people skip the question as a result, if their situation isn't addressed. But if you only care to know if X, Y, or Z situations apply to respondents, that might be okay. This comes down to knowing your goal for every question you ask. If you're not really sure what the goal is of a question, maybe you don't need that question at all. As with number of response options, you also want to minimize the number of questions, by dropping any items that aren't essential. It's better to have 100 complete responses on fewer questions than 25 complete and 75 partial responses on more questions.