Mar

19

Last week I mentioned a problem with scoring questions when each of dozens of true/ false questions had not been scored true or false (as one might think) or 1 or 0 (as one might think in mathematical terms) but, no, in some bizarre Alice in Wonderland mushroom-eating logic, each question was recorded as the answer to TWO VARIABLES. The first was 1 if the person answered true, and missing otherwise. The second variable was coded 1 if the person responded false, and missing otherwise.

Last week, I also gave the solution to scoring these in Normal World, where we ended up with variables scored 0 or 1.

Just because that way of recording data was not fucked up enough, this little data problem presented itself:

Respondents were asked to rate their ability to read, write and speak a second language on a scale from 1 = None to 5 = Native speaker.

You might assume that these would be scored on a scale of 1 to 5 for three variables. You think that my friend, because you are not stupid.

You might assume that if these were for some unfathomable reason scored as fifteen variables that the first variable would be 1 if the respondent answered for the first question and missing otherwise. The second variable would be 1 if the respondent answered 2 for the first question and missing otherwise. You think this because you noted a pattern above and are logical.

Neither of those assumptions are true. In this case, the data were coded as so:

If that wasn’t enough to make you pull your hair out, after I scored it, I found out that some people had a score of 9 on a 1 to 5 scale.

In examining the data, it turned out that a few people had checked both 4 = Advanced ability and 5= Native speaker. While I understand how people could see those as not mutually exclusive categories and check both, the researcher wanted these people to have a score of 5.

Simply stated, the problem is this:

Take these 15 variables and code them into three questions. When respondents selected two choices, assign the the larger value.

I could have used the SUM function if it wasn’t for the people who checked both 4 and 5. Using the MAX function gives those people a score of 5. Also, we had a discussion with the research team about (hypothetically) people who checked both 2 and 3, for example, because they felt their reading ability fell between basic and intermediate. In that case, their score would be rounded up to the next whole number. The MAX function then, would give a 3, so also working in that case, which didn’t actually occur in these data yet, but we like to be prepared.

[…] few days ago, I gave a solution when you have, for some bizarro reason, questions on a one to five scale coded into five different variables. Before that, I had a similar problem when true / false questions were coded into two […]