I'm working with the results of a survey which has multiple questions. All answers (in this case) are categorical and ordinal (such as very unhappy, unhappy, neutral, happy, very happy).

I'm looking for a way to sort the questions from those with "worst results" to those with "best results". Getting the extremes is somewhat easy visually. If I plot the distribution of answers for each question, I can identify which questions have lots of 'good' answers (distribution is negatively skewed) or those with lots of 'bad' answers (positively skewed histogram). So picking the extremes is easy but this is also dependent on the data.

Quantitatively however, I don't know what to do. Since the answers are on an ordinal, but not an interval scale, I don't know how to calculate an aggregate number for each question. Perhaps giving a numerical value to each category (such as -2, -1, 0, 1 or 2) and summing up the results might work if there's nothing better, but I do realize that mathematically this is not accurate as this is not an interval scale.

Oh, I'm not a statistician, just a programmer. I hope there is a reasonable option to this, I can imagine it's a fairly common question with categorical data.

5 Answers
5

If all your questions have the same response scale and they are standard Likert items, scaling the item 1,2,3,4,5 and taking the mean is generally fine.

You can investigate the robustness of the rank ordering by experimenting with different scaling procedures (e.g., 0, 0, 0, 1, 1 is common where you want to assess the percentage happy or very happy; or agreeing or strongly agreeing). From my experience, such variants in scaling will give you almost identical question orderings. You could also explore optimal scaling principal components or some form of polytomous IRT approach is you wanted to be sophisticated.

A table with three columns would be fine: rank, item text, mean. You could also do the same thing with question on the x axis and mean on the y axis.

When validating a questionnaire, we often provide usual numerical summaries (mean $\pm$ sd, range, quartiles) to highlight ceiling/floor effect, that is higher response rate in the extreme range of the scale. Dotplots are also great tool to summarize such data.

This is just for visualization/summary purpose. If you want to get into more statistical stuff, you can use proportional odds model or ordinal logistic regression, for ordinal items, and multinomial regression, for discrete ones.

Without assuming your ordinal data is interval, you can compare any convenient quantiles - e.g. medians.

Or, when comparing X vs Y which are both ordered categorical, you can estimate something like P(Y>X) - P(X>Y) or P(Y>X) + 0.5 * P(Y=X) (etc.), where you estimate probabilities by proportions of course.

If I plot the distribution of answers for each question, I can identify which questions have lots of 'good' answers (distribution is negatively skewed) or those with lots of 'bad' answers (positively skewed histogram). So picking the extremes is easy but this is also dependent on the data.

Is an absolute ranking necessary? Like you point out, things may be fuzzier in the middle, so is it relevant to your investigation to distinguish between rank 8 and 9 (or whatever) based on some scoring method?

One approach would be to continue with what you stated above -- look at the distributions and categorize questions based on proportions of good/ok/bad based on the data. You might start with a mosaic plot (with questions as factors) to explore your data. This may help reveal criteria for collapsing questions into groups. Instead of piecemeal rankings, they get classified into categories (e.g. what might have been ranks 1-5 become category 1, etc).

It's not absolutely necessary; I was hoping whether there was such a technique so there's more rigour than eye-balling. If nothing exists, well, this is it, but perhaps there is.
–
wishihadabetternameSep 11 '10 at 21:04

Oh sorry, I didn't mean to imply there weren't any techniques, only that you might be better served with collapsing into categories rather than being misled by a distinction between ranks x and x+1. For example, I think the scoring method you propose is a fine way to start.
–
arsSep 11 '10 at 22:01

You can rank ordinal distributions by means of an intuitive dominance criterion: the answers to one question are better than the answers to another when it is more likely than not that a randomly chosen answer to the first will be better than a randomly chosen answer to the second.

In more detail: put all the answers to question $X$ into one hat and all the answers to question $Y$ into another hat. Draw one answer from each hat at random. We will compare these answers, which we can do because they are on an ordinal scale. Let's also agree to resolve any ties by flipping a fair coin. Let $p(X,Y)$ be the probability that the answer to $X$ is better than the answer to $Y$. Rank $X$ ahead of $Y$ when $p$ exceeds $1/2$ and rank $X$ behind $Y$ when $p$ is less than $1/2$. If $p$ equals $1/2$, declare a tie between $X$ and $Y$. (By virtue of our tie-resolution procedure, $p(X,Y) + p(Y,X) = 1$, implying the ranking does not depend on the sequence in which we draw the two answers.)

The calculation is a simple exercise for "just" a programmer (and a fun one if you are interested in efficient calculation, although that's unlikely to matter here). To make this proposal clear, though, I will illustrate it. Suppose all answers are on an integral scale from one to four, with four best. Write the answer distributions in the form $(k_1, k_2, k_3, k_4)$ where $k_3$ counts the number of "3"'s among the answers to a question, for example. For this example suppose $X$ has distribution $(4, 2, 0, 4)$ and $Y$ has distribution $(1, 6, 1, 2)$ (ten answers each). (Stop for a moment to consider which of these distributions ought to be considered "best" and note that they have identical means of 2.4 and identical medians of 2, suggesting this is a difficult comparison to make.) Then:

There is a 4/10 chance of drawing a "4" for $X$. In this case,

There is a 2/10 chance of drawing a "4" for $Y$ for a tie;

There is an 8/10 chance of drawing less than "4" for $Y$, a win for $X$.

I should have pointed out that this ordering is not necessarily transitive! E.g., on a nine-point scale (0,2,0,2,0,0,0,0,2) beats (2,0,0,0,0,2,0,2,0) beats (0,0,2,0,2,0,2,0,0) which beats the first one! See en.wikipedia.org/wiki/Nontransitive_dice .
–
whuber♦Nov 4 '10 at 20:17

I've already +1, but I wonder (maybe a naive question) how it would generalize to more than two items? (btw, shouldn't "integral" (3rd §) actually read "integer"?)
–
chl♦Nov 4 '10 at 22:52

1

@chl ("Integral" is the adjectival form of "integer".) With more than two items you can of course still rank them in pairs this way. If you have intransitivity you can resolve it in some ad hoc ways, such as computing the percentage of time one distribution is the "winner" among a group of three or more distributions. As you know, the subject of comparing distributions gets complex because it includes most of the paradoxes of voting theory. But I like this approach because it forces us to confront such inherent difficulties rather than letting us stay in ignorance of them.
–
whuber♦Nov 5 '10 at 1:18