Cognitive Sciences Stack Exchange is a question and answer site for practitioners, researchers, and students in cognitive science, psychology, neuroscience, and psychiatry. It's 100% free, no registration required.

Is there a good way to analyze reaction times and accuracy together, other than MANOVA?

I have data from an experiment in which participants had to respond to stimuli in two different conditions in a within-subject design. And my impression is that some subjects differ in accuracy between those two conditions, while some others differ in reaction time.
How can I test such a hypothesis?

MANOVA is definitely a bad idea given that one dv is continuous and the other is binomial. After exploring a number of different approaches to combining RT and accuracy data, I've come to conclude that the best current approach is to use linear ballistic accumulator model (e.g., see Donkin et al 2011).

The LBA is a simple (structurally and computationally) framework that lets you speak to the different processes (info processing efficiency vs response criterion) that jointly contribute to RT and error data.

These days I prefer LBA because it's computationally simple/faster to fit and because I haven't seen any demonstrations of cases where diffusion can fit data that LBA can't
–
Mike LawrenceMar 30 '12 at 11:00

There are a variety of models solving accuracy and RT that have been pretty well tested and LBA is probably fine (I haven't used it). If you don't want to go that far there is a rather simple way to analyze data controlling for SAT that has much better mathematical properties than IE scores (which, as Mike said were named by me, but offhandedly proposed by Townsend & Asby, easily conceptualized as related to older rate of information scores holding information constant, and probably popularized most by Shore).

The first problem with the IE transformation (rt in ms ÷ proportion correct) is that it assumes a linear relationship between RT and acc. That's clearly not the case. While one can often achieve a linear relationship between a predictor and RT, the relationship between a predictor and accuracy is invariably an ogive. One can make it much more linear by transforming accuracy to logit or log-odds scores (keep in mind accuracy, and in most cases even RT, are COMPLETELY arbitrary representations of what they measure). Furthermore, rt has much better statistical properties represented as responses/second than seconds/response. So, taking 1/rt in seconds would make that data more normal. Therefore, logit / inverse RT scores might be a better transform. But it's still a transform into some unknown score... I think we could call it Linearized Inverse Efficiency (or L.I.E) :)... OK, it's not actually inverse efficiency anymore since it's not an inverse rate but an accuracy corrected rate.. how about A.C.E, accuracy corrected efficiency?)

But... if you're going to go that far, why not just model the logistic regression on RT in each condition? You could then hold the RT constant for each condition (maybe the grand mean) and look at the changes in predicted accuracy across conditions. That would be a reasonable way to combine the two.

The only issue I've run into with that last one is that it's all about the leading edge of your RT distribution. You need to hack off everything after accuracy asymptotes. If what you intended to measure is the immediate response to a stimulus then that's perfectly fine. If you want to capture something about the tails of the distributions it might not be well represented, but you could look at that separately. You could keep that later data by just making the logistic regression quadratic. On the flip side, one advantage you get is that you actually make use of all of the early low accuracy RTs.

This method does require those low accuracy RTs so you do, in general, have to encourage speed in the experiment. That should also be done with any transform or model of RT and accuracy because you have to have some accuracy variance to work with.

(one thing I haven't tried, which would probably work, is just entering RT into a multilevel logistic regression of accuracy. If you include it as an interaction term you can then examine the predicted scores holding it constant.)

Looking at the pros/cons to using a binomial mixed effects model of accuracy as a function of RT and other predictors is one of the projects I had planned before this discussion for our summer stats student, though I'm thinking of looking at both linear and generalized additive models of the effect of RT.
–
Mike LawrenceMar 30 '12 at 14:19

I didn't quite understand which of the mentioned approaches is the one proposed by Townsend & Ashby. The one you dubbed L.I.E? :)
–
PavelMar 31 '12 at 22:56

As for your suggestion of modeling the logistic regression on RT in each condition. Do you mean regressing response on RT (or some transformation thereof), condition and the interaction thereof? This should capture the differences in the speed-accuracy tradeoff functions, but it seems that such a model would miss differences in RT, if the SATF is the same in both conditions.
–
PavelMar 31 '12 at 23:16

Pavel, the LIE is my own suggestion for a an improved implementation of the transformation suggested by T&A that is now just done as rt in ms / acc (proportion - the IE score). The logistic regression suggestion will be sensitive to rt because then you analyze the accuracy at a fixed rt using predicted values from the regression. If the rt varied but the SAT remained the same then it will be reflected in varying accuracy scores at that rt.
–
JohnMar 31 '12 at 23:44

Yes, right. I guess one could come up with theoretical situations with non-monotonic SATFs, but a logistic regression of this kind should work fine for most practical purposes, I guess. Thanks.
–
PavelMar 31 '12 at 23:57

I recently had similar problem and I used inverse efficiency (IE) scores. These scores
were derived by dividing the response times by correct response rates separately for each condition, carried out in such a way that the higher the score was, the worse was the performance. So you get something like "corrected reaction time" scores. Here is example of paper that uses it - check Experiment 2 on page 144:

Unfortunately inverse efficiency assumes a particular scaling of RT and error rate that is completely unsupported. Even Dr. John Christie, a colleague of mine that gave IE its name, has since come to completely oppose its use.
–
Mike LawrenceMar 30 '12 at 11:06

Good to know @Mike, thank you. I will give IBA a shot when I get my head around it. It's interesting tho, that in few papers investigating multisensory integration of emotional information (paper above, and this other paper) you can find the use of IE as a standard procedure...
–
Geek On AcidMar 30 '12 at 11:24

IE has become lamentably standard in some fields partly, I suspect, because it feels somewhat intuitive and is certainly an easy "solution" to the vexing problem of combining speed and accuracy. Admittedly, I don't have data explicitly demonstrating the invalidity of the speed-accuracy scaling assumed by IE (though I now think I'll task a student to generate this data through simulations this summer), but it would seem rather remarkable if the simple IE scaling, which at least rather questionably models accuracy as a proportion (see Dixon, 2008, "Models of accuracy..."), comes out as valid.
–
Mike LawrenceMar 30 '12 at 12:16

I think that, after having been in the presence of numerous people using it, including myself, it's become common because it can make you effects bigger and make the very difficult to explain SAT go away. Neither of those are great reasons to do anything.
–
JohnMar 30 '12 at 12:24

This model is extremely simple, with just one source of variability in
evidence accumulation—within-trial randomness—and simple linear
accumulation (although evidence for one response does count against
the other). The EZ-diffusion model is even simpler than the LBA, but
it is incomplete. Wagen- makers et al. proposed the EZ-diffusion as a
descriptive rather than process model, with the aim of adequately
describing data as simply as possible. The tradeoff in developing such
a simple model was that it could not account for some of the empirical
phenomena in choice RT, such as the relative speed of correct vs.
incorrect responses.

As mentioned in the other comments, ANOVA is problematic when mixing types of predictor variables. (Generalized) mixed effects models are gaining popularity these days and actually provide a very convenient way for modelling such things. A paper demonstrating the efficacy of this approach as well as giving a tutorial-like introduction is: