A pathologist friend came to me for help with the following question for a research project. The goal is to compare the effectiveness of three different diagnostic techniques. The data set is as follows: there are 50 different specimens, each specimen was evaluated by 4 pathologists, and 3 different instruments (ie 600 total diagnoses). Each case has a possible diagnosis of positive or negative, and the true results are known, as they have been independently determined. The success rate depends on both the quality of the instrument and the skill of the pathologist, and we cannot assume that the four pathologists have the same proficiency. Finally, even though each person measured the same specimen 3 times, they can be treated as independent measurements.

What are the appropriate tests for comparing effectiveness among the instruments?

Thanks.

ADDED:
Lots of good info in the answers, thanks to both. Any thoughts on how ROC and randomized block compare/contrast?

I'm not sure I've digested it enough to know which method is "better". Since the results need to be communicated to a certain audience, it probably depends on which is more widely used among that audience.

Good first question! I'm not convinced each person's three measurements on the same specimen can really be treated as independent though. Did they know (or were they likely to realise) that the specimen was the same? Did they know the 'true' results when they made the measurements? Also do you know how many of the 'true' results were positive and how many were negative?
–
onestopMar 29 '11 at 12:06

You're right to worry about the independence of the measurements, but I think they did a pretty good job designing the experiment to minimize any memory between measurements (they spread out the diagnoses over several sessions and would only run a randomized subset of specimens on a single instrument during a session). They did not know the true results at the time. I can find out what the positive result rate was, but I think it's roughly 50%.
–
Gregg LMar 29 '11 at 23:00

2 Answers
2

As it is described in the original post, the experiment is a randomized block.

Pathologist (4 levels) is a blocking factor; the experiment is repeated within each pathologist.

Instrument (3 levels) and the true result (2 levels) of the test are the two treatments, which I assume were assigned randomly.

Consider the different specimens to be replications of each treatment combination.

The one response variable is whether the pathologist's diagnosis is correct (2 levels).

Because the result is categorical, the link function will need to be something like logit or probit. Here's some R code that does that. It may need to be extended depending on your friend's hypotheses.

The coefficients from a logit model can be interpreted in relation to odds ratios. For a particular combination of predictors, the model estimates an odds ratio. The individual coefficients indicate how the odds ratio changes depending on the predictors.

If your friend doesn't care about distinguishing between type I and type II error, he or she can drop true result predictor from the model.