Tagged Questions

Cohen's kappa is a measure of the degree to which 2 raters agree. There is also a test of inter-rater agreement based on kappa. Use [inter-rater] if you are interested in other aspects of IRA, but not this specific measure.

I have 10 subjects who filled questionnaire of 28 items. Each item has 5 options (scale 1 to 5). However when I compute in SPSS using weighted kappa with quadratic weighing versus ICC two-way mixed ...

Can someone tell me when is it appropriate to use the Kappa statistic? Also why to use it when one can use Area Under the ROC curve? Or even the Area under the precision-recall curve? So what are the ...

I understand the formula behind the Kappa statistic value and how to calculate the O and E value from a confusion matrix.
My question is what is the intuition behind this measure? Why does it work so ...

I have collected three human raters scores for an essay question(N=54). I am building Automated Scoring Algorithem. So I need to compare my machine score with the Human rater score. All are using 0-10 ...

Bland-Altman plot measure the agreement between two different methods which measures the same variable. As far as I understand Bland-Altman can measure only if the two methods have the same unit of ...

I have a data set where four coders are rating 800 items on various attributes. This is achieved by reporting a count of the prevalence of each attribute, e.g. rater 1 thinks attribute A appears in ...

I have two different medical diagnostic tests, both test the same condition (binary outcome). The condition in question is rather vaguely defined, so they don't always agree. What would be the best ...

I am staring a project on comparing standard ways of creating a classifier with some heuristic methods. The heuristic methods should result in a faster training for the classifier but should result in ...

I'm reading a data mining book and it mentioned the Kappa statistic as a mean for evaluating the predication performance of classifiers. However I just can't understand this. I also checked wikipedia ...

A group of raters (about 20) will be watching a series of videos and will be classifying them into 4 categories. I will be running a Fleiss' kappa to measure the agreement. How does one compute for ...

I have a data set, with each variable taking multiple values on a nominal scale. Separate raters could rate a given unit using more than one value per variable. That is, there are multiple ratings per ...

Are there any limitations for using Cohen's kappa with sparse data? I need inter-rater agreement between 2 raters for ~15 items, and the data in the contingency table is quite sparse (0 in some cells, ...

Much has been written on the ICC and Kappa, but there seems to be disagreement on the best measures to consider.
My purpose is to identify some measure which shows whether there was agreement between ...

I found a question that is related here, but it doesn't really goes on what I want to know.
I found a couple of papers using Kappa Statistic from 2006, and 2010, but afterwards I found other authors ...

The Kappa ($\kappa$) test is a Z-test kind of test. If I am not very wrong, to compute the $\kappa$ test, we can just estimate the appropriate variance $\hat {var}(\hat\kappa)$ for the kappa statistic ...

I am trying to calculate kappa scores for present/absent decisions made by two raters and I have heard that they can be adjusted for prevalence of the object of measurement.
Can anyone advise on how ...

I have data on two processes, where the process assigns elements into ordered bins. I am interested in testing for agreement between the processes. What is the best way to do this (R code)? Here is ...

Which inter-rater reliability methods are most appropriate for ordinal or interval data?
I believe that "Joint probability of agreement" or "Kappa" are designed for nominal data. Whilst "Pearson" and ...