This function computes Cohen’s kappa [1], a score that expresses the level
of agreement between two annotators on a classification problem. It is
defined as

\[\kappa = (p_o - p_e) / (1 - p_e)\]

where \(p_o\) is the empirical probability of agreement on the label
assigned to any sample (the observed agreement ratio), and \(p_e\) is
the expected agreement when both annotators assign labels randomly.
\(p_e\) is estimated using a per-annotator empirical prior over the
class labels [2].