art: approximate randomization testing
This package performs approximate randomization testing for corpus-wide differences in F1 score or accuracy. It is easily extensible for other metrics. Furthermore, it ships with a script that transforms the output from the CoNLL scorer for coreference resolution into the suitable format.