One of the key requirements for demonstrating the validity and reliability of an assessment method is that annotators are able to apply it consistently. This paper explores inter-annotator agreement for error classification task and investigates some of the facts that contribute to low IAA.