Contents

Test collections

Performance measure: per token accuracy. (The convention is for this to be measured on all tokens, including punctuation tokens and other unambiguous tokens.)

English

Penn TreebankWall Street Journal (WSJ). The splits of data for this data set were not standardized early on (unlike for parsing) and early work uses various data splits defined by counts of tokens or by sections. Most work from 2002 on adopts the following data splits, introduced by Collins (2002):