We analyze the performance of the Berkeley parser on OntoNotes WSJ and the English Web Treebank .

Page 1, “Abstract”

Second, we use a set of regular expressions (henceforth “regexes”) that categorize the possible structures in the treebank .

Page 1, “Introduction”

After describing in more detail the basic framework, we show some aspects of the resulting analysis of the performance of the Berkeley parser (Petrov et al., 2008) on three datasets: (a) OntoNotes WSJ sections 2-21 (Weischedel et al., 2011)1, (b) OntoNotes WSJ section 22, and (c) the “Answers” section of the English Web Treebank (Bies et al., 2012).

Page 1, “Introduction”

1We refer only to the WSJ treebank portion of OntoNotes, which is roughly a subset of the Penn Treebank (Marcus et al., 1999) with annotation revisions including the addition of NML nodes.

Page 1, “Framework for analyzing parsing performance”

We derived the regexes via an iterative process of inspection of tree decomposition on dataset (a), together with taking advantage of the treebanking experience from some of the coauthors.

Page 2, “Framework for analyzing parsing performance”

The high coverage (%) reinforces the point that there is a limited number of core structures in the treebank .

recursive

Appears in 5 sentences as: recursive (5)

In Parser Evaluation Using Derivation Trees: A Complement to evalb

As described above, we are also interested in the type of linguistic construction represented by that one-level structure, each of which instantiates one of a few types - recursive coordination, simple head-and-sister, etc.

Page 2, “Framework for analyzing parsing performance”

(c) NP-modr is a regex for a recursive NP with a right modifier.

Page 2, “Framework for analyzing parsing performance”

(d) VP-crd is also a regex for a recursive structure, in this case for VP coordination, picking out the leftmost conjunct as the head of the structure.

Page 2, “Framework for analyzing parsing performance”

Also, the attachment score is not relevant for regexes that already express a recursive structure, such as NP—modr.

Page 3, “Framework for analyzing parsing performance”

attachment score does not apply to the recursive categories, as mentioned above.