By David Crystal

New from Cambridge University Press!

By Peter Mark Roget

This book "supplies a vocabulary of English words and idiomatic phrases 'arranged … according to the ideas which they express'. The thesaurus, continually expanded and updated, has always remained in print, but this reissued first edition shows the impressive breadth of Roget's own knowledge and interests."

As described in this paper, we specifically examine the structural learning problem of a supertagging task. Supertagging is a task to assign the most probable lexical entry to each word in a sentence. A supertagger is extremely important for a lexicalized grammar parser because an accurate supertagger can greatly reduce lexical ambiguity in downstream parser. Supertagging is more challenging than conventional sequence labeling tasks (e.g., part-of-speech tagging). First, the supertags are numerous. Supertags are the lexical entries defined in a lexicalized grammar, which consists of rich syntactic/semantic information. Second, the inter-supertag relation is more complex. A proper supertag assignment is expected to be compatible with other supertag assignments in a sentence to construct a parse tree. Commonly used adjacent label features (e.g., first-order edge feature) in a sequence labeling model are too rough for the supertagging task. Long-range information is extremely important for the supertagging task. Two approaches to consider long-range information in a supertagger's training stage are proposed. Specifically, we propose a dependency-informed supertagger to use word-to-word dependency derived from a dependency parser and generate long-range features as soft constraints in the training. In the forest-guided supertagger, we constrain the classifier to learn in a grammar-satisfying space and use a CFG filter to impose grammar constraints for the update of model parameters. The experiments show that the proposed structure-guided supertaggers perform significantly better than the baseline supertaggers. Based on the improved supertaggers, the F-score of the final parser is also improved. Using the forest-guided supertagger in a shift-reduce HPSG parser, we achieved a competitive parsing performance of 89.31% F-score with higher parsing speed than that of a state-of-the-art HPSG parser.

This article appears in Natural Language Engineering Vol. 18, Issue 2, which you can read on Cambridge's site
.