Article Structure

Abstract

This paper presents the first dependency model for a shift-reduce CCG parser.

Introduction

Combinatory Categorial Grammar (CCG; Steedman (2000)) is able to derive typed dependency structures (Hockenmaier, 2003; Clark and Curran, 2007), providing a useful approximation to the underlying predicate-argument relations of “who did what to whom”.

Shift-Reduce with Beam-Search

This section describes how shift-reduce techniques can be applied to CCG, following Zhang and Clark (2011).

The Dependency Model

Categories in CCG are either basic (such as NP and PP) or complex (such as (S[dcl]\NP)/NP).

Experiments

We implement our shift-reduce parser on top of the core C&C code base (Clark and Curran, 2007) and evaluate it against the shift-reduce parser of Zhang and Clark (2011) (henceforth Z&C) and the chart-based normal-form and hybrid models of Clark and Curran (2007).

Conclusion

We have presented a dependency model for a shift-reduce CCG parser, which fully aligns CCG parsing with the left-to-right, incremental nature of a shift-reduce parser.

Topics

CCG

Appears in 31 sentences as: CCG (34)

In Shift-Reduce CCG Parsing with a Dependency Model

This paper presents the first dependency model for a shift-reduce CCG parser.

Page 1, “Abstract”

Modelling dependencies is desirable for a number of reasons, including handling the “spurious” ambiguity of CCG; fitting well with the theory of CCG ; and optimizing for structures which are evaluated at test time.

Page 1, “Abstract”

Standard CCGBank tests show the model achieves up to 1.05 labeled F-score improvements over three existing, competitive CCG parsing models.

Page 1, “Abstract”

Combinatory Categorial Grammar ( CCG ; Steedman (2000)) is able to derive typed dependency structures (Hockenmaier, 2003; Clark and Curran, 2007), providing a useful approximation to the underlying predicate-argument relations of “who did what to whom”.

To achieve its expressiveness, CCG exhibits so-called “spurious” ambiguity, permitting many nonstandard surface derivations which ease the recovery of certain dependencies, especially those arising from type-raising and composition.

Page 1, “Introduction”

But this raises the question of what is the most suitable model for CCG : should we model the derivations, the dependencies, or both?

Page 1, “Introduction”

Modelling dependencies, as a proxy for the semantic interpretation, fits well with the theory of CCG , in which Steedman (2000) argues that the derivation is merely a “trace” of the underlying syntactic process, and that the structure which is built, and predicated over when applying constraints on grammaticality, is the semantic interpretation.

Page 1, “Introduction”

And third, it has been argued that dependencies are an ideal representation for parser evaluation, especially for CCG (Briscoe and Carroll, 2006; Clark and Hockenmaier, 2002), and so optimizing for dependency recovery makes sense from an evaluation perspective.

Page 1, “Introduction”

In this paper, we fill a gap in the literature by developing the first dependency model for a shift-reduce CCG parser.

Page 1, “Introduction”

Shift-reduce parsing applies naturally to CCG (Zhang and Clark, 2011), and the left-to-right, incremental nature of the decoding fits with CCG’s cognitive claims.

gold-standard

A challenge arises from the fact that the oracle needs to keep track of exponentially many gold-standard derivations, which is solved by integrating a packed parse forest with the beam-search decoder.

Page 1, “Abstract”

and Curran, 2007) is to model derivations directly, restricting the gold-standard to be the normal-form derivations (Eisner, 1996) from CCGBank (Hockenmaier and Steedman, 2007).

Page 1, “Introduction”

Clark and Curran (2006) show how the dependency model from Clark and Curran (2007) extends naturally to the partial-training case, and also how to obtain dependency data cheaply from gold-standard lexical category sequences alone.

Page 1, “Introduction”

A challenge arises from the potentially exponential number of derivations leading to a gold-standard dependency structure, which the oracle needs to keep track of.

Page 2, “Introduction”

The derivations are not explicitly part of the data, since the forest is built from the gold-standard dependencies.

Page 2, “Introduction”

We refer to the shift-reduce model of Zhang and Clark (2011) as the normal-form model, where the oracle for each sentence specifies a unique sequence of gold-standard actions which produces the corresponding normal-form derivation.

Page 2, “Shift-Reduce with Beam-Search”

In the next section, we describe a dependency oracle which considers all sequences of actions producing a gold-standard dependency structure to be correct.

Page 2, “Shift-Reduce with Beam-Search”

However, the difference compared to the normal-form model is that we do not assume a single gold-standard sequence of actions.

Page 3, “The Dependency Model”

Similar to Goldberg and Nivre (2012), we define an oracle which determines, for a gold-standard dependency structure, G, what the valid transition sequences are (i.e.

Page 3, “The Dependency Model”

The dependency model requires all the conjunctive and disjunctive nodes of Q that are part of the derivations leading to a gold-standard dependency structure G. We refer to such derivations as correct derivations and the packed forest containing all these derivations as the oracle forest, denoted as Q0, which is a subset of Q.

Page 4, “The Dependency Model”

The main intuition behind the algorithm is that a gold-standard dependency structure decomposes over derivations; thus gold-standard dependencies realized at conjunctive nodes can be counted when Q is built, and all nodes that are part of Q0 can then be marked out of Q by traversing it top-down.

shift-reduce

This paper presents the first dependency model for a shift-reduce CCG parser.

Page 1, “Abstract”

In this paper, we fill a gap in the literature by developing the first dependency model for a shift-reduce CCG parser.

Page 1, “Introduction”

Shift-reduce parsing applies naturally to CCG (Zhang and Clark, 2011), and the left-to-right, incremental nature of the decoding fits with CCG’s cognitive claims.

Page 1, “Introduction”

Results on the standard CCGBank tests show that our parser achieves absolute labeled F-score gains of up to 0.5 over the shift-reduce parser of Zhang and Clark (2011); and up to 1.05 and 0.64 over the normal-form and hybrid models of Clark and Curran (2007), respectively.

Page 2, “Introduction”

This section describes how shift-reduce techniques can be applied to CCG, following Zhang and Clark (2011).

Page 2, “Shift-Reduce with Beam-Search”

First we describe the deterministic process which a parser would follow when tracing out a single, correct derivation; then we describe how a model of normal-form derivations — or, more accurately, a sequence of shift-reduce actions leading to a normal-form derivation —can be used with beam-search to develop a nondeterministic parser which selects the highest scoring sequence of actions.

Page 2, “Shift-Reduce with Beam-Search”

Note this section only describes a normal-form derivation model for shift-reduce parsing.

Page 2, “Shift-Reduce with Beam-Search”

The shift-reduce algorithm adapted to CCG is similar to that of shift-reduce dependency parsing (Yamada and Matsumoto, 2003; Nivre and McDonald, 2008; Zhang and Clark, 2008; Huang and Sagae, 2010).

subtrees

Appears in 7 sentences as: Subtrees (1) subtrees (6)

In Shift-Reduce CCG Parsing with a Dependency Model

Following Zhang and Clark (2011), we define each item in the parser as a pair (3, q), where q is a queue of remaining input, consisting of words and a set of possible lexical categories for each word (with go being the front word), and s is the stack that holds subtrees so, 31, (with so at the top).

Page 2, “Shift-Reduce with Beam-Search”

Subtrees on the stack are partial deriva-

Page 2, “Shift-Reduce with Beam-Search”

If |s| > 0 and the subtrees on s can lead to a correct derivation in (D9 using further actions, we say 3 is a partial-realization of G, denoted as s N G. And we define s N G for |s| = 0.

Page 5, “The Dependency Model”

2a; then a stack containing the two subtrees in Fig.

Page 5, “The Dependency Model”

3a is a partial-realization, while a stack containing the three subtrees in Fig.

Page 5, “The Dependency Model”

Note that each of the three subtrees in Fig.

Page 5, “The Dependency Model”

3b is present in (PG; however, these subtrees cannot be combined into the single correct derivation, since the correct sequence of shift-reduce actions must first combine the lexical categories for Mr. and President before shifting the lexical category for visited.

F-score

Standard CCGBank tests show the model achieves up to 1.05 labeled F-score improvements over three existing, competitive CCG parsing models.

Page 1, “Abstract”

Results on the standard CCGBank tests show that our parser achieves absolute labeled F-score gains of up to 0.5 over the shift-reduce parser of Zhang and Clark (2011); and up to 1.05 and 0.64 over the normal-form and hybrid models of Clark and Curran (2007), respectively.

Page 2, “Introduction”

On both the full and reduced sets, our parser achieves the highest F-score .

Page 8, “Experiments”

In comparison with C&C, our parser shows significant increases across all metrics, with 0.57% and 1.06% absolute F-score improvements over the hybrid and normal-form models, respectively.

Page 8, “Experiments”

While our parser achieved lower precision than Z&C, it is more balanced and gives higher recall for all of the dependency relations except the last one, and higher F-score for over half of them.

highest scoring

First we describe the deterministic process which a parser would follow when tracing out a single, correct derivation; then we describe how a model of normal-form derivations — or, more accurately, a sequence of shift-reduce actions leading to a normal-form derivation —can be used with beam-search to develop a nondeterministic parser which selects the highest scoring sequence of actions.

Page 2, “Shift-Reduce with Beam-Search”

An item becomes a candidate output once it has an empty queue, and the parser keeps track of the highest scored candidate output and returns the best one as the final output.

Page 2, “Shift-Reduce with Beam-Search”

5 In Algorithm 3 we abuse notation by using HG [0] to denote the highest scoring gold item in the set.

Page 7, “The Dependency Model”

We choose to reward the highest scoring gold item, in line with the violation-fixing framework; and penalize the highest scoring incorrect item, using the standard perceptron update.

Page 7, “The Dependency Model”

Again, our parser achieves the highest scores across all metrics (for both the full and reduced test sets), except for precision and lexical category assignment, where Z&C performed better.