Add arc <eC,ej> to GC with

Topics

discourse parsing

Previous researches on Text-level discourse parsing mainly made use of constituency structure to parse the whole document into one discourse tree.

Page 1, “Abstract”

In this paper, we present the limitations of constituency based discourse parsing and first propose to use dependency structure to directly represent the relations between elementary discourse units (EDUs).

Researches in discourse parsing aim to acquire such relations in text, which is fundamental to many natural language processing applications such as question answering, automatic summarization and so on.

Page 1, “Introduction”

One important issue behind discourse parsing is the representation of discourse structure.

EDUs

Appears in 23 sentences as: EDUs (26)

In Text-level Discourse Dependency Parsing

In this paper, we present the limitations of constituency based discourse parsing and first propose to use dependency structure to directly represent the relations between elementary discourse units ( EDUs ).

Page 1, “Abstract”

Rhetorical Structure Theory (RST) (Mann and Thompson, 1988), one of the most influential discourse theories, posits a hierarchical generative tree representation, as illustrated in Figure l. The leaves of a tree correspond to contiguous text spans called Elementary Discourse Units ( EDUs )1.

Page 1, “Introduction”

The adjacent EDUs are combined into

Page 1, “Introduction”

We assume EDUs are already known.

Page 1, “Introduction”

EDUs or larger text spans) occurring in the generative process are better represented with different features, and thus a uniform framework for discourse analysis is hard to develop.

Page 1, “Introduction”

Here is the basic idea: the discourse structure consists of EDUs which are linked by the binary, asymmetrical relations called dependency relations.

Page 1, “Introduction”

Now, we can analyze the relations between EDUs directly, without worrying about any interior text spans.

Page 2, “Introduction”

Then, discourse dependency structure can be formalized as the labeled directed graph, Where nodes correspond to EDUs and labeled arcs correspond to labeled dependency relations.

Page 2, “Discourse Dependency Structure and Tree Bank”

We assume that the teth T is composed of n+1 EDUs including the artificial e0.

Page 3, “Discourse Dependency Structure and Tree Bank”

Let R={r1,r2, ,rm} denote a finite set of functional relations that hold between two EDUs .

Page 3, “Discourse Dependency Structure and Tree Bank”

The third condition assures that each EDU has one and only one head and the fourth tells that only one kind of dependency relation holds between two EDUs .

dependency trees

Appears in 22 sentences as: dependency tree (8) dependency trees (14)

In Text-level Discourse Dependency Parsing

The state-of—the-art dependency parsing techniques, the Eisner algorithm and maximum spanning tree (MST) algorithm, are adopted to parse an optimal discourse dependency tree based on the arc-factored model and the large—margin learning techniques.

Page 1, “Abstract”

Since dependency trees contain much fewer nodes and on average they are simpler than constituency based trees, the current dependency parsers can have a relatively low computational complexity.

Page 2, “Introduction”

In our work, we adopt the graph based dependency parsing techniques learned from large sets of annotated dependency trees .

Page 2, “Introduction”

and maximum spanning tree (MST) algorithm are used respectively to parse the optimal projective and non-projective dependency trees with the large-margin learning technique (Crammer and Singer, 2003).

Page 2, “Discourse Dependency Structure and Tree Bank”

According to the definition, we illustrate all the 9 possible unlabeled dependency trees for a text containing three EDUs in Figure 2.

Page 3, “Discourse Dependency Structure and Tree Bank”

The dependency trees 1’ to 7’ are projective while 8’ and 9’ are non-projective with crossing arcs.

Page 3, “Discourse Dependency Structure and Tree Bank”

We use dependency trees to simulate the headed constituency based trees.

dependency parsing

The state-of—the-art dependency parsing techniques, the Eisner algorithm and maximum spanning tree (MST) algorithm, are adopted to parse an optimal discourse dependency tree based on the arc-factored model and the large—margin learning techniques.

Fortunately, RST Discourse Treebank (RST-DT) (Carlson et al., 2001) is an available resource to help with.

Page 3, “Discourse Dependency Structure and Tree Bank”

With this kind of conversion, we can get our discourse dependency treebank .

Page 3, “Discourse Dependency Structure and Tree Bank”

It is worth noting that the non-projective trees like 8’ and 9’ do not exist in our dependency treebank , though they are eligible according to the definition of discourse dependency graph.

Page 3, “Discourse Dependency Structure and Tree Bank”

We use the syntactic trees from the Penn Treebank to find the dominating nodes,.

Page 5, “Add arc <eC,ej> to GC with”

But we think that MST algorithm has more potential in discourse dependency parsing, because our converted discourse dependency treebank contains only projective trees and somewhat suppresses the MST algorithm to exhibit its advantage of parsing non-projective trees.

Page 8, “Add arc <eC,ej> to GC with”

In fact, we observe that some non-projective dependencies produced by the MST algorithm are even reasonable than what they are in the dependency treebank .

Page 8, “Add arc <eC,ej> to GC with”

Thus, it is important to build a manually labeled discourse dependency treebank , which will be our future work.

dependency relation

Here is the basic idea: the discourse structure consists of EDUs which are linked by the binary, asymmetrical relations called dependency relations .

Page 1, “Introduction”

A dependency relation holds between a subordinate EDU called the dependent, and another EDU on

Page 1, “Introduction”

Similar to the syntactic dependency structure defined by McDonald (2005a, 2005b), we insert an artificial EDU e0 in the beginning for each document and label the dependency relation linking from 60 as ROOT.

Page 2, “Discourse Dependency Structure and Tree Bank”

A labeled directed arc is used to represent the dependency relation from one head to its dependent.

Page 2, “Discourse Dependency Structure and Tree Bank”

Then, discourse dependency structure can be formalized as the labeled directed graph, Where nodes correspond to EDUs and labeled arcs correspond to labeled dependency relations .

Page 2, “Discourse Dependency Structure and Tree Bank”

The third condition assures that each EDU has one and only one head and the fourth tells that only one kind of dependency relation holds between two EDUs.

Page 3, “Discourse Dependency Structure and Tree Bank”

For example, The sixth feature in Table 5 represents that the dependency relation is preferred to be labeled Explanation with the fact that “because” is the first word of the dependent EDU.

spanning tree

Appears in 7 sentences as: Spanning Tree (1) spanning tree (6)

In Text-level Discourse Dependency Parsing

The state-of—the-art dependency parsing techniques, the Eisner algorithm and maximum spanning tree (MST) algorithm, are adopted to parse an optimal discourse dependency tree based on the arc-factored model and the large—margin learning techniques.

Page 1, “Abstract”

and maximum spanning tree (MST) algorithm are used respectively to parse the optimal projective and non-projective dependency trees with the large-margin learning technique (Crammer and Singer, 2003).

Page 2, “Discourse Dependency Structure and Tree Bank”

The goal of discourse dependency parsing is to parse an optimal spanning tree from V><R><V_0.

Page 3, “Discourse Dependency Parsing”

Thus, the optimal dependency tree for T is a spanning tree with the highest score and obtained through the function DT(T,w): DT(T, w) = argmaxGT gVXRMO score(T, GT)

Page 3, “Discourse Dependency Parsing”

<ei,r,ej>eGT where GT means a possible spanning tree with score(T,GT) and Mei, r, 61-) denotes the score of

Page 3, “Discourse Dependency Parsing”

3.3 Maximum Spanning Tree Algorithm

Page 4, “Discourse Dependency Parsing”

Following the work of McDonald (2005b), we formalize discourse dependency parsing as searching for a maximum spanning tree (MST) in a directed graph.

feature weights

In fact, the score of each arc is calculated as a linear combination of feature weights .

Page 5, “Add arc <eC,ej> to GC with”

(2005a; 2005b), we use the Margin Infused Relaxed Algorithm (MIRA) to learn the feature weights based on a training set of documents annotated with dependency structures yi where yi denotes the correct dependency tree for the text Ti.

fine-grained

One is composed of 19 coarse-grained relations and the other 111 fine-grained relations6.

Page 5, “Add arc <eC,ej> to GC with”

From Table 3 and Table 4, we can see that the addition of more feature types, except the 6th feature type (semantic similarity), can promote the performance of relation labeling, whether using the coarse-grained 19 relations and the fine-grained 111 relations.

Page 7, “Add arc <eC,ej> to GC with”

Table 5 selects 10 features with the highest weights in absolute value for the parser which uses the coarse-grained relations, while Table 6 selects the top 10 features for the parser using the fine-grained relations.

Page 7, “Add arc <eC,ej> to GC with”

From Table 3 and 'able 4, we can see that fine-grained relations re more helpful to building unlabeled discourse

Page 7, “Add arc <eC,ej> to GC with”

We can also see that the labeled accuracy using the fine-grained relations can achieve 0.4309, only 0.06 lower than the best labeled accuracy (0.4915) using the coarse-grained relations.

maximum spanning

Appears in 4 sentences as: Maximum Spanning (1) maximum spanning (3)

In Text-level Discourse Dependency Parsing

The state-of—the-art dependency parsing techniques, the Eisner algorithm and maximum spanning tree (MST) algorithm, are adopted to parse an optimal discourse dependency tree based on the arc-factored model and the large—margin learning techniques.

Page 1, “Abstract”

and maximum spanning tree (MST) algorithm are used respectively to parse the optimal projective and non-projective dependency trees with the large-margin learning technique (Crammer and Singer, 2003).

Page 2, “Discourse Dependency Structure and Tree Bank”

3.3 Maximum Spanning Tree Algorithm

Page 4, “Discourse Dependency Parsing”

Following the work of McDonald (2005b), we formalize discourse dependency parsing as searching for a maximum spanning tree (MST) in a directed graph.

syntactic parsing

Appears in 4 sentences as: syntactic parsing (4)

In Text-level Discourse Dependency Parsing

Since such a hierarchical discourse tree is analogous to a constituency based syntactic tree except that the constituents in the discourse trees are text spans, previous researches have explored different constituency based syntactic parsing techniques (eg.

Page 1, “Introduction”

First, it is difficult to design a set of production rules as in syntactic parsing , since there are no determinate generative rules for the interior text spans.

Page 1, “Introduction”

The other two types of features which are related to length and syntactic parsing , only promote the performance slightly.

Page 7, “Add arc <eC,ej> to GC with”

Since the RST tree is similar to the constituency based syntactic tree except that the constituent nodes are different, the syntactic parsing techniques have been borrowed for discourse parsing (Soricut and Marcu, 2003; Baldridge and Lascarides, 2005; Sagae, 2009; Hernault et al., 2010b; Feng and Hirst, 2012).

time complexity

Third, to reduce the time complexity of the state-of-the-art constituency based parsing techniques, the approximate parsing approaches are prone to trap in local maximum.

Page 1, “Introduction”

It is well known that projective dependency parsing can be handled with the Eisner algorithm (1996) which is based on the bottom-up dynamic programming techniques with the time complexity of 0(n3).

Page 4, “Discourse Dependency Parsing”

(2005b), we adopt an efficient implementation of the Chu-Liu/Edmonds algorithm that is proposed by Tar-jan (1997) with O(n2) time complexity .

Page 4, “Discourse Dependency Parsing”

Based on dependency structure, we are able to directly analyze the relations between the EDUs without worrying about the additional interior text spans, and apply the existing state-of-the-art dependency parsing techniques which have a relatively low time complexity .

learning algorithm

As we employed the MIRA learning algorithm , it is possible to identify which specific features are useful, by looking at the weights learned to each feature using the training data.

Page 7, “Add arc <eC,ej> to GC with”

Other text-level discourse parsing methods include: (1) Percep-coarse: we replace MIRA with the averaged per-ceptron learning algorithm and the other settings are the same with Our-coarse; (2) HILDA-manual and HILDA-seg are from Hernault (2010b)’s work, and their inputted EDUs are from RST-DT and their own EDU segmenter respectively; (3) LeThanh indicates the results given by LeThanh el al.

Page 8, “Add arc <eC,ej> to GC with”

We can also see that the averaged perceptron learning algorithm , though simple, can achieve a comparable performance, better than HILDA-manual.