We present novel adaptations of two major shift-reduce dependency parsing algorithms to character-level parsing.

Character-Level Dependency Tree

We differentiate intra-word dependencies and inter-word dependencies by the arc type, so that our work can be compared with conventional word segmentation, POS-tagging and dependency parsing pipelines under a canonical segmentation standard.

Introduction

Such annotations enable dependency parsing on the character level, building dependency trees over Chinese characters.

Introduction

Character-level dependency parsing is interesting in at least two aspects.

Introduction

In this paper, we make an investigation of character-level Chinese dependency parsing using Zhang et al.

Different from popular shallow dependency parsing that focus on tree-shaped structures, our GR annotations are represented as general directed graphs that express not only 10-cal but also various long-distance dependencies, such as coordinations, control/raising constructions, topicalization, relative clauses and many other complicated linguistic phenomena that goes beyond shallow syntax (see Fig.

Introduction

Previous work on dependency parsing mainly focused on structures that can be represented in terms of directed trees.

Transition-based GR Parsing

The availability of large-scale treebanks has contributed to the blossoming of statistical approaches to build accurate shallow constituency and dependency parsers .

Several supervised dependency parsing algorithms (Nivre and Scholz, 2004; McDonald et al., 2005a; McDonald et al., 2005b; McDonald and Pereira, 2006; Carreras, 2007; K00 and Collins, 2010; Ma and Zhao, 2012; Zhang et al., 2013) have been proposed and achieved high parsing accuracies on several treebanks, due in large part to the availability of dependency treebanks in a number of languages (McDonald et al., 2013).

Introduction

(2011) proposed an approach for unsupervised dependency parsing with nonparallel multilingual guidance from one or more helper languages, in which parallel data is not used.

Our Approach

The focus of this work is on building dependency parsers for target languages, assuming that an accurate English dependency parser and some parallel text between the two languages are available.

Our Approach

The probabilistic model for dependency parsing defines a family of conditional probability p over all y given sentence :13, with a log-linear form:

Our Approach

One of the most common model training methods for supervised dependency parser is Maximum conditional likelihood estimation.

The state-of—the-art dependency parsing techniques, the Eisner algorithm and maximum spanning tree (MST) algorithm, are adopted to parse an optimal discourse dependency tree based on the arc-factored model and the large—margin learning techniques.

(2013) extend this idea by coupling predictions of a dependency parser with predictions from a semantic role labeler.

Related Work

(2012) marginalize over latent syntactic dependency parses .

Related Work

Recent work in fully unsupervised dependency parsing has supplanted these methods with even higher accuracies (Spitkovsky et al., 2013) by arranging optimiz-ers into networks that suggest informed restarts based on previously identified local optima.

In this paper, we investigate various strategies to predict both syntactic dependency parsing and contiguous multiword expression (MWE) recognition, testing them on the dependency version of French Treebank (Abeille and Barrier, 2004), as instantiated in the SPMRL Shared Task (Seddah et al., 2013).

Architectures for MWE Analysis and Parsing

The architectures we investigated vary depending on whether the MWE status of sequences of tokens is predicted via dependency parsing or via an external tool (described in section 5), and this dichotomy applies both to structured MWEs and flat MWEs.

Architectures for MWE Analysis and Parsing

0 IRREG—BY—PARSER: the MWE status, flat topology and POS are all predicted via dependency parsing , using representations for training and parsing, with all information for irregular MWEs encoded in topology and labels (as for in vain in Figure 2).

In some experiments, we make use of alternative representations, which we refer later as “labeled representation”, in which the MWE features are incorporated in the dependency labels, so that MWE composition and/or the POS of the MWE be totally contained in the tree topology and labels, and thus predictable via dependency parsing .

Related work

It is a less language-specific system that reranks n-best dependency parses from 3 parsers, informed with features from predicted constituency trees.

Use of external MWE resources

Both resources help to predict MWE-specific features (section 5.3) to guide the MWE-aware dependency parser .

Use of external MWE resources

MWE lexicons are exploited as sources of features for both the dependency parser and the external MWE analyzer.

Use of external MWE resources

Flat MWE features: MWE information can be integrated as features to be used by the dependency parser .

This paper introduces a novel pre-ordering approach based on dependency parsing for Chinese-English SMT.

Dependency-based Pre-ordering Rule Set

Figure 1 shows a constituent parse tree and its Stanford typed dependency parse tree for the same

Dependency-based Pre-ordering Rule Set

As shown in the figure, the number of nodes in the dependency parse tree (i.e.

Dependency-based Pre-ordering Rule Set

Because dependency parse trees are generally more concise than the constituent ones, they can conduct long-distance reorderings in a finer way.

Introduction

Since dependency parsing is more concise than constituent parsing in describing sentences, some research has used dependency parsing in pre-ordering approaches for language pairs such as Arabic-English (Habash, 2007), and English-SOV languages (Xu et al., 2009; Katz-Brown et al., 2011).

Introduction

In contrast, we propose a set of pre-ordering rules for dependency parsers .

Introduction

(2007) exist, it is almost impossible to automatically convert their rules into rules that are applicable to dependency parsers .

The news have been processed with a to-kenizer, a sentence splitter (Gillick and Favre, 2009), a part-of-speech tagger and dependency parser (Nivre, 2006), a co-reference resolution module (Haghighi and Klein, 2009) and an entity linker based on Wikipedia and Freebase (Milne and Witten, 2008).

Heuristics-based pattern extraction

(2013), who built well formed relational patterns by extending minimum spanning trees (MST) which connect entity mentions in a dependency parse .

To solve the latter problem, we introduce an apparently novel O(|V|2 log algorithm that is similar to the maximum spanning tree (MST) algorithms that are widely used for dependency parsing (McDonald et al., 2005).

Notation and Overview

The relation identification stage (§4) is similar to a graph-based dependency parser .

Notation and Overview

Each stage is a discriminatively-trained linear structured predictor with rich features that make use of part-of-speech tagging, named entity tagging, and dependency parsing .

However, the failure to uncover gains when searching across a variety of possible mechanisms for improvement, training procedures for embeddings, hyperparam-eter settings, tasks, and resource scenarios suggests that these gains (if they do exist) are extremely sensitive to these training conditions, and not nearly as accessible as they seem to be in dependency parsers .

Conclusion

Indeed, our results suggest a hypothesis that word embeddings are useful for dependency parsing (and perhaps other tasks) because they provide a level of syntactic abstraction which is explicitly annotated in constituency parses.

Introduction

Dependency parsers have seen gains from distributional statistics in the form of discrete word clusters (Koo et al., 2008), and recent work (Bansal et al., 2014) suggests that similar gains can be derived from embeddings like the ones used in this paper.

Introduction

The fact that word embedding features result in nontrivial gains for discriminative dependency parsing (Bansal et al., 2014), but do not appear to be effective for constituency parsing, points to an interesting structural difference between the two tasks.

Introduction

We hypothesize that dependency parsers benefit from the introduction of features (like clusters and embeddings) that provide syntactic abstractions; but that constituency parsers already have access to such abstractions in the form of supervised preterminal tags.

A tweet-specific tokenizer (Gimpel et al., 2011) is employed, and the dependency parsing results are computed by Stanford Parser (Klein and Manning, 2003).

Experiments

The POS tagging and dependency parsing results are not precise enough for the Twitter data, so these handcrafted rules are rarely matched.

Introduction

(2011) combine the target-independent features (content and lexicon) and target-dependent features (rules based on the dependency parsing results) together in subjectivity classification and polarity classification for tweets.

Our Approach

We use the dependency parsing results to find the words syntactically connected with the interested target.

Our Approach

In Section 3.1, we show how to build recursive structure for target using the dependency parsing results.

Since our system uses an off-the-shelf dependency parser , and semantic representations are obtained from simple rule-based conversion from dependency trees, there will be only one (right or wrong) interpretation in face of ambiguous sentences.

Generating On-the-fly Knowledge

For a TH pair, apply dependency parsing and coreference resolution.

Generating On-the-fly Knowledge

Perform rule-based conversion from dependency parses to DCS trees, which are translated to statements on abstract denotations.

The Idea

To obtain DCS trees from natural language, we use Stanford CoreNLP5 for dependency parsing (Socher et al., 2013), and convert Stanford dependencies to DCS trees by pattern matching on POS tags and dependency labels.6 Currently we use the following semantic roles: ARG, SUBJ, OBJ, IOBJ, TIME and MOD.

The grammar for ASP contains the annotated lexicon entries and grammar rules in Sections 02-21 of CCGbank, and additional semantic entries produced using a set of dependency parse heuristics.

Experiments

These entries are instantiated using a set of dependency parse patterns, listed in an online appendix.2 These patterns are applied to the training corpus, heuristically identifying verbs, prepositions, and possessives that express relations, and nouns that express categories.

Following an assumption often used in compression systems, the compressed output in this corpus is constructed by dropping tokens from the input sentence without any paraphrasing or reordering.1 A number of diverse approaches have been proposed for deletion-based sentence compression, including techniques that assemble the output text under an n-gram factorization over the input text (McDonald, 2006; Clarke and Lapata, 2008) or an arc factorization over input dependency parses (Filippova and Strube, 2008; Galanis and Androutsopoulos, 2010; Filippova and Altun, 2013).

Introduction

Maximum spanning tree algorithms, commonly used in non-projective dependency parsing (McDonald et al., 2005), are not easily adaptable to this task since the maximum-weight subtree is not necessarily a part of the maximum spanning tree.

Multi-Structure Sentence Compression

C. In addition, we define bigram indicator variables yij E {0, l} to represent whether a particular order-preserving bigram2 (ti, tj> from S is present as a contiguous bigram in C as well as dependency indicator variables zij E {0, 1} corresponding to whether the dependency arc ti —> 253- is present in the dependency parse of C. The score for a given compression 0 can now be defined to factor over its tokens, n-grams and dependencies as follows.

While spanning trees are familiar from non-projective dependency parsing , features based on the linear order of the words or on lexical identi-

Features

ties or syntactic word classes, which are primary drivers for dependency parsing , are mostly uninformative for taxonomy induction.

Structured Taxonomy Induction

Note that finding taxonomy trees is a structurally identical problem to directed spanning trees (and thereby non-proj ective dependency parsing ), for which belief propagation has previously been worked out in depth (Smith and Eisner, 2008).