For all the tests, we used a perceptron POS-tagger (Collins, 2002), trained on WSJ sections 2—21, to assign POS tags automatically to both the training (using 10—way jackknifing) and test data, obtaining a POS tagging accuracy of 97.32% on the test data.

Results

Overall, we see that the small improvements do not confirm the previous results on Penn2Malt, MaltParser and gold POS tags .

Results

One of the obstacles of automatic parsers is the presence of incorrect POS tags due to auto-

First, to resolve the error propagation problem of the traditional pipeline approach, we incorporate POS tagging into the syntactic parsing process.

Introduction

First, POS tagging is typically performed separately as a preliminary step, and POS tagging errors will propagate to the parsing process.

Introduction

This problem is especially severe for languages where the POS tagging accuracy is relatively low, and this is the case for Chinese where there are fewer contextual clues that can be used to inform the tagging process and some of the tagging decisions are actually influenced by the syntactic structure of the sentence.

Introduction

First, we integrate POS tagging into the parsing process and jointly optimize these two processes simultaneously.

Joint POS Tagging and Parsing with Nonlocal Features

To address the drawbacks of the standard transition-based constituent parsing model (described in Section 1), we propose a model to jointly solve POS tagging and constituent parsing with nonlocal features.

Joint POS Tagging and Parsing with Nonlocal Features

3.1 Joint POS Tagging and Parsing

Joint POS Tagging and Parsing with Nonlocal Features

POS tagging is often taken as a preliminary step for transition-based constituent parsing, therefore the accuracy of POS tagging would greatly affect parsing performance.

Baseline features: For word-level nodes that represent known words, we use the symbols w, p and l to denote the word form, POS tag and length of the word, respectively.

Chinese Morphological Analysis with Character-level POS

Proposed features: For word-level nodes, the function CPpal-T (w) returns the pair of the char-acter-level POS tags of the first and last characters of w, and CPau(w) returns the sequence of character-level POS tags of w. If either the pair or the sequence of character-level P08 is ambiguous, which means there are multiple paths in the sub-lattice of the word-level node, then the values on the current best path (with local context) during the Viterbi search will be returned.

Evaluation

To evaluate our proposed method, we have conducted two sets of experiments on CTB5: word segmentation, and joint word segmentation and word-level POS tagging .

Evaluation

The results of the word segmentation experiment and the joint experiment of segmentation and POS tagging are shown in Table 5(a) and Table 5(b), respectively.

Introduction

ith Character-level POS Tagging

Introduction

We propose the first tagset designed for the task of character-level POS tagging , based on which we manually annotate the entire CTB5.

The word embeddings are used during the leam-ing process, but the final decoder that the learning algorithm outputs maps a POS tag sequence a: to a parse tree.

Abstract

While ideally we would want to use the word information in decoding as well, much of the syntax of a sentence is determined by the POS tags, and relatively high level of accuracy can be achieved by learning, for example, a supervised parser from POS tag sequences.

In this paper, we address the problem of web-domain POS tagging using a two-phase approach.

Abstract

The representation is integrated as features into a neural network that serves as a scorer for an easy-first POS tagger .

Introduction

However, state-of-the-art POS taggers in the literature (Collins, 2002; Shen et al., 2007) are mainly optimized on the the Penn Treebank (PTB), and when shifted to web data, tagging accuracies drop significantly (Petrov and McDonald, 2012).

Introduction

We integrate the learned encoder with a set of well-established features for POS tagging (Ratnaparkhi, 1996; Collins, 2002) in a single neural network, which is applied as a scorer to an easy-first POS tagger .

Introduction

We choose the easy-first tagging approach since it has been demonstrated to give higher accuracies than the standard left-to-right POS tagger (Shen et al., 2007; Ma et al., 2013).

Learning from Web Text

This may partly be due to the fact that unlike computer vision tasks, the input structure of POS tagging or other sequential labelling tasks is relatively simple, and a single nonlinear layer is enough to model the interactions within the input (Wang and Manning, 2013).

Neural Network for POS Disambiguation

The main challenge to designing the neural network structure is: on the one hand, we hope that the model can take the advantage of information provided by the learned WRRBM, which reflects general properties of web texts, so that the model generalizes well in the web domain; on the other hand, we also hope to improve the model’s discriminative power by utilizing well-established POS tagging features, such as those of Ratnaparkhi (1996).

Neural Network for POS Disambiguation

Under the output layer, the network consists of two modules: the web-feature module, which incorporates knowledge from the pre-trained WRRBM, and the sparse-feature module, which makes use of other POS tagging features.

Neural Network for POS Disambiguation

For POS tagging , we found that a simple linear layer yields satisfactory accuracies.

In contrast, assume we take the crossproduct of the auxiliary word vector values, POS tags and lexical items of a word and its context, and add the crossed values into a normal model (in gbhmm).

Introduction

This low dimensional syntactic abstraction can be thought of as a proxy to manually constructed POS tags .

Introduction

For instance, on the English dataset, the low-rank model trained without POS tags achieves 90.49% on first-order parsing, while the baseline gets 86.70% if trained under the same conditions, and 90.58% if trained with 12 core POS tags .

Problem Formulation

pos, form, lemma and morph stand for the fine POS tag , word form, word lemma and the morphology feature (provided in CoNLL format file) of the current word.

Problem Formulation

For example, pos-p means the POS tag to the left of the current word in the sentence.

Problem Formulation

Other possible features include, for example, the label of the arc h —> m, the POS tags between the head and the modifier, boolean flags which indicate the occurence of in-between punctutations or conjunctions, etc.

Results

The rationale is that given all other features, the model would induce representations that play a similar role to POS tags .

Results

Table 4: The first three columns show parsing results when models are trained without POS tags .

system, each word is initialized by the action SHW with a POS tag , before being incrementally modified by a sequence of intra-word actions, and finally being completed by the action PW.

Character-Level Dependency Tree

L and R denote the two elements over which the dependencies are built; the subscripts lcl and r01 denote the leftmost and rightmost children, respectively; the subscripts 102 and r02 denote the second leftmost and second rightmost children, respectively; w denotes the word; t denotes the POS tag ; 9 denotes the head character; ls_w and w denote the smallest left and right subwords respectively, as shown in Figure 2.

Character-Level Dependency Tree

Since the first element of the queue can be shifted onto the stack by either SH or AR, it is more difficult to assign a POS tag to each word by using a single action.

The first stage, ASR, yields an automatic transcription, which is followed by the POS tagging stage.

Experimental Setup

The steps for automatic assessment of overall proficiency follow an analogous process (either including the POS tagger or not), depending on the objective measure being evaluated.

Experimental Setup

5.3.2 POS tagger

Related Work

The idea of capturing differences in POS tag distributions for classification has been explored in several previous studies.

Related Work

In the area of text-genre classification, POS tag distributions have been found to capture genre differences in text (Feldman et al., 2009; Marin et al., 2009); in a language testing context, it has been used in grammatical error detection and essay scoring (Chodorow and Leacock, 2000; Tetreault and Chodorow, 2008).

Shallow-analysis approach to measuring syntactic complexity

Consider the two sentence fragments below taken from actual responses (the bigrams of interest and their associated POS tags are boldfaced).

The method is almost free of linguistic resources (except POS tags ), and requires no elaborated linguistic rules.

Conclusion

almost knowledge-free (except POS tags ) framework.

Conclusion

The method is almost free of linguistic resources (except POS tags ), and does not rely on elaborated linguistic rules.

Introduction

This framework is fully unsupervised and purely data-driven, and requires very lightweight linguistic resources (i.e., only POS tags ).

Methodology

In order to obtain lexical patterns, we can define regular expressions with POS tags 2 and apply the regular expressions on POS tagged texts.

Methodology

2Such expressions are very simple and easy to write because we only need to consider POS tags of adverbial and auxiliary word.

Methodology

Our algorithm is in spirit to double propagation (Qiu et al., 2011), however, the differences are apparent in that: firstly, we use very lightweight linguistic information (except POS tags ); secondly, our major contributions are to propose statistical measures to address the following key issues: first, to measure the utility of lexical patterns; second, to measure the possibility of a candidate word being a new word.

In this study, we had between 2-10 individual annotators with degrees in linguistics annotate different kinds of English text with POS tags , e.g., newswire text (PTB WSJ Section 00), transcripts of spoken language (from a database containing transcripts of conversations, Talkbankl), as well as Twitter posts.

Annotator disagreements across domains and languages

We instructed annotators to use the 12 universal POS tags of Petrov et al.

Annotator disagreements across domains and languages

2Experiments with variation 71- grams on WSJ (Dickinson and Meurers, 2003) and the French data lead us to estimate that the fine-to-coarse mapping of POS tags disregards about 20% of observed tag-pair confusion types, most of which relate to fine-grained verb and noun distinctions, e. g. past participle versus past in “[..] criminal lawyers speculated/VBD vs. VBN that [..]”.

Related work

(2014) use small samples of doubly-annotated POS data to estimate annotator reliability and show how those metrics can be implemented in the loss function when inducing POS taggers to reflect confidence we can put in annotations.

Related work

They show that not biasing the theory towards a single annotator but using a cost-sensitive learning scheme makes POS taggers more robust and more applicable for downstream tasks.

We build a CRF-based bigram part-of-speech (POS) tagger with the features described in (Li et al., 2012), and produce POS tags for all trairfldevelopment/test/unlabeled sets (10-way jackknifing for training sets).

Experiments and Analysis

(2012) and Bohnet and Nivre (2012) use joint models for POS tagging and dependency parsing, significantly outperforming their pipeline counterparts.

Experiments and Analysis

Our approach can be combined with their work to utilize unlabeled data to improve both POS tagging and parsing simultaneously.

Supervised Dependency Parsing

ti denotes the POS tag of 10,-. b is an index between h and m. dir(z', j) and dist(i, j) denote the direction and distance of the dependency (i, j).

o the set of dependency labels of the predicate’s children 0 dependency path conjoined with the POS tag of a’s head

Experiments

Before parsing the data, it is tagged with a POS tagger trained with a conditional random field (Lafferty et al., 2001) with the following emission features: word, the word cluster, word suffixes of length l, 2 and 3, capitalization, whether it has a hyphen, digit and punctuation.

Frame Identification with Embeddings

Let the lexical unit (the lemma conjoined with a coarse POS tag ) for the marked predicate be 6.

6We use the version available in the POS tagger MElt (Denis and Sagot, 2009).

Use of external MWE resources

The MWE analyzer is a CRF-based sequential labeler, which, given a tokenized text, jointly performs MWE segmentation and POS tagging (of simple tokens and of MWEs), both tasks mutually helping each other9.

Use of external MWE resources

The MWE analyzer integrates, among others, features computed from the external lexicons described in section 5.1, which greatly improve POS tagging (Denis and Sagot, 2009) and MWE segmentation (Constant and Tel-lier, 2012).

For a pair (at, c), we also consider as candidate associations the set [3 (represented implicitly), which contains token pairs (510,, ci/) such that at, and oil share the same lemma, the same POS tag , or are linked through a derivation link on WordNet (Fellbaum, 1998).

Following is a list of features adopted in the two baselines, for both BaselineC4'5 and BaselineSVM, > Basic features: first token and its part-of-speech (POS) tag of the focus candidate; the number of tokens in the focus candidate; relative position of the focus candidate among all the roles present in the sentence; negated verb and its POS tag of the negative expression;

Baselines

> Syntactic features: the sequence of words from the beginning of the governing VP to the negated verb; the sequence of POS tags from the beginning of the governing VP to the negated verb; whether the governing VP contains a CC; whether the governing VP contains a RB.

Baselines

> Semantic features: the syntactic label of semantic role A1; whether A1 contains POS tag DT, JJ, PRP, CD, RB, VB, and WP, as defined in Blanco and Moldovan (2011); whether A1 contains token any, anybody, anymore, anyone, anything, anytime, anywhere, certain, enough, full, many, much, other, some, specifics, too, and until, as defined in Blanco and Moldovan (2011); the syntactic label of the first semantic role in the sentence; the semantic label of the last semantic role in the sentence; the thematic role for AO/Al/AZ/A3/A4 of the negated predicate.

While such feature learning approaches have proven to increase robustness for parsing, POS tagging , and NER (Miller et al., 2004; Koo et al., 2008; Turian et al., 2010), they would seem to have an especially promising role for discourse, where training data is relatively sparse and ambiguity is considerable.