The fourth SPMRL workshop hosted the first shared task on parsing morphologically rich languages.
The previous page is available at http://www.spmrl.org/shared_task_old.html
(note: this webpage is in heavy editing, will be up to date before the workshop)

Goals

The primary goal of the shared task on parsing morphologically rich languages was to bring forward work on parsing morphologically ambiguous input in both dependency and constituency parsing, and to show the state of the art for MRLs. In the longer term, we aim to provide streamlined data sets and evaluation metrics, thus improving the comparability of cross linguistic work on parsing MRLs. The shared task featured
tracks in constituency parsing and in dependency parsing, in gold as well as in realistic scenarios (the realistic scenario has no gold tokenization, no gold part-of-speech tags and morphological features).

Data Set

The participants were provided with data from 9 different languages (Arabic, Basque, French, German, Hebrew, Hungarian, Korean, Polish,Swedish). The data were available in Penn Treebank bracketing format, CoNLL-X format and optionally in TiGerXML.
In order to ease cross-linguistic comparisons, the data set have also been released within a common size setting (ie, treebanks of 5000 sentences).
All treebanks (dep. and const.) are aligned at the sentence, token and POS levels.

Metrics

Gold Tokens Scenarios:

We used two metrics: Parseval (Evalb, (Black et al, 91) and LeafAncestor (Sampson and Babarczy, 2003). With a modified version (from Sancl 2012 (Petrov and Mc Donald, 2012) that penalises unparsed trees for the former and with an implementation from Wagner (2012) for the latter.

LeafAncestor: parse_la.py (please read the disclaimer on top of the file)

Note: as oppposed to the common usage in the parsing communities, all constituency results are given for sentences of all lenght and all tokens are evaluated (including punctuation tokens). For both Evalb and LeafAncestor, the labels {TOP, S1, ROOT, VROOT} are stripped off.

Multi Word Expressions evaluation:

The French data set contains MWEs annotated at the morpho syntactic level. We evaluated them for the dependency track only. (see wiki page )

Predicted Tokens Scenarios:

Dependency and Constituent Structures

We used TedEval (Tsarfaty et al 2010,2011,2012) in its realistic framework (namely a test file with its own mapping between predicted tokens and source tokens is evaluated upon a gold file and the gold token mapping). TedEval is available here: Tedeval 2.2.

We developped a set of wrappers that use MaltParser's reprojectiver (Nivre & Nilsson, 2005). Wrappers are available here: TedWrappers_20131015.tar.gz

Dependency Parsing Track

We used the same protocol as in Conll 2007 (Nivre et al, 2007) in two settings for 4 scenarios:

Full train set size ⇒ with gold or predicted morphology (POS tag and features)
5k sentences train set size ⇒ with gold or predicted morphology (POS tag and features).
Note that the predicted data were provided as baseline, participants were free to use theirs. The French, Hebrew and Arabic predicted train sets have not been subjected to a cross fold jackniffing so participants were incited to do it (only a few did use their own predicted morphology though: Alpage-IGM and Alpage-Dyalog for French, Cadim for Arabic and IMS_SGZEDED_CIS for all languages.)

Multi Word Expression Evaluation

Non Gold Token Evaluation

Arabic and Hebrew data set were provided with generated lattices (disambiguated and non-disambiguated for Hebrew, disambiguated only for Arabic – the data exist though, they should be made available at some points –)

Results on the predicted tokens scenarios are evaluated using Tedeval 2.2 (Tsarfaty et al, 2011,2012) in two modes:

A fully labeled mode (where edges, either from const. trees or dependencies, are decorated by their original labels). This mode allow for a full comparison between dependency parses produced on gold tokens and predicted tokens from the raw source text.
An unlabeled mode which allows for easier cross-framework comparison (between const. and dep. parsers). In order to perform a fully labeled evaluation of a const. tree, each edges needs to bear a function label. Please see the overview paper for full details on the cross framework scenarios.

Getting the Shared Task Data Set

All data but Arabic are freely available under the same conditions as during the shared task.
Unless stated otherwise by their original licenses, any commercial exploitation of treebank data,
derived parsing or tagging models are prohibited. Those data set are made available for
reproductibility's sake and in the hope that this shared task data will provide inspiration
for the design and evaluation of future parsing systems for these languages.

The Arabic data we provided is based on the LDC's ATB 4.1, 3.1 and 3.2, then converted to
both Columbia's CaTib Dependency Treebank (Habash & Roth, 2009) and to Stanford's preprocessed version
of the ATB (Green & Manning, 2010).
It is to be made available soon by the LDC via its usual channels. Contact us at spmrl.sharedtask@gmail.com
if you absolutely need the data urgently, we'll made available our (huge) set of scripts we developed
to create the data.

Acknowledgements

For their precious help preparing the SPMRL 2013 Shared Task and for
allowing their data to be part of it, we warmly thank the Linguistic
Data Consortium, the Knowledge Center for Processing Hebrew (MILA),
the Ben Gurion University, Columbia University, Institute of Computer
Science (Polish Academy of Sciences), Korea Advanced Institute of
Science and Technology, University of the Basque Country, University
of Lisbon, Uppsala University, University of Stuttgart, University of
Szeged and University Paris Diderot (Paris 7).
We are also very grateful to the Philosophical Faculty of the Heinrich-Heine
Universität Düsseldorf for hosting the shared task data via their dokuwiki.

We take advantage of this page to warmly and publicly thank once more all
the people involved in this shared task preparation (original data
set, scripting, website, institutionnal and moral support):