VOICE POS XML

VOICE POS XML 2.0 is the first downloadable XML version of VOICE that is
annotated with part-of-speech tags and lemmatization. VOICE thus constitutes the
first publicly available corpus of spoken ELF to be annotated in this way. VOICE
POS XML was published in January 2013 and is made available as a free-of-charge
resource under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0
Unported (CC BY-NC-SA 3.0) License, as VOICE XML (http://creativecommons.org/licenses/by-nc-sa/3.0/). It is based on
the same source data as VOICE 2.0 Online and VOICE 2.0 XML, though there are a
number of differences with regard to the encoding scheme (cf. the README file in
the download package for a list of these). VOICE POS XML can be downloaded as part of the
current download package of VOICE.

This version of VOICE was created by Barbara Seidlhofer (project director),
Stefan Majewski, Ruth-Osimk-Teasdale, Marie-Luise Pitzl, Michael Radeka (project
researchers) and Nora Dorn between June 2009 and January 2013 at the University
of Vienna. The tokenisation was conducted by Stefan Majewski and Michael Radeka,
lemmatization by Michael Radeka, and part-of-speech tagging by Ruth
Osimk-Teasdale and Michael Radeka. Nora Dorn and Marie-Luise Pitzl contributed
to the development of parts of the tagging methodology, and Leopold Lippert
developed a categorisation scheme for the tagging of spelt items. The conversion
to XML and TXT formats was done by Stefan Majewski and Michael Radeka. Henry
Widdowson substantially contributed to the POS-tagging process with valuable
ideas and helpful comments in many meetings and discussions, and through helping
with the editing of numerous texts.

For the tokenisation, lemmatization and part-of-speech tagging of VOICE,
available state-of-the-art tools and methodologies were considered and used.
However, the unique data required novel combinations and extensions of these,
and sometimes the development of a completely new, unconventional methodology.
For a more detailed account of the tagging procedures and methodology used for
VOICE POS, please consult the VOICE Part-of-Speech Tagging and Lemmatization Manual.

Before working with VOICE POS XML, we strongly recommend consulting the README
file in the download package.

We are interested to learn about any work based on VOICE POS XML. Please do send us
a message at voice@univie.ac.at.