VOICE XML

As of January 2013, an updated XML version of VOICE is available for download. The download
package now includes VOICE 2.0 XML, an updated version of the corpus with minor
revisions in some of the corpus texts, as well as VOICE POS XML 2.0 , the first
part-of-speech tagged and lemmatized version of VOICE, which is based on the
same source code as VOICE 2.0 XML. The previously released versions VOICE 1.0
XML (which corresponds to the first release of VOICE 1.0 Online in May 2009) and
VOICE 1.1 XML (an updated version of VOICE, released 5 May 2011) are also
included. VOICE XML is made available under a Creative Commons
Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) License
(http://creativecommons.org/licenses/by-nc-sa/3.0/).

The Vienna-Oxford International Corpus of English (VOICE) was created by Barbara
Seidlhofer (project director) and Angelika Breiteneder, Theresa Klimpfinger,
Stefan Majewski, Marie-Luise Pitzl (project researchers) at the University of
Vienna in a time-, labour- and cost-intensive process. Revisions and corrections
to the corpus texts were made by Ruth Osimk-Teasdale. The tokenisation was
carried out by Stefan Majewski and Michael Radeka. The corpus was lemmatized by
Michael Radeka and part-of-speech tagged by Ruth Osimk-Teasdale and Michael
Radeka. We are making VOICE available to be used for non-commercial
research purposes. It is in this spirit that access to VOICE XML is
provided free of charge to all those who are interested in using the corpus for
such purposes.

VOICE XML broke new ground in 2011 in that it was the first corpus of English as
lingua franca (ELF) to become publicly available for download. It is now also
the first ELF corpus for which a lemmatized, part-of-speech tagged version can
be downloaded. We have taken great care in the compilation of our (just over)
1-million-words corpus to meet the qualitative and technical standards of
state-of-the-art corpus linguistics in data collection, transcription and
encoding.

VOICE XML is released with a considerable amount of corpus documentation and
additional materials for users' convenience (see the README file in the download
package for details). Similarly, the corpus itself contains a substantial amount
of meta-information on speech events and speakers in the corpus header as well
as in individual text headers. We strongly encourage all users of VOICE to take
note of this documentation and the meta-information when working with the
corpus.

We would also like to encourage you to let us know when you publish or present
work based on VOICE XML. Please send us a message at voice@univie.ac.at.