TüPP-D/Z is a collection of newspaper articles written in German, automatically annotated with clause structure, topological fields, chunks and some low level annotation including POS, morphological ambiguity classes and information about some regular types of named entities including numerical expressions such as dates, numbers and units. The raw text of the corpus consists of more than 200 million words.

description.price
*This metadata is only as a guide

License fee for scientific use: 50 Euro (Covering cost of DVD and postage. Not covering the fee for the taz newspaper article CD)

Users must buy the taz data CD or obtain a license for the taz scientific edition from contrapress media gmbh (http://www.taz.de)

For commercial use : Contact Yannick Versley, SfS, University of Tübingen at: versley@sfs.uni-tuebingen.de

: The root element. A collection consists of one or more days. : A day holds all articles available for one day of taz data. : An article groups the text of an article with bibliographical information.

[Topoligical Field types]:
CF: Complementizer filed.
VCL_: Left part of the sentence bracket. Contains one finite verb of the categories lexical verb, auxiliary verb or modal verb.
VCR_: Right part of the sentence bracket.
VF: Vorfeld. Enclosed by the beginning of the sentence on the left-hand side and VCL_ on the right.
MF: Mittelfeld. Enclosed by the left part of the sentence bracket and the right part of the sentence bracket.
NF: Nachfeld. The topological field after the right part of the sentence bracket.
LV: Linksversetzung. For annotating resumptive constructions.
KOORDF: Coordination field.
PARORDF: Coordination field.