Multi-linguality type: Comparable (The corpus consists of a subset of Greek texts (fairy tales) and their translation into Bulgarian, a subset of Bulgarian (fairy tales) with their translation into Greek and a subset of Greek and Bulgarian comparable texts (in the same domain))

Text Format

text/txt

Size

700,000 Words

Character encoding

UTF - 8

ISO - 8859 - 7

Domains

fiction

Modalities

Written Language

Classification

Text type: Poems

Conformance to classification scheme: Other

Text type: Fiction

Conformance to classification scheme: Other

Text type: Fairy tales

Conformance to classification scheme: Other

AnnotationLemmatization

Tagset: ILSP tagset

StandOff: False

Format: XML

Standard practices conformance: XCES

Annotation Mode: Mixed

Semantic Annotation - Named Entities

Tagset: ACE-extended

StandOff: False

Format: XML

Standard practices conformance: Other

Annotation Tools:

MENER

Segmentation

StandOff: False

Segmentation level: Sentence

Format: TIPSTER

Annotation Mode: Automatic

Morphosyntactic Annotation - B Pos Tagging

Tagset: ILSP tagset

StandOff: False

Format: XML

Standard practices conformance: XCES

Geographic coverage

Thrace

Creation

Creation mode details: Scanning & OCR followed by manual checks

Creation mode: Mixed

Original Sources

Scanned texts

Resource Creation

Creation lasted: 10/01/2005 - 09/30/2007

Funding Project

Cultural Parallels: Study and promotion of the cultural inheritance of the neighbouring areas in Greece and Bulgaria through internet technology (Cultural Parallels)

Funding Type: National Funds

Funder: INTERREG IIIA / PHARE CBC GREECE – BULGARIA

Funding Country: GR

Project duration: 10/01/2005 - 09/30/2007

Metadata

Created: 02/02/2012

Last Updated: 01/21/2013

Source: META-SHARE/ILSP

ValidationValidated

Type of Validation: Content

Validation Mode: Mixed

Mode Details: Manual correction of the annotations, mainly of the mixed language texts