Helsinki Corpus of English Texts

The Helsinki Corpus of English Texts is a structured multi-genre diachronic corpus, which includes periodically organized text samples from Old, Middle and Early Modern English. Each sample is preceded by a list of parameter codes giving information on the text and its author. The Corpus is useful particularly in the study of the change of linguistic features in long diachrony. It can be used as a diagnostic corpus giving general information of the occurrence of forms, structures and lexemes in different periods of English. This information can be supplemented by evidence yielded by more special and focused historical corpora.

For information on the XML version of the Helsinki Corpus, click here.

Project leader: Matti Rissanen, University of HelsinkiProject secretary: Merja Kytö, Uppsala UniversityTime of compilation: 1984–1991Size: 1,572,800 wordsLanguage: English (Old, Middle, Early Modern) Number of texts/samples: c. 450Period: c. 730–1710Released: 1991Funding: The University of Helsinki; The Academy of Finland

Student assistants

File format

The coding system is based on the set of ASCII codes (96 printable characters).
The names of the 242 files follow MS-DOS conventions, limiting available characters to eight.
Each file name begins with the character C (for `Corpus'),
followed by O (for `Old English'), M (for `Middle English' or E
(for `Early Modern English'). The file names reflect, by and
large, the names of authors or texts in Old and Middle English
sections of the Corpus. In the Early Modern English section the
file names are based on the systematic coverage of different text
types.