Text files have the extension .txt. Besides the text, they contain
Helsinki text level codes, converted into HTML type codes, as
outlined in Text markup. The
original page layout is not retained. Rather, the text is divided into
tokens, which generally correspond to a main clause together with any
subordinate clauses that it contains. Each token is associated with a
token ID, enclosed in parentheses, which contains the name of the file,
a page reference to the printed text (possibly including a volume
reference), and a running token number that locates the token within the
computer file. Tokens may also consist entirely
of text level codes. Such tokens do not have IDs, but
they are counted by the token counter, which can lead to gaps in the
running token numbers. Punctuation in text files is separated from the
words in order to simplify searches.

<P_2>
<heading>
I . (CMMALORY,2.3)
Merlin (CMMALORY,2.4)
</heading>
HIT befel in the dayes of Uther Pendragon , when he was kynge of all
Englond and so regned , that there was a myghty duke in Cornewaill that
helde warre ageynst hym long tyme . (CMMALORY,2.6)
and the duke was called the duke of Tyntagil . (CMMALORY,2.7)
And so by meanes kynge Uther send for this duk chargyng hym to brynge
his wyf with hym . (CMMALORY,2.8)
for she was called a fair lady and a passynge wyse . (CMMALORY,2.9)
and her name was called Igrayne . (CMMALORY,2.10)
So whan the duke and his wyf were comyn unto the kynge , by the meanes
of grete lordes they were accorded bothe . (CMMALORY,2.11)

In general, it has not been possible to retain the markup conventions of
the Helsinki Corpus in their original form because of conflicts with the
annotation system. The major changes made are as follows:

The representation of the text as printed on the page is not
retained. The text is presented in main clause units, as described
under File formats, rather than line by
line.

All text level codes in the text have been changed to HTML
type codes or omitted as follows:

Editor comments are either omitted or enclosed in
{ED:...}. Comments added by Helsinki or Penn are enclosed in
{COM:...}.

Clear errors in the printed text are sometimes corrected, generally
following a suggestion by the editor, but occasionally without outside
support, especially in cases involving an item's part of speech.