XML employs a strongly hierarchical document model. At various points, these Guidelines discuss problems that arise when using XML to encode textual features that either do not naturally lend themselves to representation in a strictly hierarchical form or conflict with other hierarchies represented in the markup. Examples of such situations include:

Conflict between the hierarchy established by the physical structure of a document (e.g., volume, page, column, line) and its rhetorical or linguistic structure (e.g., chapters, paragraphs, sentences, acts, scenes, etc.)

Conflict between a verse text's metrical structure (e.g., its arrangement in stanzas and metrical lines) and its rhetorical or linguistic structure (e.g., phrases, sentences, and, for plays, acts, scenes, and speeches).

Conflict between metrical, rhetorical, or linguistic structure and the representation of direct speech, especially if the quoted speech is interrupted by other elements (e.g., ‘What’, she asked, ‘was that all about’) or crosses metrical, rhetorical, or linguistic boundaries.

Conflict between different analytical views or descriptions of a text or document, e.g., markup intended to encode diplomatic information about a word's appearance in a manuscript with markup intended to describe its morphology or pronunciation.

Non-nesting information poses fundamental problems for any XML-based encoding scheme, and it must be stated at the outset that no current solution combines all the desirable attributes of formal simplicity, capacity to represent all occurring or imaginable kinds of structures, suitability for formal or mechanical validation. The representation of non-hierarchical information is thus necessarily a matter of trade-offs among various sets of advantages and disadvantages.

These Guidelines support several methods for handling non-hierarchical information:

stand-off markup: the annotation of information by pointing at it, rather than by placing XML tags within it (discussed in 20.4 Stand-off Markup)

Some of these methods can be used in TEI-conformant or -conformable documents. Others require extension.

In the sections which follow these techniques are described and their advantages and disadvantages are briefly discussed. The various solutions to the problem will be exemplified using extracts from two poems. The first is the opening quatrain from William Wordsworth's Scorn not the sonnet:

Scorn not the sonnet; critic, you have frowned,

Mindless of its just honours; with this key

Shakespeare unlocked his heart; the melody

Of this small lute gave ease to Petrarch's wound.

The second example is the third stanza from the fourth section of Robert Pinsky's Essay on Psychiatrists:

Catholic woman of twenty-seven with five children

And a first-rate body—pointed her finger

at the back of one certain man and asked me,

"Is that guy a psychiatrist?" and by god he was! "Yes,"

She said, "He looks like a psychiatrist."

Grown quiet, I looked at his pink back, and thought.

These two texts can be analysed in various ways. The first, which we might describe as the ‘Metrical View’, encodes the text according to its metrical features: line divisions (as here), stanzas or cantos in larger poems, and perhaps prosodic features like stress or syllable patterns, alliteration, or rhyme. A second view, which we might describe as the ‘Grammatical’, encodes linguistic and rhetorical features: phonemes, morphemes, words, phrases, clauses, and sentences. A third view, the ‘Dialogic’, might concentrate on narrative voice: distinguishing between the narrator and their interlocutors and identifying individual segments as direct quotations. In our examples, we will restrict ourselves to relatively simple conflicts: for the Metrical View we will encode only metrical lines and line groups; for the Grammatical View we will restrict ourselves to encoding sentences; and for the Dialogic View, we only will distinguish direct quotation from other narration.

Conceptually, the simplest method of disentangling two (or more) conflicting hierarchical views of the same information is to encode it twice (or more), each time capturing a single view.

Thus, for example, the Metrical View of Scorn not the sonnet might be encoded as follows, using the l element to encode each metrical line:

<l>Scorn not the sonnet; critic, you have frowned,</l><l>Mindless of its just honours; with this key</l><l>Shakespeare unlocked his heart; the melody</l><l>Of this small lute gave ease to Petrarch's wound.</l>

The Grammatical View would be encoded by taking the same text and replacing the metrical markup with information about its sentence structure:

<p><seg>Scorn not the sonnet;</seg><seg>critic, you have frowned, Mindless of its just honours;</seg><seg>with this key Shakespeare unlocked his heart;</seg><seg>the melody Of this small lute gave ease to Petrarch's wound.</seg></p>

Likewise, the more complex passage from Pinsky could be encoded in three different ways to reflect the different metrical, grammatical, and dialogic views of its text:

<lg><l>Catholic woman of twenty-seven with five children</l><l>And a first-rate body—pointed her finger</l><l>at the back of one certain man and asked me,</l><l>"Is that guy a psychiatrist?" and by god he was! "Yes,"</l><l>She said, "He <emph>looks</emph> like a psychiatrist."</l><l>Grown quiet, I looked at his pink back, and thought.</l></lg>

<p><seg>Catholic woman of twenty-seven with five children And a first-rate body—pointed her finger at the back of one certain man and asked me, "Is that guy a psychiatrist?" and by god he was!</seg></p><p><seg>"Yes," She said, "He <emph>looks</emph> like a psychiatrist."</seg></p><p><seg>Grown quiet, I looked at his pink back, and thought.</seg></p>

<ab>Catholic woman of twenty-seven with five children And a first-rate body—pointed her finger at the back of one certain man and asked me,<said>Is that guy a psychiatrist?</said> and by god he was!<said>Yes,</said> She said, <said>He <emph>looks</emph> like a psychiatrist.</said> Grown quiet, I looked at his pink back, and thought.</ab>

This method is TEI-conformant. Its advantages are that each way of looking at the information is explicitly represented in the data and that the individual views are simple to process. The disadvantages are that the method requires the maintenance of multiple copies of identical textual content (an invitation to inconsistency) and that there is no explicit indication that the various views, which might be in separate files, are related to each other: it might prove difficult to combine the views or access information from one view while processing the file that contains the encoding of another.80

A second method for accommodating non-hierarchical objects in an XML document involves marking the start and end points of the non-nesting material. This prevents textual features that fall outside the privileged hierarchy from invalidating the document while identifying their beginnings and ends for further processing. The disadvantage of this method is that no single XML element represents the non-nesting material and, as a result, processing with XML technologies is significantly more difficult.

The empty elements used at each end are called segment-boundary elements or segment-boundary delimiters. There are several variations on this method of encoding.

For some common structural features, the TEI provides milestone elements that can be used to mark the beginning of a textual feature. These include lb, pb, cb, handShift, and the generic milestone. Using lb, for example, it is possible to indicate both the physical lineation of a poem on the page and its grammatical division into sentences:

<p><seg><lb n="1"/>Scorn not the sonnet;</seg>; <seg>critic, you have frowned, <lb n="2"/>Mindless of its just honours;</seg><seg>with this key <lb n="3"/>Shakespeare unlocked his heart;</seg><seg>the melody<lb n="4"/>Of this small lute gave ease to Petrarch's wound.</seg></p>

The use of these elements is by definition TEI-conformant. Care should be taken, however, that the meaning of the milestone elements is preserved: semantically, for example, lb is used to mark the start of a new (typographical) line. While in much modern poetry, typographical and metrical line divisions correspond, lb does not itself make a metrical claim: in encoding verse from sources, such as Old English manuscripts, where physical line breaks are not used to indicate metrical lineation, the correspondence would break down entirely.

The segment boundaries also may be delimited by the generic anchor element. Attributes can then be used to indicate the type of feature being delimited and whether a given instance opens or closes the feature.

<l><anchor subtype="sentenceStart"type="delimiter"/> Scorn not the sonnet;<anchor subtype="sentenceEnd"type="delimiter"/><anchor subtype="sentenceStart"type="delimiter"/> critic, you have frowned,</l><l>Mindless of its just honours; <anchor subtype="sentenceEnd"type="delimiter"/><anchor subtype="sentenceStart"type="delimiter"/> with this key</l><l>Shakespeare unlocked his heart; <anchor subtype="sentenceEnd"type="delimiter"/><anchor subtype="sentenceStart"type="delimiter"/> the melody</l><l>Of this small lute gave ease to Petrarch's wound. <anchor subtype="sentenceEnd"type="delimiter"/></l>

Another approach is to design custom elements that provide richer information about the feature being delimited or its boundaries. This information can be included as attribute values or as part of the element name itself: e.g., <boundaryStart element="sentence"/>... <boundaryEnd element="sentence"/>, <sentenceBoundary position="start"/>... <sentenceBoundary position="end"/>, or <sentenceBoundaryStart/>... <sentenceBoundaryEnd/>:

<l><n:sentenceBoundaryStart/>Scorn not the sonnet;<n:sentenceBoundaryEnd/><n:sentenceBoundaryStart/>critic, you have frowned,</l><l>Mindless of its just honours; <n:sentenceBoundaryEnd/><n:sentenceBoundaryStart/>with this key</l><l>Shakespeare unlocked his heart; <n:sentenceBoundaryEnd/><n:sentenceBoundaryStart/>the melody</l><l>Of this small lute gave ease to Petrarch's wound. <n:sentenceBoundaryEnd/></l>

If the custom elements can be replaced by TEI elements and attributes without loss of information, this method is TEI-conformable (see 23.4 Conformance); if the custom elements introduce information or distinctions that cannot be captured using standard TEI elements, the method is an extension.

Finally, elements that are normally used to encode nesting textual features (e.g., said, seg, l, etc.) can be adapted so that they serve as empty segment boundary delimiters when the features they encode cross-hierarchical boundaries. Additional attributes (sID and eID in the example below) are added to these elements in order to allow the unambiguous correlation of start and end points. This method has been introduced in the markup literature under various names, including Trojan milestones, HORSE markup, CLIX, and COLT. It is described in detail by DeRose (2004)):

<lg><l><seg>Scorn not the sonnet;</seg><hr:s sID="s02"/>critic, you have frowned, </l><l>Mindless of its just honours; <hr:s eID="s02"/><hr:s sID="s03"/>with this key </l><l>Shakespeare unlocked his heart; <hr:s eID="s03"/><hr:s sID="s04"/>the melody </l><l>Of this small lute gave ease to Petrarch's wound. <hr:s eID="s04"/></l></lg>

Depending on how the modifications are carried out, this method may be TEI-conformable, represent an extension of the TEI, or produce a non-conformant document.

The method is TEI-conformable if the modified elements are placed in a distinct, non-TEI namespace (see 23.4.4 Use of the TEI Namespace), and if the modified elements and attributes can be mapped without loss of information to existing TEI markup structures such as milestone or anchor elements automatically (see 23.4 Conformance).

The method represents an Extension if the modified elements are placed in a distinct, non-TEI namespace, but contain information or distinctions that cannot be algorithmically translated to existing TEI elements without loss of information (see 23.4 Conformance).

In each of the above examples (except the last), the relationship between the start and end delimiters (where these exist) of a given feature is implicit: it is assumed that "end" delimiters close the nearest preceding "start" delimiter, or, in the case of milestones, that the milestone marks both the end of the preceding example and the beginning of the next. Complications arise, however, when the non-nesting text overlaps with other non-nesting text of the same type, as, for example, in a grammatical analysis of the various possible interpretations of the noun phrase fast trains and planes. In this case, the adjective fast can be understood as either modifying trains and planes or just trains:

Figure 20.1. Two interpretations of the phrase Fast trains and planes

In order to encode the possible analyses of this phrase, an unambiguous method of associating opening and closing segment boundary delimiters is required:

In this encoding, the first interpretation, in which fast modifies the NP trains and planes, the NP trains and planes is opened using an anchor tag with the xml:id value NPInterpretationA and closed with an anchor with the same value on corresp; in the second interpretation, in which fast forms a NP with trains, the NP fast cars is opened using an anchor tag with the xml:id value NPInterpretationB and closed with an anchor tag that has the same value on corresp.

Despite their advantages, segment boundary delimiters incur the disadvantage of cumbersome processing: since the elements of the analysis (e.g., the sentences in the poems, or phrases in the above example) are not uniformly represented by nodes in the document tree, they must be reconstituted by software in an ad hoc fashion, which is likely to be difficult and may be error prone.

Most important for some encoders, the method also disguises the relationship between the beginning and the ending of each logical element. This makes it impossible for standard validation software to provide the same kind of validation possible elsewhere in the encoding. When using grammar-based schema languages it is not possible to define a content model for the range limited by empty elements.81

A third method involves breaking what might be considered a single logical (but non-nesting) element into multiple smaller structural elements that fit within the dominant hierarchy but can be reconstituted virtually. For example, if a passage of direct discourse begins in the middle of one paragraph and continues for several more paragraphs, one could encode the passage as a series of said elements, each fitting within a p element. The resulting encoding is valid XML, but the text in each said element represents only a portion of the complete passage of direct discourse. For this reason these elements are sometimes called ‘partial elements’.

In the case of our selection from Pinsky's poem, for example, the second passage of direct quotation, which crosses a line boundary and is broken up by a She said in the narrator's voice, can be made to fit within the hierarchy established by the metrical lineation by using two said elements:

<lg><l>Catholic woman of twenty-seven with five children</l><l>And a first-rate body—pointed her finger</l><l>at the back of one certain man and asked me,</l><l><said n="quotation1">Is that guy a psychiatrist?</said> and by god he was!<said n="quotation2">Yes,</said></l><l>She said, <said n="quotation2">He <emph>looks</emph> like a psychiatrist.</said></l><l>Grown quiet, I looked at his pink back, and thought.</l></lg>

Similarly, the sentences in our example from Wordsworth could be encoded:

<l><seg n="sentence1">Scorn not the sonnet;</seg><seg n="sentence2">critic, you have frowned,</seg></l><l><seg n="sentence2">Mindless of its just honours;</seg><seg n="sentence3">with this key</seg></l><l><seg n="sentence3">Shakespeare unlocked his heart;</seg><seg n="sentence4">the melody</seg></l><l><seg n="sentence4">Of this small lute gave ease to Petrarch's wound.</seg></l>

There are two main problems with this type of encoding. The first is that it invariably means that the encoding will have more elements claiming to represent a feature than there are actual instances of that feature in the text. Thus, for example, the passage from Scorn not the sonnet marks seven spans of text using seg, even though there are only four linguistic sentences in the passage.

The second problem is that it can be semantically misleading. Although they are tagged using the element for sentence, for example, very few of the textual features encoded using seg in this example represent actual linguistic sentences: with this key, for example, is a prepositional phrase, not a sentence; Of this small lute gave ease to Petrarch's wound is a string corresponding to no single grammatical category.

Taken together, these problems can make automatic analysis of the fragmented features difficult. An analysis that intended to count the number of sentences in Wordsworth's poem, for example, would arrive at an inflated figure if it understood the seg elements to represent complete rhetorical sentences; if it wanted to do an analysis of his syntax, it would not be able to assume that seg delimited linguistic sentences.

The technique of fragmentation is often complemented by the technique of virtual joins. Virtual joins may be used to combine objects in the text to a new hierarchy. Here is Scorn not the sonnet again; this time the relationship between the parts of the fragmented sentences is indicated explicitly using the next and prev attributes described in 16.7 Aggregation.

<l><seg>Scorn not the sonnet;</seg><seg next="#s2b" xml:id="s2a">critic, you have frowned,</seg></l><l><seg prev="#s2a" xml:id="s2b">Mindless of its just honours;</seg><seg next="#s3b" xml:id="s3a">with this key</seg></l><l><seg prev="#s3a" xml:id="s3b">Shakespeare unlocked his heart;</seg><seg next="#s4b" xml:id="s4a">the melody</seg></l><l><seg prev="#s4a" xml:id="s4b">Of this small lute gave ease to Petrarch's wound.</seg></l>

This method of virtually joining partial elements is sometimes called ‘chaining’.

For fragments encoded using ab, l, lg, div, or elements that belong to the att.segLike class, an even simpler mechanism for virtually joining fragments exists: the use of the part attribute with the value I (Initial), M (Medial), or F (Final) as described in 16.3 Blocks, Segments, and Anchors. Here is the above example recoded to reflect this method:

<l><seg>Scorn not the sonnet;</seg><seg part="I">critic, you have frowned,</seg></l><l><seg part="F">Mindless of its just honours;</seg><seg part="I">with this key</seg></l><l><seg part="F">Shakespeare unlocked his heart;</seg><seg part="I">the melody</seg></l><l><seg part="F">Of this small lute gave ease to Petrarch's wound.</seg></l>

This method is TEI-conformant and simple to use. Its disadvantage is that it does not work well for cases of self-overlap, or if there are nested occurrences of the same element type, as it can become difficult to ascertain which initial, medial, or final partial element should be combined with which others or in which order. This problem becomes evident if we attempt to combine a detailed Grammatical view of the Pinsky example with its metrical encoding:

<lg><l><seg part="I">Catholic woman of twenty-seven with five children</seg></l><l><seg part="M">And a first-rate body—pointed her finger</seg></l><l><seg part="M">at the back of one certain man and asked me,</seg></l><l><seg part="F">"<seg>Is that guy a psychiatrist?</seg>" and by god he was!</seg><seg part="I">"<seg part="I">Yes,</seg>"</seg></l><l><seg part="F">She said, "<seg part="F">He <emph>looks</emph> like a psychiatrist.</seg>"</seg></l><l><seg>Grown quiet, I looked at his pink back, and thought.</seg></l></lg>

A third method for aggregating fragmented partial elements involves using markup that is not directly part of the encoding, e.g., the join element. In this method, a join element is used elsewhere in the document to indicate explicitly the members of the virtual element:

The major advantage of fragmentation and virtual joins is that it allows all the hierarchies in the text to be handled explicitly: both the privileged one directly represented and the alternate hierarchy that has been split up and rejoined. The major disadvantages are that (like most of the other methods described here) it privileges one hierarchy over the others, requires special processing to reconstitute the elements of the other hierarchies, and, except in the case of join, can be semantically misleading.

Most markup is characterized by the embedding of elements in the text. An alternative approach separates the text and the elements used to describe it. This approach is known as stand-off markup (see section 16.9 Stand-off Markup). It establishes a new hierarchy by building a new tree whose nodes are XML elements that do not contain textual content, but rather links to another layer: a node in another XML document or a span of text. This approach can be subdivided according to different criteria. A first distinction concerns the link base, i.e. the content to which annotations are to be applied. Sometimes the link target contains markup that can be referred to explicitly, as in the following example where the offset markup uses the xml:id values on w to provide targets for <xi:include>82:

Note that the layer that uses XInclude to build another hierarchy might well be in another document, in which case the value of href of <xi:xinclude> would need to be the URL of the document that contains the base layer, in this case the w elements.

This is very similar to the use of join discussed above. The main advantages of the stand-off method are that it is possible to specify attributes on the aggregate seg elements, and that there exists off-the-shelf software that will perform appropriate processing. Stand-off markup may be used even when the base text being annotated is plain text, i.e. does not have any XML encoding. In this case, the range of text to be marked up is indicated by character offsets (see 16.2.4 TEI XPointer Schemes, in particular 16.2.4.7 string-range()). Another distinction concerns the number of files which can serve as link targets. Often, one (dedicated) annotation is used as the link target of all the other annotations. It is also possible to freely interlink several layers.

It has been noted that stand-off markup has several advantages over embedded annotations. In particular, it is possible to produce annotations of a text even when the source document is read-only. Furthermore, annotation files can be distributed without distributing the source text. Further advantages mentioned in the literature are that discontinuous segments of text can be combined in a single annotation, that independent parallel coders can produce independent annotations, and that different annotation files can contain different layers of information. Lastly, it has also been noted that this approach is elegant.

But there are also several drawbacks. First, new stand-off annotated layers require a separate interpretation, and the layers—although separate—depend on each other. Moreover, although all of the information of the multiple hierarchies is included, the information may be difficult to access using generic methods.

Inasmuch as it uses elements not included in the TEI namespace, stand-off markup involves an extension of the TEI.

There exist many non-XML methods of encoding a text that either solve or do not suffer the problem of the inability to encode overlapping hierarchies. These include, but are not limited to, the following proposals.

Applying the notion of concurrent markup to XML (Hilbert et al. (2005)). This reintroduces the CONCUR feature of SGML, which was omitted from the XML specification.

Designing a form of document representation in which several trees share all or part of the same frontier, and in which each individual view of the document has the form of a tree (see Dekhtyar and Iacob (2005)).

The ‘colored XML’ proposal (Jagadish et al. (2004)), which stores a body of information as a set of intertwined XML trees. This approach eliminates unnecessary redundancy and makes the database readily updatable, while allowing the user to exploit different hierarchical access paths.

The MultiX proposal (Chatti et al. (2007)) , which represents documents as directed graphs. Because XML is used to represent the graph, the document is, at least in principle, manipulable with standard XML tools.

The Just-In-Time-Trees proposal (Durusau and O'Donnell (2002)), which stores documents using XML, but processes the XML representation in non-standard ways and allows it to be mapped onto data structures that are different from those known from XML.

The Layered Markup and Annotation Language (LMNL) proposal. This offers alternatives to the basic XML linear form as well as its data and processing models. It uses an alternative notation to XML and a data structure based on Core Range Algebra (Tennison and Piez (2002)).

Markup Languages for Complex Documents (MLCD). This provides a notation (TexMECS) and a data structure (Goddag) as well as a draft constraint language for the representation of non-hierarchical structures; see Huitfeldt and Sperberg-McQueen (2001).

These approaches are based either on non-standard XML processing or data models, or not based on XML at all. Since TEI is currently based on XML they are not described any further in these Guidelines. Use of these methods with the TEI will certainly involve extensions; in most cases the documents will also be non-conformant.

It has been shown, however, that it is possible to relate the different annotations in an indirect way: if the textual content of the annotations is identical, the very text can serve as a means for linking the different annotations, as described in Witt (2002).

Grammar based schema languages (e.g., DTD, W3C Schema, and RELAX NG) are used to define markup languages (e.g., XHTML or TEI). Rule-based schema languages (e.g., Schematron) can be used to define further constraints. Such a rule-based schema language permits a sequence of certain elements between empty elements to be legitimized or prohibited.