Using XML based on the Guidelines of the TEI (Text Encoding Initiative) to Create a Digital Archive of Family Correspondence: a Sample Letter

March 2009, revised/edited April 2012

This is the first page of a letter written in December, 1938, by my maternal great-uncle, Raymond Higgins to his sister Claire Ward. My great-uncle was then serving in Habanniya, Iraq, as a Leading Aircraftsman and nursing orderly with the Royal Air Force. My grandmother Claire Ward (nee Higgins), saved the letter and numerous other items of correspondence and memorabilia, dating between 1920 and 1990; my mother and her sister are currently in the process of organizing and transcribing the material. The letter refers to several other members of my family, including my grandfather, Arthur Ward, and another great-uncle, Anthony Higgins, who was killed during a training mission for the RAF during World War II.

As an assignment for a course in Digital Humanities with Alan Galey at the University of Toronto, I began planning for a digital archive of family correspondence by marking up this letter in XML based primarily on the Guidelines of the My aim in this pilot markup was to explore how future users (both within my own family and without) might use a digital version of manuscript correspondence to explore a wide range of questions about genealogy, military history, geography, historical linguistics and other topics. The project is also a complement to the efforts of one of my sisters, a high school teacher, who has prepared a teaching module based on the wartime experience of my great-uncle Tony, using correspondence and personal effects saved by my grandmother.

In marking up the letter, I have attempted to capture the semantic complexity of the text while maintaining as much information as possible about the physical characteristics of the original document. During the markup process, I encountered problems and questions in trying to represent information in three major areas:

the structure and physical form of a manuscript letter

the identity of and relationships between named individuals

the classification of distinct forms of language such as slang and abbreviations

The structure of the sample letter is quite simple, and follows normal conventions of letter writing. The writer’s name, address and the date are given in a heading; in this case, Raymond Higgins’ military identification number is also included, likely because they were required by military regulation. The body of the letter opens with a standard salutation, followed by several paragraphs of prose, which close with another standard greeting. The front matter or heading of the letter was distinguished from the main body of the letter using the tags <front> and <body> and each paragraph was tagged with the <p> tag. The address and date contained in the front matter were also tagged as such. A standard form of the date and tags distinguishing sub-components of the address were included so that future users would be able to search letters by date, city and country of writing.

As a book historian, I feel that information about the physical form of the documents included in any future digital archive would be essential to understanding their content, and vice versa; for example, the fact that Raymond Higgins wrote his letters home on cheap and lightweight foolscap and on small scraps of waste paper can tell us more about his social and class status, the cost of postage, and his attitude to letter writing. In reviewing other digital humanities archives such as the I had also found that the emotional immediacy of personal letters was lessened when idiosyncratic handwriting was transformed into standard characters on screen.

I therefore represented the physical characteristics of the letter by:

Using the <facs> tag, to include digital facsimiles of each page of the letter that would allow the user to view Raymond’s handwriting and ink-smudges (and to check the accuracy of my transcription)

including <lb> and <pb> tags to conserve information about the location of line breaks and page breaks in the original

using <fw> or forme work tag to indicate the page numbers Raymond had included at the top of the final two pages of the letter

using the <del> tag with varying attributes to represent the method used to delete words.

including information about the material of a manuscript by using the <material> tag nested within the <physDesc> and <msDesc> tags

(further investigation is needed into how to use TEI tags to represent physical aspects such as paper-type, collation, etc)

To increase the utility of the archive for genealogists and historians, I then turned to the problem of rendering searchable information on named individuals and their family relationships. The first step was identifying all of the named individuals within the letter; a phone call to my mother and great-aunt were all that was necessary to establish the facts, but one can imagine the challenges of verifying this kind of information when the individuals mentioned in personal letters are long dead or unknown to those working on the project. Following the guidelines in I then tagged all personal names using the <PersName> and associated tags such as <forename> and <surname>. I also identified each named individual with a unique number using the @key attribute, which “provides an externally-defined means of identifying the entity (or entities) being named, using a coded value of some kind.” (13.1.1).

I imagine that the planned digital archive would be linked to a database containing biographical information about all named individuals, extending perhaps to photographs, and voice and video recordings where available. The @key attribute also provides a way of linking nicknames or honorifics to canonical forms of names; for example, the “Aunt Rose” mentioned in this letter was Rose Cook, the sister of Raymond’s stepmother.

The use of pronouns emerged as an interesting problem when encoding the letter. A researcher using a body of letters to track references to a single individual will want to search for all letters in which that individual is mentioned, even when that individual’s name does not appear. For example, Raymond refers in this letter to the fact that his older brother Anthony is “knocking about with a girl,” but never mentions her by name. She is not mentioned by name in any of the other surviving letters from this period, and the only member of my family who remembered this woman was my ninety-year old great aunt, who recalled that her name was Coreen McGillicudy. To preserve this information, I have chosen to tag the reference to “her” using <PersName> and to include an entry on Coreen McGillicudy within the database of biographical information. The decision to tag this individual in this way serves one very specific purpose: researchers or students interested in the wartime experience of my grandmother’s late brother Tony might be interested in any and all references to his girlfriend and her reaction to his death, especially since the experience of women during wartime is a topic of interest.

However, the problem of identifying Coreen McGillicudy can also illuminate some broader questions about genealogical and historical research. How many individuals, particularly women, are absent from written history because they were never named in documents? How much information about family history is never recorded and is lost when older family members die? How might we use text encoding to stem the loss of information and track individuals who are in danger of disappearing from the record?

The letter also presented encoding challenges because it contains a substantial quantity of distinct language, including slang and abbreviations Before I began the encoding process, most of these distinctive linguistic features had remained invisible to me; the “voice” used throughout the letter is almost as familiar to me as my own Granny’s, and her conversation was always filled with British working-class slang and usage.

Once I had identified words or phrases that would be unfamiliar to most users of a planned digital archive, I tagged them as <distinct> and used the associated attributes @type and @space to specify that some terms were associated with the military or with the U.K. I also identified all abbreviations in the text and provided expansions for them; I used the <choice> tag so that future users would be able to choose to view the document with abbreviations in condensed or expanded form.

Due to time constraints, I decided not to undertake the amount of etymological research that would have been necessary to tag each distinct term with information about the time, place, and social context in which the term would have been used. The TEI does provide the tools to do so, however, and an archive of family letters tagged in this way would provide an invaluable source of information to historical linguists about how slang usage changed over time and according to the place of residence, income or occupation of individuals. Within the time available, I tried to locate information on slang usage and abbreviations from credible sources, primarily the Oxford English Dictionary, Shorter Slang Dictionary and a list of abbreviations compiled by Royal Air Force Associations.1.

The process of encoding this letter showed me the level of expertise and research that would be necessary to tag a digital archive of family letters with rich and accurate information that would be useful to a range of researchers. The ideal team for such a project would contain individuals skilled in material bibliography, in biographical and genealogical research, and in historical linguistics and etymology. Although I have chosen to use fairly simple methods of tagging geographic and military aspects of this letter, it would also be possible to create very detailed mark-up about the locations mentioned in each letter, and the military units and personnel involved in the operations described. Although the labour involved in creating such an archive would be substantial, I am convinced that the process would be satisfying and useful.

Yeah, that would probably lead to many many good online conversations with word nerds:-B I think crowdsourcing geographic markup would also be really cool (or even linking to images of historical places, for example). This is really more of an intellectual experiment at this point, so that’s a good factor to add to the experiment! Thanks MB!