Her first point is that the letters are a form of persona, which seems obvious, yet also slightly counter intuitive to me. Although the letters are private, they afford the writer a space to create a character for the reader. There is a small amount of space to escape from social norms and to experiment, be bold or otherwise. It is that space that social media now plays to some degree: part play, part communication.

The staging of personae is fluid though. It changes over time and as relationships change. It is something that is worth baring in mind when editing collections or even reading them. Although the author is not necessarily playing with the reader, it is not the intended reader who reads them or has edited them. Different selves, different voices?

The issue of reconstructing the letters after nineteenth century editing (sometimes a somewhat destructive experience), lost letters and also an authorial wish for a post mortem destruction of material almost makes the editing process one of rebuilding and repair. The minutiae of the letters helps reconstruct the timeline as dating might be altered during previous editing processes or the dating, which could be “Thursday”.

Part of this begins making me query the use of building collections of letters from sources like project Gutenberg. The question is not so much one of the text but the information that might be lost through the creation of that text. As Pip Willcox reminded me it is useful for some of my interests in network analysis and playing around with, but it troubles me now.

If we want the data to be useful and open, either how do we get high quality data into the open arena and also how do we annotate or develop the data so that it has the same integrity? One answer might be Text Encoding Initiative (TEI) but the encoding is not easy and links the text into XML, rather than something more malleable than JSON. I think that this is one of the questions to be raised at the Open Humanities Hack on November 28th.

The second query is how do we deal with the provenance of the data? Who has edited what in proofreading, and what is the editorial history of the text? It would appear from first glance that the W3 Prov-DM model helps with this and the TEI header does this as well. How can we add this into data sets to inform users of changes and edits.

Although somewhat out of my current area, this series has made me think about texts in a different way. Rather than being pieces of data that are essentially pliable, they take on their own lives and personae.