Though scholars often balk at the perceived connotations of the digital part of Digital Humanities, one of the discipline’s most interesting and useful outcomes is the way in which it sheds new light on centuries-old practices. In so doing, Digital Humanities encourages us to examine afresh the basic assumptions, conventions, and practices of an activity like transcription, strengthening our understanding of the process and allowing us to better imagine how technology might play a role therein.

At Transcribe Bentham, one particular challenge involves reconciling the use of wiki technology to transcribe Jeremy Bentham’s manuscripts with the transcription methods that have been used for the same purpose in the decades that have passed since the publication of the first volumes of the new Bentham Collected Works in 1968. In recent years, Bentham’s manuscripts have been transcribed directly into Microsoft Word, with another member of the Bentham Project then checking a hard copy of the transcript and making corrections in pencil.

Scholars at the Bentham Project, like Professor Philip Schofield (left), have developed specific methods for transcribing various features of Bentham’s manuscripts, such as additions, deletions, marginal notes and summaries, and illegible text. For example, where Bentham provides an alternative reading above a word in the text, it is recorded between forwards slashes:

[ . . . ] in this way /manner/ it is possible [ . . . ]

This amounts to a shorthand notation system, where forward slashes are used to identify alternative text. Indeed, in Microsoft Word, it would be possible to transcribe the appearance of the manuscript by formatting the text to make the word ‘manner’ appear in superscript. However, neither of these methods identifies the meaning or function of the word ‘manner’ – instead, they provide a visual key for someone who is familiar with the transcription methods of the Bentham Project. By moving these transcriptions into TEI-compliant XML, the meaning of this authorial operation can be explicitly encoded into the text:

in this way <add type=”alternative”>manner</add>it is possible

The purpose of the Bentham transcripts which have already been completed (some 20,000 of about 60,000) was to aid editors in the preparation of printed volumes for the Collected Works; thus, much deleted text which would not have appeared in the printed volumes was left out of the transcriptions. It will be the aim of Transcribe Bentham to reinstate this deleted text (which may be of interest to some scholars) into the transcriptions, and encode it as such. Once completed, this type of encoding will facilitate much more refined searching than a simple full-text search: for instance, a user may choose to see every instance in which Bentham deleted the work ‘panopticon’.

Eventually, these TEI-encoded transcriptions of Bentham’s manuscripts will increase the utility of the already-invaluable Bentham Papers Database by providing facsimile images and encoded transcriptions of the manuscripts that it currently lists. In this way, it will become an important complementary resource to the printed volumes of the Collected Works.

As we’ve already begun discussing, this flags up a key tension in our community-collaborative project. To what extent is it essential to the project that our users capture the full richness of each manuscript in the typescripts – if, as a consequence, that raises the bar to user participation, engagement and productivity?

A rich marked-up typescript – deletions, corrections, underlinings, interlinear insertions – is a valuable resource, but a plain text representation (that may take a fraction of the time and training to input) is still a valuable contribution to a wide range of research activities, since it instantly makes the manuscript susceptible to powerful searching and indexing tools.

If, for example, it takes 3 times as long to transcribe a manuscript using rich TEI, than as plain text; and if the extra skills required were to alienate about a third of potential contributors: that limits by a factor of 10 (almost) the number of manuscripts that can be transcribed in the lifetime of the project, if we adopt the full-markup approach. Digitisation projects frequently face this kind of decision.

I would be interested to know what research is available into these kinds of factors around advanced digitisation, particularly with TEI markup or similar. Perhaps our experience will also be able to contribute to understanding this.

One approach we could consider is permitting users to adopt either approach, according to their preference, by providing guidance on best practice for each method – ‘simple’, using plain text only, and ‘advanced’, using an appropriate subset of TEI. This would have the merit of allowing virtually anyone to create a useful typescript (albeit at the cost of more editorial work). In this case it might be possible to use the project’s experience to assess the relative merits – to the user and to the project – of each approach.

[…] Rather than paying to outsource manuscript transcription, some projects encourage voluntary participation (or crowdsourcing) using the TEI, like Transcribe Bentham – (you may read a blog post about their use of the TEI). […]

Well, we see quite differences but the thing is, whichever way we do it, work is still work and quality is still the best thing to sustain. So we should always take note of quality of transcribe contents.