Describing Handwriting, Part II: Terminology

Continuing from Part I of this series on describing handwriting, we have outlined a 'high-level' conceptual system for describing letters in a computer from a palaeographical viewpoint. We can now start to add labels to these entities: that is, to decide on terminology. This is the point that has caused many problems in the past, so I do not suggest that this will begin to solve it. Rather, as an initial starting-point, and to help elicidate the examples in Part I, I give the example of the terminology used in the precursor to this project, namely P.A. Stokes, English Vernacular Script, Ca 990 - Ca 1035 (unpubl. PhD dissertation, Cambridge, 2005). This is available from the Cambridge UL and is discussed in part in an article in Digital Medievalist (see esp. §§24-25). This included detailed description of about 500 scribal hands, including both a prose catalogue and a database record containing just over 17,000 distinct features, and so seems a useful starting-point developed from 'real-world' experience.

Graphemes

[Edit, 17 October 2011: I now realise that Grapheme is incorrect in this context and prefer 'character'. See the discussion under 'Terminology (again)' in Part IV of this series.]

The Old English alphabet is very similar to the modern English one, with two letters missing (j and v), and three additional letters (æ, þ and ð). The letter wynn (ƿ - not visible on all browsers) is here considered an allograph of w. For further details, see the Old English Alphabet tutorial in the LangScape project.

Although not considered a letter as such, the tironian 'nota' (an abbreviation for the word and or its equivalent in the relevant language) must also be considered. This looks very much like a modern 7. Although it is a symbol of abbreviation, not unlike &, the Anglo-Saxons seemed to have included it in their alphabet, as we can see in manuscripts containing syntactic glosses. Punctuation symbols are also graphemes; those normally listed for script of this period are punctus, punctus versus, punctus elevatus, and punctus interrogativus, but others are found, particularly decorative ones at the end of sections.

For the purposes of late Anglo-Saxon script, then, we find 33 distinct graphemes:

If we include Latin then we have a several more, particularly the sequence of abbreviation symbols. An initial list includes & (strictly a ligature but often considered an abbreviation) and abbreviations for (-us) (-bus) and (est). This still leaves the question of ligatures, and potentially accents as well. For the purposes of DigiPal ligatures will probably need to be classified as graphemes, or perhaps as a level above that. Accents seem best considered components of letters.

Graphs, Allographs and Idiographs

To progress, we need first to introduce a new term and refine an old one. The new term is idiograph, which is 'the way (or one of the ways) in which a given writer habitually writes' a given grapheme (Davis 2007: 255). This is distinct from an allograph in that the latter is 'an accepted version of that grapheme' (ibid; my italics). So allographs function at the level of script, and idiographs (as well as graphs) at the level of scribal hands. To describe scribal hands, then, as opposed to scripts, we therefore need to add two new entities to our list: IDIOGRAPH and GRAPH.

The Glossary includes a number of allographs and idiographs that proved useful in the description of English Vernacular minuscule. A short list of these include 'Insular', 'Caroline' and 'cc' a; 'squinting' e; 'long', 'low', 'round' and 'tall' s; 'f-shaped' y; and so on. These examples raise questions, however, regarding the difference between an allograph, an idiograph and a graph, and what terminology we use for them. In particular, do we need names (and distinct entities) for each different allograph and idiograph, or can we represent them by their features? In other words, if we use the phrase 'squinting e', for example, then we imply that this is a distinct entity, a form of e distinct from others, and so we need a name for it. On the other hand, it could rather be conceived of as e which happens to have a given feature, namely a squinting eye? If we follow the second route then we need relatively few entities and correspondingly few names for them. Long, low, round and tall s seem genuine allographs, insofar as the difference between them is not one of a single specific feature but rather the entire shape is different, and so we would reasonably want distinct entities (and so names) for each. Much the same might be said for Insular and Caroline a, but it could also be argued that Caroline a is simply a with a hook. (Indeed, a strict definition of Caroline a has proved very difficult to give, and even more so with Square a).

With these caveats in mind, it seems reasonable to propose that distinct names are required for all examples in a given corpus which (a) are sufficiently different from one another that this difference cannot easily be given as a list of features, and/or which (b) are already recognised in the literature or are of sufficient discriminatory interest that they are useful to be identified as such and not as a list of features. In other words, Caroline a could be abandoned and described instead as Insular a with a hook, but to do so is verbose and unnecessarily confusing.

On this basis, I propose the following list of allographs for English Vernacular minuscule:

Once again, these are either common terms or described in the Glossary. Many are also described by N.R. Ker in his Catalogue of Manuscripts Containing Anglo-Saxon (Oxford, 1957), pp. xxvii-xxxiii.

The question remains which of these are really allographs and which idiographs, and whether idiographs also need their own names, but this is up to the person defining the script rather than being a question of how to model it. The list of possible idiographs is also more or less infinite and dependent entirely on the scribes, so I will not try to give one here. There are some examples below, however.

Components

What about components of letters, then? How do we describe these? In principle, components should be more or less uniform across all of Western Latin (and Latin-based) scripts. In practice, however, this is where the difficulties have been greatest. I think the reasons for this are two-fold. Part is simply that we tend not to agree with each other, and so have many different names for the same thing. This is relatively easy to overcome with a glossary as long as there is a direct match between terms; one person's 'body' may be another's 'bowl', and so on. Other cases are much more complex, though, since the terms do not directly match. This is not surprising, though, since palaeographers of different scripts will want to focus on different details, and will find different ways in which their various components can vary.

Again, it is impossible to even begin doing justice to this question here, so I simply provide a list of components which I found useful when describing the English Vernacular minuscule script:

Ascender, descender, minim, suspension stroke, and baseline are standard terms and will not be explained here. Almost all the remainder are described in the Glossary from Stokes, English Vernacular Script, cited above. Exceptions are a-component and e-component which refer to the first and second part of æ, respectively.

Descriptions and Features

Given this separation of letters into components, it then remains to list the different ways in which any given allograph, idiograph or graph, and the components of the same, can be described. There are potentially thousands of possible components if one considers the full corpus of Western medieval handwriting, let alone other scripts and hands, so it is barely possible to begin listing them here. Examples can be found in the Glossary, though, such as 'angled', 'round', 'teardrop-shaped', 'flat-topped', and so on.These descriptions can then be combined with components or allographs to create scribal features, such as 'ascender - split', 'south-west quadrant - angular', and so on.

Again, this is just a starting-point, and barely one at that. To continue the discussion, though, I want to return to our formal model and consider how we can represent not script in general but particular scribal hands and the features they contain. This, then, is the subject of Part III.

Banner image incorporates Cambridge, Corpus Christi College, MS 389, 1v by permission of the Master and Fellows, Corpus Christi College Cambridge. All images of manuscripts and charters on this website are copyright of the respective repositories and are reproduced with permission.