Tools and techniques for empirical stemmatology

A HOWTO for using Stemmaweb

I have been asked for a guide to using the tools on Stemmaweb – there is documentation under ‘About/Help’, of course, but it would be useful to give an overview for where to start and how you go on from there. So this guide is meant to be an introduction, of sorts.

The first step is to create yourself a user account. You can do this by clicking ‘Sign In/Register’ at the top, where you will have three options:

Use an account created especially for Stemmaweb. To get one of these you must first click the ‘Sign in with Stemmaweb’ bar, and then follow the ‘Register’ link. Once you are registered you can sign in back on the ‘Sign in with Stemmaweb’ tab.

Once you have done one of these three things, you will find that Stemmaweb looks just the same, but the ‘Sign In’ link will be replaced with a greeting to you (or to your email address anyway), and you will see a new button:

Now you can upload texts of your own to work with!

Stemmaweb operates on collated text. Someday, I hope, there will be an integrated collation tool that will do this first step for you, but as of today we are not there. So the first thing you need, if you want to work on Stemmaweb, is a collation of some text. You can provide this collation in, broadly speaking, one of three ways:

Do it yourself. Align the text in a spreadsheet, one witness per column, with the sigla in the first row. The spreadsheet can be saved as comma-separated format with the extension .csv, tab-separated format with the extension .txt, or as a Microsoft Excel spreadsheet (either XLS or XLSX).

Do it yourself, TEI style. Create a TEI parallel-segmentation file for your text and its witnesses. This is somewhat less recommended because there are a million different ways to apply the TEI guidelines and I have only had time and energy to support a few of them. PLEASE review these guidelines if you want to use this option.

Do it yourself, CTE style. If you have been preparing your witnesses in Classical Text Editor, you can export your work in TEI double-endpoint-attachment format, and Stemmaweb will make a best-effort attempt at reconstructing the witnesses from your apparatus. There are a lot of caveats about doing this; you can read more here. The upload may well fail due to some confusion encountered by the Stemmaweb parser; I am working on a tool to validate CTE input, but it is not yet generally available.

Get a collation tool to do it for you. CollateX is a good option for this; I recommend that you request CollateX results in its GraphML format, in order to preserve any detected reading transpositions. Do NOT use CollateX’s TEI-style parallel-segmentation output.

Now that you have a collation uploaded, if you click on that text in the list you will see some extra buttons. Let’s look at what each of them do.

This is where you should probably start. Clicking on this button will load your text into the ‘relationship mapper’; you will see the collation in the form of a graph, running in one direction from beginning to end. Each reading in the text is a node in the graph somewhere; each witness is essentially a single long string of these reading nodes, collecting its words from beginning to end. Wherever witnesses agree, they are ‘strung’ through the same node. Wherever they disagree, their ‘strings’ will diverge, and the variant readings will appear stacked roughly on top of each other in the graph.

The purpose of this tool is not only to look at the pretty picture made by your variant graph, but also to annotate the variants in relation to each other. Are the variant readings synonyms? Spelling variants? Link them as such. Has a word been shifted to a different location in one witness? Link the two occurrences of the word as a transposition. Do you think the variation in question is stemmatically significant? Make a note of it in the dialog when you create the link. More instructions for using the tool can be found here.

At the moment the available types of links are mostly limited to syntactical relationships between words. Someday, the users of Stemmaweb (that’s you, the scholar) will be able to define their own relationship categorization, but that day is not today. If none of the syntactical categories apply, you are very welcome to make liberal use of the ‘Other’ categorization and leave yourself a note in the ‘Annotation’ field.

If you click this button, you will find a fairly arcane (but hopefully well-explained) way of defining a stemma for your witnesses. This is meant to be used for the definition of any stemma at all, so long as it has a root (an archetype) and doesn’t have a cycle (e.g. A->B, B->C, C->A. Unless you are working on the New Testament, you probably won’t have this.)

Once you add a stemma, there will be a button labelled ‘Edit this stemma’; that brings up the same stemma definition box, but with the stemma in question pre-loaded there. The left- and right-arrow buttons will allow you to page through the stemmata you have defined. There is no limit to how many you can define!

Stemmaweb is connected to the Stemweb service provided from the Helsinki Institute of Information Technology, which can run one of a variety of phylogenetic algorithms on a collated text tradition. If you click this button, you will get a dialog box that asks you which algorithm you want to run. Clicking on ‘What is this?’ will bring up a description of the selected algorithm. Some of the algorithms take parameters; if you choose one of these, you will be asked to fill in the parameter.

If you have marked up the relationships between variants in the graph viewer / relationship mapper, then you will also be able to discount selected categories of relationship, if you wish – for example, it is fairly common to want to disregard spelling variation, and this is the option that lets you do it.

The algorithms offered by Stemweb all return unrooted trees; depending on the algorithm you select and the size or amount of variation of your tradition, you may have multiple trees returned. (At present there is no good way to delete a tree, or to reorder the trees that are returned.)

An unrooted tree is not, by this definition, a stemma until it has been oriented by selecting a root. In Stemmaweb you can orient/root (or re-root!) a tree by clicking on the witness node that you wish to treat as the archetype, and selecting the green checkbox to ‘Use this node to root the stemma’.

Now you have your text uploaded and marked, and you have your stemma hypothesis (or maybe several hypotheses) – you are ready to click the most exciting button!

The Stexaminer is a program and algorithm that was developed in concert with the DTAI group at KU Leuven; its job is to tell you, for each location in the text where variation occurs, whether that variation fits the stemma. In the case of stemmas without contamination / conflation this is pretty easy to calculate, but when you have traditions where contamination is known or suspected to have occurred, up to now it has been difficult to say for sure whether a given pattern of variation can be explained by the stemma. The Stexaminer can handle as complex a stemma as you care to throw at it.

Like the graph viewer / relationship mapper tool, the Stexaminer has its own help documentation for you to consult. The basic idea is that you can generate an overview of how well your stemma seems to match the textual evidence, and you can also drill down variant-by-variant to see which witnesses carry which reading. Where the pattern of variants does not match the stemma, the Stexaminer deduces where in the stemma the change might have been introduced, so that the number of ‘coincidences’ is kept to a minimum. It will also try to detect reading reversion – that is, when a scribe might have altered a reading in the exemplar to restore an ancestral reading. This is a highly experimental feature, and not one to rest a philological argument on without a lot of caution.

So there you have it! An overview of how Stemmaweb’s tools fit together and pointers to how you might use them. Go wild, and if something goes wrong, get in touch!