dedicated to DATA: digitally assisted text analysis

...the broad circumference
Hung on his shoulders like the Moon, whose Orb
Through Optic Glass the Tuscan Artist views
At Ev’ning from the top of Fesole,
Or in Valdarno, to descry new Lands,
Rivers or Mountains in her spotty Globe.
(Paradise Lost, 1. 286-91)

Whither TEI? The Next Thirty Years

In the next fifty years the entirety of our inherited archive of cultural works will have to be re-edited within a network of digital storage, access, and dissemination (Jerome McGann, 2001)

You have to put the corn where the hogs can get at it (Bill Clinton)

Only the paranoid survive (Andrew Grove)

Introduction

At the Digital Humanities 2016 meeting in Cracow it was announced that the triennial Antonio Zampolli Prize will be awarded in 2017 to the TEI Community on the occasion of the 30th anniversary of the Text Encoding Initiative, which began its life at a conference at Vassar College in 1987. This is a timely and well-deserved birthday present to the scholars, librarians, and IT professionals who have made the TEI an important tool for the digital remediation of print and manuscript materials on a global scale. The 1994 publication of the TEI Guidelines, edited by Lou Burnard and Michael Sperberg-McQueen is surely among the more consequential contributions to humanities scholarship of that decade.

Laurels are a good thing to rest on, but not for long. In the following pages I take a critical look at the TEI, focusing on shortcomings and on what should be done if the TEI is to do well in the next thirty years. I served as program chair for this year’s annual meeting of the TEI in Vienna, but it became clear in the summer that I would not be able to make it across the Atlantic. So I decided to put down some of my thoughts on paper. They are very much my thoughts, with all the advantages and disadvantages of personal observations unconstrained by the compromises of committee work. Here is a four-point summary of those thoughts:

Beware of complacency.

Worry about the lack of wide recognition and buy-in from the academic disciplines that are the main consumers of TEI-encoded documents.

Pay much more attention to the needs of average and novice users.

Broaden the revenue base and reform an ineffective governance structure.

I served briefly as chair of the TEI Consortium in 2011 and have served on its Board for the last five years. Nothing of what I write below will be news to the members of the TEI Technical Council and Board, where I have played a role somewhere between Cassandra and a Socratic gadfly. Much of it is similar to an analysis of the TEI that I offered in a public letter of August 2011 to the TEI Board and Council. I have reused some of its language, not always with attribution.

The TEI Consortium is an international body, but in practice 95% or more of its business takes place in Europe or North America. I know more about the North American than the European scene, and this report shows it. It seems plausible to me that in the years to come the balance of activities may shift towards Europe. TEI is mostly a technology for scholarly editing. In the prestige economy of the North American academy the status of scholarly editing has declined. This does not seem to be the case in Europe—or so it looks from this side of the Atlantic. But such things change, and it is difficult for me to envisage a future in which the TEI thrives without thriving in North America as well.

Because this is a long document, the following section offers a summary of each of its seven distinct sections.

A summary of findings and recommendations

Success and failure: TEI and the remediation of cultural heritage texts

The TEI schema can and has been used for a variety of purposes, but it is mostly a tool for the remediation of cultural heritage texts, and its long-term success depends on its widespread acceptance as a critical tool in that enterprise. It has exceeded expectations from one perspective and fallen short of it from another. It is the lingua franca of digital scholarly editing on a global basis, well recognized by editors and major funding agencies. But there is surprisingly little awareness of it among the literary scholars, historians, etc. and their students who are the most common end users of TEI-encoded texts. This may be truer of North American than of European users, but the fact that a high percentage of end users do not know or care much about the TEI is not just “a” problem for the TEI. I think that it will be the most important problem for the leadership of the TEI to confront in the years to come.

Libraries as financial supporters of the TEI

In the North American context, humanities departments have no budgets for stuff, tools, and services. That is the responsibility of the library, the “laboratory of the humanities” in the words of a former Northwestern provost.

More than a million dollars went into the creation of the TEI schema during its development years. Much of the TEI Consortium’s revenue in the early years followed something like a “20 x 5000” formula where four host institutions and a group of libraries provided the lion’s share of support in the form of $5,000 annual membership fees. Libraries for a while thought of the TEI as a tool for mass-digitization, but their interested faded because of technological and financial developments conveniently summarized by “Google Books”, “Hathi Trust”, and “2008”.

Libraries have increasingly become the “go to” place for Digital Humanities. They are very sensitive to what their patrons want. If they hear from faculty and their chairs in history, literature, linguistics, and other disciplines that the TEI matters to their scholarly and pedagogical work, they will invest in it. If they don’t, they won’t. They are not hearing this from them now.

Much text-based work for which TEI is a better solution ends up being done as HTML-based web sites, for which there is typically enough locally available technical support. Does the TEI want to cede a large chunk of editorial work to HTML? The Germans have the useful term Orchideenfach or orchid subject. I interpret the current trend as moving towards the TEI as an Orchideenfach. Is that where we want to go?

From 20 x 5000 to 200 x 500

The TEI should seek to broaden its base and persuade its ultimate core audience–the not particularly digitally savvy faculty and students in humanities departments– that any cultural heritage text worth encoding in the first place should be encoded in some form of TEI, which may be quite simple and will typically not involve the rigor or complexity of a critical edition.

Academic libraries remain the most promising site for maintaining the TEI competency of a particular institution, but they need to be persuaded that there is a market for such a competency and that it is technically and financially feasible. The TEI should develop a “200 x 500” campaign that would seek to triple the current institutional membership. A credible commitment to lowering entry barriers for TEI-based work should overcome the free-rider problem and persuade enough institutions to make modest contributions.

Heretical asides about money

The four American libraries that are still “sustaining partners” of the TEI would do the consortium a long-term favour if they reduced their contributions to $500.00 or at most $1,000.00 a year. This would not cause an immediate financial crisis but would be a strong incentive to work towards a more broadly based revenue model.

Individual subscriptions make up only 5% of revenue of TEI revenue. It is unlikely that they ever will add up to more than 10%. I would argue for going out of the business of individual subscriptions altogether and focusing the Consortium’s limited time and energy exclusively on what it can do for institutions and what in return it can get from them in the form of modest but broadly based contributions.

Back to the future: a new version of TEI Lite

TEILite, an 80/20 TEI schema published in 1996, was very successful. It has had half-hearted support from the Consortium in recent years, but the time has come to:

Revise TEILite in the context of the two decades that have passed since it was first released in 1996.

Add the Processing Model that was developed by the late Sebastian Rahtz.

Make the maintenance of this version a continuing first-order item of business for the TEI.

Document this version in a manner that is independent of the Guidelines.

Offer concrete help with the integration of encoded texts into existing Web environments.

Claim that version as a central property of the TEI and build around it a strategy for extending its user base and staying in touch with the novice or average users without whose awareness and support the TEI is unlikely to thrive, whether from a scholarly or financial perspective.

This entry-level version should be the text-encoding equivalent of a family practice in medicine or a Subaru Outback in transportation.

Outreach, outreach, and outreach

With the rapid and quite spectacular rise of the DH conference in recent years, the tradition of annual TEI meetings should be re-examined from the perspective of “outreach, outreach, and outreach” as the most important challenge. There is much to be said in favour of joining the DH conference and seeking a greater presence at other meetings. In the North American context, the TEI should seek “allied organization” status at the MLA, which would guarantee it at least one session at every meeting. Similar arrangements should be explored with the professional societies of historians, classicists, and linguists. There are opportunities for working with the Society for Textual Scholarship . Workshops at regional or national Library meetings are something worth looking into. A survey of the European scene might come up with comparable solutions.

Governance issues

There are serious governance issues that need addressing. The TEI Council works well within its current understanding of its role. But I question whether it still makes to sense to restrict the work of the Council to the maintenance of the TEI schema and leave the rest to the “community.” The Board has not worked well. It has not been good at raising money, and it does not consist of people who either have money to give or have access to people with money. It has not developed effective and asynchronous ways of getting business done in a timely manner. It has largely failed to think strategically about long-term goals. The Board has been well aware of its problems for some time. Solutions are less obvious. The biggest governance failure, however, has been the lack of any significant engagement between Board and Council. They are like ships passing in the night. I still see considerable virtue in my 2011 proposal for a unicameral board of directors, composed of technical and nontechnical people. In such an arrangement the relationship between the technology and its nontechnical end users would be written into the sovereign body of the organization as an explicit and continuing challenge.

Success and failure: TEI and the remediation of cultural heritage texts

I learned from Laurent Romary that the two biggest users of TEI are the European Patent Office and the French ISTEX project, which respectively have 200 million and seven million documents encoded in TEI. In principle, the TEI schema could be used in the publication of secondary materials in many disciplines. The TEI journal uses a reduced TEI schema, but I am not aware of other environments in which TEI is used widey to create new documents. It is mainly a tool for encoding texts that originated in a print or manuscript world, and more specifically texts of interest to scholars in literature departments and cognate humanities disciplines, including linguistics. This emerges very clearly from the most casual look at TEI documentation. The excellent introduction to TEI Lite picks a passage from Jane Eyre as its first example. The majority of examples in the Guidelines are literary, mainly English, with sprinklings of French and Latin.

The TEI is thus a technology in the service of the monumental task that Jerome McGann defined when he wrote in in 2001 that “in the next fifty years the entirety of our inherited archive of cultural works will have to be reedited within a network of digital storage, access, and dissemination.” Its success or failure is measured by the degree of its wide acceptance as the tool of choice for the digital “remediation” of that inherited archive.

In my 2011 letter I wrote about both success and failure:

From one perspective, the TEI has exceeded expectations. Virtually all digital editions of primary texts with any claim to scholarly standards use it. TEI is the lingual franca of digital scholarly editing on a global basis. You find it in editions of Buddhist sutras, New Zealand and Pacific island texts, Greek inscriptions, French manuscripts of the Roman de la Rose, the Hengwrt manuscript of the Canterbury Tales, slave narratives of the American South, or the historical records of the State Department. TEI has been used in all the large-scale library-based digitization projects of primary texts at Indiana, Michigan, North Carolina, Virginia, and the Library of Congress. The same is true of European encoding projects.

That is the success story. But now consider a thought experiment where you ask the chairs of history, literature, linguistics, philosophy, and religion departments of the world’s 100 top universities to write a sentence or short paragraph about the TEI. These would be very short sentences or paragraphs. The one message you would not get from them is the recognition that the TEI offers an important enabling technology for work in their disciplines.

I wish I could say that today’s sentences or paragraphs would be a little longer or firmer. But I doubt it. The formal Agreement that led to the establishment of the TEI Consortium envisaged the TEI as “an international community-based standard for scholarly text encoding,” and it argued that such a standard “will ultimately live or die on political and social rather than technical or financial grounds.” It is important to envisage this community not only as the group of people who are engaged in text encoding, but also as the much larger group of scholars who use these encoded texts and have come to appreciate the value that is added by the encoding for presentation or analysis. Department chairs or program directors (I was both for 15 years) on average do a pretty good job of articulating the priorities of their faculty. For most of those chairs TEI is not on their radar screen at all.

I can hear the objection “Who cares about English departments?” or less polite versions of it. But if there is any truth to the view that “political and social ” grounds will determine the ultimate fate of the TEI standard, the audience whose favour must be won are the literary scholars, historians, etc. and their students who are the most common end users of TEI-encoded texts. The fact that most of these users do not know or care much about the TEI, is not just “a” problem for the TEI. I think it that it will be the most important problem for the leadership of the TEI to confront in the years to come.

Libraries as financial supporters of the TEI

What I say next applies mostly to the North American context. I am not sure to what extent it applies to Europe. A Chemistry department may have multi-million dollar budgets for stuff, tools, and services, for which the department chair bears direct responsibility. The situation in the humanities is quite different. There are no departmental budgets for stuff, tools, and services. A former provost at Northwestern told me that “the library is the laboratory of the humanities.” In the days before computers, the Library was the provider of stuff (books) and a lot of “free” and very generous services in the form of reference and subject librarians. There are not yet any firmly established patterns in the humanities about who provides and pays for expensive digital tools and services. Libraries and IT departments wrangle about this and typically agree that the other should pay for it. The modal chair of a North American humanities department knows very little about these matters and is unlikely to consider worrying about it as part of his or her job description.

Before the establishment of the Consortium, the TEI was funded by grants and had annual budgets of $100K-250K. Well over a million dollars went into its making. The funding model for the Consortium established in 2001 was a pay-to-play club or benevolent oligarchy of four host institutions and of members (mainly American libraries), who paid $5,000 a year and had the privilege of electing the members of the TEI Board and Council. Some places were reserved for representing the host institutions, but the large majority of Board and Council members could be elected from anywhere. It was part of the original TEI constitution that its services would always be free. It was hoped that over time the TEI would accumulate an endowment sufficient to generate its operating costs, which were estimated at $100K a year.

The plans for an endowment never materialized, but in its early years the TEI ran on a low six-figure budget, the lion’s share of which came from between 12 and 15 big donors. Libraries thought of relatively coarse TEI encoding as a good tool for library-driven mass-digitization projects. The Text Creation Partnership has been the most important of these projects. But the interest of libraries in the TEI waned quickly. “Google Books”, “Hathi Trust” and “2008” summarize the reason for that decline. Between 2013 and 2015 the annual income of the TEI hovered around $75,000. In 2015 there were only five $5K donors.

The original Agreement thought of libraries rather than academic departments as the likeliest source of support. It remains highly unlikely that North American humanities departments will have the budgetary resources to meet the digital needs of their faculty, TEI or otherwise. In many universities—including my own and colleges like Amherst, Swarthmore, or Smith, the Library has become the “go to” place for “DH.” A recent piece about Jacob Heil as a Mellon Digital Scholar at the Five Colleges of Ohio is a good example of this trend. If the library continues its role as the “laboratory of the humanities” it makes a lot of sense if it provides one-stop shopping for stuff, tools, and services. It makes little sense for faculty to divide their work into “print humanities” and “digital humanities”, going to the Library for one and IT for the other (if they can find it).

Libraries are service organizations and as such very sensitive to what their patrons want. If they hear from faculty and their chairs in history, literature, linguistics, and other disciplines that the TEI matters to their scholarly and pedagogical work, they will invest in it. If they don’t, they won’t. They are not hearing this from them now. To quote a colleague from a distinguished university:

There’s very little faculty interest—as far as I can tell, the English faculty are actively hostile to the notion of text encoding, for example. The Library is infinitely more likely to do a WordPress site for you if you want to put something online. It’s rather depressing.

Returning to the ways in which the TEI may be said to have failed: at the age of 30 it has not succeeded in gaining wide acceptance as an indispensable technology among its end users. If there is a fault it may lie more with the end users than with the TEI. But the fact remains, and the question is how to deal with it.

You could take Milton’s “fit audience though few” approach. Accept the fact that the TEI is for the few who do critical editions and cede the ground of common garden variety editorial work to HTML, Omeka, WordPress, or various LAMP technologies, for which there is usually enough local competence. Something like that seems to be happening anyhow. It is accurately described in an email from Kevin Hawkins, a librarian who has served on the TEI Council and is the current TEI webmaster:

While the scholar may think they want to use some particular technology, a skilled librarian will try to help to understand what they actually need and steer them toward technology appropriate for that. Except for those creating scholarly editions, I don’t often hear researchers describe something that requires going beyond fulltext searching in some way that would require TEI encoding.

A few scholars are involved in creating scholarly editions of manuscripts (ancient or modern), and these are a prime audience for use of TEI. My impression is that most of this work is either at an institution (library or not) that is already a TEI member, or it’s done through a publisher with its own workflow that may or may not involve XML.

Choosing that option has benefits and risks. It would be harder to raise money, but the TEI could live on less, and if it could make do with less, “fit audience though few” has pleasures of its own. But there are real risks. In both intellectual and financial terms, an institution is better off if it has a pyramidal shape where few but complex operations or products rest on a broad base of many but simpler operations or products. If you think of HTML as “English”, it is OK for TEI to be seen as “Latin.” But it may be in trouble if it is seen as something between ancient Greek and West Tocharian. The Germans have the useful term Orchideenfach or “orchid subject”. I interpret the current trend as moving towards the TEI as an Orchideenfach. Is that where we want to go?

From 20 x 5000 to 200 x 500

I do not think the TEI should go that way. Instead it should seek to broaden its base and persuade its ultimate core audience–the not particularly digitally savvy faculty and students in humanities departments– that any cultural heritage text worth encoding in the first place should be encoded in some form of TEI, which may be quite simple and will typically not involve the rigor or complexity of a critical edition.

A key figure in this effort is the digital humanities librarian in some college or university. At the moment the modal DH librarian does not know or care much about TEI. Neither does his or her colleague in the IT department. “Too complicated and I’m too busy with other stuff right now.” It does not matter whether the perception of “too complicated” is deserved or not. In some ways it is, in others it isn’t. But perceptions are social facts and often harder than rocks.

We want to be in a situation where DH librarians in college or university libraries think of TEI as an important and manageable part of their tool kit. So when a faculty member of student comes with any text-based project that has a time frame beyond next month, doing it in TEI should be the default option. Some competency in TEI should be an essential component of the text-processing toolkit on a par with competency in audio, video, geospatial mapping, visualization, and other things that are perceived as essential ingredients of a basic digital shop.

It would take a lot of work to get there, and the effort may fail. Libraries will not respond to direct lobbying by the TEI. They will over time respond if they hear from enough faculty and students. So the work of persuasion has to be directed at the academic disciplines. Such persuasion is very much the work of the TEI Community and will occur as a conversation here and a conversation there. But there are also ways in which the Consortium can help. I remember a conversation with a business man who said that “a manager’s greatest tool is his attention.” If there is something to my analysis of the TEI’s most critical problem, the TEI leadership should making paying attention to it a first order of business. If you keep paying attention different solutions tend to emerge.

Paying sustained attention to this task of persuasion for several years is an intellectual or scholarly goal, but it maps nicely to a funding model that replaces reliance on a few donors with a broader base of modest donors. If we have a plan and message that persuades 200 institutions to support the TEI at $500 because they see it as valuable tech support for a basic scholarly function, that would count as broad-based “buy in” in many senses of the word.

Heretical asides about money

Compared with other small humanities organizations, the TEI is relatively well-off. It does not have an endowment, but over the past few years it has ended each budget year with a balance of about $200,000. It has been quite prudent in managing its resources. That said, the long-term revenue trend has been downward. Revenue from subscriptions is considerably less today than it was in the early years of the consortium. The shrinking number of “sustaining partners” with $5,000.00 contributions is the chief reason. In 2015 there were just five, including the French CNRS and four American libraries. The number of subscribing institutions at any level has fluctuated. I do not see a clear and long-term upward trend.

I can see why a national organization like the CNRS would contribute $5,000.00. I would find it very difficult to tell the chief librarian of a public university under considerable budget pressure why s/he should contribute $5,000.00 to the TEI and what the library would get in return. Some time ago I argued on the Board that we should unilaterally reduce our maximum subscription level because we were likely to lose most of the sustaining partners anyhow and foregoing the revenue now would force us to do something about developing a broader revenue base. That argument didn’t get anywhere. I think, however, that the four library sustaining partners would do the TEI a favour in the longer term if they now reduced their contributions to $500 or at most $1,000.00. Given the TEI’s comfortable cash cushion, this would not create an immediate financial crisis. It would, however, put the question of a sustainable budget high on the agenda.

In an ideal world, a scholarly community would support itself through the financial contributions of its members. In the real world this does not happen. Even the mighty MLA gets less than half of its income from dues. The majority of it comes from the sale of the MLA bibliography to libraries. Until recently the TEI rarely had more than two dozen individual members in a given year. Such memberships have increased recently: I count 223 individual membership payments in the three budget years 2013-2015: 45 in 2013, 66 in 2014, and 112 in 2015. They add up to 4.8% of the estimated membership revenue for 2013-2015. But you would need a fivefold increase or more to make up for the likely loss of sustaining partners. The more closely you look at the figures, the less promising they are. There are just 29 individuals, who were members in each of the three years, and I doubt whether at any one time there are more than 100 individuals who see themselves as longterm and paying members of the Consortium, however much the others feel themselves as eager and contributing members of the TEI community.

Individual membership fees are a trivial source of revenue. From a financial perspective one may even ask whether they are worth the administrative overhead of collecting and keeping track of them. You would get the same financial effect by raising the conference fee. If the TEI were successful in attracting between 150 and 200 institutional members and went out of the business of individual memberships altogether, the assembly of institutional electors would probably be a good enough proxy of the general will of the larger TEI community. The Consortium has a very limited amount of time and energy that can be spent on household matters and strategic planning. It would make sense to focus that time and energy on the institutions (and individuals in them) that are the critical source of support.

Back to the future: a new version of TEI Lite

What would it take to persuade the scholar with some text-based digital project, the busy DH librarian, and the just as busy web person in the IT department that doing some project in TEI is not only best in theory but that you can do it in practice, within the current budget, and more or less on time?

The short answer goes like this:

Revise TEI Lite in the context of the two decades that have passed since it was first released in 1996.

Add the Processing Model that was developed by the late Sebastian Rahtz.

Make the maintenance of this version a continuing first-order item of business for the TEI.

Document this version in a manner that is independent of the full Guidelines.

Offer concrete help with the integration of encoded texts into existing Web environments.

Claim that version as a central property of the TEI and build around it a strategy for extending its user base and staying in touch with the novice or average users without whose awareness and support the TEI is unlikely to thrive, whether from a scholarly or financial perspective.

The longer answer involves some history as well as touching the sacred cow of “customization.” It is a great virtue of the TEI schema that you can customize and extend it in various ways. But your weakness is typically the flip side of your strength. The TEI is widely perceived as being too complicated, and the TEI has done little to counter that perception. The rhetoric of customization has been unhelpful for those users who want to have some thing that they can “just use.” In most cases customization is for “later”. “Standardize where you, customize where you must” would be a good mantra.

With any technology users divide into two groups, the ones who just want to use it, and the ones who get interested in how it does what it does, take it apart, and tinker with it. The second group is the group that pushes the technology forward. But there are always many more people in the first than in the second group. Taking care of the former should be a first order of business for any technology that wants to survive (Machiavelli’s first rule for keeping the prince in office) .

In 1996 Lou Burnard and Michael SperbergMcQueen published TEILite as a schema that would meet “90% of the needs of 90% of the TEI user community.” In other words, an 80/20 or Pareto Principle version of the entire schema with its ~550 elements. This was a very successful publication. A lot of people have used TEI Lite out of the box without seeing any need for modification.

In the fall of 1996 Michael Sperberg McQueen, after consultation with Mark Olsen, John Wilkin, and Perry Willet, prepared a Set of Rules for Use of TEI Lite in CIC EText Projects. This document looks to me like one of the sources for the “Best Practices for TEI in Libraries” (BPTL) that has gone through several editions since 1999. “Level 4” of the BPTL has been the most important product of the BTPL. A progressively modified version of Level 4 was used for the EEBO TCP projects. A decade later the experience of that project led to some nontrivial modifications of the P5 schema.

The “base format” developed by the German Text Archive and subsequently adopted by CLARIN stays more or less within the confines of TEI Lite. So does the schema of TEI Simple, whose main goal was not to add yet another entry level schema, but to develop a Processing Model that would let a web programmer process TEI encoded texts without knowing much about them.

I go through this tedious detail to make the point that the original TEI Lite was a very consequential “TEI for many purposes” and directly shaped quite a few schemas that differ very little from each other or from their source. This strikes me as a source of confusion unlikely to assuage the doubts of the scholar and Digital Humanities librarian who are thinking about whether to use TEI. So I come back to the idea that the “one thing needful” for the TEI in 2016 is to focus on an entry-level version that the Consortium can claim as a central property, to which it pays loving and continuing attention, and around which it builds a strategy for future growth.

I would not call this version either “Lite” or “Simple”, both of which names underestimate the capaciousness of TEI Lite and its various derivatives. “Base”, “Core”, or “Default” might be more appropriate names. the name, however, matters less than the commitment to get behind it as a major commitment rather than think of it as a piece of legacy environment, which has been the prevailing attitude towards the current TEI Lite.

While not especially “lite” or “simple”, an updated entryl-evel version should support plain vanilla encodings of moderately complex print and manuscript materials. It should work both for editions of individual texts and for corpus-based projects, where relatively coarse encoding across a large body of texts may be accompanied by lightweight linguistic annotation. Such projects are by definition not “simple”. It should also meet the needs of documenting or writing about such projects.

Perhaps an entry-level version should even include some provisions for a basic encoding of textual variants. Alternately one can think of bundles of elements that can be attached to the schema in one step. As I understand it, such standardized “customizations” have been part of recent TEI development work. Whatever the final details of a new entry level schema, the average or novice user should find it and support for it as a first-order item on the TEI site, and s/he should get something like this message from it :

You don’t need to worry about ODD, and you don’t need to worry about XSLT. You do need to understand the rules that govern elements and attributes, and a tool like oXygen will be useful in helping you understand and apply them. But unless your needs are very special, you are unlikely to encounter situations that can’t be met by our “core” or “default” entry level schema. If and when you bump up against the limits of the default schema, there are extensions that are themselves standardized.

A useful book in this context is Richard Thaler’s and Cass Sunstein’s Nudge (2008), which makes a powerful case for the argument that people often choose wrongly when they are confronted with too many options and that they do better with default options as long as they are given a chance to opt out. Pension plans are a good example.

An entry level version that follows the principles of Nudge would be a “customization” that meets the needs of the many users for whom the task of creating a customization is daunting or seems irrelevant. This in no way constrains the expressive liberty of encoders who do not think that it is either possible or desirable to follow such path. It does, however, promise to make life easier for those who think there is some virtue in travelling that path as far as it will take you, which for quite a few projects will be far enough. Some users will never feel the need to move beyond it, others will outgrow it, and when they do they will have learned enough to do so.

Continuing attention to good documentation will be the key to a good entry-level version. The Best Practices in Libraries group has paid continuing attention to documentation. I have joined this group, more as a lurker than an active contributor. It is a good and serious group. What strikes me, though, is that most of the things we talk about are not specific to libraries. They are general TEI matters, worrying about how to deal with hyphens at the end of a line or whether to put <pb/> tags between <l> or <p> tags or inside the second. So I wonder whether the Best Practices group would be more effective if it assumed responsibility for documenting an entry level version. The role of “TEI in Libraries” has changed a great deal since the late nineties, and such a move would be in keeping with those changes.

Outreach, outreach, and outreach

The current TEI annual conference and membership meeting follows a pattern established in 2001. Over the past five years, the DH conference has grown by leaps and bounds. It is still a small meeting compared with the MLA, but it is to Digital Humanities what the MLA is to Modern Languages. I have organized one TEI meeting and served as program chair for another. I have doubts about the viability of the current and very leisurely format. It is very expensive in terms of time for members who have other commitments. It certainly provides a familiar and comfortable space for members to meet, and there is a pleasure in talking with old friends in Lyons and reminiscing about the meeting in Pisa, London, or Victoria. But does it do much to reach out to new audiences? Is the pool of interesting papers large enough to support a three-day conference?

I could imagine a scenario in which the annual members’ meeting and a smaller program become part of the DH meeting, somehat in the same way in which the Milton Society or the Slavic Association meet at the MLA. I recognize the objection that submerging the TEI meeting in the big and impersonal DH meeting is a great loss. But I question its force. Nobody goes to the “MLA” or any other large meeting as a whole. People who go to the MLA go to particular sessions and meet with their network of friends and colleagues.

There is a lot to be said for actively seeking a TEI presence in other venues. I recently corresponded with Kathleen Fitzpatrick, the MLA’s Director of Scholarly Communications. She told me that the MLA “would welcome special session proposals on the TEI, particularly in the context of digital scholarly editing, an area into which our Committee on Scholarly Editions has recently put a great deal of work.” She also told me that the TEI could seek “allied organization” status, which carries the privilege of one guaranteed convention session each year. A TEI workshop at the MLA might attract a new audience. Similar arguments can be made for a presence at the American Historical Association or the Society for Classical Studies (formerly the APA). Workshops and paper sessions could also be welcome additions to regional meetings of the MLA and similar associations.

From a disciplinary perspective, the TEI could and should cultivate relations with the Society for Textual Scholarship, “an international organization of scholars working in textual studies, editing and editorial theory, electronic textualities, and issues of textual culture across a wide variety of disciplines.” They are not nearly as well off as the TEI, but it’s an interesting group as I discovered when I gave a paper there a couple of years ago.

The Chicago Colloquium on Digital Humanities and Computer Science has rotated among Chicago universities for a decade. Chicago is within easy driving distance of an extraordinary number of excellent universities and colleges, and it is a great place to visit. It is easy for me to imagine a continued and (in)formal TEI presence at that meeting.

I am not as familiar with European meetings, but I suspect that it offers many similar opportunities. All this is in the spirit of “putting the corn where the hogs can get at it.”

Governance isssues

During the first decade of its life as a Consortium editorial work on the Guidelines and some administrative was recognized as the responsibility of individuals who were paid directly or had TEI responsibilities written into their job descriptions. Between a quarter and a third of the TEI budget was committed to such arrangements. This worked very well in some circumstances (I remember Sarah Wells as a superb executive secretary), and I am told that it did not work so well in other cases. Since 2012, all work in the TEI has been done on a volunteer base.

The results have not been great, especially in the case of the Board, which has come to the reluctant conclusion that it is not very good at getting things done. Volunteer organizations have well-known problems: people want to be involved, but they are also busy, and the organization’s business never quite makes it high enough on this week’s priority list. If the organization is international, the problems are compounded. Different time zones and different administrative customs make everything a little harder. None of this is insuperable, but it is enough to keep things from getting done.

Collectively the Board probably has not spent enough time on its job. It certainly has spent it in the wrong way. The Board meets once a year in person and on a monthly basis via Google hangout. For both of these genres, the success of a given meeting is pretty much a linear function of written preparation, and the written preparation has usually been poor. In so geographically scattered and international body as the TEI it would be much better to make written and asynchronous communication the “meat and potatoes” of business. People who stand for election to the Board should recognize that it requires continuing rather than sporadic attention.

I don’t think there is much disagreement on the Board about this analysis, but there is less agreement on solutions. The TEI is not big or rich enough to hire a full-time executive secretary/treasurer or director. There are different options, each with known advantages and disadvantages, and each with some support on the Board. One is to stick to the volunteer model, remind candidates that standing for election means committing real time to the task, and hope that the next round of Board members will live up to their promises a little better. A second option is to return to some version of the original host model, where an institution takes the TEI under its wing, receives some money from the TEI but also provides some in-kind support. There are many examples of this. Thus the executive director of the Shakespeare Association of America is funded by Georgetown University through a substantial reduction in her teaching load. In budgetary and administrative terms, the TEI is not unlike a midsize scholarly journal, and quite a few journals are supported in this way. The “home” aspect of such arrangement is a real advantage, unless things go wrong. The third option is farming out back office tasks to an online agency specializing in managing not-for-profit organizations. This is bureaucratically the cleanest option, and it is likely to work very well for procedures that can be routinized. It is another question whether the responsible person has any “local knowledge” or cares about it. The matters that clearly can be routinized involve collecting revenue of ~$75,000 from ~125 institutions or individuals and preparing tax returns. But various online association management groups seem to want something like $20K for those services, which seems like a lot of money.

It is not an easy choice. According to Sarah Wells’ rough estimate, TEI office work adds up to a day a week or 20% of an FTE, somewhere between $20k and $25k including fringe benefits. “Office work” here is defined more generously as anything that counts as “minding the store”. Dividing those tasks among members of an international committee does not work very well. Communication and coordination may end up taking more time than it would take for one person to just do it. Putting the burden on the chair is not a good option either. Individuals who make good chairs typically have their hands full already.

The Council does get its work done. Perhaps they are superior people, but it helps that there are well and internationally established routines for dividing technical work and communicating asynchronously.

Both the Board and the Council can be faulted for not talking to each other, articulating long-term intellectual goals, and relating them to a strategy for finding appropriate resources. I have served on the Board for five years, and I cannot remember a single occasion where we had a strategic discussion with appropriate preparation or follow-up. The chair of the Council is an ex officio member of the Board, but communication has been limited to money and other mundane matters. By and large the Board and Council have behaved as if they had little to do with each other and should not meddle in each other’s affairs. This is odd but has a long history and is compatible with the original statement about the mission of the TEI “Directorate” (the name for the Board in the original Agreement:

The TEI Directorate will administer the intellectual property in TEI (after a transition period described below) on behalf of the Consortium, will administer the funds collected from TEI Hosts and Members, will select new Hosts, and will provide the TEI seal of approval for TEI Developer sites and for services offered by TEI Hosts or Members. All other prerogatives with respect to the management of TEI and its editorial direction will be delegated by the TEI Directorate to the TEI Council.

I recently thumbed through the minutes of the TEI Council for the last four years. I certainly got the sense of a technically competent and responsible group. But I was particularly interested in discussions about outreach and education. It came up once in the minutes of a meeting some years ago. There was an inconclusive discussion about whether this was the Council’s or the Board’s job, and there was no follow-up. The Council believes that it is their job to develop the schema and everything else is the task of the “community”. I understand this principle, but I don’t think it works very well.

I conclude from this that there is a fundamental flaw in the governance structure of the TEI. There is no mechanism that encourages or enforces a continuing dialogue about the relationship of the technology to its end users. To put it crudely, there is a sandbox for the techies, and there is a Board that is supposed to pay the bills but otherwise keep quiet. This is not a good environment for strategic planning. If the TEI looks today like a somewhat rudderless organization, its peculiar governance structure may have something to do with it.

In 2011 I made a recommendation for a unicameral board of directors, composed of technical and non-technical people so that the relationship between the technology and its non-technical end users would be written into the sovereign body of the organization as an explicit and continuing challenge. There were two responses at the time. The Council said that they found the Board’s work boring and didn’t want to do it. This I understand because the Board also finds it boring and doesn’t want to do it. A Board member said that the Board and Council called for different skills. This is true in one way but not true in another. I looked through the membership of the Council and the Board, and discovered two things. First, quite a few people have served on both the Board and the Council. Secondly, some people on the Council have had day jobs with budgetary responsibilities that exceed the budget of the TEI by orders of magnitude, and many them have had considerable administrative experience.

Why then does the TEI have a governance structure that systematically excludes from sustained discussion the most important questions worth asking? My experience on the Board has confirmed my sense that the TEI would do better with a Board of Directors that brings technical staff and end users together in a single body responsible for thinking about the TEI schema and its uses. Such an approach would require a rethinking of the ways in which technical work is organized and recognized. There are different ways of going about this. But something needs to be done. As it now stands, the TEI leadership (the Board and Council) can be described as honest and prudent. But they are not thinking ahead in a systematic fashion, and I would not describe the Consortium as well-governed.

2 comments

In response to your comment about a) libraries not being as engaged with the TEI as before and b) TEI uptake dependent on technical support, in libraries especially, we have other techniques and tools for preservation of content that is not restricted to XML. We have repository frameworks, and for DH we are often leveraging solutions like GitHub and other versioning systems that are more manageable for scholars (not working exclusively on digital editions).

This is why I find the work TAPAS is doing as a real benefit for the TEI community from the repository (data curation, preservation and storage) and publishing (tools) perspectives.

In response to needing an entry-level version of the TEI, in many ways this what the Best Practices for TEI in Libraries (http://purl.org/TEI/teiinlibraries) strove to do in Version 3. I know the Libraries SIG is actively updating the current Version of the best practices, which I imagine will address TEI Simple more directly, but I never really understood the purpose of Simple (aside from the processing model) when the Best Practices attempts to address various levels of encoding, from Simplest to Scholarly, with ODDs (therefore schemas) accompanying levels 1-4.