Elsewhere

Electronic lexicography: what a difference a quarter-century makes — or does it?

November 7, 2011
by Toma Tasovac.
Average Reading Time: about 6 minutes.

I have become very interested in the (pre)history of electronic lexicography — the dark ages before the Internet, Twitter and Justin Bieber shook us to the core and made us who we are today: polycentered, posthuman and still desperately trying to get our hands on Justin’s One Less Lonely Girl nail-polish collection for Nicole by OPI (based on a novel by Sapphire).

I just finished reading Lexicography: An Emerging International Profession (hereafter, LAEIP), edited by Robert Ilson and published by Manchester University Press in 1986. It contains a selection of papers from the 1984 colloquium organized by the Fulbright Commission to commemorate the bicentennial of Samuel Johnson’s death.

LAEIP is a book from a very different age. In 1986, Margaret Thatcher was still practicing the art of dropping curtseys for the Queen during their weekly tête-à-têtes; while Ronald Regan (from Irish riagan, “little king”?) had made the effortless transition from Hollywood to the White House, where his official schedule was color-coded for good and bad days based on the advice his wife Nancy solicited from her astrologer, Joan Quigley. Around that time, just as CD-based walkmans were replacing the ones based on cassette tapes, the coolest gadgets in the realm of electronic lexicography were “portable vade-mecums” such as Langenscheidt’s “ALPHA 8 electronic dictionary, an 8,000 word German-English bilingual lexicon in a form resembling a hand-held calculator” (LAEIP, p. 126). Mirabile dictu!

Twenty-five years ago, academics announcing the final reception at the end of a conference would still say things like “may the priests and priestesses of Minerva and those of Mercurius join in common libations to Bacchus” (p. 146). It was the time when dictionary makers proclaimed that lexicography was an “emerging international discipline” but published books which discussed only various Englishes, and, really, mainly, only the British and the American flavors. It was the time when books got off to a slow start (LAEIP had a Foreword, a Preface, an Introduction and Opening Remarks, each by a different author), and were not over till the fat lady sang (cf. LAEIP’s “Summation” by Ladislav Zgusta, Final Recommendations by Gabriele Stein and, as if that hadn’t been final enough, “Concluding Remarks” by Stein & John Herrington, who — for no apparent reason — wrote about himself in the third person).

It was the time when the question of the future of the lexicography could not be discussed without an obligatory paean to the OED supreme divinity, Murray (which is priceless, if for no other reason, then for reminding us of his quote from the “Evolution of English Lexicography” that it is trite, but, alas, sometimes also true to say “they do these things better in France.”) It was the time when the answer to the question about who articulated the last really new idea in lexicography could include as contenders Samuel Johnson (1709 – 1784; etymologies + attributed quotations), Franz Passow (1786-1833; and his idea that “every word should be made to tell its own story”), and Philip Gove (the editor of Webster’s controversial Third, for his inclusion of evidence from TV broadcasts). It was the time, when it was still possible to say something like:

No one here present, I suppose, regards the application of computers in lexicography as an innovation, though some may regard it with trepidation. (p.125)

So it was a different time all right. It is nonetheless striking that 25 years ago, serious lexicographers were still asking the question:

Should citation-gathering be limited to famous authors, or should other sources be used as well, such as currently popular personalities, periodicals and their advertisements, spoken language? (p. xiii).

Linguistic corpora were, of course, already there: the Brown Corpus and the Lancaster-Oslo/Bergen corpus had around one million words at the time. But it was not until 1987 that the first ever fully corpus-based dictionary — the Collins COBUILD — would be published. (I bought my first Collins COBUILD in a bookstore in Dublin, as a teenager, during a short visit in 1989 or 1990. The fact that I am writing these lines also in Dublin, almost a quarter of a century later fills me with Nancy-Reganesque sense of cosmic destiny).

It is all too easy to look back and feel superior with the benefit of hindsight, to pat oneself on the shoulder and think: oh, we’ve come such a long way since then. A lot has changed, but not everything. Consider what Richard Bailey had to say in his chapter on the “Dictionaries of the next century”:

A few moments ago I spoke of the need for an organised plan that will move us towards the better dictionaries of the next century. An essential component of that plan, I believe, is an effort to consolidate the reference information we have already accumulated in a great variety of places. (p. 133)

Back in 1973, at the conference on lexicography at the New York Academy of Sciences, Bailey proposed something he called an “Index to the English Vocabulary” — a guide to the entry words from different dictionaries. Ten years later, Laurence Urdang published his Idioms and Phrases Index (1983), which included over 140,000 phrases in 400,000 entries taken from some 40 English-language dictionaries. The need for indexing and cross-referencing lexicographic resources has, in my mind, only gained in importance over the past twenty years. Yet we don’t seem to have advanced in this respect as much as we could have or should have. Google’s algorithmic celebration of raw data is NOT a solution to all our information needs.

lexicographers should see themselves as offering interpretative indexes to data of varying kinds and degrees of selectivity… Only through a multiplicity of such kinds of indices can we begin to gain a perspective on the lexicon as a multi-dimensional array of information, and only through a well organised computer data base can we provide a single access point to these multiple dimensions of the vocabulary. (128)

I have spoken elsewhere about why I believe that lexicography in the age of digital humanities can help us reclaim the the notion of “the dictionary” even though there is and never was such a thing as a singular and uniquely authoritative source of information about words and their meanings. That, however, should not prevent us from trying to imagine what that “thing” — the dictionary — could be: whether a global web-service, a multilingual, heavily cross-referenced index, a layer of the Semantic Web, or a utopian (Borgesian?) meta-dictionary, the dictionary of all dictionaries.

Most of what we say today will be considered useless in twenty-five years. But this is an exciting time to be working on and thinking about dictionaries. We are yet to see what can be done when we bring text mining to large collections of dictionaries (or as Julianne Nyhan would say: What do you do with a million dictionaries?) We are yet to truly put into practice something that Richard Bailey — in in his fascinating 1973 “Reflections on Technology in Lexicography” — called “socialized lexicography.”1 And, perhaps, most importantly: we are yet to write a new chapter on lexicographic complexity and ambiguity in the electronic age. Just because dictionaries and XML-based technologies seem like such a natural fit doesn’t mean that we should not think about what gets left out by language models derived from structured data: what are the “remainders” of the dictionary? What can dictionaries learn from poetry? Would a deconstructed dictionary still be a dictionary? How would we model it? And could we actually produce it?

A history of electronic lexicography — a history yet to be written — will have a lot to teach us: not only about how far we have come, but also how far we still have to go.

Bailey’s “Reflections on Technology in Lexicography” is a very thoughtful paper about the way tools and technology condition lexicographic research — I’m planning to write more about it on this blog. ↩

One comment on ‘Electronic lexicography: what a difference a quarter-century makes — or does it?’