After a short introduction explaining the rationale of the dictionary, this volume falls into two parts: first, a lemmatised alphabetical list of Latin words and their frequency in a range of Latin authors (details below); and secondly, a list of the same words in descending order of overall frequency in the corpus of texts used. ('Frequency' is here the number of occurrences of the word in all its forms.) This new frequency dictionary of Latin differs in several ways from its predecessor, produced by Denooz and others thirty years ago (L. Delatte, E. Évrard, E. Govaerts, J. Denooz, Dictionnaire fréquentiel et Index inverse de la langue latine, Liège, Université de Liège, Laboratoire d'Analyse Statistique des Langues Anciennes, 1981). First, the corpus of Latin texts used has expanded to more than twice the size of the earlier one (over 1.7 million words compared to 794,662).1 Secondly, in the alphabetical list of word frequencies, the earlier dictionary gave just three figures for each word – the overall frequency in the prose texts used, the overall frequency in the verse texts, and the sum of the two – but the new dictionary gives separate figures for the frequency of the word in each of the nineteen authors included (and for the younger Seneca, there are two separate figures, for the philosophical works, and for the tragedies), plus the total frequency for the whole corpus. Thirdly, the old dictionary contained tables with a variety of grammatical information (including data on the relative frequency of different parts of speech, on the frequencies of nouns and adjectives by declension, case and number, and on the frequency of verbs by conjugation, voice, mood and tense); there are no such tables in the new dictionary (but such information could be retrieved from the online L.A.S.L.A. database, on which see below). Fourthly, the old dictionary contained a reverse index of all the word-forms found in the corpus, but the new dictionary does not.

The new dictionary, like its predecessor, is based on the Latin database of the Laboratoire d'Analyse Statistique des Langues Anciennes (L.A.S.L.A.) at Liège. The dictionary is based on the complete works of Caesar, Cato, Catullus, Curtius Rufus, Horace, Juvenal, Lucretius, Persius, Petronius, Propertius, Sallust, Tacitus, Tibullus and Vergil, and some works of Cicero (all the speeches, and a few philosophical works—De officiis, De amicitia,De natura deorum,De senectute—, but no letters or rhetorica), Ovid (the elegiac works but not the Metamorphoses), Plautus (eight plays, from Amphitruo to Epidicus in the standard alphabetical order, but omitting Cistellaria), the younger Pliny (letters, including those from Trajan, but not the Panegyricus), and Seneca (the tragedies, and the prose works apart from the Naturales Quaestiones). There are full details online, at http://www.cipl.ulg.ac.be/Lasla/tlatins.html; and the database, which contains morphological and syntactical as well as lexicological information, can be searched online; and from the database one can retrieve the passages that lie behind the bare numbers in the dictionary.2

A dictionary such as this is a tool: so what can this one be used for? The introduction says '[o]n observe que les principaux auteurs antérieurs au 2e siècle de notre ère sont intégrés au lexique, soit dans leur totalité, soit partiellement' (p. VII). But there are obvious gaps in the coverage: no Terence, no Livy, no post-Virgilian epic, to go no further. Also, much of the information in the dictionary could be retrieved from other sources, including printed concordances and word lists to individual authors, the Thesaurus Linguae Latinae, and electronic databases such as PHI and BTL. Nevertheless the volume has its uses, particularly in parts of the alphabet that the Thesaurus has not yet reached, and particularly where lemmatisation is crucial (e.g. in distinguishing tempus 'time' from tempus 'temple', or adverbial from relative qui, or interrogative from indefinite quis): here the dictionary can give a useful, though incomplete, first impression of the distribution of a word. Furthermore, the dictionary facilitates ready comparisons between the authors whom it does include, e.g. between the usage of the various elegiac poets, or between Seneca the tragedian and the philosopher. As for the list of lemmas in descending order of frequency, there are no surprises here, and such things exist already; though they all differ in detail, because they use different databases, and some of them are lemmatised, some not.3

Although the volume is handsomely produced in a large format, the table of frequencies in different authors is not easy on the eye, with its 21 columns of figures in small print. The authors are arranged alphabetically by name of author (except that, unaccountably, Senecan tragedy and Pliny appear out of sequence after Virgil), and the names are repeated at the heads of the columns on each page; but it is still not easy to keep track of, say, Catullus, Ovid, Propertius and Tibullus in columns 4, 10, 14 and 18, or of Seneca's prose and verse, in columns 16 and 20. Which brings me on to the irony, not unfamiliar, of a work generated by computer being made available solely in book form. If one had the dictionary in electronic form (whether on CD or online), it should be easy to allow the reader to hide columns, and so get over practical difficulties of the kind just mentioned. More significantly, it could allow frequencies to be displayed not just as total numbers of occurrences in an author, but also as number of occurrences per 1000 words. This is in some respects a more useful figure, given that the works of the authors used in the dictionary range widely in length, from Persius' with 4,633 words to Cicero's with 467,665. If one looks at the entry for, say, ego, and one sees that Cicero uses ego in its various forms 4672 times and Persius uses it 44 times, it is not readily apparent that ego is in fact almost as frequent in Persius as in Cicero (on average 9.50 occurrences per 1000 words, compared to Cicero's 9.99). So one may hope that the dictionary will soon be published in electronic form, which would make it potentially a more user-friendly and versatile tool.

Notes:

1. However, some texts included in the earlier dictionary have disappeared from the new one, namely: Cicero Tusc. 5; extracts (unspecified) from Livy and from Ovid, Metamorphoses; and Vitruvius books 1, 9, and 10. 2. There are however, some differences between the online database and the corpus used in the dictionary: on the date when I checked (2 March 2011) the online database included Pliny's Panegyricus but not his letters, whereas the dictionary includes just the letters; and the dictionary includes Sallust's Histories (besides the Catiline and Jugurtha), but the online database does not. Also, the word counts of each author differ slightly between the dictionary and the database; the reason for this seems to be that the dictionary includes forms of the auxiliary sum in the count (see p. VII n. 6). 3. For instance, online one can find a non-lemmatised list of high-frequency Latin word forms at http://www.slu.edu/colleges/AS/languages/classical/latin/tchmat/grammar/vocabulary/hif1-ed2.html, or the dissertation of P. B. Diederich on The Frequency of Latin Words and their Endings (Chicago, 1939) at http://users.erols.com/whitaker/freq.htm.

No comments:

Post a Comment

About BMCR

Bryn Mawr Classical Review (BMCR) publishes timely reviews of current scholarly work in the field of classical studies (including archaeology). The authoritative archive can be found at http://bmcr.brynmawr.edu.

This site was established to allow responses to reviews through the comments feature; all reviews from August 2008 have been posted.