Greek Historiography Through Dependency Syntax Treebanking

Abstract: We are collaborating on a digital ancient Greek historiography project focusing on dependency syntax treebanking. As we collect and categorize syntactic data from a growing body of treebanked authors, we hope to identify syntactic thumbprints in order to distinguish authorship, especially where surface features such as vocabulary may be misleading. Our aim is to determine the accuracy of the quotes attributed to early authors by sources such as Athenaeus and various Byzantine epitomizers. The first step, which is well advanced, is to prepare syntactic trees of representative extant Greek prose writers, particularly historians. These trees, together with the material already available in the Ancient Greek Dependency Treebank, will constitute our working corpus of data.

The second step is to extract syntactic information from this corpus in usable form. The most straightforward approach seems to be to convert dependency relationships into “syntactic words” (swords). To do this, one may trace the dependency path between each leaf node and the sentence root and record the dependency label for each edge. As an example, sentence 1 of Athenaeus Book 12:

The chief advantage of recasting dependencies as syntax words is that they are immediately valuable: with trivial modifications such texts can be put into standard text-processing software to produce type-token ratios, word frequency histograms, etc., providing detailed syntactic information about individual authors. All data and the algorithms which generate them will be made available under Open Source conventions. We will use this output, which in itself represents significant new evidence on ancient Greek usage, to discover if we can computationally distinguish between the different authors of established texts. Once we have this proof of concept, we can use the methods developed to compare directly-transmitted and epitomized work by the same author, particularly Polybius and Diodorus Siculus. This work could have significant ramifications for Roman historiography.

Then we can also apply the queries to the complex text of Athenaeus. He cites thousands of fragments by hundreds of authors, usually naming the author. But it is not clear what is quote and what is paraphrase, and the length of the citation is frequently ambiguous. If we can distinguish text from cover-text in a clear way, then we can argue for the legitimacy of these early quotes. But if the opposite is true, and the prose attributed to others more nearly resembles the writing of Athenaeus and others of his time, then as a profession we have to rethink our approaches to Greek, particularly Hellenistic, historiography.