Welcome to .txtLAB, a laboratory for cultural analytics at McGill University directed by Andrew Piper. We explore the use of computational and quantitative approaches towards understanding literature and culture in both the past and present. Our aim is to engage in critical and creative uses of the tools of network science, machine learning, or image processing to think about language, literature, and culture at both large and small scale.

The Sweep of History

This is the second in a series of posts by .txtLAB interns. This post is authored by Magdalene Klassen.

Many if not most contemporary historians would probably agree with the statement that “the typical mode of explanation used by historians [is] narrative.” (Roberts 2001) Storytelling, then, is not the difference between history and fiction. Instead, we could say, the scope of the story is what differentiates historical and fictional writing. For the past four months, I have been comparing a corpus of historical texts with a corpus of novels in English, French, and German. Based on my interpretation of the results, fictional texts have a smaller scope than histories, thematically, structurally, and lexically.

I considered works published between 1770-1930. All of the novels were in third person for comparative purpose. My results should taken with caution, as my data included more novels than histories.[1] Few nineteenth-century histories have been well-digitized because the historical narrative has changed, and these texts have become primary rather than secondary sources. For example, Edward Gibbon’s The Decline and Fall of the Roman Empire is no longer an authoritative account – we now think of this “fall” as a transformation. Now his text is a means to understand how late-eighteenth-century historians understood their task, and the Roman Empire.

I have defined a “history” as an account of people and their actions in wider societal events in the past. This definition is meant to exclude:

histories of disciplines (science/philosophy/art)

speculative evolutionary histories (dawn of time/state of nature)

memoirs/histories written by those who took part

I ran five main tests on the data sets, studying sentence length, type-token ratio, corpus homogeneity, vocabulary likeliness, and dictionary frequency. The first two were primarily structural tests, and the last three had a greater focus on the words of the corpora.

Sentence Length

I was only able to run the sentence length test on English and German texts, due to the software, and in both languages the historical texts had longer sentences than novels. This is far more pronounced in English; sentences in novels are on average 21.47150 words long, while in histories the average sentence is 25.69378 words long (p-value 5.137e-07). The shorter sentences of novels is likely a result of the frequent use of dialogue, which is often composed of short interjections. Alternatively, these results may confirm that historical, academic writing is more dense than fiction. A further test not including dialogue might yield more conclusive results.

The difference in German is much smaller, novels having sentences that are on average 21.53037 words long, while the average for histories was 22.62373 words (p-value 0.2505). Again, a test which did not include dialogue would be helpful to better understand the genre difference in sentence length in German. My results may suggest that the difference between functional and literary language is lesser in German, or that the difference between dialogue and description is less pronounced.

Type-Token Ratio

Histories also have a higher type-token ratio than novels, in all three languages.

TTR

Nove

History

P-value

English

0.2158489

0.2404620

9.53E-09

German

0.2786909

0.3232653

8.802E-16

French

0.2549798

0.2666072

0.005711

These results suggest that, although history describes the past, it does so with more novel language that the novel, especially in English and German. These nineteenth-century histories may have introduced new vocabularies because they were often about exotic others, whether ancient or far away. In contrast, the vast majority of French histories – though I did try to maintain a variety – were about the French Revolution. Although one might think that conjuring up an entire literary world is harder than writing about one everybody agrees existed – for example, Fontane describing Berlin in Irrungen, Wirrungen compared to Franz Kugler describing the Berlin of more than a century earlier in his Friedrich der Große – it may be that in a fictional effort, the author repeats the same words in order to solidify those characteristics in a reader’s mind.

Homogeneity

Novels as a genre are significantly more linguistically homogenous than histories. I determined this by a series of correlation measures, by which each text was compared to every other text in the generic corpus, to determine the overall similarity of the corpus to itself.

The stark difference in homogeneity across all three languages is difficult to explain, but once again these results suggest that novels have a smaller scope than histories. Whereas histories argue a thesis or perspective that must be defended, novels seek to convey a recognizable social reality.

Homogeneity

Novel

History

P-value

English

0.6700383

0.4467730

< 2.2e-16

French

0.6843885

0.4686732

< 2.2e-16

German

0.7715615

0.6196034

< 2.2e-16

Distinctive Words

I measured a word’s likelihood in a given corpus by a paired difference test: the Wilcoxon rank-sum test. The results strongly confirmed the thematic differences between nineteenth-century novels and histories: the words most characteristic of novels focused on individual emotions and bodies, while histories tended to use larger-scale words about war, diplomacy, and geopolitics. My results confirm earlier impressions about the difference between these genres. Yet the degree to which they validate conceptions of nineteenth century historical methodologies is fascinating. For example, in all three languages, the name Alexander (the Great, most likely) indicated histories. In German, 53 of the words characteristic of histories were either first or last names. This is clear proof of the so-called Great Man Theory, which was first popularized by the English historian Thomas Carlyle in the 1840s. In many ways, history has a broader scope than fiction, but certain common threads still remain. Significantly, these commonalities confirm what we understand about the established historiography of the nineteenth century.

Self-Reflexivity

I based the final test on several thematic dictionaries I collated, but here I focus on only one. This dictionary consisted of approximately 100 words in each language that referenced the document’s status as a text. Words included historian, archive, metaphor, and storytelling, as well as “neutral” words that applied to both genres, such as editor and page. I found that histories are much more reflexive than novels, though the dictionary was meant to represent equally “history words” and “novel words.”

Reflexivity

Novel

History

P-value

English

0.2158489

0.2404620

9.53E-09

German

0.2786909

0.3232653

8.802E-16

French

0.2549798

0.2666072

0.005711

Conclusion

Historical texts draw their legitimacy from their ability to interact with other texts, and so references to themselves as texts, surrounded by other texts, are crucial to their authoritative function. In contrast, novels attempt to immerse the reader in the world of the novel, despite the occasional “dear reader” moment. Where novels pull the reader inward, histories draw them outward, offering an explicitly broader scope and frequent examples.

[1] English: 86 histories/108 novels, French: 83 histories/100 novels, German: 75 histories/110 novels. I also tried not to use individual volumes from multivolume works, which severely limited my choice of histories, and so some first or second volumes were included.