Crunching Literature

By

“A poem,” wrote William Carlos Williams toward the end of World War II, “is a small (or large) machine of words.” I’ve long wondered if the good doctor -- Williams was a general practitioner in New Jersey who did much of his writing between appointments – might have come up with this definition out of weariness with the flesh and all its frailties. Traditional metaphors about “organic” literary form usually imply a healthy and developing organism, not one infirm and prone to messes. The poetic mechanism is, in Williams’s vision, “pruned to a perfect economy,” and there is “nothing sentimental about a machine.”

Built for efficiency, built to last. The image this evoked 70 years ago was probably that of an engine, clock, or typewriter. Today it’s more likely to be something with printed circuits. And a lot of poems in literary magazines now seem true to form in that respect: The reader has little idea how they work or what they do, but the circuitry looks intricate, and one assumes it is to some purpose.

I had much the same response to the literary scholarship Matthew L. Jockers describes and practices in Macroanalysis: Digital Methods & Literary History (University of Illinois Press). Jockers is an assistant professor of English at the University of Nebraska at Lincoln. The literary material he handles is prose fiction -- mostly British, Irish, and American novels of the 18th and 19th centuries -- rather than poetry, although some critics apply the word “poem” to any literary artifact. In the approach Jockers calls “macroanalysis,” the anti-sentimental and technophile attitude toward literature defines how scholars understand the literary field, rather than how authors imagine it. The effect, in either case, is both tough-minded and enigmatic.

FollowingFranco Moretti’s program for extending literary history beyond the terrain defined by the relatively small number of works that remain in print over the decades and centuries, Macroanalysis describes “how a new method of studying large collections of digital material can help us to understand and contextualize the individual works within those collections.”

Instead of using computer-based tools to annotate or otherwise explore a single work or author, Jockers looks for verbal patterns across very large reservoirs of text, including novels that have long since been forgotten. The author notes that only “2.3 percent of the books published in the U.S. between 1927 and 1946 are still in print” (even that figure sounds high, and may be inflated by the recent efforts of shady print-on-demand “publishers” playing fast and loose with copyright) while the most expansive list of canonical 19th-century British novels would represent well under 1 percent of those published.

Collections such as the Internet Archive and HathiTrust Digital Library available for analysis. Add to this the capacity to analyze the metadata about when and where the books were published, as well as available information on the authors, and you have a new, turbocharged sort of philology – one covering wider swaths of literature than even the most diligent and asocial researcher could ever read.

Or would ever want to, for that matter. Whole careers have been built on rescuing “unjustly neglected” authors, of course, but oblivion is sometimes the rightful outcome of history and a mercy for everyone involved. At the same time, the accumulation of long-unread books is something like a literary equivalent of the kitchen middens that archeologists occasionally dig up – the communal dumps, full of leftovers and garbage and broken or outdated household items. The composition of what’s been discarded and the various strata of it reveal aspects of everyday life of long ago.

Jockers uses his digital tools to analyze novels by, essentially, crunching them -- determining what words appear in each book, tabulating the frequency with which they are used, likewise quantifying the punctuation marks, and working out patterns among the results according to the novel’s subgenre or publication date, or biographical data about the author such as gender, nationality, and regional origin.

The findings that the author reports tend to be of a very precise and delimited sort. The words like, young, and little “are overrepresented in Bildungsroman novels compared to the other genres in the test data.” There is a “high incidence of locative prepositions” (over, under, within, etc.) in Gothic fiction, which may be “a direct result of the genre’s being ‘place oriented.’” That sounds credible, since Gothic characters tend to find themselves moving around in dark rooms within ruined castles with secret passageways and whatnot.

After about 1900, Irish-American authors west of the Mississippi began writing more fiction than their relations on the other side of the river, despite their numbers being fewer and thinner on the ground. Irish-American literature is Jockers’s specialty, and so this statistically demonstrable trend proves of interest given that “the history of Irish-American literature has had a decidedly eastern bias…. Such neglect is surprising given the critical attention that the Irish in the West have received from American and Irish historians.”

As the familiar refrain goes: More research is needed.

Macroanalysis is really a showcase for the range and the potential of what the author calls “big data” literary study, more than it is a report on its discoveries. And his larger claim for this broad-sweep combination of lexometric and demographic correlation-hunting – what Moretti calls “distant reading” -- is that it can help frame new questions about style, thematics, and influence that can be pursued through more traditional varieties of close reading.

And he’s probably right about that, particularly if the toolkit includes methods for identifying and comparing semantic and narrative elements across huge quantities of text. (Or rather, when it includes them, since that’s undoubtedly a matter of time.)

Text-crunching methodologies offer the possibility of establishing verifiable, quantifiable, exact results in a field where, otherwise, everything is interpretive, hence interminably disputable. This sounds either promising or menacing. What will be more interesting, if we ever get it, is technology that can recognize and understand a metaphor and follow its implications beyond the simplest level of analogy. A device capable of, say, reading Williams’s line about the poem as machine and then suggesting something interesting about it – or formulating a question about what it means.