Friday, November 1, 2013

Taking Google’s Ngram For a Spin

Eric Schultz

As
part of a recent parents weekend at a fine New England educational institution,
I had the pleasure of watching Peter Norvig, director of research at Google,
demonstrate Google’s Ngram Viewer (Ngram section starts around 9:30). Released in December 2010, Ngram
(Wikipedia tells us) is a “phrase-usage graphing tool” that charts the yearly
count of selected n-grams (letter combinations), or words and phrases.
The Google database currently includes 5.2 million books published between 1500
and 2008 containing 500 billion words in American English, British English,
French, German, Spanish, Russian, and Chinese

Peter
Norvig’s example that day illustrated the power of the Ngram database,
comparing the word combinations “The United States is” and “The United States
are.”

The
result seems logical: the singular “is” becomes the dominant verb after the
American Civil War. This was a relatively straightforward example, however,
and I should warn as you read through this article that you’ll need to channel
a little of your inner art historian as the graphs become more complicated and
require a longer look, often informed by a refresh on dates.

I
also need to warn that the Ngram tool is a little like a chain saw in the hands
of a beginner; my graph does not look exactly like Peter’s (and I’m not sure
the reason), everything is cap-sensitive, and there’s no simple facility yet
(of which I am aware) for combining terms. So, for example, one cannot
search on “Franklin Delano Roosevelt” and “FDR” combined.

As
my kindly doctor says, he won’t prescribe for an illness he cannot
diagnose. So, as a complete novice, I am not endorsing Ngram for serious
historical research. I have concluded, however, in a competition between
Angry Birds and NGram, the latter is a far more fascinating diversion.

Here’s
a comparison I ran on the terms “one nation under God” and “one nation
indivisible.” Remembering that “under God” became a pressing issue in the
1950s and was signed into law by President Eisenhower in 1954, this graph again
makes good sense.

Now
let me offer something a little more nuanced, comparing the terms “George
Washington” and “Abraham Lincoln” (and remembering that “President Washington”
or “Abe Lincoln” might be good terms to one day combine in a total
search). The results are below.

What
to make of this? I would have bet that Lincoln had more sheer volume of
mentions than Washington, at least in the last generation, but it turns out to
be the opposite. We can see increases at the time of Lincoln’s 100th
birthday in 1909, and Washington’s 200th in 1932. Beyond that,
it appears that Washington really remains first in the hearts (or at least the
publications) of his countrymen. (To complicate the picture, when John
Adams is added, he dominates both Washington and Lincoln for most of the 19th
century before falling behind permanently around 1900.)

Another
graph shows the comparison of four wartime events and seems more
straightforward, with the emotional force of Pearl Harbor clearly reflected in
literature.

Having
written about the history of air conditioning for United Technologies
(“Weathermakers to the World,” 2012), I was curious to see what would happen
when I tested the term. Sure enough, the history of the technology was
plotted on the screen, from its introduction to the public in movie theaters
and department stores beginning around 1925, to its hyped status in the 1930s
as a technology capable of pulling America out of the Great Depression, to its
growth as the Baby Boomers returned from WWII and invaded suburbia.

When
the New York Times called America’s 1970 census “the Air-Conditioned
Census,” it resulted in a decade of torrid press until air conditioning became
a mature, more mundane topic. As climate change becomes a persistent
topic (and Google updates its Ngram data beyond 2008), we might well see
another upsurge in “air conditioning” literature.

I
graphed a small sample of American historians, just to get a sense for the push
and pull of various interpretations. (I could see this as the dreaded
final exam in a History class, with the simple instructions:
“Comment.”)

Remembering
that every “John Fiske” (historian, philosopher and other) ever written about
is contained in the Ngram results, I leave it to my professional historian
friends to make sense of this chart. I might add only that, knowing the
world a bit as an entrepreneur, the emphasis on Frederick Jackson Turner’s
frontier thesis beginning in the 1980s is not surprising, as he is the adopted
historian of the high-tech crowd.

Finally,
as some of this writing was done with the World Series raging, I wanted to
compare the Cardinals and Red Sox. Being a lifelong Sox fan, I was overjoyed to learn that the
recent separation in press shown by Ngram accurately fortold the
results of the on-field competition.