Google Books Ngram Viewer, launched in December 2010, displays a graph showing how phrases have occurred in a corpus of 5.2 million digitized books (about 4% of all books ever printed) from the 16th century until 2008. It is fun, but not only. It can reveal cultural trends, based on language usage. The Ngram of the words “nursery school”, “kindergarten” and “child care” given on google website is a good example.

Originally, the name n-gram comes from a model sequence used in statistic since 60 years. The concept is not new, but internet capabilities give it a second go.

The game of the name

Now, just try it. Type words and look at the curve. A plethora of website already exploited the tool, just for fun or for scientific purposes.

A hint before playing with it: Ngram is case-sensitive. For instance, one get more hits when the words “facebook” and “google” are written with a capital letter. In “When OCR Goes Bad: Google’s Ngram Viewer & The F-Word“, other issues concerning the limits of the model are addressed.

To be sure we can use Google Ngram with confidence, my scientific education recall me we have first to check that the model correctly depicts language usage. One simple idea to do it: let’s look at the Ngram of the most common words. A priori, the curves shall be relatively flat (see next figure). Most of them are, but some aren’t!

I am not a native english speaker, nor a linguist. But I think it can be explained why the occurrence of the words “the” and “of” decreases. Look at next figure for instance. I let you do the rest of the job by yourself.

So, we have (more or less) checked the model. Now, we can start using it.

Ngram usage for business

My intention here is to show some examples how Google Ngram can be utilized to improve business and marketing analysis. The method is based on following axioms:

Google Ngram model correctly depicts language usage.

Language and culture strongly influence each other.

Culture and business are strongly related.

The Ngram occurrence of the name of a company is proportional to its business visibility.

Ngram helps marketers to choose the right word among synonyms.

Maybe these axioms are not always true, or not completely true. But let’s imagine they are and let’s see then what it implies.

Note that visibility can be negative or positive. Ngram does not make any difference between the two.

Example 1 – The rise of Goldman Sachs

Next figure shows that occurrence of names of top consulting firms increases to peak between 2001 and 2006. On the contrary, the occurrence of the words “Goldman Sachs” made an outstanding leap in 2008. Can it be interpreted as the rise of the investment management firm, which succeeded to take over the political power in some European countries?

Example 2 – Microsoft

Following figure shows the occurrence of the word “Microsoft” along with its share price on the stock market. The two curves present a similar pattern, both indicating the slowdown of Microsoft.

The comparison with Google and Facebook displayed on next figure completes the analysis. But one has to mention that Microsoft still leads the race. How much longer? It is a shame Google Ngram data is available only until 2008. Else we would have a better overview. Note that in 2008 the word “IBM” was still more frequently used in books than “Facebook”. It is today certainly not the case anymore.

Example 3 – The fleeting NetPC

In 1996, I attended a conference on NetPC, a PC without local storage devices. We were told it was the future. It didn’t happen, as following Ngram tragically demonstrates.

Example 4 – Be smart: write “smart”!

Words play a major role in marketing. Google NGram helps marketers to choose them.

In the electricity sector, the term “smart grid” is excessively used. Nobody knows neither what it means exactly, nor where it comes from. Did you ever heard of a dumb grid before? It doesn’t matter. Every company active in the electricity sector is now widely using it. But it is no accident that the word “smart” was used. The trend was already there. According to the following Ngram It was top-notch to use “smart grid”, instead of “intelligent grid” or “clever grid”. Why not “cunning grid” or “ingenious grid”?

This Ngram reveals us a more global fashion: do not hesitate to use the words “intelligent”, “smart” and other synonyms. They are really trendy since 2000. Look at the number of companies and products using the word “smart”: smartphone, Smartbook (not a success story), SmartBank, SmartLaw, Smart Client Software Factory from Microsoft, SmartFuel, SmartCellar, SmartDivorce (they dare it!), smart bomb (is it anything smart concerning bombs?), etc.

Type “smart” in Google or Bing, you get two times more hits than for “Obama”. Vote “smart”!

A last one before leaving

I let you meditate upon the next Ngram. Chicken or egg?

There is no moral of the story

I don’t try to pretend that the examples demonstrate any general theory. But many curves produced with Ngram, like the previous ones, trigger off a “I knew it!” Indeed, many Ngram confirm what many of us already know. But sometime they don’t. There is clearly something in it, but what exactly?