How to Search Ngram More Effectively with Google Ngram Viewer

Google maintains a multilingual database of published language. By scanning books en masse, Google is able to process the text and provided statistical data-based frequency of word appearance. With the Google Ngram Viewer search tool, you can search through that voluminous statistical data rapidly and effectively. By comparing the relative popularity of words, you can map how language and culture have changed over time. Ngram can do much more than simply report word frequency within Google’s vast textual corpus, however.

Advanced Search (2- through 5-grams)

By adding additional search words (“grams,” in the language of the search engine), you can create complex comparisons across time. You can enhance search with keyword commands like Google Search’s advanced functionality.

Separate sequential search terms with a comma.

The Ngram Viewer will display the relative frequency of your search terms in a single graph. Hover over the graph’s lines to see precise data points.

Wildcard Search

Use the asterisk (“*”) in your search terms as a wildcard. For example, “Bachelor of *” would return results for many Bachelor’s degrees.

Inflection Search

To find all the inflections of a term, append the “_INF” text command. This searches for every inflection of the attached word, like the various forms of “to be” in English.

Parts of Speech

If a word includes many parts of speech, you can append text operators to be specific. The valid parts of speech in Google’s database include all of the following:

_ADJ_: adjective (fast, large, smart)

_ADV_: adverb (quickly, later, always)

_PRON_: pronoun (their, it, we)

_DET_: determiner or article (a, an, the)

_ADP_: adposition (prepositions and postpositions)

_NUM_: numeral (first, second, fifth)

_CONJ_: conjunction (and, nor, but)

_PRT_: particle, which is a catchall, rarely-used category for other word functions

Each of these grams can be combined into phrases. For example, “_ADJ_ boy” would return adjective + “boy” word pairs.

To specify a specific part of speech for one search term, append it to the end: i.e., “water_VERB”, without a trailing underscore.

To include every part of speech for a given word, use the wildcard operator after the underscore, as seen below.

Using Functional Variables

Functional variables let you search by the function or placement of words.

_ROOT_ is a placeholder for the root of the sentence’s parse tree, This is typically the primary subject or the word modified by the verb.

_START_ indicates the beginning of a sentence (“_START_ President Obama” returns only sentences that start with the phrase “President Obama”).

_END_ indicates the end of a sentence (“_ADP_ _END_” returns sentences that end in prepositions).

Compositions

By combining search terms with arithmetic operators, you can perform simple mathematical analysis with values for term frequency:

+ adds multiple expressions into one search term

– subtracts the expression on the right from the expression on the left, providing a quick way to compare the relative use of two search terms.

/ divides the expression on the left by the expression on the right

* multiplies the expression to compare ngrams of widely varied frequency. Make sure to enclose the whole ngram in parentheses to avoid having the asterisk parsed as a wildcard character.

: searches for the ngram on the left within the corpus on the right

Dependencies

Finally, you can set dependencies with “=>” to search linguistic relationships. “car=>fast” would return results where “fast” was grammatically dependent on, or modifying, the word “car.” This can be mixed freely with any of the advanced search operations.

Conclusion

When working multi-grams, your search can quickly get complicated. Some of these search techniques play well together, while others are incompatible. The best way to find out if something works is to simply try it. For example, the _INF tag is highly flexible, while _VERB is picky. You’ll quickly learn the quirks as you delve into the Ngram Viewer’s toolkit.

One comment

I can no longer see the original printed sources for phrases that I am searching for, not even in the meagre snippet view. This was easily the most useful feature of the old version, and compared to that, this new frequency graph version is completely pointless and useless.