While data analytics and visualization tools have accumulated a significant historical record of accomplishments, now, in turn, this technology is being applied to actual significant historical accomplishments. Let’s have a look.

Every year in January, the President of the United States gives the State of the Union speech before both houses of the U.S. Congress. This is to address the condition of the nation, his legislative agenda and other national priorities. The requirement for this presentation appears in Article II of the U.S. Constitution.

The researchers developed custom algorithms for their research. They were applied to the total number of words used in all of the addresses, from 1790 to 2014, of 1.8 million. By identifying the frequencies of “how often words appear jointly” and “mapping their relation to other clusters of words”, the team was able to highlight “dominant social and political” issues and their relative historical time frames. (See Figure 1 at the bottom of Page 2 of the full report for this lexigraphical mapping.)

One of the researchers’ key findings was that although the topics of “industry, finance, and foreign policy” were predominant and persist throughout all of the addresses, following World War II the recurring keywords focus further upon “nation building, the regulation of business and the financing of public infrastructure”. While it is well know that these emergent terms were all about modern topics, the researchers were thus able to pinpoint the exact time frames when they first appeared. (See Page 5 of the full report for the graphic charting these data trends.)

Foreign Policy Patters

The year 1917 struck the researchers as a critical turning point because it represented a dramatic shift in the data containing words indicative of more modern times. This was the year that the US sent its troops into battle in Europe in WWI. It was then that new keywords in the State of the Union including “democracy,” “unity,” “peace” and “terror” started to appear and recur. Later, by the 1940’s, word clusters concerning the Navy appeared, possibly indicating emerging U.S. isolationism. However, they suddenly disappeared again as the U.S. became far more involved in world events.

Domestic Policy Patterns

Over time, the researchers identified changes in the terminology used when addressing domestic matters. These concerned the government’s size, economic regulation, and equal opportunity. Although the focus of the State of the Union speeches remained constant, new keywords appeared whereby “tax relief,” “incentives” and “welfare” have replaced “Treasury,” “amount” and “expenditures”.

An important issue facing this project was that during the more than two centuries being studied, keywords could substantially change in meaning over time. To address this, the researchers applied new network analysis methods developed by Jean-Philippe Cointet, a team member, co-author and physicist at the University of Paris. They were intended to identify changes whereby “some political topics morph into similar topics with common threads” as others fade away. (See Figure 3 at the bottom of Page 4 of the full paper for this enlightening graphic.*)

As a result, they were able to parse the relative meanings of words as they appear with each other and, on a more macro level, in the “context of evolving topics”. For example, it was discovered that the word “Constitution” was:

closely associated with the word “people” in early U.S. history

linked to “state” following the Civil War

linked to “law” during WWI and WWII, and

returned to “people” during the 1970’s

Thus, the meaning of “Constitution” must be assessed in its historical context.

My own questions are as follows:

Would this analytical approach yield new and original insights if other long-running historical records such as the Congressional Record were like subject to the research team’s algorithms and analytics?

Could companies and other commercial businesses derive any benefits from having their historical records similarly analyzed? For example, might it yield new insights and recommendations for corporate governance and information governance policies and procedures?

Could this methodology be used as an electronic discovery tool for litigators as they parse corporate documents produced during a case?