What happens when we begin to think of all information as data that can be explored to yield new insights into our world? What would it look like to take nearly a decade of CNN, Fox News, and MSNBC television broadcasts and two years of BBC News broadcasts and run them through sophisticated natural language processing algorithms to identify every mention of a location on earth in their coverage and then create a series of maps that visualize the places we hear about when we turn to the news? What would those maps look like and what might they tell us about what we see when we turn on our televisions each day?

Half a decade ago I began working with the Internet Archive’s incredible Television News Archive to explore how powerful computer algorithms could allow us to “see” the news in entirely new ways. From simple longitudinal keyword searches to mass emotion mining to geographic mapping to the most powerful deep learning algorithms watching political ads, television has an incredible amount to teach us as we explore it through the modalities and lenses of massive data mining.

Geography offers a particularly powerful and yet underutilized lens through which to understand the news. In fact, it was an animated map of 400,000 hours of television that was the very first major visualization I created from the Television News Archive. Over the years this has been followed with city and country-level aggregations and animations, each exploring a different dimension of television geography.

As we approach nearly a decade of television news coverage monitored by the Internet Archive, it is worth looking back to see just what the world has looked like through the world of television news.

Mapping television news starts with the raw textual closed captioning streams of each station. These captioning streams are essentially verbatim transcriptions of the spoken audio of each broadcast, allowing us to apply textual data mining tools to this visual medium. The textual captioning streams are fed through specialized algorithms known as “fulltext geocoders” that identify all mentions of geographic locations ranging from a country or city name on through a remote hilltop and uses the immediate context of the broadcast to disambiguate it and separate Paris, Illinois from Paris, France. The final result is a massive archive of latitude/longitude coordinates of the centroids of all of the locations mentioned in the television news broadcasts monitored by the Internet Archive over the past decade.

It is important to understand that automated textual geocoding of raw closed captioning streams will necessarily incur a certain degree of error from the captioning and geocoding processes inherent in all fully automated data mining. Raw television captioning data is especially difficult to work with, filled with typographical errors and rapid-fire contextual changes and lacking capitalization, punctuation and refined grammatical structure, meaning geocoding algorithms have less high-confidence contextualizing features to help guide their selection and disambiguation processes. In short, mapping television news through closed caption geocoding will contain a certain level of error but offers a powerful glimpse into the geography of attention of television news over the past decade.

Using this data created by my open data GDELT Project and a single line of SQL with Google’s BigQuery platform, visualized through Carto’s mapping platform, we can create a map of the geography of each television station over the entire period the Internet Archive has monitored it.

The Archive’s BBC News archive spans just under two years, 2017-present. Immediately clear is its near-saturation coverage of the United Kingdom and heavy coverage of former colonies and countries of interest such as India, South Africa, Australia and Zimbabwe. Its coverage of the US is strong, but not exhaustive.

Geography of BBC News coverage 2017-2018Kalev Leetaru

In contrast, CNN unsurprisingly has offered near saturation coverage of the United States over the past nine and a half years (2009-2018), with far less comprehensive, though still relatively strong, coverage of the UK. Afghanistan and Iraq also feature more prominently, likely due to the enormous US military investment in those countries over the time span.

Geography of CNN coverage 2009-2018Kalev Leetaru

There appears to be little substantive difference between CNN’s geographic focus and that of Fox News and MSNBC over the period 2009-2018.

Geography of Fox News coverage 2009-2018Kalev Leetaru

Geography of MSNBC coverage 2009-2018Kalev Leetaru

Overlaying all three US stations on one map (CNN in purple, Fox News in blue and MSNBC in green, with black indicating locations covered by all three), slight differences can be observed, but no obvious systematic differences are immediately apparent.

Zooming into the United States, it is clear that all three stations cover the eastern half of the country far more extensively than the West and that Seattle, San Francisco and Los Angeles as well as Salt Lake City and Denver are the most heavily covered of the Western cities. There does not appear to be any difference in rural/urban coverage or North/South.

What about the difference between American and British news? The following map overlays the three American stations in yellow (orange/red indicates areas of heavy overlap among the American stations) and BBC News in blue. As suggested earlier, BBC is the clear winner in the UK and former British colonies, with CNN, Fox News and MSNBC the clear winners in the US and Middle East. The rest of the world is a fairly even mix. Surprisingly, American stations appear to have slightly better coverage in Europe, though the comparison is not quite fair given that nearly ten years of American news is being compared to just under two years of BBC News coverage.

Of course, often the most interesting stories are told not through static snapshots, but rather through the patterns in how those points have moved over time. To explore this in more detail, two animations were created that show the day-by-day geographic focus of the stations. The first shows the BBC’s daily coverage 2017-2018 and the second shows the combined daily focus of CNN, Fox News and MSNBC. Each frame shows all of the locations mentioned on those stations on that day. Watch closely and you can see a wealth of world events result in bursts and waves moving across the maps. Just as quickly, watch how fast the media lose interest in each story and focus their attention elsewhere.

Putting this all together, when we begin to think of television as data and geography as a lens through which to explore it, we are able to “see” the news in an entirely new light. Most importantly, we able to quantify just how little of the world we actually hear about each day and the importance of editorial decisions and agenda setting in our understanding of the world around us. In the end, perhaps the most important story of all is that by bringing together powerful data mining algorithms with the analytic capabilities of BigQuery and the visualization power of platforms like Carto and applying those to archives as incredible and unique as the Internet Archive’s Television News Archive, we now have the tools to explore our world in ways we could never before dream of and to see the world around us in a whole new light.

I would like to thank Google for the use of Google Cloud resources including BigQuery and to Carto for the use of their online mapping platform. I would also like to extend a very special thanks to the Internet Archive’s Television News Archive and the team behind it for creating such an incredible resource, without which these explorations would simply be impossible.

Based in Washington, DC, I founded my first internet startup the year after the Mosaic web browser debuted, while still in eighth grade, and have spent the last 20 years working to reimagine how we use data to understand the world around us at scales and in ways never befor...