In this work, we analyze more than two million news photos published in January 2016. We demonstrate i) which objects appear the most in news photos; ii) what the sentiments of news photos are; iii) whether the sentiment of news photos is aligned with the tone of the text; iv) how gender is treated; and v) how differently political candidates are portrayed. To our best knowledge, this is the first large-scale study of news photo contents using deep learning-based vision APIs.

Not that bias-free news is possible, but deep learning appears to be useful in foregrounding bias against particular candidates:

…
We then conducted a case study of assessing the portrayal of Democratic and Republican party presidential candidates in news photos. We found that all the candidates but Sanders had a similar proportion of being labeled as an athlete, which is typically associates with a victory pose or a sharp focus on a face with blurred background. Pro-Clinton media recognized by their endorsements show the same tendency; their Sanders photos are not labeled as an athlete at all. Furthermore, we found that Clinton expresses joy more than Sanders does in the six popular news media. Similarly. pro-Clinton media shows a higher proportion of Clinton expressing joy than Sanders.
…

If the requirement is an “appearance” of lack of bias, the same techniques enable the monitoring/shaping of your content to prevent your bias from being discovered by others.

Data scientists who can successfully wield this framework will be in high demand for political campaigns.

I have written quite a bit about GDELT (the Global Database of Events, Languages and Tone) over the past year, because I think it’s a great example of the type of ambitious project only made possible by the advent of cloud computing and big data systems. In a nutshell, it’s database of more than 250 million socioeconomic and geopolitical events and their metadata dating back to 1979, all stored (now) in Google’s cloud and available to analyze for free via Google BigQuery or custom-built applications.

On Thursday, version 2.0 of GDELT was unveiled, complete with a slew of new features — faster updates, sentiment analysis, images, a more-expansive knowledge graph and, most importantly, real-time translation across 65 different languages. That’s 98.4 percent of the non-English content GDELT monitors. Because you can’t really have a global database, or expect to get a full picture of what’s happening around the world, if you’re limited to English language sources or exceedingly long turnaround times for translated content.
…

The GDELT homepage reports:

We’ll be releasing a new “Getting Started With GDELT” user guide in the next few days to walk you through the incredibly vast array of new capabilities in GDELT 2.0,…

Awesome, simply awesome!

Bear in mind that the data presented here isn’t “cooked.” That is it hasn’t been trimmed and merged with your client’s internal knowledge of “…socioeconomic and geopolitical events…” and how it impacts their interests.

For example, labor strikes in a shipping port on one continent may delay ontime shipments from a manufacturer on another for delivery to still a third continent. The information that ties all those items together is held by your client, not any public source.

There is vast sea of client data, relationships and interests to be mapped to from a resource like GDELT and the 2.0 version is simply upping the possible rewards.

Just in case you are curious:

Terms of Use

What can I do with GDELT and how can I use it in my projects?

Using GDELT

The GDELT Project is an open platform for research and analysis of global society and thus all datasets released by the GDELT Project are available for unlimited and unrestricted use for any academic, commercial, or governmental use of any kind without fee.

Redistributing GDELT

You may redistribute, rehost, republish, and mirror any of the GDELT datasets in any form. However, any use or redistribution of the data must include a citation to the GDELT Project and a link to this website (http://gdeltproject.org/).

It is hard to imagine a data resource getting any better than this!

PS: By late Spring 2015, the backfiles to 1979 will be available in GDELT 2.0 format. Maybe it can get better. 😉

“The idea of GDELT is how do we create a catalog, essentially, of everything that’s going on across the planet, each day,” Leetaru explained in a recent interview.

And now all of it is available in the cloud, for free, for anybody to analyze as they desire. Leetaru has partnered with Google, where he has been hosting GDELT for the past year, to make it available (here) as a public dataset that users can analyze directly with Google BigQuery. Previously, anyone interested in the data had to download the 100-gigabyte dataset and analyze it on their own machines. They still can, of course, and Leetaru recently built a catalog of recipes for various analyses and a BigQuery-based method for slicing off specific parts of the data.
…

See Derrick’s post for additional details.

When I previously wrote about GDELT it wasn’t available for querying with Google’s BigQuery. That should certainly improve access to this remarkable resource.

Perhaps intelligence gathering/analysis will become a cottage industry.