Fruit or mobile device: learning concepts through connections

Preview of insights shared at upcoming session at Strata Santa Clara

Social media gives us the power to share content and engage with a wide range of internet users. As a person or brand, we are often concerned with who we are talking to and how we can better serve our viewers. Traditional demographics such as ‘female’ and ‘25-30’ are no longer sufficient in this arena. For example, Google is having a hard time getting gender and age correct for ad preferences. It is more interesting to observe what content is consumed and how attention changes over time.

Bitly, which is used to shorten and share links, can offer insight into this space. This means the data has an unprecedented view into what people are sharing and has a holistic view of what users are concerned about on the internet.

We use their data to look into how we can define the audience of different content. The simplest example of this is: given a group of users that click on “oreilly.com”, what other websites do they engage with. We now have what bitly calls a co-click graph. Domains are represented as nodes while edges between nodes represent the number of people that have clicked on each domain. A co-click graph can be made to represent any number of attributes, but for now we are going to remain interested in topics and keywords.

In the previous graph, you can see where people who visit oreilly.com go on the internet. Not only are there fun, play websites but also more serious news and techie outlets like “theatlantic.com” and “wired.com”. This is not the full coclick graph for O’Reilly, but shows the largest attention getters so far for the month of February. It is awesome to see that we do get our tech news from different sites and still enjoy some creative freedom on what we look at. From this graph we can determine the audience of oreilly.com and can help them find partners to support and coordinate news with.

This is already interesting, and it’s the most basic kind of analysis that we can do with this data. Next, we’ll show you some interesting connections we found. For example, Russians mostly stick to Russian websites while German is more integrated with english tech topics. In the following graph we can see how keywords cluster together when the edge weight becomes the number of urls that phrases have in common.

This clustering highlights different concepts that are represented on the internet. This data is just from one day but shows social networks sticking together by being physically close to mobile phones and media such as videos, games, and photos in the graph. What also shows up are groups that represent news stories: Egypt and Syria end up close to all the technology clusters. If you click through to the larger groups, you will see even more clusters that represent different sports (both rumors and matches), President Obama, and even porn.

This is a step towards getting at major themes that can change with current events and reflect the attention of the internet’s eye. Over time, the words and clusters grow, migrate and even disappear as they no longer become relevant.

To learn more about how the process behind creating clusters and using them in products, I will be speaking at Strata!

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.

Featured Video

Is Privacy Becoming a Luxury Good? Julia Angwin discusses how much she has spent trying to protect her privacy, and raises the question of whether we want to live in a society where only the rich can buy their way out of ubiquitous surveillance.