Is Twitter Starting to Make a Grab for the Interest Graph?

Over the last couple of years, I’ve dabbled with mapping parts of the interest graph defined by friends and follower relationships on Twitter. But with a couple of recent Twitter announcements, I’m starting to wonder if my ability to continue producing such maps will, to all and intents and purposes, soon be curtailed?

Here’s the story so far: the technique I’ve been using so far to generate interest maps relies on finding how the people followed by a particular individual follow each other, or who the common friends of the followers of a particular user are, or who the common friends of a set of hashtag or search term users are. This allows us to generate maps that show common interests of the friends of a given user, the followers of a user, or the users of a hashtag. Note that we interpret ‘interests’ loosely as named Twitter accounts that we can associate with a particular sort of interest. So for example, following @OrdnanceSurvey may be indicative of an interest in maps or mapping, or following @BBCr4today indicative of an interest in politics and current affairs, and so on.

As an example, here’s a map from a year or so ago of some of the accounts commonly followed by a sample of followers of the RedBullRacing account on Google+:

If you look closely, you can spot clusters (that is, groups of names close to each other) relating to Formula One teams, F1 drivers, performance cars, and so on…

The pattern of friends and followers in a social graph may thus be interpreted as some sort of interest graph. I personally find these maps to be, in and of themselves, interesting, in much the same way some folk take interest in, and pleasure from looking at, cartographic maps.

Today we’re taking an important next step by allowing you to target your Promoted Tweets and Promoted Accounts campaigns to a set of interests that you explicitly choose. By targeting people’s topical interests, you will be able to connect with a greater number of users and deliver tailored messages to people who are more likely to engage with your Tweets.
…
There are two flavors of interest targeting. For broader reach, you can target more than 350 interest categories, ranging from Education to Home and Garden to Investing to Soccer.
…
If you want to target more precise sets of users, you can create custom segments by specifying certain @​usernames that are relevant to the product, event or initiative you are looking to promote. Custom segments let you reach users with similar interests to that @​username’s followers; they do not let you specifically target the followers of that @​username. If you’re promoting your indie band’s next tour, you can create a custom audience by adding @​usernames of related bands, thus targeting users with the same taste in music. This new feature will help you reach beyond your followers and users with similar interests, and target the most relevant audience for your campaign.

Let’s try to unpick that: [c]ustom segments let you reach users with similar interests to that @​username’s followers. In my naive definition, my first attempt to implement this would go something along the lines of: custom segments let you reach users who tend to follow similar people to the people followed en masse by that @​username’s followers. That is, I would position each @username in an interest space defined by people commonly followed by the followers of @username, and then target people who tend to follow the people disproportionately represented in that interest space compared to some sort of “baseline” representation of intereests of a more general population. I’ve no idea how Twitter do the targeting, but that would be my first step.

If targeted advertising is Twitter’s money play, then it’s obviously in their interest to keep hold of the data juice that lets them define audiences by interest. Which is to say, they need to keep the structure of the graph as closed as possible. [UPDATE – it seems as if Twitter is starting/continuing to block other social networks’ access to it’s social graph data…]

Unlike Facebook – which limits users to seeing friendships between their friends (or, painfully in terms of API calls, test whether friendship connections appear between two specified individuals), or LinkedIn, which doesn’t let you get hold of any data about how your friends connect other than in graphical form using InMaps (Visualize your LinkedIn network with InMaps), Twitter makes friend and follower data publicly available (unless you have a protected account). (If you want to visualise your own Facebook network, here’s a recipe for doing so: Getting Started With The Gephi Network Visualisation App – My Facebook Network.) Google+ also makes an indivudal’s connection data public in a roundabout way, although not via an easily accessed API (to access the graph data as data, you need to scrape it, as described here: So Where Am I Socially Situated on Google+?).

However, some changes have recently been announced relating to the Twitter API that look likely to limit the extent to which we can sample anything other than small, fragmentary snapshots of the interest graph: Changes coming in Version 1.1 of the Twitter API. In particular, the new 1.1 version of the Twitter API will apply differential rate limiting to different Twitter API endpoints, compared to the current limit of 350 API calls per hour summed across all API endpoints. For the last few years I’ve been lucky enough to benefit from a whitelisted Twitter API key (granted for research purposes) that has allowed me 20,000 calls per hour. (I’ve only ever used a fraction of the total possible number of API calls I could have made over that period, but on occasion have made several thousand calls over a short period to grab the friends or followers lists of hundreds or low thousands of users that I then use to generate the social interest maps.)

What my whitelisted API key allowed, and what the original 350 calls per hour limit for users not grandfathered in to the whitelisted key limit allowed, was the ability to grab the friends (or followers) lists from at least of the order of hundreds of users per hour. My own a hoc experiments suggest that sampling the friends of 500 or so followers of a particular account that may have tens or hundreds of thousands of followers, and then looking for accounts followed by 10-20% of that follower sample, gives an idea of the social interest positioning of the target account. Which means with a 350 API call limit per hour, you could generate at least one guesstimate interest positioning map every couple of hours. (With my 20k limit, I could generate several, much larger sample-size maps, per hour.) However, if the whitelisted key limit does not continue to be offered under version 1.1 of the Twitter API, and if rate limiting of 60 calls per hour to the friends/follower list endpoints is enforced (as looks likely?), this means that we’d only be able to generate one or two small-ish sample interest maps per day, and to run larger sample size maps, we’f have to max out hourly friends/follower list collection API calls 24 hours a day for several days in order to collect the data. Which is a pain. And a de facto block on the harvesting of graph data for the purposes of generating interest maps. (Which is why I’m hoping against hope my whitelisted API access continues! Though I am starting to feel as if I have squandered it, and either should have built a proper business around it and milked it for all it was worth, or sold the key on;-)

PS This looks like it could lead to the second major loss of “interesting” functionality relating to “derived” services around Twitter that I can think of, the first being the disappearance of services like the Twist search tool, that used to show volume trends for keywords as used on Twitter over the previous week:

We can still pick up webrhythms from tools such as Google Insights for Search, cultural rhythms from things like Google Books NGrams, and possible correlates using tools like Google Correlate, but the Twitter rhythms gave a much more visceral account of daily life than the trends that things like Google Trends can detect.

One of the things I find really disheartening about the web, even in its short lifetime to date, is the way that access to these large scale behavioural patterns is promissed as services start to scale up, but then disappear again as companies start to lock down their services or reign in access to the data, or the ad hoc third party services fail to scale (perhaps because there isn’t widespread enough interest to keep them ticking over?). Such is life, I suppose…

The odd thing is Twitter doesn’t want client apps and yet the new limits suit a client more than a specific analysis tool. We do similar analysis to you and need lots of calls for only a few methods. WE want to be able to use 350/h on one method, not 60/h for method 1, 60/h for method 2 etc. That is more in line with what a Tweetbot would need.