IBM can find you using non-geotagged tweets

A team of IBM researchers has developed an
algorithm that can infer the hometown of Twitter users by
analysing their last 200 non-geotagged tweets.

The vast majority of people do not use the geotagging function
on Twitter -- in fact less than one percent of all tweets are
geotagged. But do not be fooled into thinking that means your
location is secret. At least not to IBM researchers.

A team at IBM Research in Almaden California -- headed up by
Jalal Mahmud -- has found a way to analyse anyone's last 200 tweets
to find out the city they are in with 68 percent accuracy. In order
to do this, the team looked at geotagged tweets from between July
and August in 2011, focussing on those located in the top 100
cities in the US. They then selected 100 different Twitter users
from each city and downloaded the last 200 tweets they had posted.
When private users were removed, this provided a sample of
1,524,522 geotagged tweets generated by 9,551 users. They then
divided this data set in two, using 90 per cent of the tweets to
train their algorithm and the remaining ten per cent to test it
against.

The algorithm was designed to search for information relating to
the user's location, piecing together clues in order to predict the
most likely location. This included looking for references to
particular cities within tweets (cross referencing with a database
of US cities) and scanning links generated by Foursquare (6.6
percent of tweets), which would pinpoint the location of users. The
algorithm also looked for specific "local terms" that might help
locate the user, these include references to city-based sports
teams such as "Red Sox" (Boston-based) or references to a
particular state, such as "another sunny day in California", which
limits the number of cities. The team also factored in a heuristic
element to deduce which time zone the person was in, assuming that
tweeting behaviour follows certain patterns over the course of the
day.

Having trained the algorithm with 90 percent of the tweets, they
then tested it using the remaining ten percent. The team found that
they could predict -- in less than a second -- the hometown of
tweeters with 64 percent accuracy, rising to 68 percent accuracy
when they eliminated Twitter users who were obviously travelling.
They could predict a Twitter user's time zone accurately 80 percent
of the time.

The team thinks that it could improve upon the granularity of
the results to locate people within a particular neighbourhood, but
this will require the incorporation of more knowledge into the
prediction models, for example by adding a landmark database to the
mix. The plan is to improve the model to the point that it could be
used in streaming analytics applications. It could be useful, for
example, in disaster relief.