Anatomy of a Tweet

Are you interested in capturing social trends over Twitter? So are we! To give you a head start, in this post, we’ll look at how Twitter handles its own data.

Twitter uses the JSON format to store data and segregates it into meaningful fields. Attached to every tweet is the metadata, that is the information about the tweet, like who tweeted it and when. In fact, the 140 characters is just the tip of the iceberg. By capturing this metadata, you can get additional insight, for example, if a tweet was sent from a smartphone, and sometimes even get the user’s location. This and other associated information can support your analysis significantly.

In what follows, we’ll use one of my tweets as an example, and take a brief tour of the tweet metadata.

Sample tweet

When you fetch this tweet through the Twitter API, the data you’ll get will be something like this:

Anatomy of a tweet

Yes, Twitter uses a lot of fields! I’ve tried to group them together based on similarity. All right, let’s go through some of the major fields that can be useful in your research.

text: This field actually stores the tweet, so you definitely want to grab it.

'text': '@VasuNadella , see you tomorrow! #macromeasures',id:This is a unique number that serves as an identifier. Keep in mind that javascript will choke on such a large number. So, if you’re getting overflow problems, use the alternative ‘id_str’ field, which gives you the same thing, but as a string.Twitter doesn’t approve of publishing people’s tweets outside the platform, but one way to identify them (say, for the sake of reproducibility) is to give the tweet id.'id': 524019628847419392L,
'id_str': '524019628847419392',

created_at: Not surprisingly, this is the timestamp at which the tweet was created. By the way, Twitter stores all the timestamps in UTC and not in your local time.

'created_at': 'Mon Oct 20 02:09:49 +0000 2014',

entities:Twitter stores certain important features of a tweet separately to save you from having to extract them yourself. For example, hashtags, user-mentions, urls and symbols like the $ sign, can be conveniently found in the ‘entities’ field. You’ll probably find yourself using hashtags and other entities often, so getting them from ‘entities’ rather than the text saves a lot of time.

Being able to find the geographical location of a tweet can be key for some studies, since it lets you analyze local trends. Twitter saves this information in several fields.

geo:This contains the latitude and longitude pinpointing where the tweet came from. But the field has now been deprecated and replaced with coordinates.

'geo': None,

coordinates: This field contains the same information as ‘geo’ but in a GeoJson format. Unfortunately, this doesn’t show up in my tweet because I used a desktop machine. In fact less than 1% of tweets have proper location information, which limits their usefulness…

'coordinates': None,

place: Twitter has a database with place names and the associated coordinates. When you add a location to your tweet, Twitter retrieves this information using a place_id and tags it with the tweet.

Alright, now let us have a look at some of the user-level information in the metadata.

lang: This lang is different from the previous ‘lang’. It stores the language in which a user views the Twitter website.

location:This is the location that a user provides in their profile. We all know how inaccurate that can be, with blanks and silliness. It is still worth looking it as you might find more information than just the geo fields.

time_zone: One of 141 local time zones that a user has in their profile. While not giving you an accurate geographical location, time zones can help you to restrict your analysis to a certain region.

'lang': 'en'
'location': 'Montreal'
'time_zone': None,

Of course, you’ll also find other basic information about the user, like their follower count, following count, user name, user id, user screen name, etc.

That concludes our tour of tweet metadata! Now that we’ve seen some of the most important fields, you should be ready to tackle challenges with the help of Twitter! Twitter uses more than 30 fields and the metadata structure is evolving, so keep an eye out for new fields. And do remember to keep all the data anonymized and work within the ethical boundaries.

For more information, you can visit the Twitter documentation of the metadata here. Happy mining!