Day 10: Exploring folklore Thursday using rtweet

by Danielle Navarro, 06 May 2018

I have several colleagues and friends who use social media as a data source, and I’ve always wondered how they get the data and if I could do the same using R. The Twitter API, for instance, allows you some (limited) access to tweets, and in the past I have played around with the twitteR package to set up an R based twitter client. The rtweet package seems to be a slightly more recent version of the same thing? So let’s try it out!

First off, I created a twitter app. There are good instructions on how to do this in the package vignette, or alternatively there’s this post that provides a quick setup guide. It did require me to enter my mobile phone number into twitter, and I found that I had to disable callback locking (in the check boxes) in order to get an access token. But after I’d done that, this command worked perfectly:

Once you have the token you can save it locally so that you don’t have to keep looking up your consumer key and secret in every session. I didn’t bother because I’m just playing around with this and not likely to use it again in the near future

Searching twitter for folklore

Besides #Rstats, my favourite hashtag on twitter is #FolkloreThursday. So I used the search_tweets function to find 1000 tweets that reference the hashtag.

tw <- search_tweets(
q = "#FolkloreThursday",
n = 1000
)

The result is a tibble with 1000 rows and 42 columns summarising the tweets:

The media_url variable in the tw tibble contains links to any images included in the tweets. So if I wanted to pull all of the images from the tweets, I could do something like this. First find all of the image URLs:

urls <- unlist(tw$media_url)
urls <- urls[!is.na(urls)]

Next, I created a vector of locations to save the images inside a convenient “tweets” folder

This produces a bunch of images, some prosaic, some wonderful, but – as is always the case with Folklore Thursday – all kind of interesting. To give the original tweeters credit for their posts, instead of showing the automatically downloaded images, here are two lovely tweets by Alexandra Epps that I discovered by browsing the images…

A puzzle

I feel like I’m still missing something important about how to pull information from the tweets though. One of the more endearing tweets that I found in my data set was this one:

Fretrúnir - FART-RUNESOne of the strangest (& most horrible) curses recorded in Icelandic sorcery involves casting a #FartRunes spell.In 1654 one man was burnt at the stake after admitting casting Fretrúnir on a local girl. (Lots more info in comments thread)#FolkloreThursdaypic.twitter.com/jjb6ngo0MQ

None of the URL fields refer to the image at all. My first hypothesis was that the retweet (which is what my client actually found) doesn’t record the URL of the image? But that doesn’t make a lot of sense - the tweets for which the tw tibble did record the image URL were retweets too. My next thought was to scan the text of the tweets and compare that to whether there was a URL listed anywhere

## [1] "Fretrúnir - FART-RUNES\nOne of the strangest (&amp; most horrible) curses recorded in Icelandic sorcery involves casting a #FartRunes spell.\nIn 1654 one man was burnt at the stake after admitting casting Fretrúnir on a local girl. (Lots more info in comments thread)\n#FolkloreThursday https://t.co/jjb6ngo0MQ"

The URL at the end takes you to the original tweet (i.e.,the one I embedded above). Presumably if I then used the rtweet client to download the original tweet, it would have the links to the images?

This seems to happen quite a bit. One of the tweets in tw is a retweet of this one:

Shānguǐ (山鬼). Literally “mountain ghost.” a jilted lover in the classical Nine Songs, she developed over time into a woman living in the wilderness, married to a red leopard or a tiger. Considered the goddess of Mt. Wū (巫山神女). Painted by Hwa San-chiuen.#FolkloreThursdaypic.twitter.com/nh77sOsZi7

So I think there is something going involving truncation with retweets, but that’s maybe not the whole story? I’m not quite sure if there’s an easy fix – is there some argument I can pass to the Twitter API that would return the image links within the original tweet, or would that require a different query? I have a suspicion that Twitter might not let me pull this information? In any case, I’ve run short on time, so I’ll leave that to another time.

Wrapping up

This was fun! I guess I have a lot to learn about the Twitter API (note to self: here) but it’s definitely enjoyable!

Postscript

After doing a little more reading, it’s worth noting that the Twitter terms of service (as of May 6th!) do allow you to crawl the site, so long as the queries respect the robots.txt file; for creating derivative works (which, arguably, any data set compiled in the manner I did so here), you must adhere to the Developer Agreement and Developer Policy. I think that what I’ve posted here is consistent with these - I try to take ToS provisions seriously 😄