Nutritional Science

Menu

An analysis of Prop37 (GMO) twitter activism

The election this November had an important topic on California ballots: Proposition 37, which would require the labeling of foods containing genetically modified ingredients. It ended up failing by just under 3% margin. While I didn’t support the initiative (though I don’t live in California), it was fascinating to observe both sides of the campaigns and activists on Twitter leading up to and after the election. This was an opportune time to get a glimpse at GMO activism, so I collected tweets for some basic analysis.

Methods

The tweets were collected and added to a CSV file by some Python code that I wrote, using Twitter’s streaming API. The hashtags/keywords that were followed were: #prop37, #noon37, #yeson37, GMO, prop 37, #labelgmo, and #righttoknow. I collected the: tweet, ID, datestamp, username, if a RT, user Bio from profile, profile link, follower/following/listed counts, total tweet count, and location if listed in the profile. The total dataset consisted of 253,861 tweets from October 25 to December 6. It is uploaded here if others want to use it. I decided to limit analyses to only tweets containing prop37, #prop37, #yeson37, or #noon37. This yielded 55,537 tweets that I played around with in R.

Describing the Tweeters

Here is a graph of the frequency of tweets with these hashtags by date. As you can see my internet went out several times for significant periods, but I don’t expect it to change much. I also collected the locations of each person tweeting if they set one in their profile. With these, I geocoded them to lat/long coordinates using the google maps API and did a couple plots (includes retweets). These are not adjusted for population density, but still they show much of the activity in California as expected:Here is a density map of the world, of about 60% of the locations (for whatever reason R couldn’t handle more on my computer). The log of the tweet count makes it a little easier to distinguish:

Frequencies

The main reason I collected the data is to try to attempt to see who is driving Prop37 activism on twitter. So I ran a number of frequencies to describe the population. On average, people tweeting with the prop37 hashtag tweeted almost 3 times, and the person who tweeted most did so 339 times.

Average

Median

SD

Max

2.9

1

9.8

339

If we run a frequency of the tweets, we see that most tweets were original or were not retweeted much. One tweet was retweeted 442 times.

Average

Median

SD

Max

1.6

1

5

442

As the 2 hashtags promoted by each campaign were #yeson37 and #noon37, I extracted the tweet count that contained these. The result was surprisingly 1-sided (includes all retweets). However as I show later, the #yeson37 count is artificially inflated by fake accounts.

#yeson37

#noon37

31,571

524

And if you look through the #noon37 tweets, many of them are clearly in favor of Prop 37 but just adding both hashtags. It seems like this dataset is almost completely representative of people in support of Prop37. To get a more accurate picture, I wrote a script to randomly poll a sample of the people who added #prop37 to their tweet to see which side they supported, but my account was quickly banned by twitter. I also wanted to poll out of the overall twitter feed those people who list California in their location to see if this could predict the election results. If anyone knows if twitter makes exceptions to do things like this let me know, but I assume not.

So now lets look at the most frequent Prop 37 tweeters. Here are the top 20 after removing fake accounts (read further to see how I determined this). For the top 5, I went through some of the tweets and picked an example of poor information. This is cherry picking, but they are so egregious that it suggests a pattern.

A retweet of the official account: “RT @CARightToKnow: Could GM foods be responsible for record low birth rate in the US? #LabelGMOs #YesOn37 t.co/MWf1HdGD“

Earthnik

233

This links to a youtube video that says GM food are poison and don’t work. The science says otherwise: “Seriously … here’s the actual truth about GMOs t.co/NGPTIVkb #YesOn37 #LabelGMOs”

bookieboo

215

Many tweets that state GM foods are harmful to health/children. Clear why she thinks this though (Jeffrey Smith is about the worst source you can find): “I’m going to be tweeting what Jeffrey Smith says. Go to t.co/FjC3pMQT to see Genetic Roulette #Mamavation #Yeson37″

Due to evidence suggesting that the official campaign promoting prop 37 purchased followers, I was on alert for other abnormalities. It quickly became obvious that there were a large number of fake accounts (with zero followers and following counts) tweeting the same thing over and over on the #yeson37 hashtag- and linking to the campaign’s website. So after expanding the links of all tweets, I counted how many contained #yeson37, contained a link to the campaign website (carighttoknow.org), and had 0 followers: 10,209 from 965 different accounts! It appears that someone (the campaign denies it)- paid for a huge number of fake accounts to tweet the website. If you look through the tweets, they almost look normal, but the screennames are all names with random numbers at the end, are highly repetitive, and include various keywords or hashtags that were trending at the time. This is likely for 2 reasons: 1) to try to get their website out on various popular hashtags to increase awareness, 2) twitter doesn’t allow you to tweet the same thing multiple times within a short time period, so using trending hashtags would slightly change the tweets. Very shady stuff, and all of this is grounds for suspending of accounts. Annoyingly, it made more work for me. So I removed all tweets with follower and following counts of 0 (10,527), leaving 22,578. I used these to explore what links were being tweeted and for associations. Here is additional proof: by plotting the time of each tweet for each screenname, we see that most only tweet a few times and are then suspended. Below that plot is a plot of non-fake accounts.

Sources/Links

I wrote some code to expand all links and running a frequency on them revealed pretty poor top information sources. Here are the top 50. Many suggest health risks from GMO, ignoring consensus, or propagate erroneous stories. For the top 15, I added some notes.

I ran some word associations to see what terms appeared together most frequently. Because I don’t have much RAM and R was struggling, I took a random sample of 5,000 tweets of the 44,561 (full dataset minus fake accounts) for this.

Here are some associations (values are how often the words occur together, x100 for %):

“prop37″

prop37

california

pass

defeat

fight

labeling

defeated

spending

companies

monsanto

1.00

0.12

0.12

0.10

0.10

0.09

0.08

0.08

0.07

0.07

“gmo”

gmo

labeling

labels

demand

parent

infertility

soy

baby

death

monsanto

1.00

0.30

0.17

0.15

0.15

0.14

0.14

0.12

0.12

0.11

So, some evidence that people who wrote GMO in their tweets tend to think they are harmful to health, and not surprisingly Monsanto was a popular word. We can dig a bit further:

“health”

health

risks

cancer

vitamin

activism

duped

terrorism

harming

cleanse

hazards

1.00

0.19

0.18

0.17

0.16

0.16

0.14

0.13

0.11

0.11

“Terrorism”- really??

“monsanto”

monsanto

spending

control

dupont

chemical

dow

fight

banned

india

dangers

1.00

0.23

0.22

0.18

0.17

0.17

0.16

0.15

0.15

0.11

Sentiment

I ran each tweet through the sentiment analysis method described here. It scores each tweet by the net number of positive and negative words that it matches from a list here. The average was just above neutral, because half the tweets were neutral. Interestingly, as I analyzed the ratios of increasing sentiment (positivity and negativity), the tweets became more negative.

Average (SD)

Neutral Tweets

Ratio of +1 to -1

Ratio of +2 to -2

Ratio of +3 to -3

Ratio of +4 to -4

0.16 (1.02)

26,394 (48%)

1.91

1.34

0.94

0.54

This is easier to visualize:

Conclusions

Though it is difficult to draw overarching conclusions without more manual classifying of tweets, tweeters, and sources, and those that I identified (the top ranking of each) may just represent a small portion of the overall activism of Prop 37, they paint a picture of misinformation. Although some in favor of Prop 37 just don’t want corporations controlling the food supply, the top tweeters and sources of information clearly think there are health risks to GM foods. The official “Yes on 37″ campaign did nothing as far as I could tell to correct this thinking and in fact promoted it at times.

So that is just the tip of the iceberg of what could be done with this data I’m sure, and if I come across new ways of digging around I will update. Let me know if you have any ideas. I will also note that Becca Harrison is working on some qualitative analysis of prop 37 tweets, so those results will be more interesting!

Post navigation

Would it be possible to use Topsy searches as a data source? I wonder if that would allow you to get away with automatic surveys. @StealthMountain has been around for ages without getting banned, so there has to be a way around the rule you ran into.

http://www.nutsci.org Colby

Yeah I don’t understand why some accounts like that can tweet so often @ people and not get banned- maybe it is because of the large follower count?

I could search Topsy but my account was banned because I was sending random people messages and asking for their position on Prop37. They might have reported me as spam but I assume twitter takes into account other factors like how old the account is, follower count, if there are links, if it tweets the same thing over and over, etc.

http://twitter.com/mem_somerville mem_somerville

Although I had heard about the follower-buying in the Romney campaign, I really didn’t expect to see this sort of thing much. But if you are willing to fake followers, what else are you willing to fake? I find it really repulsive.

The good news is that it appears it didn’t matter or have much value if they lost anyway.

Archives

Awards/Honors

Some of my posts relating to agriculture are featured at Biofortified.org, an evidence-based group blog that is also indexed by Google News. It has great potential to challenge mainstream ignorance on agriculture.