Category Archives: Social Media Sources

Post navigation

Gnip is one of the world’s largest and most trusted providers of social data. We partnered with Twitter four years ago to make it easier for organizations to realize the benefits of analyzing data across every public Tweet. The results have exceeded our wildest expectations. We have delivered more than 2.3 trillion Tweets to customers in 42 countries who use those Tweets to provide insights to a multitude of industries including business intelligence, marketing, finance, professional services, and public relations.

Today I’m pleased to announce that Twitter has agreed to acquire Gnip! Combining forces with Twitter allows us to go much faster and much deeper. We’ll be able to support a broader set of use cases across a diverse set of users including brands, universities, agencies, and developers big and small. Joining Twitter also provides us access to resources and infrastructure to scale to the next level and offer new products and solutions.

This acquisition signals clear recognition that investments in social data are healthier than ever. Our customers can continue to build and innovate on one of the world’s largest and most trusted providers of social data and the foundation for innovation is now even stronger. We will continue to serve you with the best data products available and will be introducing new offerings with Twitter to better meet your needs and help you continue to deliver truly innovative solutions.

Finally, a huge thank you to the team at Gnip who have poured their hearts and souls into this business over the last 6 years. My thanks to them for all the work they’ve done to get us to this point.

We are excited for this next step and look forward to sharing more with you in the coming months. Stay tuned!

After a couple exciting years in social finance and some major events, we’re back with an update to our previous paper “Social Media in Markets: The New Frontier”. We’re excited to be able to provide this broad update on a rapidly evolving and increasingly important segment of financial service.

Social media analytics for finance has lagged brand analytics by 3 to 4 years despite being an enormous potential for profit through investing based on social insights. Our whitepaper explains why that gap has existed and what has changed in the social media ecosystem that is causing that gap to close. Twitter conversation around tagged equities has grown by more than 500% since 2011. The whitepaper explores what that means for investors.

We examine the finance specific tools that have emerged as well as outline a framework for unlocking the value in social data for tools that are yet to be created. Then we provide an overview of changes in academic research, social content, and social analytics for finance providers that will help financial firms figure out how to capitalize on opportunities to generate alpha.

Like a child’s first steps or your first experiment with pop rocks candy, the first ever Tweet went down in the Internet history books eight years ago today. On March 21st, 2006, Jack Dorsey, co-founder of Twitter published this.

It has become the digital watering hole, the newsroom, the customer service do’s and don’ts, a place to store your witty jargon that would just be weird to say openly at your desk. And then there is that overly happy person you thought couldn’t actually exist, standing in front of you in line, and you just favorited their selfie #blessed. Well, this is awkward.

Just eight months after their release, the company made a sweeping entrance into SXSW 2007 sparking the platforms usage to balloon from 20,000 to 60,000 Tweets per day. Thus beginning the era of our public everyday lives being archived in 140 character tidbits. The manual “RT” turned into a click of a button, and favorites became the digital head nod. I see you.

In April 2009, Twitter launched the Trending Topics sidebar, identifying popular current world events and modish hashtags. Verified accounts became available that summer; Athletes, actors, and icons alike began to display the “verified account” tag on their Twitter pages. This increasingly became a necessity in recognizing the real Miley Cyrus vs. Justin Bieber. If differences do exist.

The Twitter Firehose launched in March 2010. By giving Gnip access, a new door had opened into the social data industry and come November, filtered access to social data was born. Twitter turned to Gnip to be their first partner serving the commercial market. By offering complete access to the full firehose of publicly-available Tweets under enterprise terms, this partnership enabled companies to build more advanced analytics solutions with the knowledge that they would have ongoing access to the underlying data. This was a key inflection point in the growth of the social data ecosystem. By April, Gnip played a key role in the delivering past and future Twitter data to the Library of Congress for historic preservation in the archives.

July 31, 2010, Twitter hit their 20 billionth Tweet milestone, or as we like to call it, twilestone. It is the platform of hashtags and Retweets, celebrities and nobodies, at-replies, political rants, entertainment 411 and “pics or it didn’t happen.” By June 1st, 2011, Twitter allowed just that as it broke into the photo sharing space, allowing users to upload their photos straight to their personal handle.

One of the most highly requested features was the ability to get historical Tweets. In March 2012, Gnip delivered just that by making every public Tweet available starting from March 21, 2006 by Mr. Dorsey himself.

Fast forward 8 years, Twitter is reporting over 500 million Tweets per day. That’s more than 25,000 times the amount of Tweets-per-day in just 8 years! With over 2 billion accounts, over a quarter of the world’s population, Twitter ranks high among the top websites visited everyday. Here’s to the times where we write our Twitter handle on our conference name tags instead of our birth names, and prefer to be tweeted at than texted. Voicemails? Ain’t nobody got time for that.

Twitter launched a special surprise for its 8th birthday. Want to check out your first tweet?

Last week we talked about tracking SXSW from 2007 to 2012 using Gnip’s Historical PowerTrack for Twitter. This gave us insight into year-over-year trends in SXSW Tweets and now we’re going to look at how SXSW trends have changed over time.

With every square inch of Austin packed with the social media influential, SXSW provides an interesting avenue to examine trends, big and small, to see what people are talking about on Twitter. Now that companies can use Gnip’s Historical PowerTrack for Twitter to baseline events, it provides a whole another avenue to determine trends.

Party vs. Panel
People have such a love/hate relationship with SXSW. Some people love it for its networking opportunities and great sessions, while other people decry it as one giant party. Letting the data speak for the truth, it seems that in earlier years of the conference, people came for the panels and hopefully to learn something from their peers. But by 2011, the word “party” overtook those interested in “panel” by more than 10,000 Tweets. People were talking about the best places to meet people rather than the best places to learn. That same year, there were 13,072 mentions of the word RSVP in SXSW Tweets talking about plans to find the best parties and likely indulging in the practice of RSVPing for 136 events and actually attending 12 of those events.

Geo-location Wars
While Twitter is useful for helping understand how cultural events are changing, the use cases extend further into helping understand the rise and fall of startups. With the launch of Foursquare and Gowalla at SXSW in 2009, it was the beginning of the so-called geo-location wars. Many people have wondered how Foursquare ended up the winner, and SXSW provides interesting insight into how Foursquare came out on top. Back in 2009, if you looked at SXSW Tweets, it would tell you it was anyone’s game because surprisingly Foursquare only received a little more 100 Tweets than Gowalla. By 2010, Foursquare had been more clearly marked as the winner with Foursquare receiving nearly double the Tweets that Gowalla was receiving. At that point, everyone was still writing posts to determine the pros and cons of each service, but the social data was clear — Foursquare had the buzz that year in part to their ability to easily publish updates, badges and mayorships on Twitter and perhaps even their rogue game of Foursquare outside the Convention Center. By 2011, Foursquare had completely suckerpunched Gowalla with Foursquare receiving the lion’s share of public voice receiving nearly 65,000 Tweets to Gowalla’s nearly 8,000 Tweets. By the end of 2011, Facebook ended up making an acqui-hire for the Gowalla team.

BBQ vs. Tacos
This next trend might seem silly, who cares if more people are interested in BBQ or Tacos? I mean, what significant impact can this social data have? But if you’re a restaurant chain or looking to start a new franchise chain, it would be interesting to know about cultural food trends such as the rise of cupcakes as it is happening.

While many have long suspected that Austin was a BBQ kind of town, the social data has shown that at the last SXSW, Tacos overtook BBQ as the most talked about grub to grab. More data science would have to be done to determine if the Taco is becoming a more widestream cultural trend, but when all other Tweet volumes were falling in 2012, the term Tacos was charging full-steam ahead.

This is just the beginning of what social data can tell companies about trends and market research. We think historical social data will provide invaluable to market research with the sheer volume of conversations that are happening on Twitter.

Last year, Gnip and StockTwits partnered in order to bring this incredible social finance conversation to market – in both real-time and historical products. Given that relationship, we’ve worked closely with StockTwits and have been consistently impressed with how the trading community platform they’ve been building has progressed.

So this last week it was incredibly exciting to attend Stocktoberfest on Coronado Island, and to watch other attendees get a look into just how far this awesome product has come. What really sunk in for me personally was the extent to which the platform has evolved and how leveraged it now is.

In order to get a platform to actually work, you need a few things: community critical mass (check!), producers (check!), and consumers (check!). StockTwits has done an incredible job engaging the community directly on-site, and the latest API allows app developers to weave highly focused (by equity or currency) conversations, bi-directionally, into their apps in real-time.

From a community standpoint, checkout what StockTwits has pulled together:

The Financial charting and equity analysis ecosystem is vast, and because of the work StockTwits has done, that ecosystem can incorporate relevant social conversation directly into platforms and products. By doing so, players in this space can take advantage of the community StockTwits has already built, which lifts their tools, products, and platforms into social. Building community, whether focused or broad, is obviously a hard nut to crack. The more integrations you have, the more the overall community benefits; everybody wins. ChartIQ is one beautiful example of a product leveraging the StockTwits platform that proves this point.

I left Stocktoberfest giddy over where this stream is now. Seeing, and talking to, so many of the StockTwits platform partners was just awesome!

The impact “social” is having across the Finance space is tremendous; as is StockTwits’ role in it. We’re excited for the next few months and what our continuing work with them will bring.

Union Metrics has been with Gnip since the early days, using our social data in their flagship product, TweetReach. Earlier this year, when we announced the availability of social data from Tumblr, we were excited that Union Metrics moved quickly to start building a new product based on that data. Last week, Union Metrics launched Union Metrics for Tumblr and was named Tumblr’s preferred analytics provider.

We are proud to announce that, for the first time, access to the entire historical archive of public Tweets, dating back to @Jack’s very first Tweet more than 7 years ago, is now available via our new product, Historical PowerTrack for Twitter. This product has been years in the making, and we can’t wait to see what the world will build with this data.

We believe that social data has unlimited value and near limitless application. The nature (fast & viral) and newness of social conversations has naturally directed focus to realtime applications. However, as the world becomes more reliant on realtime social data and the amount of social data created grows exponentially, the need to put this information into historical context has become increasingly important. Often, companies are considering the realtime reaction in social data and asking “is this good or bad?” This is one of the main questions historical data can answer. For example, if an auto manufacturer launches a new model and 25% of the social conversation is determined to be negative, is that healthy? Knowing that the last model launched to record sales & had 40% negativity helps put the new realtime data into context.

Historical data can also be highly informative to predictions about the future. Researchers have suggested to us that they can predict the outcome of a revolution by studying past revolutions online such as the “Arab Spring”. Likewise, we’re seeing hedge funds make a real commitment to incorporating social data into their trading algorithms. It is critical for these funds to be able to refine their predictive trading models by studying vast quantities of historical data.

Until now, all this promise of social data has had a foundational limitation: very little reliable and complete historical data has been available. And as we know, historical analysis is only as good as the quality of the underlying data. You can’t provide complete context if you only have part of the data. That’s why we are so excited to be the first company to offer complete coverage of all public Tweets from the beginning of time.

We’re able to deliver the full historical corpus via our long-standing partnership with Twitter. We helped Twitter deliver the full archive of Tweets to the Library of Congress. That was a massive effort that took a long time. The rest of the social data ecosystem can benefit from that effort starting today.

This level of access has never been available and we know it is really going to accelerate the rate of innovation going forward. We think there are new products and businesses that will now be possible with access to a “social layer” of historical data. We frequently ask ourselves “If you could know what the world was saying at any moment in time about any topic, what could you build?”

We’ve already been working with companies like Esri, Union Metrics, Brandwatch, Waggener Edstrom Worldwide, and Texifter during our early access period and it’s been incredible to see how fast they are innovating with this new data.

Gnip aspires to be the source of record for all public conversation. That’s a lofty goal. We’re taking a major step forward with today’s announcement.

Want to learn more about Historical PowerTrack for Twitter? Email info@gnip.com.

While looking at the speakers for the International Conference on Weblogs and Social Media, the premier academic conference for social media, I stumbled across the research of Lada Adamic. Not only was Lada one of the keynote speakers for the conference, her research at the University of Michigan was just plain awesome. Lada’s research included understanding commonly used ingredient substituions from the 40,000 recipes in Allrecipes.com, understanding how peers rate each other on Couchsurfing, Facebook memes, and more. You can check out all of her research on ladamic.com, follow her on Twitter at @ladamic and be sure to check out her hilarious blog.

1. Your background focuses on networks and how information spreads. You’ve done multiple projects with different data sources, what are some of the overarching trends you’ve seen?

The only sure thing is the unpredictability of information in a network. Sure, in aggregate some information will go viral, while most will not, but predicting what will go where, that’s not so simple. To complicate matters further, information is not only diffusing, but also evolving, while concurrently spurring changes in the social network itself. One trend I do keep seeing is that social networks’ greatest boosting effect is in the niche. There are lots of ways to find out about something widely popular, but information about that curious interest that you and your friends share — that is more likely to come through your friends.

2. What information do you get from looking at networks vs all the other sources you use?

I think it’s more a question of whether there are any data that I don’t try to represent as networks! All I have to do is identify connections between entities in the data, and presto, I have a network. It’s the structure of these connections that can turn up fascinating results: identifying experts from their online interactions, predicting which recipe is going to be rated more highly, or understanding the structure of federal law from the way it’s strung together.

3. What is useful, difficult and unique about connections found in social data?

Well, you’re dealing with data by and about humans. Humans are difficult. Humans interacting with other humans, that’s complicated… but also highly informative, because a lot of human interaction is about informing one another. And as they inform one another about what’s worthwhile, their location, their mood, etc., that data can be harnessed to detect trends and patterns in human behavior. And perhaps precisely because this data is so rich and powerful, it is important to be mindful of privacy.

4. You were able to determine commonly used ingredient substitutions by looking at 40,000 recipes from Allrecipes.com. How much did the comments in the recipes help determine substitutions and what other insights do you think could be pulled from recipe comments?

In the research paper we relied entirely on the comments in the user-supplied recipe reviews to figure out how often cooks substituted one ingredient for another in a recipe, whether ingredients can be cut or omitted, and, crucially, whether the recipe needs more or less garlic (our data showed, usually, more). Untapped kinds of information included in the reviews include who the recipe was a hit with (the kids, the husband etc.) and vetted improvements, e.g. “I put the dough in the fridge for 2 hours as the other reviewer suggested…”. I think this is a really fun example of harnessing our collective intelligence. Instead of each cook tweaking recipes in their own kitchen and sharing their recommendations with a few friends, now we can gather millions of tweaks and start to understand food and cooking systematically.

5. You’ve used data from a wide variety of sources including Couchsurfing.com, Allrecipes.com, Facebook, etc. What do you look for in a data source?

I’m not too discriminating about data, though sometimes I have a question that only certain data can answer. For example, when my husband and I first started dating, I defended my reluctance to watch Sci-Fi movies by pulling their ratings distribution from the IMDb. On an only slightly more serious note, I turned to online recipes because they comprised lots of data about something that I had no clue about: cooking.

Other times you just know the data is good even if your questions about it are not (yet). Such was the case with the CouchSurfing dataset, which encompassed anonymized user-to-user trust and friendship ratings. The data was so rich, that even our initial stumbling steps led to some interesting results about rating human relationships. But it wasn’t until the 2nd and 3rd paper that we really got a handle on how the visibility of the ratings skews them, and some more fundamental insights about the relationship between friendship and trust that are rendered beautifully evident in such a large data set.

6. What study have you done that has surprised you the most? What projects do you see in the future that you think academia should focus on to better understand social data?

Some nice surprises actually came up as I was gathering data for my statistics class. When the Economist published an article about the U-curve of happiness vs. age, I thought, wait a sec, we see the same curve in CouchSurfing ratings: people in their 30s & 40s rate and are rated less enthusiastically than those either younger or older. Then my statistics class used the American Time Use Survey to see how much sleep people were getting, and it was the same curve. Coincidence? I think not!

Another happiness vs. age trend came up in the Adolescent Health data, also analyzed in my stats class. Teens having sex in 8th and 9th grade were less happy on average than their peers who were abstaining, but by senior year, the relationship was reversed. It goes to show that you never know which underused columns in existing data sets hold fun statistics (we also explored the “cheerleading”, “math team” and “wears braces” columns…).

To answer the second question: researchers have only started to take advantage of the abundance of social data. There are many long-standing questions in sociology that were previously studied in small groups. Now these questions can be tested on very large data, just at the time when we really do need to understand how they pertain to changing social interactions as they shift online. Among the questions I’m personally interested in are how online social networks shape media consumption, and how information evolves in social networks.

I should mention that the crucial bottleneck for academics doing this kind of research is access to the data. GNIP is certainly part of the solution (you guys have academic discounts, right?). To anyone else who has interesting data, please consider sharing it with data-starved academics.

Thanks to Lada for her interview (and yes, we’re looking at partner programs for academic researchers!). If you have any other suggestions for Data Stories, please leave a comment.

Sherry Emery is a Senior Research Scientist at the UIC Institute for Health Research and Policy focusing on understanding how both traditional and new media influence health behavior. Sherry’s research has been focusing on social data and smoking cessation, looking at how people talk about smoking, their behaviors and their reactions to smoking cessations campaigns on social media. Sherry works with Gnip’s client DiscoverText to access the Twitter firehose.

1. You’ve been studying the media and smoking for the past 15 years, what caused you to be interested in social data?

For a long time my research focused on TV advertising, but a few years ago I began to worry that our work was going to be less and less relevant unless we started to understand new media, including social.

2. How has your research with social data compared to previous research among other mediums, especially TV?
Researching social media and using social data is much harder — there’s more of it, and it’s way more complex. In the past, we were just worried about exposure to ads — and the measures were developed and widely accepted decades ago; now we’re still worried about exposure, but also searching for information, and sharing information on social media; and with social data, it’s still the wild west for measurements. How do you measure exposure, search and exchange across social platforms, and how are these behaviors related to health behavior. In addition, with TV advertising, there was only an anti-tobacco message to measure. With social media, we need to figure out who’s talking about smoking cigarettes, and how to distinguish them from people talking about smoking ribs or smoking hot girls. And then we need to figure out if the information they are promoting/sharing is pro- or anti-smoking.

3. What are insights from your research on smoking cessation and social data?

We’ve learned so much! First, lots of people who are talking about smoking are not talking about cigarettes! One of our biggest challenges has been to refine our key words and develop techniques to code Tweets and other content as tobacco-relevant. Early on in our process, Gnip’s own Charles Ince had the brilliant insight to introduce us to Stu Shulman, who developed DiscoverText, which is an invaluable tool tthat we rely upon for our data cleaning and analysis process. DiscoverText allows us to sort through and code the millions of tweets that contain some reference to ‘smoking’. Using DiscoverText gives us both the transparency and control that we need to make sure that the tweets we analyze are the tweets that are relevant to our research questions. We can use humans to code for tobacco relevance, and then a boolian language recognition algorithm in DiscoverText can learnfrom the human coders, and code literally tens of thousands of tweets—actually more accurately than humans could at that scale! As part of this process, we’ve also learned that there are lots and lots of words people use to talk about cigarettes and smoking tobacco — an obvious statement, but one that has really important implications for searching for/measuring the content we’re interested in. No matter how thorough, broad and prospective we try to be, we cannot anticipate all the the terms and keywords that turn out to be relevant. The ability to go back and look for content once we’ve identified key ideas will be critically important to our work. Now that we’re getting a handle on how to deal with this massive and very complex data, we’re also learning a ton about people are talking and thinking about smoking. In simplest terms, smoking weed is discussed much more favorably than smoking cigarettes. In the world of media campaignevaluation, we learned that the recent CDC anti-smoking media campaign really struck a chord with people — the effect of the graphic images were broad and deep. This was an important observation because the graphic approach of these ads were very controversial. By looking at the social media reaction, we could see that they achieve substantial engagement, rather than rejection of their message, which was a concern.

4. People are less likely to be honest about bad habits on surveys. What are some of the advantages and disadvantages of using social data to capture life habits?
Social data reflects such spontaneous and generally unfiltered responses. It’s great to see and analyze what people are saying and claiming as their own. I think that surveys still have their place — there is a lot of individual-level information that is important, and which social data doesn’t reveal well. But it’s now critically important to understand what and how many much people are saying, searching for, and passing along on social platforms. These data can give context to traditional survey data and can also guide the development of better, more relevant surveys.

5. Several years ago danah boyd talked about the class divisions between MySpace and Facebook, and how Facebook was for the “good” kids and MySpace was for the burnouts. How do you see the audiences matching up on segments you’re trying to study and the social data sources you’re using?
That’s an interesting question. So far, we’re pretty focused on Twitter data. We’re just beginning to explore Disqus and other social platforms, so I can’t really compare across platforms. We do see that there is particular language/words used on Twitter that seems to characterize different populations such as the slang words for cigarettes. By understanding the slang, we can see regional differences, as well as cultural differences, in attitudes about tobacco.

6. How is the health world starting to use social data and what are some of the struggles they’re seeing?
The health world seems to be just starting to use social data. There’s some super cool work on developing social networks/data to monitor health conditions. I haven’t seen many other projects that are trying to wrangle massive social data similar to what we are doing. It’s hard to get your head around the variety and complexity of the data that is now available. We have been obsessed with data management and measure development. I think that’s the missing link for the public health world, and it’s one of our biggest challenges — translating how these data can answer questions the public health community is interested in.

On June 25th, to promote the year of Oreo’s 100th birthday, Nabisco lent its cookie some currency: The company tweeted the image of a six-layered cookie, with crèmes the color of the rainbow, above a simple caption – “Pride.”

“We feel the Oreo ad is a fun reflection of our values,” a Kraft spokesman later told reporters. The cookie, the company said, illustrated ‘in a fun and playful way’ an issue that was making history.

The image lit up the social web. This post, and two that follow, explore conversations on Tumblr through the lens of Oreo. Part Two looks at how the episode touched other brands on the network. Part Three dives into the dynamics of Tumblr conversations and how they diverge from other platforms.

The image itself touched a vein. Opponents to marriage equality took to Oreo’s accounts on Facebook and Twitter to slam Nabisco and threaten boycott.

“[W]onderful job Oreo on supporting equal rights, just for that, now I’ll buy a pack today.”

“I believe I’m going to go buy every package of Oreos I see when I go grocery shopping. Kudos!!”

Within hours, Oreo found itself the subject of some 7,500 tweets. The conversation ramped to midnight EST, when the brand was pulling back some 2,000 tweets per hour.Figure 1 shows hourly Twitter volumes around Oreo between June 18 and July 2.

Tumblr followed on the 26th. In three hours that night, the company drew more than 300 textual posts on the network, double what the brand had done each day the week before.

The talk stayed political: “Way to go Kraft!,” one post read, “However it is also eye-opening to see how many people are proud to show their hate, or belief that all Americans do not deserve equal rights.”

Figure 2 shows hourly Tumblr volumes around Oreo between June 18 and July 2.

By then, the story had spilled. ABC, NBC, Reuters and the Washington Post amplified news of the flap. A conservative family group urged supporters to look elsewhere for cookies. Meanwhile, the image was slowly amassing more than 60,000 Facebook comments and close to 300,000 likes. Two social analytics companies would later call that conversation overwhelmingly positive – for Oreo.

For days on Tumblr, the story echoed. Median hourly Twitter volumes had returned to normal by the fracas’ fourth day. But on Tumblr, a full week after Oreo’s image went live, chatter remained triple the cookie’s prior volume.

In that way, the image marked a breakthrough for Oreo on Tumblr. At peak, the pride cookie generated 2.6 times Oreo’s median Twitter volume from the week prior. For Tumblr, that figure was 19.8.Figure 3 shows the ratio between hourly platform volume around Oreo and typical hourly platform volumes between June 18 and July 2.

Oreo had long been a social brand. Before the pride cookie, it counted 26 million Facebook fans and tens of thousands of Twitter followers. On Tumblr, the cookie already outstripped its rivals. And in a move that may help the company retain that lead, Oreo can rely on oreodailytwist.tumblr.com, the brand’s official Tumblr presence. Its first posted image? June 25 – the pride cookie.

Figure 4 shows Oreo’s Tumblr lead over major cookie brands in the United States between June 18 and July 2.

But Oreo’s Tumblr story rippled beyond the cookie alone. That broadening – a central quality of the Tumblr platform – has implications for brands linked by product, demographic or, in this case, ideology. Return for more in Part Two.