Tag Archives: realtime data

Post navigation

New Year’s Eve gives us a sense of closure on the past and an opportunity to make new dreams. With the emergence of social media, we can now see these reflections and resolutions transpire in realtime. As we observed the posts, comments, and tweets related to the New Year, we saw the typical expressions on Facebook and Twitter of best wishes for the coming year and pithy observations about the past year. What we didn’t expect was that users of the two popular social media sites would have different outlooks on the world.

As we enter 2012, Facebook users are more optimistic than Twitter users.

You’re probably wondering how we can say that. Well, we looked at all of the public posts on Facebook and Tweets on Twitter that contained “Happy New Year.” For all of those posts and Tweets, we compared the use of positive words such as “better” and “good” to the use of negative words such as “worse” and “bad.” We found that Tweets with positive words appeared 8 times more frequently than Tweets with negative words. You might be thinking a ratio of 8 to 1 is pretty optimistic…

It may be, but posts on Facebook had a ratio of 40 to 1–such a huge difference lead us to speculate that Facebook is a more optimistic place than Twitter.

Interesting stuff. Could be a variety of reasons for the difference, from the mix of users on each service to the fact that Facebook is used to communicate with friends, while Twitter is user to broadcast to followers. We’ll leave the speculation up to you.

Gnip’s asset and investment management clients are consistently impressed by two aspects of our social data that differentiate this data from their other sources: Speed & Amplification.

Speed

Speed relates to the ability of social media content to be ‘instant’; an ability fueled by millions of global users who can break news and sentiment more immediately than traditional media sources always can.

A prime example is news of the death of Osama Bin Laden. Keith Urbahn, the former chief of staff for Don Rumsefeld, is widely credited with the breaking that story… through Twitter!

After Keith’s tweet, multiple retweets quickly followed. Within 19 tweets on this subject, a company called DataMinr had identified this as an important and breaking story. DataMinr, a “global sensor network for emerging events and consumer signals,” then issued a signal to their clients, alerting them to this important piece of information.

How does this play into the ‘speed’ characteristic? Because it would be over 20 minutes before that story appeared on traditional news sites. Access to a data stream that can beat traditional media sources by over 20 minutes requires no explanation as to its value for traders and investors.

Amplification

Amplification speaks to the ability of social media as a ‘crowd-sourced megaphone.’ The propensity of users to like, share, and retweet content from other users gives those consuming social media data an extremely easy mechanism to measure what content is most important to the world – and compare that content against other content in real time.

A prime example is the passing of Steve Jobs. We wrote about Steve Jobs’ passing a few weeks ago – that post is here – but there’s an important item to revisit:

The impact he had on us made his death that much more profound and the reaction on Twitter was immediate and immense. Word spread rapidly, peaking at 50,000 Tweets per minute within 30 minutes. At that point, Tweets about Jobs accounted for almost 25% of all Tweets being sent globally.

Access to Gnip’s social media data stream allowed our clients to measure, in the moment, the amplification of this story to measure the importance the world placed on this piece of news. While I doubt any of us needed to see those numbers to know Steve’s passing was an important piece of news, that’s a clear example of how ‘amplification’ works.

Our clients use amplification as a measure to weigh the importance of breaking news, upcoming events, market and product announcements, etc. against other stories. By capturing a realtime snapshot of what the market considers important – and what it doesn’t – they’re able to add an important factor to their existing algorithms.

None of this is to suggest that either social media data speed or amplification should be a sole factor in investing. But when the Gnip social media data stream provides clients with an additional factor to help understand or predict market fluctuations, the value is obvious.

Gnip is excited to announce the addition of Google+ to its repertoire of social data sources. Built on top of the Google+ Search API, Gnip’s stream allows its customers to consume realtime social media data from Google’s fast-growing social networking service. Using Gnip’s stream, customers can poll Google+ for public posts and comments matching the terms and phrases relevant to their business and client needs.

By working with Gnip along with a stream of Google+ data (and the availability of an abundance of other social data sources), you’ll have access to a normalized data format, unwound URLs, and data deduplication. Existing Gnip customers can seamlessly add Google+ to their Gnip Data Collectors (all you need is a Google API Key). New to Gnip? Let us help you design the right solution for your social data needs, contact sales@gnip.com.

As we wrote in our last post, Gnip co-sponsored the 2011 Dreamforce Hackathon, where teams of developers from all over the world competed for the top three overall cash prizes as well as prizes in multiple categories. Our very own Rob Johnson (@robjohnson), VP of Product and Strategy, helped judge the entries, selecting the Enterprise Mood Monitor as winner of the Gnip category.

The Enterprise Mood Monitor pulls in data from a variety of social media sources, including the Gnip API, to provide realtime and historical information about the emotional health of the employees. It shows both individual and overall company emotional climate over time and can send SMS messages to a manager in cases when the mood level goes below a threshold. In addition, HR departments can use this data to get insights into employee morale and satisfaction over time, eliminating the need to conduct the standard employee satisfaction surveys. This mood analysis data can also be correlated with business metrics such as Sales and Support KPIs to identify drivers of business performance.

Pretty cool stuff.

The three developers (Shamil Arsunukayev , Ivan Melnikov and Gaziz Tazhenov) from Comity Designs behind this idea set out to create a cloud app for the social enterprise built on one of Salesforce’s platforms. They spent two days brainstorming the possibilities before diving into two days of rigorous coding. The result was the Enterprise Mood Monitor, built on the Force.com platform using Apex, Visualforce, and the following technologies: Facebook API (Graph API), Twitter API, Twitter Sentiment API, LinkedIn API, Gnip API, Twilio, Chatter, Google Visualization API. The team entered their Enterprise Mood Monitor into the Twilio and Gnip categories. We would like to congratulate the guys on their “double-dip” win as they took third place overall and won the Gnip category prize!

Have fun and creative way you’ve used data from Gnip? Drop us an email or give us a call at 888.777.7405 and you could be featured in our next blog.

While Steve Jobs’ resignation yesterday had investors anxiously watching how $AAPL fared in trading, we at Gnip were having fun watching a different ticker- the realtime Twitter feed.

As you can see from the graph below (these represent the number of “Steve Jobs” mentions per minute*), Twitter showed an incredible spike almost immediately. Apple-specific activity peaked 11 minutes after the news broke, showing how quickly the word spread. Honors for first tweet go to @AronPinson, who must have some blazing fast typing skills.

Once again, it’s incredible to see how social media is quickly becoming a trusted means of accessing and delivering realtime information.

*For more details on how we conducted this search across the millions of real-time tweets we have access to, contact us!

It’s been a volatile time for the markets the last few weeks. Today especially – the Dow closed down 635 points; S&P, -80; NASDAQ, -175. While there’s no shortage of opinions on how/why the market will/will not recover, one thing is for certain – having the right data to make decisions is more important than ever.

Part of the reason for this is that the markets are clamoring for trends – definitive information on stock trends and market sentiment. Which is why it’s exciting to see how our finance clients are using the Gnip realtime social media data feeds. In a time of increased volatility, our hedge fund (and other buy-side) clients are leveraging social media data as a new source of analysis and trend identification. With an ever-growing number of users, and Tweets, per day, Twitter is exploding, and market-leading funds are looking at the data we provide as a way to more accurately tap into the voice of the market. They’re looking at overall trend data from millions of Tweets to predict the sentiment of consumers as well as researching specific securities based on what’s being said about them online.

While the early-adopters of this data have been funds, this type of analysis is available to individuals as well. Check out some start-ups doing incredible things at the intersection of finance and social media:

Centigage provides analytics and intelligence designed to enable financial market participants to use social media in their investment decision-making process

The social ecosystem has become the pulse of the world. From delivering breaking news like the death of Osama Bin Laden before it hit mainstream media to helping President Obama host the first Twitter Town Hall, the realtime social web is flooded with valuable information just waiting to be analyzed and acted upon. With millions of users and billions of social activities passing through the ever-growing realtime social web each day, it is no wonder that companies need to reevaluate their traditional business models to take advantage of this big social data.But with the exponentially ever-growing social web, massive amounts of data are pouring into and out of social media publishers’ websites and APIs every second. In a talk I gave at GlueCon a couple of months ago, I ran down some math to put things into perspective. The numbers are a little dated, but the impact is the same. At that time there were approximately 155,000,000 Tweets per day and the average size of a Tweet was approximately 2,500 Bytes (keep in mind this could include Retweets).

So what does this mean for the data consumers, the companies wanting to reevaluate their traditional business models to take advantage of vast amounts of Twitter data? At Gnip we’ve learned that some of the collective industry data processing tools simply don’t work at this scale: out-of-the-box HTTP servers/configs aren’t sufficient to move the data, out-of-the-box config’d TCP stacks can’t deliver this much data, and consumption via typical synchronous GET request handling isn’t applicable. So we’ve built our own proprietary data handling mechanisms to capture and process mass amounts of realtime social data for our clients.

Twitter is just one example. We’re seeing more activity on today’s popular social media platforms and a simultaneous increase in the number of popular social media platforms. We’re dedicated to seamless social data delivery to our enterprise customer base and we’re looking forward to the next data processing challenge.

The Twitter Streaming API is designed to deliver limited volumes of data via two main types of realtime data streams: sampled streams and filtered streams. Many users like to use the Streaming API because the streaming nature of the data delivery means that the data is delivered closer to realtime than it is from the Search API (which I wrote about last week). But the Streaming API wasn’t designed to deliver full coverage results and so has some key limitations for enterprise customers. Let’s review the two types of data streams accessible from the Streaming API.The first type of stream is “sampled streams.” Sampled streams deliver a random sampling of Tweets at a statistically valid percentage of the full 100% Firehose. The free access level to the sampled stream is called the “Spritzer” and Twitter has it currently set to approximately 1% of the full 100% Firehose. (You may have also heard of the “Gardenhose,” or a randomly sampled 10% stream. Twitter used to provide some increased access levels to businesses, but announced last November that they’re not granting increased access to any new companies and gradually transitioning their current Gardenhose-level customers to Spritzer or to commercial agreements with resyndication partners like Gnip.)

The second type of data stream is “filtered streams.” Filtered streams deliver all the Tweets that match a filter you select (eg. keywords, usernames, or geographical boundaries). This can be very useful for developers or businesses that need limited access to specific Tweets.

Because the Streaming API is not designed for enterprise access, however, Twitter imposes some restrictions on its filtered streams that are important to understand. First, the volume of Tweets accessible through these streams is limited so that it will never exceed a certain percentage of the full Firehose. (This percentage is not publicly shared by Twitter.) As a result, only low-volume queries can reliably be accommodated. Second, Twitter imposes a query limit: currently, users can query for a maximum of 400 keywords and only a limited number of usernames. This is a significant challenge for many businesses. Third, Boolean operators are not supported by the Streaming API like they are by the Search API (and by Gnip’s API). And finally, there is no guarantee that Twitter’s access levels will remain unchanged in the future. Enterprises that need guaranteed access to data over time should understand that building a business on any free, public APIs can be risky.

—

The Search API and Streaming API are great ways to gather a sampling of social media data from Twitter. We’re clearly fans over here at Gnip; we actually offer Search API access through our Enterprise Data Collector. And here’s one more cool benefit of using Twitter’s free public APIs: those APIs don’t prohibit display of the Tweets you receive to the general public like premium Twitter feeds from Gnip and other resyndication partners do.

But whether you’re using the Search API or the Streaming API, keep in mind that those feeds simply aren’t designed for enterprise access. And as a result, you’re using the same data sets available to anyone with a computer, your coverage is unlikely to be complete, and Twitter reserves the right change the data accessibility or Terms of Use for those APIs at any time.

If your business dictates a need for full coverage data, more complex queries, an agreement that ensures continued access to data over time, or enterprise-level customer support, then we recommend getting in touch with a premium social media data provider like Gnip. Our complementary premium Twitter products include Power Track for data filtered by keyword or other parameters, and Decahose and Halfhose for randomly sampled data streams (10% and 50%, respectively). If you’d like to learn more, we’d love to hear from you at sales@gnip.com or 888.777.7405.

The Twitter Search API can theoretically provide full coverage of ongoing streams of Tweets. That means it can, in theory, deliver 100% of Tweets that match the search terms you specify almost in realtime. But in reality, the Search API is not intended and does not fully support the repeated constant searches that would be required to deliver 100% coverage.Twitter has indicated that the Search API is primarily intended to help end users surface interesting and relevant Tweets that are happening now. Since the Search API is a polling-based API, the rate limits that Twitter has in place impact the ability to get full coverage streams for monitoring and analytics use cases. To get data from the Search API, your system may repeatedly ask Twitter’s servers for the most recent results that match one of your search queries. On each request, Twitter returns a limited number of results to the request (for example “latest 100 Tweets”). If there have been more than 100 Tweets created about a search query since the last time you sent the request, some of the matching Tweets will be lost.

So . . . can you just make requests for results more frequently? Well, yes, you can, but the total number or requests you’re allowed to make per unit time is constrained by Twitter’s rate limits. Some queries are so popular (hello “Justin Bieber”) that it can be impossible to make enough requests to Twitter for that query alone to keep up with this stream. And this is only the beginning of the problem as no monitoring or analytics vendor is interested in just one term; many have hundreds to thousands of brands or products to monitor.

Let’s consider a couple examples to clarify. First, say you want all Tweets mentioning “Coca Cola” and only that one term. There might be fewer than 100 matching Tweets per second usually — but if there’s a spike (say that term becomes a trending topic after a Super Bowl commercial), then there will likely be more than 100 per second. If because of Twitter’s rate limits, you’re only allowed to send one request per second, you will have missed some of the Tweets generated at the most critical moment of all.

Now, let’s be realistic: you’re probably not tracking just one term. Most of our customers are interested in tracking somewhere between dozens and hundreds of thousands of terms. If you add 999 more terms to your list, then you’ll only be checking for Tweets matching “Coca Cola” once every 1,000 seconds. And in 1,000 seconds, there could easily be more than 100 Tweets mentioning your keyword, even on an average day. (Keep in mind that there are over a billion Tweets per week nowadays.) So, in this scenario, you could easily miss Tweets if you’re using the Twitter Search API. It’s also worth bearing in mind that the Tweets you do receive won’t arrive in realtime because you’re only querying for the Tweets every 1,000 seconds.

Because of these issues related to the monitoring use cases, data collection strategies relying exclusively on the Search API will frequently deliver poor coverage of Twitter data. Also, be forewarned, if you are working with a monitoring or analytics vendor who claims full Twitter coverage but is using the Search API exclusively, you’re being misled.

Although coverage is not complete, one great thing about the Twitter Search API is the complex operator capabilities it supports, such as Boolean queries and geo filtering. Although the coverage is limited, some people opt to use the Search API to collect a sampling of Tweets that match their search terms because it supports Boolean operators and geo parameters. Because these filtering features have been so well liked, Gnip has replicated many of them in our own premium Twitter API (made even more powerful by the full coverage and unique data enrichments we offer).

So, to recap, the Twitter Search API offers great operator support but you should know that you’ll generally only see a portion of the total Tweets that match your keywords and your data might arrive with some delay. To simplify access to the Twitter Search API, consider trying out Gnip’s Enterprise Data Collector; our “Keyword Notices” feed retrieves, normalizes, and deduplicates data delivered through the Search API. We can also stream it to you so you don’t have to poll for your results. (“Gnip” reverses the “ping,” get it?)

But the only way to ensure you receive full coverage of Tweets that match your filtering criteria is to work with a premium data provider (like us! blush…) for full coverage Twitter firehose filtering. (See our Power Track feed if you’d like for more info on that.)

You may find yourself wondering . . . “What’s the best way to access the Twitter data I need?” Well the answer depends on the type and amount of data you are trying to access. Given that there are multiple options, we have designed a three part series of blog posts that explain the differences between the coverage the general public can access and the coverage available through Twitter’s resyndication agreement with Gnip. Let’s dive in . ..

Understanding Twitter’s Public APIs . . . You Mean There is More than One?

In fact, there are three Twitter APIs: the REST API, the Streaming API, and the Search API. Within the world of social media monitoring and social media analytics, we need to focus primarily on the latter two.

Search API - The Twitter Search API is a dedicated API for running searches against the index of recent Tweets

Whether you get your Twitter data from the Search API, the Streaming API, or through Gnip, only public statuses are available (and NOT protected Tweets). Additionally, before Tweets are made available to both of these APIs and Gnip, Twitter applies a quality filter to weed out spam.

So now that you have a general understanding of Twitter’s APIs . . . stay tuned for Part 2, where we will take a deeper dive into understanding Twitter’s Search API, coming next week…