eMetrics Washington, D.C. 2010 — Fun with Twitter

I took a run at the #emetrics tweets to see if anything interesting turned up. Rather than jump into Nielsen Buzzmetrics, which was an option, I just took the raw tweets from the event and did some basic slicing and dicing of them.

[Update: I’ve uploaded the raw data — cleaned up a bit and with some date/time parsing work included — in case you’d like to take another run at analyzing the data set. It’s linked to here as an Excel 2007 file]

The Basics of the Analysis

I constrained the analysis to tweets that occurred between October 4, 2010, and October 6, 2010, which were the core days of the conference. While tweets occurred both before and after this date range, these were the days that most attendees were on-site and attending sessions.

To capture the tweets, I set up a Twapper Keeper archive for all tweets that included the #emetrics hashtag. I also, certainly, could have simply set up an RSS feed and used Outlook to capture the tweets, which is what I do for some of our clients, but I thought this was a good way to give Twapper Keeper a try.

The basic stats: 1,041 tweets from 218 different users (not all of these users were in attendance, as this analysis included all retweets, as well as messages to attendees from people who were not there but were attending in spirit).

Twapper Keeper

Twapper Keeper is free, and it’s useful. The timestamps were inconsistently formatted and/or missing in the case of some of the tweets. I don’t know if that’s a Twapper Keeper issue, a Twitter API issue, or some combination. The tool does have a nice export function that got the data into a comma-delimited format, which is really the main thing I was looking for!

Twitter Tools Used

Personally, I’ve pretty much settled on HootSuite — both the web site and the Droid app — for both following Twitter streams and for tweeting. I was curious as to what the folks tweeting about eMetrics used as a tool. Here’s how it shook out:

So, HootSuite and TweetDeck really dominated.

Most Active Users

On average, each user who tweeted about eMetrics tweeted 4.8 times on the topic. But, this is a little misleading — there were a handful of very prolific users and a pretty long tail when you look at the distribution.

June Li and Michele Hinojosa were the most active users tweeting at the conference by far, accounting for 23% of all tweets between the two of them directly (and another 11% through replies and retweets to their tweets, which isn’t reflected in the chart below — tweet often, tweet with relevancy, and your reach expands!):

Tweet Volume by Hour

So, what sessions were hot (…among people tweeting)? The following is a breakdown of tweets by hour for each day of the conference:

Interestingly, the biggest spike (11:00 AM on Monday) was not during a keynote. Rather, it was during a set of breakout sessions. From looking at the tweets themselves, these were primarily from the Social Media Metrics Framework Faceoff session that featured John Lovett of Web Analytics Demystifed and Seth Duncan of Context Analytics. Of course, given the nature of the session, it makes sense that the most prolific users of Twitter attending the conference would be attending that session and sharing the information with others on Twitter!

The 2:00 peak on Monday occurred during the Vendor Line-Up session, which was a rapid-fire and entertaining overview of many of the exhibiting vendors (an Elvis impersonator and a CEO donning a colonial-era wig are going to generate some buzz).

There was quite a fall-off after the first day in overall tweets. Tweeting fatigue? Less compelling content? I don’t know.

Tweet Content

A real challenge for listening to social media is trying to pick up hot topics from unstructured 140-character data. I continue to believe that word clouds hold promise there…although I can’t really justify why a word frequency bar chart wouldn’t do the job just as well.

Below is a word cloud created using Wordle from all 1,041 tweets used in this analysis. The process I went through was that I took all of the tweets and dropped them in MS Word and then did a handful of search-and-replaces to remove the following words/characters:

#emetrics

data

measure

RT

These were words that would come through with a very strong signal and dominate potentially more interesting information. Note: I did not include the username for the person who tweeted. So, occurrences of @usernames were replies and retweets only.

Here’s the word cloud:

What jumped out at me was the high occurrence of usernames in this cloud. This appears to be a combination of the volume of tweets from that user (opening up opportunities for replies and retweets) and the “web analytics celebrity” of the user. The Expedia keynote clearly drove some interest, but no vendors generated sufficient buzz to really drive a discussion volume sufficient to bubble up here.

As I promised in my initial write-up from eMetrics, I wasn’t necessarily expecting this analysis to yield great insight. But, it did drive me to some action — I’ve added a few people to the list of people I follow!

LOL. Thanks, John! My ulterior motive is to keep chipping away at techniques for actually quickly analyzing a channel like Twitter regarding any topic — this was a convenient, manageable data set to play with. What it highlighted for me was how much elbow grease had to go into the data extraction and cleanup — I didn’t, by any means, have to go through and touch each of the 1,041 tweets, but I did have to write some crazy formulas to help with the aggregation. I’m dangerously close to deciding I need to roll my sleeves up and dive into the Twitter API.

Michele — I’m sure you had some other clever thoughts of how to look at these. I’ve added a link to the data set, which includes the date/time cleanup, at the beginning of this post. I’d love to get other ideas as to how to meaningfully cull through this sort of freeform text without using an expensive tool.

Scott — for free, the tools I use most often are: Facebook Insights, Google Analytics (using it to tag Facebook content where possible, as well as for referral data to sites from social sites), RSS feeds into Outlook (for Twitter searches — it’s a pretty robust way to get a clean archive of tweets…that occur after the feed is set up), and TwitterCounter (a little bit — I don’t really trust the historical count of followers, but it’s the best I’ve found), and Wordle (for various flavors of the analysis shown here). I also use MS Word macros for some cleanup of “screen-scraped” Facebook content.

It’s a mish-mash of tools, and there are gaps and automation-shortcomings across the board.

I talked to a fellow a guy a few months ago who spoke about how he was using R to hook into the Twitter API and do some pretty interesting stuff — both some text analysis of tweets themselves, but also investigating the social graphs of the people who were tweeting relative to a topic. I can’t get out of my head that he was onto something there when it comes to a low-cost way to have a flexible way to pull things from Twitter that none of the current tools seem to do quite well enough (although, of course, the Twitter API throws up its own limitations).

This is a hot topic for my agency (Resource Interactive), as we do a lot of social media strategy work, as well as execution (Facebook pages, Twitter, and mobile as the primary channels) for major consumer brands…and I’m on the hook to meaningfully measure that work. I try to keep one ear to the ground to sniff out smart ideas on that front — “frameworks” are useful and necessary (the Lovett/Owyang paper, for instance), and tactics are where the rubber hits the road, but it seems like there’s less thinking out in the blogosphere as to how to actually tie frameworks to tactics.