Taking Sentiment Analysis to Dancing With the Stars and American Idol

Kate Gosselin’s still in. Big Mike was out â€“ at least for a minute. Reality Buzz Kapow could have told you so.

Welcome to sentiment analytics for Dancing With the Stars and American Idol (or DWTS and Idol as they’re tweeted), courtesy of Kapow Technologies. Kapow is really all about automated collection of structured and unstructured web data, in the service of improving predictive analytics and business decision-making. But for this project â€“ an effort to get its own buzz on for its technology by applying it to the TV shows that some 50 million Americans are addicted to â€“ it’s home-grown tech to ferret out the positive/negative/neutral sentiment among the data it’s harvesting from Twitter, Facebook, blogs, and forum discussions. Kapow’s laid over that some more complex algorithms it’s developed to try to understand how strong sentiment is and what that actually might equate to in the results.

It’s trickier than you might think, says Kapow’s Rick Kawamura, who’s spearheading the effort. Kapow’s heritage is in enabling companies through its robots to automatically get data for analyzing everything from markets to competitors to pricing to customer opinions, whether it’s from web services like Twitter or Facebook where all or lots of data is publicly available, or from more difficult-to-access sources, such as sites that use Flash-based maps that don’t render in HTML or information behind password-protected accounts that would otherwise require API access and code-writing. With Kapow’s background, extracting the social media commentary around these two shows is easy for its technology (though of course it can’t mine private updates in services like Facebook) — it takes only about an hour to get up and running a robot to monitor new blogs or other sites that reflect our national pastime of voting on dancers and singers, Kawamura says. A search for DWTS or Idol, for instance, can be paired with “if, and, or” terms â€“ DWTS and Kate, for instance. And once the information with these references is in its database, Reality Buzz can apply semantic analysis of keywords in relation to each other in search of the positive, negative or neutral nuances relative to them.

“The easy part is to immediately filter in the positive, negative, and neutral and graph who has more positive vs. negative comments,” Kawamura says. But you can’t come to any solid conclusions just on the basis of a graph, and that’s where things get interesting â€“ just as they do in applying sentiment analytics to business concerns. “You have to consider all these other factors, how to weight the positive vs. negative and other complex algorithms to really know that what the data is telling you is in fact accurate,” says Kawamura.

For instance, does a positive comment mean an online poster thinks someone is pretty good and maybe will dial in once, or that he will vote 100 times for the contestant? “That’s a huge difference,” Kawamura says, and one Kapow’s still working to figure out. Or, if someone says these are my top three and these are my bottom three, does that does mean she voted for the top three, once, many times, or not at all? “We didn’t start off with the idea to build all those algorithms but as we got into this we saw it was harder than we thought. We thought it would be very clear cut as to who gets eliminated.”

You also may have to take into account factors like whether a particular contestant’s fan base is likely to use a particular carrier’s text messaging service â€“ that could make a difference in the number of votes a contestant gets on American Idol, which is tied to AT&T Mobility subscribers. Or how about whether a contestant’s fan base is well-represented on social media like Facebook and Twitter to start with? By and large, these two shows’ demographics are almost exactly overlaid with Twitter and Facebook, Kawamura says, but then again you might have had a large senior contingent pulling for astronaut Buzz Aldrin, eliminated from DWTS last week, who probably weren’t using the Internet to talk about him. (Note the link between this and Bradley Honan’s comment about online sentiment analytics being just one arrow in a quiver for business.)

That all brings us back to Kate, who, along with Pamela Anderson, has been reason for a lot of people to watch DWTS this year. (The DIVORCE! The BANK ACCOUNT SCANDAL! The CUSTODY FIGHT!) Kawamura says monitoring recent activity revealed that she was getting about 50 percent of all social media comments, and about 90 percent of those were negative â€“ which might lead you to conclude she was a goner. “But we were pretty confident she wouldn’t go,” says Kawamura. It’s the contestants that no one is talking about â€“ DWTS’ Jake (aka The Bachelor) Pavelka, All My Children’s Aiden Turner, or Shannen Doherty â€“ that Kawamura has considered to be in trouble. He got Shannen right and has been predicting Aiden’s demise for a couple of weeks. Not to make excuses but he said he was going to pick Aiden again this week, but went with Jake since everyone was telling him not to draw the same conclusion a third week in a row. Of course, turns out Aiden did dance out this week.

As for Kate, with a bigger fan base than other contestants, it’s possible the more bashing she gets the stronger that base gets in support of her â€“ in which case a negative comment might have an unintended effect, or perhaps the entire value of a negative comment should be in question since you can’t cast a negative vote anyway. It’s enough to make your head spin, never mind your feet. “These models require lot of hypothesis that we create and try to determine what is the most accurate prediction,” Kawamura says. “It’s been fun and we’ve learned a lot about analyzing data ourselves, and what is required through processes and tracking to be accurate as possible.”

Another important factor around weighting sentiment is the time when it’s expressed. Reality Buzz collects data for the next round for seven days previous to the show to get an indication of fan base size, “but if we look at the data between the live performance and elimination show it changes quite a bit based on the actual performance,” Kawamura notes. Big Mike, for instance, was saved by the Idol judges last week after a weaker showing, and this week he had a strong performance. His positive comments went through the roof, Kawamura says, so he was expecting Big Mike to make it through again, which he did. Upshot is there’s heavy weight put on comments between those two shows, and particularly the last twelve hours before the elimination event. “Andrew [Garcia], on the other hand, has been up and down, and this week probably he was the worst in the Elvis songs, and I was 100 percent confident he was gone,” Kawamura says. “That was 99 percent data driven but just watching the show I knew, too.”

Who won’t be going home next week? Kawamura says right now it looks like Kate on DWTS and Tim Urban on Idol still have a good chance to stay. Kate might even be winning over more people since her footwork’s improving, a bit â€“ and Tim already seems to have drawn more fans, after taking some beatings by the judges earlier in the contest. “He’s in the top two or three right now,” Kawamura says. “He’s overcome such opposition and now he’s winning over fans who probably think he’s cute and hey, not a bad singer.”

Putting money on your office’s DWTS or Idol pool? Maybe you should check out Reality Buzz â€“ and, heck, it’s Friday, so feel free to post your own sentiments about the contestants right here, too.

About the author

Jennifer Zaino is a New York-based freelance writer specializing in business and technology journalism. She has been an executive editor at leading technology publications, including InformationWeek, where she spearheaded an award-winning news section, and Network Computing, where she helped develop online content strategies including review exclusives and analyst reports. Her freelance credentials include being a regular contributor of original content to The Semantic Web Blog; acting as a contributing writer to RFID Journal; and serving as executive editor at the Smart Architect Smart Enterprise Exchange group. Her work also has appeared in publications and on web sites including EdTech (K-12 and Higher Ed), Ingram Micro Channel Advisor, The CMO Site, and Federal Computer Week.