Social media tool aims to help journalists find undiscovered, reliable sources on Twitter

A couple of tweets, discovered well after we all learned that Whitney Houston had died Saturday, illustrate a challenge journalists face in a breaking news situation. How do we find key sources, particularly eyewitnesses, in those first minutes and hours after news breaks?

One of the tweets, posted about a half-hour before the AP confirmed Houston’s death, said that she had died in a Beverly Hills hotel; the other said that a relative, who worked for the singer, had found her dead in a bathtub.

“Nobody seems to have cracked the nut on being able to find that tweet from the person who nobody knew prior to that vital piece of information they posted,” said Reuters Social Media Editor Anthony De Rosa.

“How many times have those people gone unheard because they were unable to be found in the first place?” he said in an email. “We need a social media early detection system.”

De Rosa has tested a prototype tool, developed by researchers at Rutgers University and Microsoft, that aims to solve this problem. Called SRSR, for Seriously Rapid Source Review, the tool:

Identifies different types of Twitter users

Surfaces tweets with links to images or video

Filters out retweets

Picks out tweets posted from mobile devices

And perhaps most important for reporters on the leading edge of the latest big story, it tries to figure out who observed something important.

The first-time source, the unnoticed tweet

When a big story strikes, such as last May’s tornadoes, many journalists turn first to social media for eyewitness accounts. Finding them can be hit-or-miss.

Journalists can search for tweets and public Facebook updates with certain words such as “twister” or “I’m OK.” They may look for tweets posted near a certain location. If something has been widely retweeted, they can follow the trail back to the source. They can see what local, well-connected journalists are pointing to.

De Rosa said he employs tools such as Storyful, Topsy, Siftee and Radian6. “None of them are good at being able to find that hidden person with the great nugget of information,” he said.

To put it another way: If someone tweets something newsworthy and no one retweets it, did she make a sound?

“The point is not to replace anyone,” said Diakopoulos, who has since left Rutgers, in a phone interview. “It’s really to augment them, to give them additional cues to help them filter” the newsworthy tweets from the irrelevant and misleading ones.

SRSR, or Seriously Rapid Source Review, aims to help journalists find credible, newsworthy tweets among all the conversation occurring about a major news event.

Clues to credibility

To find eyewitnesses, the researchers came up with a list of 741 words that act as clues – words that someone would be likely to use if she had seen, heard or otherwise perceived something. They searched 1,000 tweets related to news events to see if the software could distinguish the eyewitness accounts from the rest of the conversation.

SRSR did well in one respect and fell short in another. The shortcoming stemmed from the fact that eyewitnesses don’t only use the words that the researchers picked out.

“We missed a lot of eyewitnesses out there, but the ones we told you are eyewitnesses, we’re pretty sure are,” Diakopoulos said.

To use the Twitter-as-a-river analogy, if you think of SRSR as a net that you use to capture eyewitness tweets, then 89 of the 100 tweets captured were in fact eyewitness accounts. But another 189 or so tweets flowed by without SRSR catching them.

“I think there’s a lot of future work we can do to make the eyewitness tool better,” Diakopoulos said, such as analyzing adjectives and looking for tweets without geographical names. (Anecdotal evidence, he said, suggests that eyewitnesses normally don’t include place names when they tweet; people far away from the scene do.)

The system also tries to sort out types of sources, dividing them into three categories: a journalist or blogger, an organization of some kind, or a regular person. It did well on that front, correctly categorizing 90 to 95 percent.

The idea, he said, is that journalists would benefit from knowing who’s tweeting what. “What are the organizations tweeting; who are the other journalists on the ground, who may be local to the event, who I can trust?”

And the tool looks at where a person’s network is located so journalists can see if he is a credible source of a particular piece of information. If a Twitter user is located in New York but has many ties to Egypt, he’s probably more credible than someone with few ties.

One feature that journalists didn’t find useful, researchers found, was an analysis of people’s tweets prior to the news event. “If someone sees a tornado, it doesn’t matter what they’ve been tweeting about for the last 12 months,” Diakopoulos said, although he suspects it could be useful for other types of news stories. (For instance, we reported last year that a woman in St. Petersburg, Fla., tweeted the details of a police shootout. Before that, she tweeted to entice people to watch her strip online.)

The researchers asked journalists from seven news organizations – Philly.com, New York Daily News, NPR, The Huffington Post, The Washington Post, Reuters and the Guardian – to test the tool.

They first showed them how SRSR worked, using a sample of tweets related to the 2011 Tottenham riots in the U.K. Then they loaded12,595 tweets from 7,263 sources, all posted in the first six hours after tornadoes struck Joplin, Mo., in May 2011. They told the journalists to imagine that they didn’t have any reporters on the scene, and to use the tool to find sources, stories or angles that they could use in their coverage.

In general, the journalists were able to find sources for their stories, including some they may not have discovered otherwise. “This gives you context,” said one unnamed journalist quoted in the researchers’ paper. “You have the context for whether or not you think they’re reputable or whether or not they’re worth reaching out to.”

Don’t expect to use it for the next hurricane or tornado

The journalists noted several missing features that would help them, such as combining the eyewitness and the location filters, and focusing on the tweets rather than the sources.

But the biggest challenge to moving past a prototype, Diakopoulos said, is getting access to the full river of tweets. The researchers were able to use just a sample of all tweets related to each event.

“In order to make this useful in real-time, I think you need a partnership with Twitter,” said Diakopoulos. “They’ve got to be able to provide this kind of network data to you in real time so you can aggregate it and present it in real time.”

As for Houston’s death, Diakopoulos said he doesn’t think the prototype would have identified those two tweets, although “there’s definitely potential for future systems to help detect these kinds of things sooner.” Such a tool could search for particular terms and analyze the sources’ networks to see if they appeared to be independent.

“But even if a system can alert a journalist about this,” Diakopoulos told me by email, “I think the journalist would still need to make the final call, possibly messaging those sources and trying to vet their information. Otherwise we rapidly spiral into the world of misinformation: Just because someone claims a celebrity died doesn’t mean anything.”

There are plenty of examples to back that up. Until we get better at finding the right sources while news is breaking, we’ll keep reading backwards-looking stories about the regular guy who broke the news before the professionals did.