Category Archives: Crowdsourcing

In an article published last weekend on Mashable, Sarah Kessler asked the question, “Can Robots Run the News?” It’s an important question not just for journalists, but for anyone who creates or curates content on the Web.

The examples Kessler cites span the range of content creation, from automatically generated sports news to the use of algorithms to identify news topics. There’s obvious value to automated content creation, and as Jeff Jarvis has declared, “Data is (are) journalism.” But we should be careful not to confuse computed content with communication.

Computed content is a set of data; communication is the expression of an attitude toward, or perspective on, those data. Without a point of view, content is just an audience speaking to itself.

Using Web analytics from a test period to automatically choose between two headlines, as we’re told the Huffington Post does for its stories, can make sense—if both versions are true to the content. If you balance crowd-sourced feedback with the content creator’s point of view, you’ll have a productive conversation. But if the crowd takes precedence, it may simply replace content’s individual vitality with the bland mean.

Take, for instance, the English title for Stieg Larsson’s novel The Girl with the Dragon Tattoo. It may not have been crowd sourced, but it certainly plays to a corporate idea of the crowd. Is it really better than the literally translated original title, Men Who Hate Women? (That’s a rhetorical question. The original title nails the book’s central concern; the English version just wraps it in a pulp-fiction cover.)

Even in content marketing, where knowing what people want is critical to the content provider’s success, a one-sided conversation dominated by the audience won’t fly. For a conversation to work, there must be differences between the participants. The power of new media is the way it enables the audience to challenge the creator. That doesn’t mean, though, that the creator should stop challenging the audience.

This balance seems to be what Yahoo VP of Media Jimmy Pitaro is after in the company’s news blog, The Upshot. In her interview with him last week on All Things D, Kara Swisher noted that while some see computational journalism as a “‘democratizing’ of the news, others are more concerned about relying on algorithms to determine the best coverage and the implications for a society guided by its own searches.”

But as Pitaro noted in his video interview, “data and audience insights” constitute just one component of the content. In addition, Yahoo uses the “old-school” methods of “manually identifying topics” through its team of editors and writers.

Similarly, as Kessler mentioned in Mashable and as Claire Cain Miller explored at greater length in yesterday’s New York Times, the tech-news site Techmeme uses both algorithms and editors to produce its content. Why? Because “humans do things software cannot, like grouping subtly related stories, taking into account sarcasm or skepticism, or posting important stories that just broke.”

If readers didn’t care about such things, algorithms alone might be enough. But they do care. The same audience whose searches drive the algorithms also want the human touch in their content. Until computers can pass the Turing Test, it isn’t likely that they will replace people in content creation.

Is Google poised to slow the growing domination of its search results by content farms like Demand Media and Associated Content? At the end of last Saturday’s episode of the podcast This Week in Google, Matt Cutts, the head of Google’s Webspam team, suggested that it would: “If your business model is solely based on mass-generating huge amounts of nearly worthless content, that’s not going to work as well in 2010.”

Cutts’s remark came in response to a question by host Leo Laporte near the end of the episode. Though Laporte only learned about Demand Media a week earlier in his This Week in Tech Podcast, as he glancingly noted, he left no mistake about where he stood on the merits of its approach: “it seems like a way to game Google by creating a lot of pages with . . . barely adequate content in a niche area [in order] to drive traffic.”

Though Cutts avoided taking a position on Demand Media itself, he made it clear that Google was looking to address the generic problem:

“Within Google, we have seen a lot of feedback from people saying, Yeah, there’s not as much web spam, but there is this sort of low-quality, mass-generated content . . . where it’s a bunch of people being paid a very small amount of money. So we have started projects within the search quality group to sort of spot stuff that’s higher quality and rank it higher, you know, and that’s the flip side of having stuff that’s lower-quality not rank as high.”

In response to a question from co-host Jeff Jarvis, Cutts gave some specific ideas of how Google might try to adjust for the content-farm effect:

“You definitely want to write algorithms that will find the signals of good sites. You know, the sorts of things like original content rather than just scraping someone, or rephrasing what someone else has said. And if you can find enough of those signals—and there are definitely a lot of them out there—then you can say, OK, find the people who break the story, or who produce the original content, or who produce the impact on the Web, and try to rank those a little higher. . . .”

Jarvis, it should be noted, is not a cookie-cutter critic of Demand Media. He argued that Demand’s system for determining what content readers and advertisers want is “very smart.” But he seemed to agree that its resulting product is ranked too high on Google’s results. In the link economy, he said, it becomes an “ethical matter” to support original content by linking to it “at its source.”

Jarvis took Cutts’s thoughts further by stressing the growing importance of “Twitter, Buzz, and Facebook,” or “human recommendation of content,” as a way “to get past this notion of spam and content farms.” The more Google and others can capture the value of this social-media validation, he said, “the less this content-farm chaff is going to be a problem.”

In a BuzzMachine post published on Monday, Jarvis expanded on the topic of how content will be discovered in the future. Thanks to new tools like Twitter, Facebook, Buzz, he wrote, “human links are exploding as a means of discovery.” Earlier forms of discovery, he said, have been prone to manipulation, but in the new “content ecosystem,” where we “discover more and more content through people we trust,” quality will again rise to the top.

An article published today by Michael Masnick on his Techdirt blog takes on a Forbe’s opinion piece that tries to debunk the “myth” of crowdsourcing. The Forbes contributor, Dan Woods, claims that the commonly cited triumphs of crowdsourcing like Wikipedia (the supplier of this definition of crowdsourcing) are in fact the products largely of individuals, not groups.

Masnick’s reaction is basically, “Well, duh!” As he says, “of course there are individuals, and the point of crowdsourcing isn’t that everyone in the crowd is equal, but that they each get to contribute their own special talents, and something better comes out of it.”

The mistake Woods makes in his Forbes piece is in confusing the prolific diversity of crowds with the monolithic single-mindedness of mobs. The one is productive, the other, destructive. Woods’s error is one publishers climbing up the new-media learning curve should strive to avoid.