Researchers discover challenges of debating scholarly work on the Web

Gunther Eysenbach’s paper about Twitter and scholarly communications foretold its own fate as a topic of lively discussion.

What it did not foretell was how that discussion would bring into such stark relief the advantages and challenges of vetting, contextualizing and defending scholarly work on the public Web.

In December, Eysenbach, an associate professor of health policy at the University of Toronto, published a paper[1] in Journal of Medical Internet Research (JMIR) -- a Web-only enterprise that Eysenbach founded in 1999 with a group of other researchers, and for which he serves as editor and publisher.

The paper, classified as an editorial, was called “Can tweets predict citations? Metrics of social impact based on twitter and correlation with traditional metrics of scientific impact.”

“Cite and retweet this widely,” Eysenbach wrote via his Twitter handle, @eysenbach, shortly after it was published in the journal. “Tweets can predict citations! Highly tweeted articles 11x more likely to be highly cited.” The author-publisher included a link to his article.

Drawing on data from the journal’s own archives, the piece argues that articles that get a lot of buzz on Twitter are more likely to be cited in scholarly articles down the line. It was an ostensibly groundbreaking contribution to the study of “alt-metrics,”[2] a burgeoning class of measurements aimed at quantifying the influence of scholarly work outside formal channels, especially on social media websites.

A day later, Eysenbach’s article had been tweeted 293 times. “Quite a twimpact!” the Toronto researcher observed on Twitter. “Should become highly cited.”

But not everybody was convinced. Phil Davis, an independent researcher and consultant, saw retweets of Eysenbach’s original pronouncement crop up several times in his own Twitter feed. Intrigued by the buzz it seemed to be generating, he clicked through.

Davis spotted a number of what he believes to be methodological flaws in the study. The Twitter activity in Eysenbach’s data set, he says, did not discriminate against multiple tweets by the same author, nor tweets that appeared to have no author at all, but instead had been generated automatically by “bots.”

He also noticed that the footnotes to the paper included what Davis thought to be an exorbitant number of references to JMIR articles, only a few of which actually referred to the content of those papers. Eysenbach had cited separately each article appearing in his data set, all of which came from his journal. Davis observed that this could be a boon for JMIR, and by extension its multi-hatted editor; alt-metrics notwithstanding, citations are still the measure by which most of the research community recognizes scholarly significance.

To redress these alleged shortcomings, Davis took to the social Web himself, penning a critical posting[3] on Scholarly Kitchen, an academic publishing blog to which he contributes. Would he have done through the trouble of penning a formal letter to the editor if he had encountered Eysenbach's paper in the days before blogs? “Probably not,” he said in an interview.

In the posting, Davis argued that the limitations of the study did not control for some of Twitter’s inherent vagaries, and that the actual implications of the data failed to justify the exuberant shorthand Eysenbach had used when evangelizing the piece on Twitter. Eysenbach's exclaiming that Twitter predicts citations, Davis wrote, “makes a great 140-character headline but needs much more context for interpretation."

Eysenbach in fact does offer a more nuanced take in the article itself. “Correlation is not causation,” he writes, “and it is harder to decide whether extra citations are a result of the social media buzz, or whether it is the underlying quality of an article or newsworthiness that drives both the buzz and the citations -- it is likely a combination of both.” The two largest sections of the paper describe the “limitations of Twitter-based metrics” and the “limitations of this study”; taken together, these sections are six times longer than the “principal findings” section, and total 12,025 characters.

In an interview with Inside Higher Ed, Davis acknowledged that Eysenbach in his paper had taken care to qualify his findings with the appropriate caveats. But Davis says he was irked that none of that nuance came through in Eysenbach’s tweet of the article -- nor the hundreds of tweets that followed. “There was just this dearth of substantial evaluation of the article,” he says. “It was getting a lot of media attention, but it wasn’t clear that anyone was reading past the title and abstract.”

And so Davis blogged, attacking both Eysenbach’s methodology and his potential conflicts of interest vis-à-vis the paper’s excessive, self-referential citations. Davis also noted that Eysenbach had registered several Web domains — twimpact.org, twimpactfactor.org, and twimpactfactor.com -- whose value seemed contingent on the correlation being legitimate. (Eysenbach discloses the conflicts in an appendix.) “With so many sources of potential bias,” Davis wrote, “it becomes hard to separate the paper’s contribution to science from its contribution to entrepreneurship.”

By the time Davis’s posting went live on Scholarly Kitchen, Eysenbach’s article had been on the Web for almost three weeks and its Twitter buzz had swelled to 527 tweets. Davis countered by immediately tweeting a link to his critique from his own Twitter account, @ScholarlyChickn. Davis tweeted the link again several hours later, adding: “When author is editor and shareholder, self-citation behavior is questioned. Ethical?”

Peer Review Reconsidered

Eysenbach says his multiple roles, the narrow sample size, and 67 self-citations were all appropriate because he had classified the paper as an editorial, not a proper research article. “An editorial clearly talks about journal-related matters,” he told Inside Higher Ed in an interview.

Still, Eysenbach sent the paper out to be peer-reviewed before he published it. One of those reviewers was Jason Priem, a graduate student in library sciences at the University of North Carolina at Chapel Hill and a well-known advocate for alt-metrics. “BIG #altmetrics news: Highly tweeted articles 11x more likely to be highly cited,” Priem had written from his Twitter account, @jasonpriem, when Eysenbach first published the article. (@Esyenbach re-tweeted this.)

But when Priem read Davis’s critique on Scholarly Kitchen, he realized he had overlooked the self-referential citations during his review. He posted a reply in the blog’s comments section. “I wholeheartedly agree that Eysenbach’s choice to cite his data points was a very unfortunate one, and taints the rest of the paper,” Priem wrote. “I certainly hope he will be quick to publish an erratum, moving the articles to supplemental data where they belong.”

In an interview with Inside Higher Ed, Priem took the blame for the oversight on behalf of himself and his fellow reviewer, who is not named. “That was our bad,” he says.

“I definitely thought it was illuminating to see that my name was attached to this and that I missed something that I should have caught,” Priem later told Inside Higher Ed. “I definitely sort of had that feeling, people will think I’m a terrible reviewer.”

Later that day, Eysenbach did post a correction[4] on JMIR, explaining that he had removed the offending data set citations. He assured readers that Thomson Reuters, whose citation-based “impact factor” is widely known as the metric-of-record for assessing journals, had not indexed the old version.

“Having to remove references from a manuscript to preserve the validity of a journal-level impact metric is somewhat troubling,” Eysenbach added in the correction, “but if anything, then this perhaps illustrates the limitations and tyranny of the impact factor, and why we should consider additional metrics.”

The next morning, the author-publisher joined the discussion thread at Scholarly Kitchen. In a 10,366-character rejoinder, Eysenbach challenged, point by point, the criticisms of Davis as well as those of David Crotty, a senior editor at Oxford University Press, who had posted his own raft of criticisms. (Crotty emphasizes that his views are his own and do not represent those of Oxford University Press.)

Eysenbach pointed out factual errors in each critique. Davis had asserted that Eysenbach had declined to have the article independently peer-reviewed; he had commissioned Priem and another to do just that. Crotty had said he was “disturbed” by the study’s lack of negative controls. Eysenbach had in fact included such controls, an oversight Crotty later acknowledged.

“Some of the things I wrote initially were wrong,” Crotty said later in an interview, adding that he had been writing from memory at the time. “It pointed out to me that if you’re really going to do this right, you have to do the work. You have to have the paper in front of you and be going through, line by line.”

The ensuing discussion on the Scholarly Kitchen discussion thread went on for 17 additional comments totaling 40,264 characters — almost all belonging to Davis, Crotty and Eysenbach. “When I see online academic arguments turning ugly,” tweeted @jasonpriem, “I always want to post…” Priem then linked to a photograph[5] of four adorable kittens, with text superimposed in large block font: “CALM THE FUCK DOWN. LOOK AT THESE KITTENS. LOOK AT THEM.”

By coincidence, several hours later, Aaron J. Barlow, an associate professor of English at the New York City College of Technology, gave a talk at the Modern Language Association’s annual meeting at a session called “The Future of Peer Review.” Barlow was giving his pitch for post-publication peer review -- an ascendant model of scholarly communications that is based on the idea that exposing papers to the open Web and debating their merits in public is the best way to separate the wheat from the chaff.

“We have not yet institutionalized post-publication review, though why that is baffles me,” said Barlow at the session. “Wouldn’t the number of citations, reviews, and other responses give as strong an indication of the value of an essay as its original venue? Stronger, I’d say. Especially since, in a digital environment, even the published essay can be improved in light of comments and criticism. And should be.”

(After his talk, Barlow posted a transcript[6] of his remarks on his own blog. Then he tweeted it.)

Over at Scholarly Kitchen, the parrying over Eysenbach’s Twitter article ended with the author-publisher calling for a temporary return to traditional channels. “Sorry, but I have to withdraw myself from this discussion on this site,” he wrote. “I am all for a scholarly debate, but not at this level.” He invited Crotty and Davis to send a letter to the editor.

Eysenbach stands behind his Twitter study and maintains his faith in social media as an important medium for scholarly exchange. But in an interview, Eysenbach said he thinks in this case the debate would have been better-served if Davis and Crotty had written to him in private before upbraiding him on a blog.

“There were a number of simply wrong statements, and I would have liked the ability to clarify this in private first,” Eysenbach says. One hazard of having such debates first in the public eye is that “if there is some critique of something, and you don’t respond immediately even if you respond one or two days later, it’s as good as no response [at all],” he says, “because by that time the damage to your reputation may already be done.”

That and it can be time-consuming. “It can very easily be a task you cannot manage,” Eysenbach adds, “because if a paper really has a huge impact on a lot of blogs, to monitor all this and to respond to everything [takes time] — a lot of what’s written in blogs really requires an author’s input.”

The 140-character limit might be too constrained to convey the nuances of scholarly research, but other public-facing forums might not be constrained enough. Crotty agrees that debating the minutiae of scholarly papers can have diminishing returns in the unrestricted format of the Web, and can sometimes come at the expense of a researcher’s more pressing work. “A scientist’s job is to do science,” he says. “There’s a real question there: how much time and effort can you really expect?”

Maybe counting scholarly citations is the only practical and efficient form of post-publication appraisal, Crotty says.

To Priem, who hews to his advocacy for alt-metrics and open peer review, this sort of public airing poses the greatest challenge not to scholars’ schedules, but to their egos.

“As academics, at a professional level, we make our living based on our expertise and convincing other people that we’re experts,” Priem says. “But also, on a personal level, as academics we tend to build a lot of our self-image around our expertise. And so when that’s questioned, or when we make mistakes in that area, it can be an unsettling sort of vulnerability.”

At the time of this writing, Eysenbach’s article has been tweeted almost 900 times. (How many times it will be cited in scholarly articles remains to be seen.) Davis’s critique on Scholarly Kitchen has been tweeted 246 times. Eysenbach’s correction has been tweeted 35 times.

For woefully truncated teasers linking to the latest technology news and opinion from Inside Higher Ed, follow @IHEtech on Twitter[7].