Post navigation

Attention conservation notice: I’ve basically done this same post before. But as a blogger, I’m never too proud to put old wine in new bottles. Plus, historians of science will tell you that scientists win scientific debates in part by repeating themselves over and over. And I like to win. 🙂

*******************************

Nobody can read more than a tiny fraction of all the scientific papers published every year. And that fraction is getting tinier as more and more papers are published while the length of the day and the speed at which people read remain fixed. So everybody needs filters to decide what to read (even what abstracts to read), and what to ignore. This post is about different ways of filtering the literature, and questioning whether they’re really so different.

I mostly use pretty traditional filters. My main filter is to scan the titles of new and forthcoming papers in selective journals in general science, ecology, evolution, and philosophy (especially philosophy of science). I keep very close track of what’s coming out in journals that have a good “batting average” for publishing papers I want or need to read (mostly highly-selective journals like Nature, Science, PNAS, Ecology Letters, Ecology, Am Nat, Evolution, etc.). I keep less close track of what’s coming out in journals that have a lower batting average for me (e.g., Journal of Theoretical Biology, Theoretical Population Biology, Genetics). If I see a title that looks interesting, I read the abstract, and based on the abstract I decide if I want to read the full paper in detail. So if something’s not published in a selective journal I keep an eye on, it’s quite possible that I’ll miss it. Which is a problem, but I don’t worry about it too much because any filter would have the same problem. No filter is guaranteed to sift out all and only those papers you want to read. I have other filters, but this is the main one.

As I’ve discussed in previous posts and comment threads, my filters work for me.* They work because there are many other people who filter the literature in roughly the same way I do. That is, there are many other people who read, review for, and submit to the same selective journals I do. So in using “what’s published in selective journals” as a filter, effectively what I’m doing is trusting the collective judgment of the many colleagues whose scientific interests and values broadly overlap with mine. I’m trusting them as authors to submit what they regard as their most novel, interesting, and important work to journals like Science, Nature, and Ecology. And I’m trusting them as reviewers to only recommend to journals like Science, Nature, and Ecology papers that seem to them to be especially novel, interesting, and important compared to the bulk of ecology and evolution papers.

An alternative perspective on filtering the literature starts from the observation that reviewers often disagree a fair bit about what’s “interesting” or “important”. It’s often further argued that what gets published in leading selective journals is determined by “salesmanship” on the part of authors, and that journals like Science and Nature (and in ecology, Ecology Letters) are showy, insubstantial “glamour mags” (see, e.g., the comment thread on this post, or this post and the comments). As an alternative, it’s often suggested that reviewers should just evaluate technical soundness, with post-publication filtering being done (either in part, or entirely) via social media. For instance, you might choose to read papers tweeted by people you follow on Twitter, papers linked to by your friends on Facebook, papers recommended by people in your Google+ circles, papers linked to by blogs you read, etc. (Or maybe you read papers that have been read or downloaded many times from the journal website, or you do Google searches and then read the top hits. Those filtering methods wouldn’t ordinarily be termed “social media” methods, but I’m going to lump them in with “social media” because in practice they have the same consequences, as I argue below.)

Traditional filters (“consider reading what’s in leading selective journals”) and newer social media filters (“consider reading what people are blogging about, tweeting, sharing, linking to, and downloading”) sound very different. But are they? After all, relying on social media filters still comes down to relying on the judgment of your colleagues. You’re still relying on the fact that, collectively, they can read a lot more than you can. You’re still relying on them to point out to you papers that are particularly interesting or important. After all, if someone just tweeted a link to literally every new paper about ecology and evolution, would you find that useful? Of course not. Same if someone just tweeted random papers. So it’s not as if newer filtering methods eliminate inevitably-somewhat-subjective judgments about what’s “interesting” or “important” so much as relocate them. And of course, reading what’s been read or downloaded most, or the top hits in a Google search, is just an indirect way of relying on the collective–but nonetheless still subjective–judgment of your colleagues as to what’s most worth reading.**

So isn’t this really just a case of po-tay-to, po-tah-to? A case of differences that really don’t make much of a difference?

Now, of course, for any given person, one filtering method might work better than another, in the sense of revealing more, and missing fewer, papers that the person in question wants to read. Traditional filtering methods work better for me than other methods would, but I’m sure the opposite is true for others. But which method will work best for a given person is surely an empirical question, and likely one without a general answer, at least at the moment (at some point in future, traditional filtering methods may well just fail, but that’s not an argument for giving them up now if they currently work for you).

And in aggregate, both ways of filtering have pretty much the same consequences for science as a whole. For instance, citations in the scientific literature are highly concentrated, and they’re becoming more concentrated. A small fraction of papers get a disproportionately large fraction of the citations. That’s an effect of traditional ways of filtering the literature. Lots of people agree on what the top journals are. And they read, submit to, and review for them in large part because they’re the top journals. So the small fraction of the literature that’s published in those top journals gets a large fraction of the attention, and thus the citations. But citation concentration wouldn’t go away if we all switched to filtering the literature via social media. The popularity of pretty much everything in social media, or online more generally (or even offline!), has a highly skewed frequency distribution. A small fraction of blogs have massive readerships while most blogs have small ones. On any blog (including this one), a small fraction of posts draw huge numbers of pageviews, while most draw small numbers. A small fraction of Twitter users have massive followings, while most have few. A small fraction of YouTube videos garner huge numbers of views, while most garner few. A small fraction of Plos One ecology papers garner huge numbers of views, while most garner few. A small fraction of books become bestsellers, while most sell very few copies. A small fraction of movies garner a large (and increasing) fraction of the box office. Etc. In my experience, advocates of post-publication social filtering often bemoan the fact that papers in “glamour mags” are widely-read and often-cited just by virtue of where they’ve been published. But if anything, post-publication filtering is likely to lead to greater, not lesser, citation concentration (or concentration of any other attention metric–views, downloads, shares, etc.). Since rather than acting independently the way authors and pre-publication reviewers do, people often link to, share, tweet, and blog about papers others have already linked to, shared, tweeted, and blogged about.

Note that, so far, I’m not arguing that social media filters necessarily tend to promote any particular sort of work–good work, poor work, flashy work, whatever. I’m just saying that they tend to promote citation concentration (or more broadly, “attention concentration”), without making any claims about the properties of the papers that end up garnering the bulk of everyone’s collective attention.***

So now let’s consider the issue of whether different filters promote different sorts of work. Maybe both ways of filtering do indeed promote “attention concentration”, but social media filtering does a better job of concentrating our collective attention either more accurately on “objectively better” papers, or at least more precisely (i.e. consistently concentrating our attention on papers with certain properties). Maybe–but probably not. Ace social scientist Duncan Watts and colleagues have done massive, well-designed, properly controlled experiments asking whether the collective, evaluative judgements of people sharing information via social media are more accurate or repeatable than the evaluative judgments of individuals acting independently, and the answer is that they’re not. If you think that what gets published in Nature or Ecology Letters is a crapshoot, well, it’s no more of a crapshoot than which papers would go viral in a world in which everything was published in one place and then filtered via social media. And please don’t try to push back against this claim by citing data on which altmetrics (retweets, shares, downloads, whatever) best predict future citations, because future citations themselves are just one more metric among others. There’s no reason to think of future citations as an “objective” measure of the “interest” or “importance” of a paper, and then judge filtering methods by their tendency to pick papers that will accumulate lots of citations in future. Unless you really want to insist that the best filtering methods are whichever ones best predict bandwagons.

And if you say, well, at least a world in which all filtering is done post-publication via social media is a fair world, in which everything gets a shot at going viral, I’d respond, how is submitting to pre-publication peer review at a selective journal not getting a fair shot? Anyone can submit anything they want to any journal. How is hoping that a widely-read selective journal will publish your paper any different from hoping that, say, a widely-read blogger will blog about your paper, or that your paper will become one of the few to go viral on Twitter, or whatever? My question here is not at all hypothetical. For instance, in economics (which has a much larger and more active online community than ecology), everyone pays attention to what Mark Thoma and Marginal Revolution link to. Drawing a link from one of those sites is sure to send massive numbers of readers to your economics paper or blog post. How is hoping from a link from Mark Thoma or Marginal Revolution any different–in particular, any more fair–than hoping that the editor at Science or Nature likes your paper?

Look, I get that it really bugs scientists, who work so hard to be objective, and to tease out subtle signals from random noise, that the readership of their papers–and thus, ultimately, their own careers–might inevitably depend in part on subjective criteria and random chance. But unfortunately, there’s no changing that. Here’s why:

Nobody can read everything

Everybody wants to read interesting and important papers

Everybody thinks some papers are more interesting and important than others

Everybody wants and needs to pay some heed to what everyone else thinks is interesting and important. (Science is a communal activity. You can’t remain willfully ignorant of what everyone else in your field is reading. Nor can you define “your field” so narrowly that it only includes you, or you and your friends. Not if you want a job or grants, anyway.)

People don’t always agree on which papers are most interesting or important.

Those ingredients are, I think, sufficient to ensure that a small fraction of papers will always garner a large fraction of the attention (however measured–citations, downloads, times tweeted or shared, whatever), and that the identity of those papers will always be determined both somewhat subjectively, and somewhat randomly. And no alternative filtering method is going to change any of those ingredients one bit (and I’m not the only one who thinks so).

So just use filtering method works best for you, and don’t worry about the consequences of your choice for science as whole. Because all filtering methods have the same consequences for science as a whole.

*If they ever stop doing so, I’ll change them.

**There are differences in detail of course. For instance, in relying in part on the judgment of pre-publication peer reviewers, I’m relying on the judgment of people who have read papers in detail, which strikes me as a good idea. Collectively, those peer reviewers also represent a larger and more random sample of all ecologists than, say, the set of all ecologists on Twitter or Facebook (especially since anyone on Twitter or Facebook only follows or friends a non-random subsample of Twitter or Facebook users). So my way of filtering the literature may well be more effective at finding me stuff I didn’t even know I wanted to read, and better for helping me avoid groupthink and bandwagon-jumping. And conversely, I’m sure folks who filter the literature differently than I do can cite differences in detail that favor their way of filtering the literature. But I do think those details are just that–details–which I why I didn’t focus the post on them.

***Although I note with amusement that this is the most-viewed Plos One ecology paper of all time. Whatever the undoubted virtues of this paper, I think you’d be hard-pressed to argue that it’s been read 281,000 times because of the “objective” importance of its science, and not because of the first word in the title. 😉

A group of us at Davis are trying out a new system in which everyone is assigned a journal or two and at our weekly theoretical ecology coffee break everyone brings up the relevant papers that got published in their journal this week. This system that you have a group of similarly-minded people reading, but it makes for a more even-handed “peer-filtering” system. It could probably be transferred to a virtual system, and I wonder what the optimal scale might be.

Funny that you mention scaling! I largely agree with your piece here — both systems are essentially social filters and both have flaws. I my understanding, the critique against the “traditional” filter is a question of how does it scale as the number of publications grows exponentially?

Having been using both forms for several years, my personal experience reflects your argument here — traditional filters discover more of what I end up reading than social media filters. (I tag the source of all articles I add to my pdf library, so I know if I discovered the piece from twitter, a table-of-contents, etc. I’ve also learned that I discover more from bibliographies then I thought, and recommendations from humans (e.g. mentors, colleagues) provide many of the most useful things I read).

Despite this, in the long run the traditional filters don’t seem to scale. How can a half-dozen journals manage to review all the best literature? Networks can scale, a fixed set of gatekeepers cannot.

You make the excellent point that on aggregate, social media filters scale poorly too — they concentrate results on a least-common-denominator appeal. But this is to miss the point of the social filter, which I can tune in a way that I cannot tune traditional filters. By choosing who to follow, I can be sure to discover Don Ludwig’s wonderful piece on exit times from 1981 while entirely failing to hear all the chatter about whatever nonsense hits the most tweeted charts.

I’d say traditional filters can scale if authors self-filter. But if everybody sends everything they write to Ecology Letters or whatever in the hopes of getting lucky, then yeah, at some point traditional filters break. We’re not at that point yet, in part because growth in number of publications also is accompanied by changes in the composition of publications. For instance, it’s my understanding that a substantial fraction of all submissions to Nature now come from China–and that the vast majority are rejected instantly. But I agree that at some point, traditional filters like mine will probably break.

Your point about the ability to customize and tune one’s social filters is an interesting one. Do you think there are downsides to it? Do you think it would work equally well for everyone? Doesn’t it depend on there being a critical mass of people on Twitter who share your interests and standards? Not that traditional filters don’t depend on having a similar critical mass of people with shared interests and standards, of course.

Maybe the other thing that will happen in future is that, as the amount of stuff worth reading totally outstrips anyone’s ability to read it, we’ll just give up on the idea of keeping up with the literature. Maybe instead we’ll just take the attitude that anyone who’s read a reasonable amount of *something* to do with ecology is well-informed about ecology. As well informed as anyone can be in an age of information overload. We’ll still be able to count on a minimal shared background of knowledge (everyone’s been taught really big ideas and really important facts, like natural selection and the latitudinal richness gradient), but beyond that, there’ll be no expectation that anyone else has read *anything* that anyone else has read. I don’t know, I’m just speculating wildly here…

A social filter will only be as good as your network of course. No reason why that network has to be twitter — technology infrastructure just facilitates networks. Well connected people have benefited from the social filters of literature before the twitter, email, or even academic journals at all, and even the most primitive can scale as both researchers and publications increase, in a way that a fixed n gatekeepers cannot (even if people only send the best work). Clearly twitter, etc helps the social filter scale much much faster, and I think the adoption of such networks will eventually become as ubiquitous as email for these reasons.

But like you say, the whole idea of reading papers itself doesn’t scale indefinitely. While we’re both speculating, I predict this is why things like publishing data and publishing code (along with adding machine-readable semantics in papers) is so important. Data scales (at least it can, if it is open and properly documented). Code scales (same disclaimers). If I have a new method to, say, detect early warning signals, I should be able to (and expected to) automatically test it against all existing data sets and methods so far proposed, regardless of whether I had read every paper proposing a method or not. A scientific literature in which that was possible would be truly scalable, reproducible, and discoveries would advance at a pace and a level of robustness that we cannot even approximate today.

I am not sure if I completely agree with @cboettig comment on scaling eventually making reading papers irrelevant. I can imagine where the motivation would come from in empirical based fields, but in theoretical or philosophical areas, I can’t see doing away with papers (or something like them). These papers usually share a new way of thinking about something and not just some mapping from data to prediction (or postdiction).

Although there is a problem of scaling, I think it will be overcome with good machine ranking of papers. Of course, in many ways this is just social filtering, except your social circle is a machine learning algorithm instead of other people. This has the potential to really allow you to see exactly the papers you want, but that is also a risk: getting locked in your own filter bubble.

It would be nice to see some of the journalistic writing bots updated to the point that they can synthesize several of the most relevant papers into personally tailored mini-surveys. I wonder if there are folks working on this.

Oh dear, I never meant to imply that we wouldn’t read papers in the future — only that we would start to value other contributions more. I agree entirely with @Artem that machine reading of papers will be a useful but imperfect solution.

People will always read and write papers, and there will always be key papers that change a field.

But for the remaining 100-1e-6 % of publications, we will recognize that writing a paper is not the most valuable way to make a contribution that will be forever lost in the deluge. Instead, such knowledge will enter the scientific ecosystem by a route that can scale more effectively — data repositories, contributions to code bases, etc. In between times we will both write a paper announcement and deposit data. But only the latter will most effectively scale to the whole scientific community, while the former gets a decreasing audience as scientific output increases. Incentive structures should reflect that.

Whoops, my bad Carl. I slightly misunderstood you. But only slightly (I hope!). After all, the vision of the future you describe certainly is very different than the ways things are now, and certainly has a much reduced role for reading and writing papers.

no worries, that’s what I get for choosing to pitch an extreme view in the first place! It’s more of a thought experiment (like studying the limiting cases of any other model) than an actual prediction. Who knows how things will change? The only real point is that publishing data and publishing code do not experience this filtering problem quite as severely as papers do, which I fear we sometimes overlook in our desire for impact.

Re: the desire for impact, one question I have about the “limiting case” that you envision is who writes that really small percentage of papers that everyone still reads, and that shape the direction of entire fields. If there are big rewards associated with writing such papers, then I wonder a little if the “limiting case” you envision isn’t in some ways just an extrapolation of current trends. Won’t everyone still be spending lots of time and effort trying to write one of those really rare high-impact papers? Or will we get to a point where high-impact papers become such a small fraction of the literature, that most people eventually decide that which papers become high-impact really is a crapshoot, a lottery? And then those people stop playing the lottery?

I note that there’s a potential analogy here to the issue of growing income inequality in economics. And I’m not going to pursue that thought any further! Fools rush in and all that…

Nah, — because in my fictional limiting case scenario the scientific community recognizes that almost all data contributions are useful, some data contributions prove to be particularly brilliant, while most publications are neither brilliant nor useful. I merely invert the scenario of today. Replace your “high-impact paper” with “high-impact data”. Certainly very high impact data exists today (whole genome assemblies may fall into this category, or key software like MrBayes) but does everyone spend all their time trying to get some really valuable data in KNB? Or to place really awesome software on Github?

Now it’s my turn to clarify! My musings were specific to papers. I certainly wouldn’t think they’d apply to software, datasets, etc. As you say, some software and datasets are much more widely used than others–but nobody tries to, say, write really high-impact software.

Which is kind of what I’m wondering. Will there come a point in future where people’s attitudes about papers shifts, so that it’s more like the attitude about shared code and datasets?

Or in the future, will things shift in the other way? Will people start thinking about code and datasets more in the way they think about papers–really valuing the rare, high-impact software packages and widely-used datasets?

I ask this in part because it’s hard to imagine a future in which individuals aren’t somehow judged–and yes, ranked–based on the scientific work they do. If for no other reason than for purposes of hiring and promotion. In the “limiting case” you envision, what do you think are the qualities that employers (academic and non-academic) will look for in future scientists, and how will they assess those qualities in their current and potential employees?

In my limiting case, I imply that high impact papers will still exist, but despite that, there won’t be the rush on writing potentially high-impact papers that reduces everything to a lottery, because they won’t be incentivised so ridiculously. In the limiting case, give all publications the status that data sets would be given today: if you are edgy, you might list them on a CV, but review committees and grant committees don’t care so much about them. To extend the limit, say there are no journals for publishing papers, just a few new non-selective “paper repositories” that are starting to catch on in some circles. People would still write papers, out of a necessity to communicate and explain the contributions, and really good papers would still exist, just like really good data exists today even though data is generally unrewarded.

Meanwhile, CVs, NSF panels, hiring committees, etc would focus primarily on demonstrably good, well-annotated data (or perhaps software, mooc teaching syllabi, or whatever other scholarly outputs can scale to the community…). If you happened to have written a paper everyone knows about, you might list it, just as you would list a widely adopted piece of software today. But in a world where ideas are a dime a dozen, academic review would focus on the scalable contributions to science, and measure that impact. Yes, this would create the scamble competition to have the most widely used data or software or mooc, etc, which would need to be filtered and ranked etc. But unlike today’s scenario, when as a by-product of that competition all the databases fill up with tons of unread data as a result of this deluge, the information isn’t lost — we all benefit.

Copyright

(C) 2011-2017 by the author of each individual post (specifically Jeremy Fox, Meghan Duffy, Brian McGill, or as otherwise noted at the top of each post).
The copyright holders have made these posts available on the Dynamic Ecology website at the present time for reading and commenting to benefit the scientific community. Hypertext links to posts which transfer readers to our website are also welcome. However, the authors retain all other rights to the posts including the rights to republish elsewhere and to charge for access. The authors also prohibit other uses including copying or republishing entire or substantial portions of posts without the author's permission, but do allow quoting small sections as allowed by fair use law for purposes of commentary and criticism.