Tuesday, February 05, 2008

Peer Review IV

Specifically, there seems to be a trend to increasingly cite horizontally instead of mostly vertically. What I mean with horizontal citations are citations of related works that go back to the same initial ideas or concepts, but are neither actually necessary to understand the content of a paper, nor have they investigated a closely related aspect of a problem. Citing horizontally is attached with a lot of politics. People cite others horizontally to be polite, because it seems smart for networking reasons, because they hope the favour will be returned, as a reply to annoying emails, or just because they believe conventions require it.

As an example, see 0706.3155 ( "Collider Phenomenology of Unparticle Physics" by Cheung, Keung, & Yuan). In the more advanced version one quotes simply as

"Many studies have investigated the implications of ... [11]."

or

"Various considerations of ... have recently been developed in the literature [8] [9]."

And then clumps together 25 papers in citation [11], see e.g. 0801.1534 ( "Unparticle Self-Interactions and Their Collider Implications" by Feng, Rajaraman, & Tu). For an extreme version, try reference [8] of 0801.0018 ( "Unparticle physics at the photon collider" by Kikuchi, Okada & Takeuchi) which fills more than a whole page, and is probably just a complete list of what an arXiv keyword search brought up.

This kind of citation seems to be especially common in the first some years after a topic received interest, but has a cut-off length that I'd estimate to be somewhere around 50 papers where it simply is no longer feasible. For example in the first years when black holes at the LHC where hip ('01-'04 or so), citations of the sort "in a number of recent papers people have studied..." (e.g. hep-ph/0405054"SUSY Production From TeV Scale Blackhole at LHC" by Chamblin, Cooper & Nayak) were quite common, but around 2005 references condensed to review articles and the few papers where the idea originated.

Also, in my impression the more established researchers take horizontal citing less seriously (who are you to request I cite your paper?).

In contrast to this I mean with vertical citations the papers that were actually used for a new publication, that are necessary to its understanding (whether they are sufficient to understanding is a different issue), or previous work on the same topic (even if unfortunately unknown to the authors during writing). Of course scientists need to pay proper credit to other people's works, and to back up arguments with references. But should they just group-cite 'various considerations'?

Reasons

This kind of citation was a very useful feature in the days before one could do a keyword search in a database, or click on 'cited by'. Horizontal citing serves the purpose to let the interested reader know who else has worked on a given topic and what other related studies have already been done. However, this is a good example where one sees how technological improvements together with the increase of our community can result in developments that have unwanted consequences.

Consequences

Whether we like it or not, the citation index of a researcher matters to his or her career. If many people cite horizontally out of politeness - possible often without even reading all papers themselves - it encourages fast publications on hip topics. These works contain more horizontal citations, which makes the topic look even more like the place to be. Most importantly, researchers have to act fast to be among the earliest papers because then they makes it onto the citation lists if those who come later. A mechanism like this is called positive feedback: interest causes increasing interest. Nowadays, one can literally make a living out of jumping on and off topics with a good timing.

An effect like this can considerably distort scientist's judgement on which areas they regard worth spending their time on.

Another annoying side effect is that people try to get citations, just because it seems possible, and because every single one improves their cite index. As a result, if I put a paper on the arxiv, the following day I will receive several emails of the type

Depending on temper people request more or less bluntly to be mentioned in my reference list. In some cases these references are interesting and might be useful for later papers. In rare cases I did indeed miss a previous publication on the same topic, which is as annoying as embarrassing. In most cases though, people seem to send these emails for no other reasons than that the title or abstract of my paper contains a word that appears also in their paper. I actually knew a guy who wrote a script to check the new arxiv submissions for keywords and to produce emails like the one above.

And you know what? I can't even blame people for doing this. Even if chances are low, if you send out enough annoying emails one or the other recipient will just cite you, and isn't that what matters*? It's one of these cases where the incentives lead people to focus on meeting secondary criteria (high citation index) instead of primary goals (good research). For more on primary goals and secondary criteria, see The Marketplace of Ideas.

Peer Review

Although peer review does to a certain degree ensure relevant previous publications are appropriately mentioned (restrictions apply), it rarely happens that it is pointed out to the author he has plenty of redundant papers on the reference list. What people put on the arXiv is their business, but if it was clear peer reviewed publications wouldn't support the citation of only weakly related papers this trend would calm down considerably. That's why I think peer review would be the place to address the issue.

How to

If you don't want to cite everybody, don't cite anybody. It sounds silly but it seems people get easily pissed off if one cites a colleague who has worked on topic X, but not themselves. If one doesn't cite the colleague either, it doesn't bother them. Just sticking stubbornly to the publications that were actually used and are relevant to a work seems to be acceptable (and if that isn't sufficient blame it on a page limit). It has the drawback however that colleagues are less likely to return the favor and cite you - Science or Sociology?.

Bottomline

In times where keyword searches and 'cited by' queries are possible, horizontal citations are unnecessary. They have however the side-effect of causing a positive feedback on fashionable topics that can distort objectivity.

Disclaimer

Nothing of what I've speculated here is backed up by an actual study, it's just my impression. It would be interesting to see an analysis of the citation distribution with regard to the cut-off length of clustered citations. I am not criticising the content of any of the papers I mentioned above (that in fact I didn't even read).See also: Peer Review II, III and the related posts Science and Democracy I, II, and III.* Footnote to the younger readers: That's meant in a sarcastic way, please don't take it seriously. You can easily spoil your reputation with that kind of behaviour. Nobody wants to work with somebody who is just incredibly annoying and self-centered.

30 comments:

horizontal and vertical citations - that's a very compelling labelling for the phenomenon you describe!

I guess that actually horizontal citations may be actively encouraged by guidelines stating that introductions should give some general background on a paper, and describe its setting in relation to current research.

Which brings into play what you describe in two of your remarks: If there are no review papers yet, this leads to these long lists exactly for the reason that "if you don't want to cite everybody, don't cite anybody".

So maybe editorial guidelines would be the best way to deal with this, and to find some sensible compromise between what is necessary to make clear the setting and a flood of quotes?

I would very much appreciate if papers more often had helpful introductions, and for that purpose I wouldn't mind at all a lot of citations! But just writing things like 'many people have worked on this' or 'various considerations have been developed' is completely useless. These are the cases the above post is about. Best,

Thanks for pointing out those reference lists - I pass over unpapers on unphysics and the unphenomenology of unparticles at uncolliders. This subject has been a citation factory since it started but I didn't realise it was that industrialised.

One thing that I think would help a lot in curbing this tendency would be if SPIRES could do a statistic where they normalised the number of citations each paper makes, so a citation from a paper with one hundred references is treated as one fifth of one from a paper with twenty references.

But just writing things like 'many people have worked on this' ... is completely useless.

I just had a look at the first paper you mentioned, and indeed, I see your point! This formulation at the end of the general introduction is really not very helpful.

Hi piscator,

your suggestion of normalisation by the total number of citations may be a bit unfair towards review papers or people who include helpful comments/description about lots of paper they cite and describe.

OK, point taken. Combining lots of papers in one cite is weird. I think though that for many of the articles in e.g. American Journal of Physics, referencing a bunch of articles that explore the same theme makes sense. That's because AJP is educational/survey/historical etc. BTW is AJP the only high-end journal of its type? (Well, in USA, I assume there's quite a few elsewhere. Naming some of them would help too.)

PS: What really is "unparticle physics"? I looked at http://physorg.com/news100753984.html and I still don't really get it. Like the Stephen Wolfram stuff? What does anyone think of all that, and is it mostly just a new "interpretation" with no new predictions? Does it fit in with Max Tegmark-ish ideas that the universe literally "is" just a "mathematical structure"? tx

“In times where keyword searches and 'cited by' queries are possible, horizontal citations are unnecessary. They have however the side-effect of causing a positive feedback on fashionable topics that can distort objectivity.”

The whole thing boils down to what papers have high relevance and which do not. It is interesting to note that when Google’s Page & Grin faced this dilemma they used the human quality of boredom as a way to sort through it all. In their synopsis of what constitutes the ranking system in the search engine they used this as a key factor as to what constitutes being relevant. They called it “Intuitive Justification”, which they outline as follows:

“PageRank can be thought of as a model of user behavior. We assume there is a "random surfer" who is given a web page at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. And, the d damping factor is the probability at each page the "random surfer" will get bored and request another random page.”

It should be considered that one component that constitutes relevance is interest, thus a good indicator that it is lacking such is boredom. I would submit then that any paper that is actually opened rather then having simply its abstract opened has more relevance. Also, any paper that is actually down loaded would have still more relevance. I would also suggest that any paper that is over referenced as you describe would have a tenancy to be more boring and would not be downloaded. Perhaps Larry and Sergey could actually help out in all this if their observation were applied to ArXiv ratings and rankings.

One technique I learned from a more senior collaborator to reduce the amount of anger due to these quasi automated "cite me" emails is to include the reference without ever looking at the paper. This can be very satisfying.

Additional points can be earned by attaching the \cite at useless places like "Physics has also been considered in\cite{Moron:90xy}." or attaching it to random technical terms "String\cite{Dorfnats:93uy} theory is compactified on ...".

interesting, i too do this, though I preferably use German (\cite{bloederdepp} \cite{quatschkopp} \cite{murkspaper}). one never knows who downloads the source code, though the risk is half of the fun ;-)

something else that I just remembered: some years ago I received an email saying 'thanks for citing my paper A. but if you cite A you also have to cite my papers B, C, and D'. upon which i took out citation A and wrote back I considered what he said, and decided citing A is unnecessary, so I wouldn't cite any of his papers. never heard of that guy again.

Of course to me this smacks of some Woitian dissension about how the process is worked. I heard the cry some time ago about who was in charge.

Some carry the torch then?:)

But that you choose the Unparticle scenario is interesting. I would have pointed as well to your previous post Bee as well as the one by Howard Georgi when Neil asked.

But it was something else that triggered the response in terms of the Koch Fractal. What weight do you apply to new concepts that are introduced and then are measured according some tone set by a researcher that feels they(unparticle research) are in some association with somebody who has an agenda lie the Templeton group?

There has to be a clearing of such dissension, while it is not transparent to the rest. Honest "structured integrity" would no doubt help in that respect.

I am not carrying anybodies torch. The fact that I'm not the only one noticing these developments have severe drawbacks is the only source of my optimism. I find it quite astonishing, again and again, how many people complain about 'the system' but then go an willingly work in it, shrugging shoulders and saying 'that's just the way it is'.

who is involved with Templeton? The reason why I chose the Unparticles is just that it currently seems to be the latest trend. Two years ago I might have chosen AdS/CFT or swhatever. The tendency of these trends though is to become more pronounced the better people 'optimize' their strategies. The problem with the horizontal citations that that scientists have to struggle 'floating' on top. Who drops off the reference lists at some point has a hard time getting in again, since these lists get passed on (copied/reused). Best,

Imho, I don't think that I agree that 'horizontal' citations (to use your terminology) are not needed. I think these can be helpful when used in moderation (especially in, say, the introductory part of your paper). Why go search on Google when you are reading the paper right then and there?

Please read the context and my comment above. What I tried to say is that these citations are useless if they are just listed as 'further stuff []', I would have no problems with detailed introductions that would explain ref [] did that and ref [] added this and ref [] did something nobody knew what to do with, and ref [] criticised ref [] etc. (The question would then be, do you need that in every paper?)

Almost all journals list keywords on the paper, you can use these as well. Further, there are increasing efforts to structure fields and subfields into areas where you could get some tree-like structure in a much more useful way.

Why go search on Google when you are reading the paper right then and there?

I never had Google in mind. The point is if you only get a 50 items list with 'various considerations' you have to check the whole list anyhow to see whether it contains something you'd be interested in. The thing to do though is you go back 'vertically' to the relevant papers (if you can figure out which it is) and use 'cited by', which is a) more efficient than a keyword query and b) fairer than a cite-list which can be biased. Best,

With the reply to me and Neil you demonstrate you are becoming skilled with the art of the one liner. So I thought I’d share one from the King of one liners.

“He willed his body to science. Science is contesting the will.”-Henny Youngmen

Seriously, I wasn’t aware that they were already rating/ranking papers in search engine like fashion. Although they are recording hits and downloads it’s not self evident that they are applying a full blown Google technique where boredom is used more skillfully as a measure. They refer to still having a lot of noise in their data. I’ve always found it interesting that they would use such a term for error. If I didn’t know better I could be lead to believe they are referring to entropy which relates to random rather then error. Thanks for the info and I will be interested in following how it progresses with more refinement.

As a referee I do sometimes ask for self-citations to be reduced. But not being much of a bandwagon-hopper I don't often get papers to referee that contain useless monster reference lists like the unparticle ones.

I would be even more explicit in labeling the extremes: social versus scientific citation. Self-explanatory, whereas 'horizontal/vertical' is not so immediately meaningful.

True, I actually like this better. I had a picture in mind with the horizontal/vertical but it didn't quite work out. If you think about a citation tree you'll figure why. Thanks, I think if I bring the topic up again I will use your suggestion. Best,

Bee, for future reference, I think what you meant was, not very many people notice your one-("no" ?) liners? As written, it seems to mean, only small people notice them. Or you could easily just be kidding, but English can have odd complications even for a well-educated foreigner.

Yes Bee, the correct phrase would be "few people notice them." [or "it."] Using "it" is OK if you refer to your one-liners as a singular abstraction! That is like saying, "We know more about the nucleus than we used to, but it is still a mystery in many ways" etc. (Well, is "it," BTW?)

Ha. Do you know this one: "If I don't see farther than others, it's because giants are standing on my shoulders." Can't recall where I got that from, but unfortunately not my idea.

Besides, what I actually meant to say with my no-liner is that I do read all of the comments, but I just don't have the time to reply to all of them. So I want to say, I appreciate your usually very thoughtful comments that have a lot of content, I read them and I might actually use them in a later post. I am sorry if I am sometimes very brief. Best,

Just as a one further related aside as to this Newton quote. It was only a couple of years back when reading some English philosophy from the Renaissance that I discovered that Newton plagiarized when he said this. For a quote of Robert Burton (1577-1640) reads:

“A dwarf standing on the shoulders of a giant may see farther than a giant himself.”

In as Newton was born in 1643 it is clear that this was taken from Burton. It also explains why it was considered as to be in reference to Hooke’s physical and scientific stature in Newton’s opinion. It is then not to wonder then why Newton was so quick to implicate Leibniz as a plagiarist, for he had practiced this himself. It then serves to be more telling of the accuser then the accused.

I don't know why some people have this zeal to attain the highest possible citation count. Most departments are smart enough to judge for themselves which papers are worthwhile and which are fluff.

For instance, publishing some semi obvious continuation of a new hot topic that is guarenteed to be be a citatation monster, but is ultimately trivial (eg in the words of Joker from Resonance, "you are encouraged to forget about this paper once you cite me")

'By reading this paper, the reader is authorised to enjoy and even reuse the arguments without quotation. Hereby the authors explicitly refuse any kind of claim against the reader. The reader is authorised to deny even the fact of having read the paper. Hereby the authors explicitly refuse... well, whatever. On the other hand, we would be happy about legal actions against any person quoting our paper while not reading it. Specially, co-authors'"