RePEc has been publishing for several years now a list of the most cited papers and articles cataloged in its database according to three criteria, recently expanded to six. By popular demand, we now publish also a list of the most cited recent papers and articles. The selection criterion here is that the last know version has been published five or less years ago. That may sound like a long period, but considering the publication lags we suffer, I think it is reasonable. Thus, currently, articles (and papers) published in 2002 or thereafter qualify. Within a few days, those from 2002 will be dropped, so enjoy them while you can…

At the same time, the list of the most cited items has been expanded. Previously, only the top 200 were released, now we show the top 1‰. This list thus gets longer as RePEc expands and stands currently at 559. Again, the list is available according to six different criteria. So, check out whether your favorite papers are listed. And remember, all this citation data is still experimental as we try to improve on its quality, but still quite informative.

Open Access News pointed out a very interesting article in the Journal of Cell Biology,Show Me the Data. Written by that journal’s executive editor, the executive editor of Journal of Experimental Medicine, and the Executive Director of The Rockefeller University Press, it first reiterates many quality issues with journal impact factors that seem to be well-known among biologists, but I suspect that they are news to many economists. Many of these issues also hold for citation rankings for individuals. Beyond that, there are other issues that make citation data suspect. Fortunately, there are potential solutions to many of these problems.

First, it helps to describe impact factors as they are calculated by Thomson Scientific (previously the Institute of Scientific Information, or ISI). An impact factor in year t is the mean number of cites to all articles in that journal in years t-1 and t-2 divided by the number of number of research or review articles. Criticisms include

the data in the denominator and numerator are not consistent

Thomson is unclear on what exactly defines a research or review article

some journals have negotiated with Thomson on exactly what defines the article type

retracted papers are not excluded

of course, the mean is inflated by a few star papers

editors can game the system; apparently some do and some don’t (I’ve even seen this in the Wall Street Journal)

The authors go on to say that they contacted Thomson and received some of their data. They found numerous errors in how article were categorized. Further, “The total number of citations for each journal was substantially fewer than the number published” as reported by Thomson. When they requested further data from Thomson, the data still didn’t add up. They conclude “It became clear that Thomson Scientific could not or (for some as yet unexplained reason) would not sell us the data used to calculate their published impact factor.”

Their bottom line is even more clear: “If an author is unable to produce original data to verify a figure in one of our papers, we revoke the acceptance of the paper. We hope this account will convince some scientists and funding organizations to revoke their acceptance of impact factors as an accurate representation of the quality—or impact—of a paper published in a given journal. Just as scientists would not accept the findings in a scientific paper without seeing the primary data, so should they not rely on Thomson Scientific’s impact factor, which is based on hidden data.”

Besides the points reiterated and brought up in the Journal of Cell Biology, there are further accuracy issues with Thomson data. For example, to identify authors, they only use initials for the their first and middle name. As they pool papers from all fields, this is a more severe error than one might first guess. Thomson reports that Kit Baum (known to Thomson as CF Baum) has publications in the Fordham Law Review (on nuclear waste) and the Sociology of Education (on group leadership).

A further issue is Thomson’s coverage; EconLit lists some 1,240 journals in our field while the last time I checked Thomson covered but a fraction of these. I don’t have recent data for their coverage, but in total Thomson covers 8,700 journals encompassing all academic fields, so it seems doubtful that Thomas has substantially changed its economics coverage.

A further problem plaguing all citation analysis is simply extracting citation data with software. After all, citations are written for people, not machines. I haven’t seen data for Thomson on this (one wonders if it is public), but I do know that CitEc has faced a very real challenge here.

There would seem to be several solutions to these problems. First, all of us should treat impact factors and citation data with considerable caution. Basing journal rankings, tenure, promotion, and raises on uncritical acceptance of this data is a poor idea. In the extreme, one could imagine legal action in a tenure case.

Third, we should investigate putting unique identifiers into each reference so that software can easily read it. That is, besides listing the journal, its volume, and so on, it would also include a unique identifier to the cited paper. DOIs are one possibility, but it is prohibitively expensive to get a license to dispense DOIs. However, “RePEc handles,” which identify papers in RePEc, are permanent and also cover working papers. Thus, we might start including them in each reference. This highlights a further issue: there is little incentive for authors to add this to their citations as it aids others. Perhaps one step in this direction would be for sites like IDEAS, which provide references for papers in different formats like BibTeX or EndNote, could include the RePEc handle along with the current author, title, journal, etc.

The 15,000th author registered recently on the RePEc Author Service (which also has another 5,000 registered, but without any works in their profile). See a list of all those registered at EconPapers or IDEAS. This give us the opportunity to reflect on the coverage of this service: what proportion of academic economists is covered? Let me offer a few suggestions.

Assume that the works listed in RePEc provide a representative sample of all the works written by economists. Then determine how many of these works are listed in the profile of a registered author. By that account, about 40.1% have been claimed, and thus about 40% of the profession would be registered with RePEc. This latter number is in reality higher, due to several biases: a) some authors are not alive and cannot register; b) some registered authors have the unfortunate habit to remove from their profile working papers once they are published; c) some works listed are not written by economists, and these authors are less likely to register with RePEc.

Alternatively, estimate the number of authors in the world from the membership in academic societies. I guess the three largest societies are the American Economic Association (18,000 members), the European Economic Association (2,300 members) and the Econometric Society (5,500 members). Obviously, their membership overlaps, and not every of their members is an author. But not every economist is member either. Assume that adding their membership numbers corrects for all mismeasurements, then the RePEc Author Service covers 58% of the profession.

One can also observe a specific subsample of economists, those listed among the top 1000 by Tom Coupé. There, the RePEc Author Service covers 75% of the top 1000 by publications and 65% of the top 1000 by citations (which includes quite a few non-economists). But we have good reasons to believe these proportions are higher than for the whole population. Indeed the proportion is significantly higher for the better ranked within this sample, and we can extrapolate that those outside the top 1000 are less represented in the RePEc Author Service.

In summary, the RePEc Author Service covers between 40% and 75% of the profession. Possibly less, possibly more, likely in between.

The RePEc blog was offline for a few days due to a hardware failure, along with a few other websites at Boston College, our host. Everything seems to be running well now, but please contact us if you see any remaining issues.

The (US) Association of Research Libraries released a few days ago a report entitled “The E-only Tipping Point for Journals: What’s Ahead in the Print-to-Electronic Transition Zone” (pdf). It makes the argument that sooner or later every publisher will turn to an electronic-only format in the face of rising (relative) costs of print formats. Currently, we are in a transition period where most journals went from print-only to print and electronic, and it is predicted that with 5 to 10 years, the printed journals will be only from the most specialized and small ones who cannot afford the fix cost of setting up the electronic editions. Another feature of the transition is the large proportion of new journals that do not even bother with a print edition.

This discussion largely pertains to university press publishing, but can probably be extended to commercial publishing. Indeed, commercial publishers show signs that they want to discourage print editions, either through their subscription price structure or by modifying subscriptions to be by default electronic-only. In Economics, the dissemination of research, in terms of readership, is dominated by pre-prints (working or discussion papers) that have gone all electronic for some time now, with only few exceptions. As far as I know, nobody regrets the period of the all printed working papers: they were difficult to obtain unless you were in the “club”, only few institutions had a systematic (but costly) way to disseminate them, and only established researchers had any chance of being read through this medium. People would even travel to some libraries to consult their working paper collections. Today, research is much more widely disseminated and researchers from outside the elite institutions have a better chance to follow and contribute to the research frontier. We hope RePEc has contributed to this democratization. Never has been the use of electronic pre-prints as widespread as now, possibly at the cost of reducing journals to historical records of research. Well, journals also act as gateways through peer-review, but you sometimes have to wonder about this as well when hearing all the complaints about this process.

A few interesting numbers from the study: 60% of 20,000 per-reviewed journals are available in electronic format, library-provided electronic editions are at least ten times more read than print ones, only 30% of library subscriptions are print only.

There are now rankings of authors and institution by field. As discussed before, the procedure is the following: identify relevant authors by the number of papers announced in NEP field reports. Past a threshold (currently 25% of all announced papers or 5 announced papers), the author is considered specialist of her field (see discussion. For institutions, each affiliated author contributes with a weight corresponding to the proportion of papers announced in the field, irrespective whether the threshold was met (see discussion).

For rankings within regions (and within fields, described above), authors and institutions are not anymore ranked by picking them from the “grand list”, i. e., the ranking of all authors or institutions. Rather, the ranking is performed within the respective subgroup: for example, authors are ranked with the others of the same region according to all criteria, and then ranking points are aggregated. This had already been done a few months ago for women, which led complaints about this particular ranking to virtually disappear…

Soon, we will also add citation rankings for recent papers. An announcement will be made on this blog in a week or two. Note that all these rankings are experimental and subject to changes. We welcome discussions and suggestions about them in the comment section.

Every month, a short summary of what happened with RePEc is sent to the RePEc-announce mailing list. I will also put that message, slightly adapted, on this blog.

The new feature of the month has to do with rankings: There are now rankings of institutions and authors by field. Also, there have been procedural changes in the rankings within region. See the discussion of these items elsewhere on the blog.

We continue with fast paced additions of material. This year alone, over 100,000 items have been added to the bibliographical database. Accordingly, the RePEc web services are more popular than ever, establishing new traffic records: 697,596 file downloads and 2,516,310 abstract views within a month. This leads us to the thresholds we have passed this month:

You are currently browsing the The RePEc Blog blog archives for December, 2007.

About this blog

Welcome to the RePEc blog. We, the RePEc team, discuss here the workings of RePEc and seek input from the community on how we can improve. We also want to give more volunteers opportunity to be part of this project and provide valuable services to the profession. Finally, we also discuss issues about the dissemination of research in Economics.

Comment policy

To post a comment, you need to be registered with this blog with a valid email address. Your first comment may be delayed for verification purposes. Note: you may need to create a new account if your old one predated the move to our new blog service.