Like this:

The demand for measures of individual performance in the management of universities and research institutes has been growing, in particular since the early 2000s. The publication of the Hirsch Index in 2005 (Hirsch, 2005) and its popularisation by the journal Nature (Ball, 2005) has given this a strong stimulus. According to Hirsch, his index seemed the perfect indicator to assess the scientific performance of an individual author because “it is transparent, unbiased and very hard to rig”. The h-index balances productivity with citation impact. An author with a h-index of 14 has created 14 publications that each have been cited at least 14 times. So neither authors with a long list of mediocre publications, nor an author with 1 wonder hit are rewarded by this indicator. Nevertheless, the h-index turned out to have too many disadvantages to be wearing the crown of “the perfect indicator”. As Hirsch acknowledged himself, it cannot be used for cross-disciplinary comparison. A field in which many citations are exchanged among authors will produce a much higher average Hirsch index than a field with much less citations and references per publication. Moreover, the older one gets, the higher ones h-index will be. And, as my colleagues have shown, the index is mathematically inconsistent, which means that rankings based on the h-index may be influenced in rather counter-intuitive ways (Waltman & Eck, 2012). At CWTS, we therefore prefer the use of an indicator like the number (or percentage) of highly cited papers instead of the h-index (Bornmann, 2013).

Still, none of the bibliometric indicators can claim to be the perfect indicator to assess the performance of the individual researcher. This raises the question of how bibliometricians and science managers should use statistical information and bibliometric indicators. Should they be avoided and should the judgment of candidates for a prize or a membership of a prestigious academic association only be informed by peer review? Or can numbers play a useful role? What guidance should the bibliometric community then give to users of their information?

This was the key topic at a special plenary at the 14th ISSI Conference two weeks ago in Vienna. The plenary was an initiative taken by Jochen Gläser (Technical University Berlin), Ismael Rafols (SPRU, University of Sussex, and Ingenio, Polytechnical University Valencia), Wolfgang Glänzel (Leuven University) and myself. The plenary aimed to give a new stimulus to the debate how to apply, and how not to apply, performance indicators of individual scientists and scholars. Although not a new debate – the pioneers of bibliometrics already paid attention to this problem – it has become more urgent because of the almost insatiable demand for objective data and indicators in the management of universities and research institutes. For example, many biomedical researchers mention the value of their h-index on their CV. In publications lists, one can regularly see the value of the Journal Impact Factor mentioned after the journal’s name. In some countries, for example Turkey and China, one’s salary can be determined by the value of either the h-index or the journal’s impact factor one has published in. The Royal Netherlands Academy of Arts and Sciences also seems to ask for this kind of statistics in its forms for new members in the medical and natural sciences. Although robust systematical evidence is still lacking (we are working hard on this), the use of performance indicators in the judgment of individual researchers for appointments, funding, and memberships, seems widespread, intransparent and unregulated.

This situation is clearly not desirable. If researchers are being evaluated, they should be aware of the criteria used and these criteria should be justified for the purpose at hand. This requires that users of performance indicators should have clear guidelines. It seems rather obvious that the bibliometric community has an important responsibility to inform and provide such guidelines. However, at the moment, there is no consensus yet about such guidelines. Individual bibliometric centres do indeed inform their clients about the use and limitations of their indicators. Moreover, all bibliometric centres have the habit of publishing their work in the scientific literature, often including technical details of their indicators. However, this published work is not easily accessible to non-expert users such as deans of faculties and research directors. The literature is too technical and distributed over too many journals and books. It needs synthesizing and translation into plain language which is easily understandable.

To initiate a process of a more professional guidance for the application of bibliometric indicators in the evaluation of individual researchers, we asked the organizers of the ISSI conference to devote a plenary to this problem, which they kindly agreed to. At the plenary, Wolfgang Glänzel and me presented “The dos and don’ts in individual level bibliometrics”. We do not think this is a final list, more a good start with ten dos and don’ts. Some examples: “do not reduce individual performance to a single number”, “do not rank scientists according to 1 indicator”, “always combine quantitative and qualitative methods”, “combine bibliometrics with career analysis”. To prevent misunderstandings: we do not want to initiate a bibliometric police with absolute rules. The context of the evaluation should always determine which indicators and methods to use. Therefore, some don’ts in our list may sometimes be perfectly useable, such as the application of bibliometric indicators to make a first selection among a large number of candidates.

Our presentation was commented on by Henk Moed (Elsevier) with a presentation on “Author Level Bibliometrics” and by Gunnar Sivertsen (NIFU, Oslo University) with comments on the basis of his extensive experiences in research evaluation. Henk Moed built on the concept of the multi-dimensional research matrix which was published by the European Expert Group on the Assessment of University Based Research in 2010, of which he was a member (Assessing Europe’s University-Based Research – Expert Group on Assessment of University-Based Research, 2010). This matrix aims to give global guidance to the use of indicators at various levels of the university organization. However, it does not focus on the problem of how to evaluate individual researchers. Still, the matrix is surely a valuable contribution to the development of more professional standards in the application of performance indicators. Gunnar Sivertsen made clear that the discussion should not be restricted to the bibliometric community itself. On the contrary, the main audience of guidelines should be the researchers themselves and adminstrators in universities and funding agencies.

The ensuing debate led to a large number of suggestions. They will be included in the full report of the meeting which will be published in the upcoming issue of the ISSI’s professional newsletter in September 2013. A key point was perhaps the issue of responsibility: it is clear that researchers themselves and the evaluating bodies should carry the main responsibility for the use of performance indicators. However, they should be able to rely on clear guidance from the technical experts. How must this balance be struck? Should bibliometricians refuse to deliver indicators when they think their application would be unjustified? Should the association of scientometricians publicly comment on misapplications? Or should this be left to the judgment of the universities themselves? The plenary did not solve these issues yet. However, a consensus is emerging that more guidance by bibliometricians is required and that researchers should have a clear address to which they can turn to with questions about the application of performance indicators either by themselves or by their evaluators.

What next? The four initiators of this debate in Vienna have also organized a thematic session on individual level bibliometrics at the next conference on science & tecnnology indicators, the STI Conference “Translational twists and turns: science as a socio-economic endeavour”, which will take place in Berlin, 4-6 September 2013. There, we will take the next step in specifying guidelines. In parallel, this conference will also host a plenary session on the topic of bibliometric standards in general, organized by iFQ, CWTS and Science-Metrix. In 2014, we will then organize a discussion with the key stakeholders such as faculty deans, adminstrators, and of course the research communities themselves on the best guidelines for evaluating individual researchers.

Stay tuned.

Bibliography:

Assessing Europe’s University-Based Research – Expert Group on Assessment of University-Based Research. (2010). Research Policy. European Commission. doi:10.2777/80193

Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569–72. doi:10.1073/pnas.0507655102

Waltman, L., & Eck, N. J. Van. (2012). The Inconsistency of the h-index. Journal of the American Society for Information Science and Technology, 63(2007), 406–415. doi:10.1002/asi

At the opening of the bi-annual conference of the International Society for Informetrics and Scientometrics (ISSI) in Vienna on July 16, Susanne Weigelin-Schwiedrzik, the Vice Rector of the University of Vienna called upon the participants to reorient the field of scientometrics in order to better meet the need for research performance data. She explained that the Austrian universities nowadays are obliged by law to base all their decision regarding promotion, personnel, research funding and allocation of research funds to departments on formal external evaluation reports. “You are hosted by one of the oldest universities in Europe, it was founded in 1365. In the last couple of years, this prestigious institute has been reorganized using your scientometric data. This puts a tremendous responsibility on your field. You are no longer in the Kindergarten stage. Without your data, we cannot take decisions. We use your data to allocate research funds. We have to think twice before using your data. But you have the responsibility to realize your role in a more fundamental way. You also have to address the criticism of scientometric data. And what they represent.”

Weigelin’s passionate call for a more reflexive and critical type of scientometrics is motivated by the strong shift in Austrian university policy with respect to human resource management and research funding. In the past, the system was basically a closed shop with many university staff members staying within their original university. The system was not very open to exchanges among universities, let alone international exchange. Nowadays, the university managers need to explicitly base their decisions on external evaluations, in order to make clear that their decisions meet international quality standards. As a consequence, the systems of control at Austrian universities have exploded. To support this decision making machinery, the University of Vienna has created a specific quality management department and a bibliometric department. The university has an annual budget 380 million Euro and needs to meet annual targets that are included in target agreements with the government.

On the second day of the ISSI conference, Weigelin repeated her plea in a plenary session on the merits of altmetrics. After a couple of presentations by Elsevier and Mendeley researchers, she said she was “not impressed”. “I do not see how altmetrics, such as download and usage data, can help solve our problem. We need to take decisions on the basis of data on impact. We look at published articles and at Impact Factors. As a researcher, I know that this is incorrect since these indicators do not directly reflect quality. But as a manager, I do not know what to do else. We are supposed to simplify the world of science. That is why we rely on your data and on the misconception that impact is equal to quality. I do not see a solution in altmetrics.” She told the audience, which was listening intently, that she has a constant flow of evaluation reports and the average quality of these reports is declining. “And I must say that a fair amount of the reports that are pretty useless are based on scientometric data.” Nowadays, Weigelin is no longer accepting recommendations for promotion of scientific staff that are only mentioning bibliometric performance measures without a substantive interpretation of what the staff member is actually contributing to her scientific field.

In other words, at the opening of this important scientometric conference, the leadership of the University of Vienna has formulated a clear mission for the field of scientometrics. The task is to be more critical with respect to the interpretation of indicators and to develop new forms of strategically relevant statistical information. This mission resonates strongly with the new research program we have developed at CWTS. Happily, the resonance among the participants of the conference was strong as well. The program of the conference shows many presentations and discussions that promise to at least contribute, albeit sometimes in a modest way, to solving Weigelin’s problems. It seems therefore clear that many scientometricians are eager to meet the challenge and indeed develop a new type of scientometrics for the 21st century.

Can one person manipulate the position of a whole university in a university ranking such as the Leiden Ranking? The answer is, unfortunately, sometimes yes – provided the processes of quality control in journals do not function properly. A Turkish colleague recently alerted us to the position of Ege University in the most recent Leiden Ranking in the field of mathematics and computer science. This university, not previously known as one of the prestigious Turkish research universities, ranks second with an astonishing value of the PP(top 10%) indicator of almost 21%. In other words, 21% of the mathematics and computer science publications of Ege University belong to the top 10% most frequently cited in their field. This means that Ege University is supposed to have produced twice the amount of highly cited papers as expected. Only Stanford University has performed better.

In mathematics and computer science, Ege university has produced 210 publications (Stanford wrote almost ten times as much). Because this is a relatively small number of publications, the reliability of the ranking position is fairly low, which is indicated by a broad stability interval (an indication of the uncertainty in the measurement). Of the 210 Ege University publications, no less than 65 have been created by one person, a certain Ahmet Yildirim. This is an extremely high productivity in only 4 years in this specialty. Moreover, the Yildirim publications are indeed responsible for the high ranking of Ege University: without them, Ege University would rank around position 300 in this field. This position is therefore probably a much better reflection of its performance in this field. Yildirim’s publications have attracted 421 citations, excluding the self-citations. Mathematics is not a very citation dense field, so this level of citations is able to strongly influence both the PP(top10%) and the MNCS indicators.

An investigation into Yildirim’s publications has not yet started, as far as we know. But suspicions of fraud and plagiarism are rising, both in Turkey and abroad. One of his publications, in the journal Mathematical Physics, has recently been retracted by the journal because of evident plagiarism (pieces of an article by a Chinese author were copied and presented as original). Interestingly, the author has not agreed with this retraction. A fair number of Yildirim’s publications have been published in journals with a less than excellent track record in quality control. The Elsevier journal Computer & Mathematics with Applications (11 articles by Yildirim) has recently retracted an article by a different author because it turned out to have “no scientific content”. Actuallly, it was an almost empty publication. According to Retraction Watch, the journal’s editor Ervin Rodin has been replaced at the end of last year. He was also relieved from his editorial position at the journal Applied Mathematics Letters – An International Journal of Rapid Publication, another Elsevier imprint. Rodin was also editor of Mathematical and Computer Modelling, in which Yildirim published 5 articles. The latter journal currently does not accept any submissions “due to an editorial reconstruction”.

How did Yildirim’s publications attract so many citations? His 65 publications are cited by 285 publications, giving in total 421 citations. This group of publications has a strong internal citation traffic. They have attracted almost 1200 citations, of which a bit more than half is generated within this group. In other words: this set of publications seems to represent a closely knit group of authors, but they are not completely isolated from other authors. If we look at the universities citing Ege University, none of them have a high rank in the Leiden Ranking with the exception of Penn State University (which ranks at 112) that has cited Yildirim once. If we zoom in on mathematics and computer science, virtually all of the citing universities do not rank highly either, with the exception of Penn State (1 publication) and Gazi University (also 1 publication). The rank position of the last university, by the way, is not so reliable either, as indicated by the stability interval that is almost as wide as in the case of Ege University.

The bibliometric evidence allows for two different conclusions. One is that Yildirim is a member of a community which works closely together on an important mathematical problem. The alternative interpretation is that this group is a distributed citation cartel which not only exchanges citations but also produces very similar publications in journals that are functioning mainly as citation generating devices. A cursory look at a sample of the publications and the way the problems are formulated seems to support the second interpretation more than the first.

But from this point, the experts in mathematics should take over. Bibliometrics is currently not able to properly distinguish sense from nonsense in scientific publications. Expertise in the field is required for this task. We have informed the rector of Ege University that the ranking of his university is doubtful and requested more information from him about the position of the author. We have not yet received a reply. If Ege University wishes to be taken seriously, it should start a thorough investigation of the publications by Yildirim and his co-authors.

If you see other strange rankings in our Leiden Ranking or in any other ranking, please do notify us. It may help us create better tools to uncover fraudulent behaviour in academic scholarship.