We study the correlation between citation-based and expert-based assessments of journals and series, which we collectively refer to as sources. The source normalized impact per paper (SNIP), the Scimago Journal Rank 2 (SJR2) and the raw impact per paper (RIP) indicators are used to assess sources based on their citations, while the Norwegian model is used to obtain expert-based source assessments. We first analyze – within different subject area categories and across such categories – the degree to which RIP, SNIP and SJR2 values correlate with the quality levels in the Norwegian model. We find that sources at higher quality levels on average have substantially higher RIP, SNIP, and SJR2 values. Regarding subject area categories, SNIP seems to perform substantially better than SJR2 from the field normalization point of view. We then compare the ability of RIP, SNIP and SJR2 to predict whether a source is classified at the highest quality level in the Norwegian model or not. SNIP and SJR2 turn out to give more accurate predictions than RIP, which provides evidence that normalizing for differences in citation practices between scientific fields indeed improves the accuracy of citation indicators.

Measures of research productivity (e.g. peer reviewed papers per researcher) is a fundamental part of bibliometric studies, but is often restricted by the properties of the data available. This paper addresses that fundamental issue and presents a detailed method for estimation of productivity (peer reviewed papers per researcher) based on data available in bibliographic databases (e.g. Web of Science and Scopus). The method can, for example, be used to estimate average productivity in different fields, and such field reference values can be used to produce field adjusted production values. Being able to produce such field adjusted production values could dramatically increase the relevance of bibliometric rankings and other bibliometric performance indicators. The results indicate that the estimations are reasonably stable given a sufficiently large data set.

Understanding the quality of science systems requires international comparative studies, which are difficult because of the lack of comparable data especially about inputs in research. In this study, we deploy an approach based on change instead of on levels of inputs and outputs: an approach that to a large extent eliminates the problem of measurement differences between countries. We firstly show that there are large differences in efficiency between national science systems, defined as the increase in output (highly cited papers) per percentage increase in input (funding). We then discuss our findings using popular explanations of performance differences: differences in funding systems (performance related or not), differences in the level of competition, differences in the level of university autonomy, and differences in the level of academic freedom. Interestingly, the available data do not support these common explanations. What the data suggest is that efficient systems are characterized by a well-developed ex post evaluation system combined with considerably high institutional funding and relatively low university autonomy (meaning a high autonomy of professionals). On the other hand, the less efficient systems have a strong ex ante control, either through a high level of so-called competitive project funding, or through strong power of the university management. Another conclusion is that more and better data are needed.

The purpose of this study is to find a theoretically grounded, practically applicable and useful granularity level of an algorithmically constructed publication-level classification of research publications (ACPLC). The level addressed is the level of research topics. The methodology we propose uses synthesis papers and their reference articles to construct a baseline classification. A dataset of about 31 million publications, and their mutual citations relations, is used to obtain several ACPLCs of different granularity. Each ACPLC is compared to the baseline classification and the best performing ACPLC is identified. The results of two case studies show that the topics of the cases are closely associated with different classes of the identified ACPLC, and that these classes tend to treat only one topic. Further, the class size variation is moderate, and only a small proportion of the publications belong to very small classes. For these reasons, we conclude that the proposed methodology is suitable to determine the topic granularity level of an ACPLC and that the ACPLC identified by this methodology is useful for bibliometric analyses.

The main rationale behind career grants is helping top talent to develop into the next generation leading scientists. Does career grant competition result in the selection of the best young talents? In this paper we investigate whether the selected applicants are indeed performing at the expected excellent level-something that is hardly investigated in the research literature.We investigate the predictive validity of grant decision-making, using a sample of 260 early career grant applications in three social science fields. We measure output and impact of the applicants about ten years after the application to find out whether the selected researchers perform ex post better than the non-successful ones. Overall, we find that predictive validity is low to moderate when comparing grantees with all non-successful applicants. Comparing grantees with the best performing non-successful applicants, predictive validity is absent. This implies that the common belief that peers in selection panels are good in recognizing outstanding talents is incorrect. We also investigate the effects of the grants on careers and show that recipients of the grants do have a better career than the non-granted applicants. This makes the observed lack of predictive validity even more problematic.

In Van den Besselaar et al. (2017) we tested the claim of Linda Butler (2003) that funding systems based on output counts have a negative effect on impact as well as quality. Using new data and improved indicators, we indeed reject the claim of Butler. The impact of Australian research improved after the introduction of such a system, and did not decline as Butler states. In their comments on our findings, Linda Butler, Jochen Gläser, Kaare Aagaard & Jesper Schneider, Ben Martin, and Diana Hicks put forward a lot of arguments, but do not dispute our basic finding: citation impact of Australian research went up, immediately after the output based performance system was introduced. It is important to test the findings of Butler about Australia – as these findings are part of the accepted knowledge in the field, heavily cited, often used in policy reports, but hardly confirmed in other studies. We found that the conclusions of Butler are wrong, and that many of the policy implications based on it simply are unfounded. In our study, we used better indicators, and a similar causality concept as our opponents. And our findings are independent of the exact timing of the policy intervention. Furthermore, our commenters have not addressed our main conclusions at all, and some even claim that observations do not really matter in the social sciences. We find this position problematic − why would the taxpayer fund science policy studies, if it is merely about opinions? Let’s take science seriously − including our own field Do observations have any role in science policy studies? A reply. Available from: https://www.researchgate.net/publication/318312568_Do_observations_have_any_role_in_science_policy_studies_A_reply [accessed Sep 11, 2017].

More than ten years ago, Linda Butler (2003a) published a well-cited article claiming that the Australian science policy in the early 1990s made a mistake by introducing output based funding. According to Butler, the policy stimulated researchers to publish more but at the same time less good papers, resulting in lower total impact of Australian research compared to other countries. We redo and extend the analysis using longer time series, and show that Butlers’ main conclusions are not correct. We conclude in this paper (i) that the currently available data reject Butler’s claim that “journal publication productivity has increased significantly… but its impact has declined”, and (ii) that it is hard to find such evidence also with a reconstruction of her data. On the contrary, after implementing evaluation systems and performance based funding, Australia not only improved its share of research output but also increased research quality, implying that total impact was greatly increased. Our findings show that if output based research funding has an effect on research quality, it is positive and not negative. This finding has implications for the discussions about research evaluation and about assumed perverse effects of incentives, as in those debates the Australian case plays a major role.

Journal classification systems play an important role in bibliometric analyses. The two most important bibliographic databases, Web of Science and Scopus, each provide a journal classification system. However, no study has systematically investigated the accuracy of these classification systems. To examine and compare the accuracy of journal classification systems, we define two criteria on the basis of direct citation relations between journals and categories. We use Criterion I to select journals that have weak connections with their assigned categories, and we use Criterion II to identify journals that are not assigned to categories with which they have strong connections. If a journal satisfies either of the two criteria, we conclude that its assignment to categories may be questionable. Accordingly, we identify all journals with questionable classifications in Web of Science and Scopus. Furthermore, we perform a more in-depth analysis for the field of Library and Information Science to assess whether our proposed criteria are appropriate and whether they yield meaningful results. It turns out that according to our citation-based criteria Web of Science performs significantly better than Scopus in terms of the accuracy of its journal classification system.

Several investigations to and approaches for categorizing academic journals/institutions/countries into different grades have been published in the past. To the best of our knowledge, most existing grading methods use either a weighted sum of quantitative indicators (including the case of one properly defined quantitative indicator) or quantified peer review results. Performance measurement is an important issue of concern for science and technology (S&T) management. In this paper we address this issue, leading to multi-level frontiers resulting from data envelopment analysis (DEA) models to grade selected countries/territories. We use research funding and researchers as input indicators, and take papers, citations and patents as output indicators. Our research results show that using DEA frontiers we can unite countries/territories by different grades. These grades reflect the corresponding countries' levels of performance with respect to multiple inputs and outputs. Furthermore, we use papers, citations and patents as single output (with research funding and researchers as inputs), respectively, to show country/territory grade changes. In order to increase the insight in this approach, we also incorporate a simple value judgment (that the number of citations is more important than the number of papers) as prior information into the DEA models to study the resulting changes of these Countries/Territories' performance grades.