@P_van_de_Weghe @reseauSCF It could affect chemistry more because current open access levels in that field (with some others) are way lower than in geology, astronomy and especially life and health. https://t.co/Tv9ABrSJ8s

@albertomartin @OBi_Ojemany In our study (https://t.co/7tFjGRCMpx, based on @Unpaywall_data and @webofscience ), a number of African countries (eg Tanzania) had high levels of OA, even with limited data on green OA. Will be interesting to see if this bears out in the GS data, as well #todolist https://t.co/96ZHuT10db

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.

Abstract

Across the world there is growing interest in open access publishing among researchers, institutions, funders and publishers alike. It is assumed that open access levels are growing, but hitherto the exact levels and patterns of open access have been hard to determine and detailed quantitative studies are scarce. Using newly available open access status data from oaDOI in Web of Science we are now able to explore year-on-year open access levels across research fields, languages, countries, institutions, funders and topics, and try to relate the resulting patterns to disciplinary, national and institutional contexts. With data from the oaDOI API we also look at the detailed breakdown of open access by types of gold open access (pure gold, hybrid and bronze), using universities in the Netherlands as an example. There is huge diversity in open access levels on all dimensions, with unexpected levels for e.g. Portuguese as language, Astronomy & Astrophysics as research field, countries like Tanzania, Peru and Latvia, and Zika as topic. We explore methodological issues and offer suggestions to improve conditions for tracking open access status of research output. Finally, we suggest potential future applications for research and policy development. We have shared all data and code openly.

Author Comment

This is a preprint submission to PeerJ Preprints

Additional Information

Competing Interests

Jeroen Bosman and Bianca Kramer are both affiliated to Utrecht University Library which supports and promotes open access and runs its own repository. They also run the project 101 Innovations in Scholarly Communication and as such, are also involved in the execution of the National Plan Open Science of the Netherlands.

Author Contributions

Jeroen Bosman conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.

Bianca Kramer conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.

4. show applicable uses of this method for informing policy development and monitoring their effects

5. suggest improvements in data availability and methods.

re goal #1.

In my opinion, what is lacking to reach this goal is the definition of usability and a description of usable for what purpose, a description of how usability is tested/measured and a description of what, preferably quantitative, criteria are used for a conclusion on usability.

re goal #2 and #3

I find it a pity that this manuscript has only the ambition to explore and present data and suggest explanations. The work would profit from more focus and clear testable hypotheses and research questions.

Secondly, the authors describe a number of caveats (p9) that ...

Caveat 1c (coverage of IRs by oaDOI) is a major problem for the presented data. As far as I can judge, green OA numbers for Dutch universities presented in this manuscript are incorrect. I suspect this is due to incomplete harvesting of Dutch IRs and the authors should check this. On p10 they claim "green OA data reported in Web of Science are restricted to green only, hence quite low". However, the authors have not investigated this, thus cannot make this statement. Dutch universities also report green only and report much higher percentages (up to 20%). Not "a minimal effect on overall OA levels". I'm worried about how this caveat affects the data presented. As it is not clear what the coverage of oaDOI is and data are largely presented as overall OA percentages, it is impossible to interpret the results.

On a sidenote, on p29, the authors claim that "the overall percentage of OA ... for all Dutch universities as reported by the VSNU.... is consistent with the overall figure for the Netherlands in the WoS data (both 42%). This is not quite true. In the WoS data the 42% includes a large percentage of bronze OA. Not all Dutch universities include this category in their analysis. Same goes for green OA not included in an IR (but in PMC or arXiv for instance).

With regards to caveat 2, the authors could easily quantify the impact of this caveat by comparing the results of the OA tool in WoS with the results of oaDOI, for instance for the last year.

Similarly, caveat 3, the impact of the inclusion of the ESCI cold be easily tested by excluding ESCI from the dataset and compare the results with the full dataset.

Caveat 4 could also be addressed in more detail. Gold and hybrid percentages are not expected to change much with the progress of time, and it would be interesting to zoom in to those two categories. Also, the hypothesis that this has an effect on both green (due to embargo times) and bronze (due to moving walls) could be investigated further by looking at the longitudinal data of those categories separately.

I have a lot of questions re Caveat 5. For instance:

- what instance of the DOAJ is used in oaDOI? the most recent year only or different versions for publications of different age? what would be the consequence of this?

- wouldn't publications in journals that have been removed from the DOAJ pop up as hybrid or bronze OA in the analysis? or have those journals been removed from WoS as well? what would be the consequence of that?

More detail in the methodology and more awareness in the results/analysis section would be valuable.

Re caveat 6. It would be informative if the authors would indicate where the number of publications is low and specify the threshold they use.

The third issue I have with this manuscript is the data quality. The data are incorrect for the Dutch situation and that doesn't give me much trust in the overall numbers in other countries, fields, languages etc. With different initiatives to measure and present OA information, I do welcome thorough analysis of new tools and databases, but I think an important issue is data quality; something not addressed in this manuscript and that is a real loss.

Authors’ response: the reaction above by Marjet Elemans points to a number of issues. One of the main issues addressed is the analysis of Dutch universities, possible differences between what is reported here vs. what is reported by universities themselves, and what this indicates as to the quality of data and methodology used here.

Our intention with the section on Dutch data is not to correct or supplant the OA counts of Dutch universities. We chose the Netherlands because of the reasons stated in the paper: the size of this subset of data, Dutch OA policies and our knowledge of the Dutch national context. To our knowledge there is no open data available on OA levels as reported by universities themselves, other than the reported overall percentage of OA for all universities (42%), which is in line with our overall finding for the Netherlands (although the figures cannot be directly compares as our Dutch figure includes non-university output). It would be very welcome to be able to corroborate the data on different types of OA with data provided by universities themselves especially if those are also accompanied by full descriptions of methods used.

Any such comparison should differentiate carefully between differences caused by definitions and inclusion criteria (i.e. whether green OA includes submitted versions (before peer review), whether disciplinary repositories are included in addition to institutional repositories, whether bronze OA is included) and differences caused by data quality, (i.e. representation of research output in Web of Science and coverage and quality of harvesting of institutional repositories by oaDOI). The latter are acknowledged as caveats in the paper and should indeed be kept in mind when interpreting the data, the former are a question of definition, not of correct vs. incorrect data.

Below, we address the other points made one by one.

- regarding the lack of a definition of usability:

We agree that we need to elaborate on usability when we mention that as a goal and will do so in a next version of the paper. For now it may suffice that when we say usability we refer to issues as transparency of the data, units of analysis that make sense for real-world questions/situations, enough diversity that comparisons are meaningful, the degree to which counts can be reproduced, the ease with which counts can be generated, the universality of the applicability regarding countries, fields, institutions, and finally the costs involving generating the counts. Added to that, the usability would in the end not be up to us to judge. It would be the degree to which the WoS/oaDOI data delivers insights considered helpful by OA stakeholders themselves. Our exploration can help them make that judgment.

- regarding the lack of focus without testable hypotheses:

The absence of hypotheses is a direct consequence of choosing to do exploratory, hypotheses- generating research. The reason we take this approach is exactly because this is new and uncertain data presented in a new way. As stated in our discussion section hypothesis testing could be done in a later stage. The present article could inform those subsequent efforts

- regarding caveat 2:

On the potential effects of a time lag between WoS and oaDOI updates Elemans states that we could quantify that effect by by comparing the results of the OA tool in WoS with the results of oaDOI. However, that is not possible. The only thing that would check for is the current time lag of WoS update of oaDOI data, and the results would depend on where in the update cycle of oaDOI and WoS the moment of measurement lies.

- regarding caveat 3:

Technically it would indeed be possible controlling for the effect of inclusion of the ESCI sub-database, and perhaps we will do that exercise. However, it is important to note that (1) coverage of WoS is changing constantly even without addition of ESCI and (2) we are testing the usefulness of full WOS core, in the way most people will use it. We also addressed the effect of ESCI in some more detail in the sections on language and countries.

- regarding caveat 4:

For the wish to zoom in on subcategories of gold OA (pure gold, hybrid and bronze): this is exactly why we made the extra effort to discern between the various types of OA using the dutch case. We are not entirely sure what it is that would be lacking in our paper in this regard. There are several reasons why gold and hybrid can change with time: not in the sense that a gold or hybrid paper will lose that status over time, but in the sense that possibilities & circumstances for publishing gold and hybrid change continuously (deals, funds, APC levels, etc.). The levels found for the past publication years could in part reflect these kind of changes.

Further, regarding the analyses of longitudinal data called for: strictly speaking we do not have longitudinal data: the data we use should be regarded as a snapshot. That is the point caveat 4 makes. We did look at the data for a range of publication years broken down to OA type, which is what the available data allow.

- Regarding caveat 5:

oaDOI does not explicitly state which version of DOAJ is used and how often they update this. As oaDOI data is meant to reflect current, not historical OA availability, it is presumed that checking is done against one, current version of DOAJ, not separate versions from each publication year. DOAJ does provide information on the first calendar year journal provided online Open Access content - it is not known whether oaDOI uses this information, but this would be indeed good to verify with them.

Regarding the effect of journals removed from DOAJ: if the articles are still in Web of Science and are freely available at the publisher website, they would show up as bronze or hybrid (depending on licensing data in Crossref). If the journals have been removed from WoS, they would not show up in our analysis at all (from the time of removal, if older articles remain). This could cause a decrease in pure gold levels (and corresponding increase in bronze and/or hybrid OA) or possibly a decrease in levels of overall OA, respectively. Whether that would be a detectable difference compared to current levels would depend on the number of articles in such journals in any given set. In this respect, it is good to note that Web of Science can add or remove journals, including OA journals, at any time, and inclusion in DOAJ is likely just one factor in this.

- Regarding caveat 6:

For each unit of analysis, thresholds used in selection (total number of articles & reviews in the period 2010-2017) are indicated in the methodology section, where applicable. No additional threshold was used for the number of articles & reviews in a given publication year. All absolute numbers are available in the accompanying dataset.

……….

Finally: Elemans mentions the lack of attention to data quality. We are very much aware of issues in the data, which is exactly why we formulated the caveats and why we welcome the feedback and continue to discuss these matters with data providers.

Add your feedback

Before adding feedback, consider if it can be asked as a question instead, and if so then use the Question tab. Pointing out typos is fine, but authors are encouraged to accept only substantially helpful feedback.

Follow this preprint for updates

"Following" is like subscribing to any updates related to a preprint.
These updates will appear in your home dashboard each time you visit PeerJ.

You can also choose to receive updates via daily or weekly email digests.
If you are following multiple preprints then we will send you
no more than one email per day or week based on your preferences.

Note: You are now also subscribed to the subject areas of this preprint
and will receive updates in the daily or weekly email digests if turned on.
You can add specific subject areas through your profile settings.