Geographical trends in academic conferences: an analysis on authors' affiliations

Tracking #: 550-1530

Authors:

Responsible editor:

sahar vahdati

Submission Type:

Research Paper

Abstract:

In the last decade, research literature reached an enormous volume with an unprecedented current annual increase of 1.5 million new publications.
As research gets ever more global and new countries and institutions, either from academia or corporate environments, start to contribute with their share, it is important to monitor this complex scenario and understand its dynamics and equilibria.
We present a study on a conference proceedings dataset extracted from Springer Nature Scigraph that illustrates insightful geographical trends and highlights the unbalanced growth of competitive research institutions worldwide in the 1996--2016 period.
The main contribution of this work is fourfold.
In first instance, we found that the distributions of institutions and publications among countries follow a power law, consistently with previous literature, i.e., very few countries keep producing most of the papers accepted by high-tier conferences. Secondly, we show how the annual and overall turnover rate of country rankings is extremely low and steadily declines over time, suggesting an alarmingly static landscape in which new entries struggle to emerge.
In third instance, we performed an analysis of the venue locations and their effect on the distribution of countries involved in the accepted publications, underlining the central role of Europe and China as knowledge hubs.
Finally, we evidenced the presence of an increasing gap between the number of institutions initiating and overseeing research endeavours (i.e. first and last authors' affiliations) and the total number of institutions participating in research.
As a consequence of our analysis, the paper also discusses our experience in working with authors' affiliations: an utterly simple matter at first glance, that is instead revealed to be a complex research and technical challenge yet far from being settled.

Date of Decision:

Decision:

Overall Impression: GoodSuggested Decision: AcceptTechnical Quality of the paper: GoodPresentation: GoodReviewer`s confidence: MediumSignificance: Moderate significanceBackground: ReasonableNovelty: Clear noveltyData availability: All used and produced data (if any) are FAIR and openly available in established data repositoriesLength of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

Using the Springer Nature Scigraph dataset to look at geographic trends in academic conference goers. Giving interesting insight into the rise of China and other rising scientific superpowers, and how researchers from the different continents travel. All carried out using the reproducible research paradigm using Jupyter notebooks.

Reasons to accept:

I previously reviewed a version of this paper for the SAVE-SD conference and they have made all of the changes I suggested. The paper now reads very well and the additional discussion is very interesting. I'd like to thank the authors for their great work.

Reasons to reject:

None.

Further comments:

If FAIR is a goal of the journal, the authors have done a good job sharing their work in GitHub and Jupyter notebooks but the FAIR-ness and reproducibility could be increased further by releasing it under open licensing (adding an-OSI compliant license) and taking snapshots in a repo like Zenodo in case the Git or Jupyter pages are changed/deleted. Otherwise looks great.

Overall Impression: GoodSuggested Decision: AcceptTechnical Quality of the paper: GoodPresentation: AverageReviewer`s confidence: MediumSignificance: High significanceBackground: ComprehensiveNovelty: Clear noveltyData availability: All used and produced data (if any) are FAIR and openly available in established data repositoriesLength of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

This is an interesting paper on an important topic. Using a large database of conferences and presentations, it examines how the representations of countries at conferences has changed over time and demonstrates a concentration of content from a small number of countries and little turnover in those that are most represented.

Reasons to accept:

The paper demonstrates both the opportunities and challenges of this relatively new dataset and the possibilities of addressing questions of diversity and representation. It uses sharing technologies effectively to make the analysis available alongside the relevant data. It addresses an important problem – diversity and representation – that should be a central concern for our scholarly communities.

Reasons to reject:

I have no reasons to reject. I make some suggestions to improve the paper below.

Further comments:

I found some aspects of the presentation of data hard to follow and would suggest giving some more detail and context in figure legends. In general Figure legends are short (eg Figure 1) and take up a whole page in the manuscript. I found myself flipping backwards and forwards a lot to remind myself of what the details of the differences between plots in a figure were.

The Figure 3 heatmaps were a particular case in point where more of a guide was needed to aid the reader to understand why the plots look similar in context. I would suggest that Figure 3d could be laid out on a horizontal line as continental = 1-international. This would also have the advantage that each continent could be separated out making it easier to label individual countries and to see the overall relationships between continents (I would have liked to try and make the change myself and send pull request but I'm afraid that turned out to be beyond my capacity at the moment).

Figure 2a I would suggest giving full country names. It took me some time to disentangle CH (Switzerland) and CN (China). I know these are standard two letter codes but it took me longer to figure that out than was ideal.

I think the paper could overall make a stronger case for the issues that it raises. These results are disturbing, if not unexpected. I think the community could benefit from a more robust presentation of the downstream issues. Another analysis that the authors might want to consider adding is an expansion of that shown in Figure 9 to look at the issues for conferences in the US, UK, and Germany specifically. It would be of significant value to the community if the growing sense that Germany (and Canada) are more appropriate venues due to ease of travel for presenters was tested. I think this data is already present in the datset but one could for example show whether there is any effect for African attendees at US, UK, DE, and Canadian conferences over time.

I do not have extensive expertise in statistics so can not give an expert confirmation that the methodology and choices of analyses are appropriate. However I did not see anything that raised any concerns for me.

Overall I think this is a very valuable paper and I think it would benefit if the authors put some further work into making sure that the results of the analysis are very clear and accessible to readers.

Review #3 submitted on 05/Feb/2019

Review Details

Reviewer has chosen to be Anonymous

Overall Impression: GoodSuggested Decision: AcceptTechnical Quality of the paper: GoodPresentation: ExcellentReviewer`s confidence: HighSignificance: High significanceBackground: ComprehensiveNovelty: Limited noveltyData availability: All used and produced data (if any) are FAIR and openly available in established data repositoriesLength of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

The submitted paper by Mannocci et al. provides a comprehensive study of scholarly literature published at academic conferences (in the semantic web and digital libraries) for a period of ~20 years. The authors analyzed more than a million contributions and their associated metadata, to conduct a micro- and macro-level investigation of the publishing landscape, in terms of continents, countries, institutions and geopolitical factors, using authors' affiliation information.

The document is well-written and has a sound methodology. It should be noted that this work builds upon and has a significant content overlap with a previous publication by the same authors at the SAVE-SD 2018 workshop, which is properly mentioned in the text. The data and tools used in the paper are also available online for public access.

Reasons to accept:

The authors did a great job explaining their methodology and have very useful visualization of their results that helps the readers get an understanding of the semantic web/digital libraries publishing landscape at a glance. The insights from this work are very valuable for the wider community for working towards a balanced opportunities for researchers to gain visibility in their respective communities, as well as motivating strategic planning and investments for geographical territories.

I found some minor (mostly typographical and grammatical) issues in the text:
- The time period for which this study was conducted is not consistent across the article sections. You mentioned 1996-2016 in the Abstract, but 1996-2017 in the Introduction, "21 year span" in Page 8, line 27, and again in page 5 line 25. Please clarify.
- Either use "USA" or "United States".
- Page 3, Line 17: "8,4 million", replace comma with a dot.
- Page 4, Line 2: "He found that that...", remove one "that".
- Page 4, Line 3: "produced by in high...", choose "in" or "by", not both.
- Page 4, Line 4: "Falagas et al...", add a dot after "al".
- Page 4, Line 5: "biomedical paper" -> ".. papers"
- Page 4, Line 5: "ISI dataset published on in the period...", remove "on".
- Page 4, Line 3: "produced by in high...", choose "in" or "by", not both.
- Page 4, Line 8: "They analysed the the articles...", remove one "the".
- Page 4, Line 20: "reported that that international...", remove on "that".
- Page 4, Line 28: "to visualises..." -> "to visualise"
- Page 6, Line 1: "authored by authors...", "written by authors" maybe?
- Page 4, Line 3: "produced by in high...", choose "in" or "by", not both.
- The word "SciGraph" does not have a consistent capitalization in the paper.
- Page 4, Line 3: "produced by in high...", choose "in" or "by", not both.
- Page 7, Line 3: add some details on manual curation efforts. How long did it take? etc.
- Page 7, Line 27: "which is is a...", remove one "is"
- Page 7, Line 36: "who attend..." -> "... attends"
- Page 7, Line 41: "weight differently..." -> "weigh..."
- Footnote 16, separate the text and URL by a space
- In several places, replace "amount of papers" to "number of papers".
- Page 8, Line 20: "further test further..." -> remove one "further"
- Page 11, Line 17: "a increasingly...", "an..."
- Page 11, Line 22: "grown again...", "grow again..."
- Page 11, Line 33: replace "the old continent" by "Europe". I know it reads nicer this way, but it makes the reading of the sentence more difficult.
- Page 11, Line 35: "Asia publish...", "... publishes"
- Page 11, the sentence starting with "This might" on line 36 crossing over to line 37 is not grammatically correct and is difficult to understand.
- Page 12, Line 34: "if instances...", "of ..."
- Page 13, Line 6: "overseer", "oversee"
- Page 13, Line 9: add "the fact that" between "despite" and "the average"
- Page 13, Line 21: "author's position" -> "authors'..."
- Page 13, Line 44: add "a" before "specific year"
- Page 14, Line 41: "appears to venue most open to changes" does not read well. Is this grammatically correct?
- Page 17, the sentence spanning the first few lines is very long and hard to read. Paraphrase or break into smaller sentences.
- Page 17, Line 14: "in term of" -> "...terms"
- Page 17, Line 15: "growing increasingly static" is quite paradoxical! What did you mean here?
- Page 18, Line 27: "researcher" -> "researchers"
- Please write more detailed captions for your figures.

Reasons to reject:

Although I found a significant amount of content overlap with the SAVE-SD paper, I found the publication of this (more thorough) article to be beneficial. In my opinion, this article should be accepted, as explained above.

1 Comment

Although scholarly communication, in general, have become considerably easier and more efficient, scholars encounter problems in finding metaresearch statistics. This research work provides a systematic analysis of scholarly metadata using a conference proceedings dataset extracted from Springer Nature. The dataset contains information about scholarly literature published at certain academic conferences. The experiments represent the location movement of research topics over time and demonstrates a concentration of content from several countries. Overall the work is considered valuable and interesting from the community and have an impact in the life cycle of scholarly communication in general with the provided insights.

There are certain changes suggested by the reviewers that are expected to be applied by the authors. All the typographical and grammatical issues are expected to be addressed. Although the data is available and reusable, the authors are expected to increase the FAIRness of the data based on the suggestion of the reviewers.