Author Archives: Melissa Clegg

The 2016 edition of the annual topcites list is still very much dominated by experiment, in particular the discovery of the Higgs boson in 2012, with the ATLAS and CMS papers at the [1] and [2] positions as they have been since 2013 (joined by the ATLAS and CMS instrumentation papers [12,13]). Indeed, they have now cracked the top ten of the all time list, where they are the only papers from the 2010s and, together with the 2006 PYTHIA [4] and 2002 GEANT4 [6] papers, the only papers from this century. The ATLAS and CMS collaborations produced a joint paper in 2015 on the Higgs boson mass and it makes its first appearance in the Top Forty this year [32]. The papers from the 1990s on the AdS/CFT correspondence [5,14,20] continue to be strongly represented. A breakthrough paper from 2006 by Ryu and Takayanagi [39], which connects entanglement entropy and Bekenstein-Hawking entropy, has made its first appearance in the Top Forty list as interest grows in the connection between quantum information concepts and quantum gravity. Aside from these papers, all of the theoretical papers in the top twenty are resource papers centered around LHC-relevant simulations [4,6,7,8,9,16,19]. The 21st century simulation codes Sherpa and POWHEG make their first appearance on the Top Forty list this year [35,38] following a long, steady climb in their annual citation rates [2008 paper, 2004 paper]. Away from the LHC-zone, observational cosmology rules the top twenty, with familiar favorites [3,10,15,17,18] and one very important newcomer [11], to which we now turn.

The gravitational wave discovery paper [11] by the LIGO Scientific and Virgo collaborations appeared simultaneously in Physical Review Letters and on arXiv.org in February. By April it had 200 citations and by July 500 citations. In late December Science Magazine named this discovery the Breakthrough of the Year for 2016. So far this seems to have exerted little influence on the rest of the topcite list (though one can detect an uptick in citations of Einstein’s original GR paper and his 1937 paper on gravitational waves). It will be interesting to see what happens in 2017.

The other big news of 2016 was the possible di-photon (or gamma-gamma) excess reported in December of 2015 by ATLAS [24] and CMS [29] in papers that were, unprecedentedly for the Top Forty list, neither arXiv eprints nor journal articles. As a potential signal for New Physics, this precipitated an intense period of research. The observations generated more than 400 theory papers citing the ATLAS and CMS reports. This collection of theory papers acquired a Hirsh index of 92, that is 92 of these citing papers themselves garnered at least 92 citations. Publishing these theory papers was a matter of controversy. JHEP declined for some time to publish any theory paper explaining the resonance; Physical Review Letters chose four to illustrate the ferment in the particle theory community. Finally at the ICHEP conference in Chicago in August it was announced that the signal disappeared when studied in the larger LHC data set accumulated in 2016. In the still-relevant words of Maurice Goldhaber, “not all candidates get elected”.

The remainder of the list includes familiar papers from previous Top Forty lists. On the theory side are more LHC-relevant simulation papers [22,23,25,28,30,31], Hawking radiation [21], inflation [26,34], large extra dimensions [33] and neutrino mixing [37]. The list is rounded out by the first resullts from LUX on dark matter [27] (the final results from LUX appeared in August, too late for this edition) and the update of cosmological parameters from the full WMAP data set [36].

Traditionally DOIs (Digital Object Identifiers) have been associated with published papers in the digital era, but papers are not the only research objects that physicists may want to search, use, and cite. We talked with Jim Simone of Fermilab about his efforts to get DOIs assigned to MILC collaboration datasets and to get records of them uploaded to INSPIRE.

How is Jim involved with the MILC collaboration?

Jim is a member of FERMILAB-LATTICE collaboration, which works closely with MILC on scientific projects involving matrix elements and flavor physics. MILC generates data sets consisting of lattice gauge configuration files, which the collaboration has made openly available for others to use, as is increasingly becoming required for federally funded research in the U.S.

What is the MILC collaboration’s connection to the International Lattice Data Grid (ILDG)?

Jim was an early organizer of the ILDG, which is intended as a data grid to enable collaborations to share gauge configurations. The ILDG metadata catalog had its limitations; it only held limited kinds of metadata, sometimes making it difficult for people to find what they were looking for. People involved with the project have been trying to fill in the gaps, including the biggest problem: connecting scientific papers produced by the data to the datasets.

Rather than reinventing the wheel, ILDG is considering to use INSPIRE as a catalog to connect papers with datasets, making the data usable and findable by all physicists, including HEP and nuclear phenomenologists, as ILDG is currently only used by lattice scientists. In INSPIRE the datasets and associated papers can be searched starting with the papers in order to see what configurations were used to get the results, though in the upcoming version of INSPIRE, the Data collection will be made public and searching will also be possible starting with the individual datasets and from there finding what papers were produced from these configurations.

Why and how did Jim go about getting DOIs assigned to the datasets? What challenges did he face?

Jim believes DOIs, as public, persistent identifiers, are a natural mechanism to identify the datasets, which are public, first class data objects, and permanent. With DOIs, the configurations will be better integrated into the ILDG and INSPIRE.

In the case of published papers, DOIs are assigned by publishers, but this route would not work for datasets. While INSPIRE is equipped to directly issue DOIs, MILC’s direct connection to the U.S. Department of Energy (DOE) made it practical for DOIs to be issued by DOE Office of Scientific and Technical Information (OSTI). In either case, DOIs are registered with the central agency DataCite.

ILDG has started a discussion on how other groups can get DOIs for their datasets. Outside the DOE, CERN also issues DOIs, and regional ILDG groups can help members get DOIs and serve as gatekeepers to keep the metadata clean and clear. DataCite can also help researchers find registration organizations.

For Jim it was a learning experience working with OSTI and interacting with their web services. As one of his main focuses was findability, Jim wanted to include lots of searchable metadata in the dataset records so to help physicists find the particular configurations they wanted. This amount of metadata was more than OSTI was used to receiving when minting DOIs, but they were able to work with Jim’s requests and he considered them a great help through the entire process

Beyond getting the DOIs assigned, another challenge was figuring out how citations should be marked up in papers, both written and digitally. With the goals of making the datasets findable and identifiable, Jim and the ILDG wanted people to be able to see the DOI in a print version of a reference list as well as click it in a digital version. In order to make the process as transparent as possible for people citing the datasets, Jim worked with us to include instructions in the metadata of the INSPIRE records and OSTI records.

For researchers unsure of how to cite datasets that do not include specific citation guidelines in their metadata, DataCite and CrossRef have developed a DOI citation formatter that can take a DOI registered by either of these services and format its citation in a variety of styles.

When going through the publication process with a paper that used MILC configurations, Jim found the referees and copy editors weren’t familiar with how the citations should appear. Most objects with a DOI are published papers that can be cited in written format using a journal reference, volume, page range, etc., so the DOI is often left out of the text of a reference list. However, following this standard would not make the datasets adequately identifiable to the human eye.

The community known as FORCE 11 (Future of Research Communication and e-Scholarship) has developed eight principles of data citation practices with equal emphasis on human readability and machine-actionability. As these recommendations become more widely endorsed in research communities and researchers become accustomed to citing datasets in their papers, the issue of human identifiable data citations will most likely be resolved.

What advice does Jim have for others looking to make their datasets more findable and citable?

Jim has two pieces of advice: get DOIs and mark up the metadata in a way that’s sensible for the community who will use the datasets. DataCite makes this simple by being explicit about its mandatory metadata requirements, while also allowing for additional recommended and optional metadata.

At INSPIRE we look forward to integrating more dataset DOIs into our records. Send your questions and comments about dataset DOIs in INSPIRE to feedback@inspirehep.net.

The annual INSPIRE Topcites list provides a snapshot of the topics that were of greatest interest in a given calendar year. To maintain the focus on HEP, we construct the list by considering only citations from core papers. To be complete, we also provide individual Topcite lists for each arXiv category we cover.

Continuing a recent trend, the 2015 Top 40 list is virtually unchanged from the previous year, save for a little re-shuffling in the middle order and the quantum fluctuations of classic papers near the bottom. The leading five papers from 2014 securely held their positions, with almost 150 citations separating Maldacena’s 1997 AdS/CFT paper at number [5] from this year’s number [6] paper, the 2002 GEANT4 description paper (which itself was seventh last year).

The first new paper on the list appears at number [7], a Planck paper on cosmological parameters that updates the results of a 2013 Planck paper [3]. Since its posting in February 2015 this paper has collected over 700 citations and brings to four the total number of Planck papers on the list, including another February 2015 paper on inflation [27] which, again, updates a 2013 paper [30].

At number [15] we have the second paper making its debut, a 2014 descendant of the 2011 MadGraph5 paper [16], describing a software package for automatically calculating cross sections at next-to-leading order.

Of the papers on the list submitted to arXiv.org, 11 were from hep-ph, 4 from hep-th, the 2 Higgs discovery papers were, of course, from hep-ex and 10 were from astro-ph (8 from astro-ph.CO and the two 1998 supernova papers [10, 13] that would have been in astro-ph.CO if this subcategory had existed when they were written). The astro-ph papers are all observational, so we see a roughly equal number of theoretical and “experimental” papers. The classic papers from before the digital age, however, are all theoretical works on particle physics and cosmology that have been summoned to the list by recent research and discovery. Interestingly, the charts of their annual citation counts all show an impressively upward trajectory:

In the world of topcited papers, 2014 looked a lot like 2013 and not just because the Review of Particle Physics is once again at the top. The effects of 2012’s discovery of the Higgs boson continued to be strongly felt and many of the related papers from the 2013 topcite list appeared again in more or less the same position. Along with the discovery papers themselves [1,2], the original theory papers [34,35] and the detector description papers [17,19], a host of papers relevant to event simulation at the LHC [4,7,9,12,14,20,22,28,31] have featured prominently; interestingly, the PYTHIA paper [4] is now the first paper from the 2000s in the All Time Topcite list. The AdS/CFT papers [5,10,16] and Randall-Sundrum [26] continue their 15+ year run on the Topcite list. Planck [3], WMAP [11,24] and the 1998 supernova cosmology papers [13,15] again represent observational cosmology on the list.

So what was new this year? The March announcement [6] by BICEP2 of the results of a search for inflationary gravitational waves in the cosmic microwave background had an immediate impact and the paper had 100 citations within two weeks and 500 by July. This got people thinking about inflation and brought back a number of inflation papers from the 1980s to the topcite list [23,29,36,37] in addition to Guth’s perennial paper [21], which climbed twelve places in the rankings, and the Planck inflation paper [8], which climbed twenty one.

The remaining inductees were the October 2013 LUX constraints on dark matter paper [18] (that joined the similar XENON100 paper [27]) and the March 2013 Planck overview paper [38].

Any journal name variant listed in INSPIRE will work with rawref. You can find these variants by searching for a journal name in the Journal section of INSPIRE. Click the name for the detailed record, and then click the link for “Show name variants”.

Usually we add DOIs and update INSPIRE to reflect the fact that articles are published within a few weeks of publication. This information is taken from feeds INSPIRE receives from publishers. We try to find the corresponding preprint in a semi-automatic way and add the publication information. Sometimes this process is delayed or fails. If after waiting patiently for updates you do believe there is a failure in the process, please let us know, as other articles might be affected as well.

But did you know that the quickest way to add a DOI and journal publication information to preprints in INSPIRE is to update them in arXiv? While this may seem like an indirect method, authors have direct control over their papers on arXiv (via paper password) and can easily augment information there in an authenticated way. INSPIRE automatically receives these updates within just a day or two. This can be done at any time, and adding the publication information on arXiv does not result in a new version either in arXiv or in INSPIRE.

Once you’ve logged in at arXiv, click the Journal ref symbol beside the papers you’d like to update.

This will take you to a page where you can enter the relevant information.

Doing this on arXiv has several advantages. Information updated on arXiv will flow to other services as well, whereas information added to INSPIRE currently doesn’t propagate yet. While arXiv can process such requests automatically due to the login, it requires manual intervention at INSPIRE and hence it may take more time for the updates to appear. If you have any comments or questions on this topic, drop us a line at feedback@inspirehep.net.