In order to improve the quality of systematic researches, various tools have been developed by well-known scientific institutes sporadically. Dr. Nader Ale Ebrahim has collected these sporadic tools under one roof in a collection named “Research Tool Box”. The toolbox contains over 720 tools so far, classified in 4 main categories: Literature-review, Writing a paper, Targeting suitable journals, as well as Enhancing visibility and impact factor.

Why are Authors Citing Older Papers?

With so much new literature published each year, why are authors increasingly citing older papers?

Late last year, computer scientists at Google Scholar published a report
describing how authors were citing older papers. The researchers posed
several explanations for the trend that focused on the digitization of
publishing and the marvelous improvements to search and relevance
ranking.

However, as I wrote in my critique
of their paper, the trend to cite older papers began decades before
Google Scholar, Google, or even the Internet was invented. When you are
in the search business, everything good in this world must be the result
of search.

In order to validate their results, the helpful folks at Thomson Reuters Web of Science sent me a dataset that included the cited half-life for 13,455 unique journal names reported in their Journal Citation Report (the report that discloses journal Impact Factors). Rather than relying on the individual citation as the unit of observation (the approach used by Google Scholar), we base our analysis on the cited half-life of journals. This
approach has the obvious advantage of scale, allowing us to approach
the problem using thousands of journals rather than tens of millions of
citations.

In order to approximate a citation-based analysis, each journal was
weighted by the number of papers it published, so that small quarterly
journals don’t have the same weight as mega-journals like PLOS ONE.
Each journal was also classified into one or more subject categories
and measured each year over the 17-year observation period. Our variable
of interest is the cited half-life, which is the median age of
articles cited in a given journal for a given year. By definition, half
of the articles in a journal will be older than the cited half-life;
the other half will be younger. The concept of half-life can also be
applied to article downloads.

For the entire dataset of journals, the mean weighted cited half-life
was 6.5 years, which grew at a rate of 0.13 years per annum. For those
journals that had been indexed continuously in the dataset over the 17
years, the mean weighted cited half-life was 7.1 years, which grew at
the same rate. For the newer journals, the cited half-life was just 5.1
years, but grew at a rate of 0.19 years per annum.

Focusing on the journals for which we have a continuous series of
cited half-life observations, 91% (209 of 229) of subject categories
experienced increasing half-lives. Some of these categories
grew significantly more than average. For example, Developmental Biology
journals grew at 0.25 years per annum, Genetics & Heredity journals
grew at 0.20 years per annum and Cell Biology journals grew at 0.17
years per annum.

Conversely, the cited half-life of 20 (9%) of journal categories decreased
over the observation period. With few exceptions, these fields covered
the general fields of Chemistry and Engineering. For example, the cited
half-life for journals classified under Energy & Fuels declined by
0.11 years per annum, Chemistry-Multidisciplinary declined by 0.07 years
per annum, Engineering-Multidisciplinary by 0.05 years per annum, and
Engineering-Chemical by 0.04 years per annum. Granted, these are smaller
declines, but they do run contrary to overall trends.

Figure 1. Cited half-life for 229 journal subject categories.

We also discovered that cited half-life increases with total
citations, meaning, as a journal attracts more citations, a larger
proportion of these citations target older articles. This can be seen in
Figure 2, as journal categories move from the bottom left to the upper
right quadrant of the graph over the observation period.

Figure
2. Cited half-life for 229 journal categories observed from 1997–2013.
The size of each bubble represents the number of papers in each journal
category. Trail lines depict the trajectory of each category.

The next figure highlights the trajectory of highly-cited journals
from 1997 to 2013, illustrating how cited half-life increases with the
total citations to a journal. While most highly-cited journals move
toward the upper-right quadrant of the graph, we highlight three
chemistry journals that run contrary to this trend: Journal of the American Chemical Society, Angewandte Chemie-Int Ed., and Chemical Communications. Those
readers wishing to speculate why Chemistry and Engineering journals
were bucking the overall trend are welcome to do so in the comment
section below.

Readers are also welcome to explore the data (for categories and for journals).
The files (.swf) require the Adobe Flash plug-in. Mac users may need to
hold the Control key and selecting one’s browser when opening these
files. Categories may be be split into component journals. Other
controls moderate the size, speed and display of the data.

In sum, we were able to validate the claims by the Google Scholar
team that scholars have been citing older materials, with some
exceptions.

The citation behavior of authors reflects cultural, technological,
and normative behaviors, all acting in concert. While digital publishing
and technologies were invented to aid the reader in discovering,
retrieving, and citing the literature, the trend appears to predate
many of these technologies. Indeed, equal credit may be due to
the photocopier, the fax machine, FTP, and email as is given to Google,
EndNote, or the DOI.

Nevertheless, a growing cited half-life might also reflect major
structural shifts in the way science is funded and the way scientists
are rewarded. A gradual move to fund incremental and applied research
may result in fewer fundamental and theoretical studies being published.
Giving credit to these founders may require authors cite an
increasingly aging literature.

Correction note: Table 1 of the manuscript “Cited Half-Life of the Journal Literature”
(arXiv) contains a sorting error. A corrected version (v2) was
submitted and will become live at 8pm (EDT). Thanks to Dr. Jacques
Carette, Dept. of Computing and Software at McMaster University for
spotting this error.

Share this:

Related

About Phil Davis

I am an independent researcher and publishing consultant
specializing in the statistical analysis of readership and citation
data. I am a former postdoctoral researcher in science communication and
former science librarian. http://phil-davis.org/

Tuesday, 28 April 2015

Blog on your own and blog with your publisher

April 27, 2015

Both journal editors and authors rack their brains on how
to increase citation scores. This is why they should work together to
solve the problem. As I pointed out in my previous entry on this topic,
to be cited an article has to be first discovered. The discoverability
of a paper can be increased with, amongst others, search engine
optimization and academic blogging. I discussed the problem of SEO here. Now it is time to write a little bit more about academic blogging.

Academic blogging is quite a new thing and we are still lacking a
deeper understanding of this concept. Some academics are reluctant to
devote their time to the blogosphere. They usually think they have more
serious things to do, and that it is better to spend more time on
research or teaching than on writing posts. But there is growing
evidence to suggest that blogging is a very efficient tool for academic
communication, and that it also may have positive influence on career
development.

Will a blog post about your paper increase its readership?

People who chase citations (both editors and authors) should
understand that there is a long way between readership and citation, but
citation is not possible without readership. And blogging can
significantly increase readership. This was claimed a.o. by David
McKenzie and Berk Özler in their study The Impact of Economics Blogs,
based on a sample of 94 economics papers, which were mentioned on
economics blogs. They found that being mentioned on blog usually results
in a huge increase in the number of views and downloads for a paper.
The study examined the impact of popular, well known blogs, dealing with
economics, so we cannot assume that results would be similar for other
fields and in the case of regular blogs. But we can try to understand
what possible mechanism of blogging can influence the readership of an
article.

First of all, more and more researchers today use Internet search
engines in their work; to search for some publications they already
know, to quickly verify information, or sometimes to find more sources
of information. And in some cases blog entries about an article or a
book might be easier to find on the Internet than the article itself.
Especially, very well known blogs, with good domain names, which are
well ranked by searching bots.

Secondly, some academic blogs have big networks of readers, who are
sometimes researchers themselves, and who read them frequently to keep
up to date with all current discussions. And to be honest, the average
readership of an academic article is very low, and in many cases it is
much lower than for a good blog post. Thus, one good blog post about an
article may generate a huge increase in the number of views and
downloads of the paper. Of course these downloads will not automatically
trigger citations, but in some cases the blog post may bring your
article to the relevant audience, which will be likely to cite your work
in the future.

Start your own blog and/or cooperate with existing ones

Of course it is not easy to gain significant readership for a new
blog. Starting a blog itself will take you less than hour, but getting
the right readership needs months of regular work, and a little bit of
luck. Regardless of whether or not you have already started this work,
writing a guest post for more established blogs is always a good idea.
It may bring attention to your work (and to your blog if you already
have one) from a large group of people.

Almost all academics publishers at this moment run blogs, which are designed to stimulate discussion among researchers (like De Gruyter Open
runs the Open Science dot com Blog). These blogs are a good point to
start your blogging adventure, or to attract to a new audience to your
work. So you should not be surprised when a journal or book editor
offers you the option of writing a guest post for the company blog to
publicize your research, or to just discuss some problem that you have
faced during your research. This might be a good opportunity. If your
editor has not mentioned this option, do not be discouraged. If you have
an idea for a guest post to be published on a publisher’s blog, ask the
editors what they think about it.

A blog post usually needs less work to be published than an academic
article, so you can start blogging about your research project long
before you publish a paper which summarizes it. The blog (a personal
one, the one belonging to your publisher, or the one managed by your
university or a group of researchers working in your field) might be a
place for valuable discussion, which may have a positive influence on
every stage of your work. You can write about partial results, about
methods, about problems and about your work flow and research team
organization issues. This will make your work more open, more
interesting and more attractive to both the general public and your
peers. It may help you to gain citations and recognizability in your
field and without a doubt it will be useful for a lot of people to read a
little about your work.

Image: Émile Friant Political Discussion 1889. This image is in the public domain because its copyright has expired.

Abstract: Earlier publications
have shown that the number of references as well as the number of
received citations are field-dependent. Consequently, a long reference
list may lead to more citations. The purpose of this article is to study
the concrete relationship between number of references and citation
counts. This article tries to find an answer for the concrete case of
Malaysian highly cited papers and Malaysian review papers. Malaysian
paper is a paper with at least one Malaysian affilation. A total of 2466
papers consisting of two sets, namely 1966 review papers and 500
highly-cited articles, are studied. The statistical analysis shows that
an increase in the number of references leads to a slight increase in
the number of citations. Yet, this increase is not statistically
significant. Therefore, a researcher should not try to increase the
number of received citations by artificially increasing the number of
references.

1. Introduction

Researchers seeking citation tracking to find the
most influential articles for a particular topic and to see how often
their own published papers are cited (Bakkalbasi et al. 2006). On the other hand universities are looking for citations because of its influence in the university ranking (Ale Ebrahim et al. 2013, Ioannidis 2010, Bornmann, Leydesdorff, and Wang 2014).
A citation count is the number of times a research work such as a
journal article is cited by other works. The citation per paper
meaningfully influence a number of metrics, including total citation
counts, citation speed, the ratio of external to internal cites,
diffusion scores and h-index (Carley, Porter, and Youtie 2013). Citation counts still commonly use for the measure of research papers quality and reputation (Abt and Garfield 2002). The number of citations that an article receives measured its impact on a specific field (Lai, Darius, and Lerut 2012). Citation analysis is one of the most important tools to evaluate research performance (Bornmann et al. 2012). Citation indicator is important for scientists and universities in all over the world (Farhadi, Salehi, Yunus, et al. 2013).
In the early stage, the relationship between the number of references
and the number of the paper citation was investigated in the 1965 (UZUN 2006, de Solla Price 1965). A long reference list at the end of a research paper may be the key to ensuring that it is well cited (Corbyn 2010, Ball 2008). Hence, citation counts are correlated with reference frequencies (Abt and Garfield 2002). Webster, Jonason, and Schember (2009)
raised the question “Does the number of references an article contains
predict its citation count?” and found that reference counts explained
19% of the variance in the citation counts. Lancho-Barrantes, Guerrero-Bote, and Moya-Anegón (2010)
found that not only the number, but also the citation impact of the
cited references correlated with the citation counts for a paper. The
higher the impact of the cited references, the higher the later impact
of the citing paper (Bornmann et al. 2012). Review articles are usually highly cited compare to other types of papers (Meho 2007).

2.Materials and methods

All data were obtained through Web of Science
online academic database provided by Thomson Scientific. This database
included the necessary information to examine the relationship between
reference and citation counts for every review and highly cited papers
published in Malaysia since 1980 to October 2013. Science Citation Index
Expanded, Social Sciences Citation Index and Arts & Humanities
Citation Index, were searched for reviews and highly cited papers. For
each paper, all Bibliometrics data, especially the number of references
and the number of times the paper has been cited during the interval
between the year of publication and the year 2013, have been
collected.Two samples set were selected: 1- The sample number one
consisted of 1966 review papers in all disciplines from Malaysia,
according to the Web of Knowledge’s classification system. Citation
statistics produced by shorter than three years’ time frame may not be
sufficiently stable (Adams 2005, UZUN 2006).
Because, papers appearing in the Web of Science databases over the last
few years, have not had enough time to accumulate a stable number of
citations (Webster, Jonason, and Schember 2009).
Therefore, the time span limited from 1980 to November, 2010; yielding a
subsample of 721 publications (37% of the original sample).
Publications with zero citation were removed. In order to select the
highly cited paper a threshold 10 times cited per year is considered.
The association between the number of references (independent variable)
and time cited per year (dependent variable) of highly cited review
papers investigated with linear and non-linear models. 2- The sample
number two comprises 500 highly cited publications from Malaysia.
According to the Web Of Science classification, the results are obtained
based on the article type and exclude the review articles, editorial
material, conference papers and book review.

3. Results and discussion

Two sets of data 1- 1966 review papers and 2- 500
high cited papers, were investigated separately. The results and
discussions are coming as follows.

Outliers for sample one (1966 review papers)

Due to the effect of the age of an article, the number of citations
cannot be a reference of highly cited paper. Therefore, the citation per
year selected as a reference for highly cited paper. Papers with 10
times cited per year is considered as highly cited paper. Figure 3-1
shows the number of times cited per year for 660 review papers. A
threshold was visually determined on 50 times cited per year. Papers
with more than 50 times cited yearly is called “extremely high cited
paper” and detected as outliers. Papers with more than 300 listed
references also detected as outliers (3-2).

Figure 3-1 Number of times cited per year vs number of review papers references

Figure 3-2 Number of times cited per year vs number of references in review paper

Correlation analysis for sample one (1966 review papers)

The correlation between variables was modeled with regression model, linear model

y = α x + β and exponential model, non-linear model y = α eβx. The
goodness of both model was then measured with Spearman’s rho , Kendall’s
tau and Pearson correlation coefficient . The result of correlation
analysis is summarized in 3-1.

The association between variables is
graphically illustrated with scatter plots. The trend of these
associations was drawn with solid lines. Refer to Figure 3 and Figure 4,
both linear and non-linear models are not significantly fitted, trends
are positive which support the hypothesis “For a given review paper,
increasing in the number of references may have result of increasing the
times cited per year”.

Table 3-1 The result of correlation analysis of highly-cited review papersFigure 3-3 Relationship between number of references andcitation counts in review papers (linear model)Figure 3-4 Relationship between number ofreferences and citation counts in review papers (Exponential model)

Outlier detection for sample two (500 highly cited papers)

Papers with 10 times cited per year is considered as highly cited
paper. Papers that cited more than 100 times per year is considered as
extremely high cited paper and detected as an outlier. Figure 5 and
Figure 6 are showing raw data and filtered data respectively.

Figure 3-5 Raw data – Number of times cited per year vs number of references 500 highly cited papersFigure 3-6 Filtered data – Number of times citedper year vs number of references in 500 highly cited papers

Correlation analysis for sample two (500 highly cited papers)

The association between the number of
references (independent variable) and time cited per year (dependent
variable) of first 500 high cited papers investigated with linear and
non-linear model correlation analysis. The correlation was modeled with
regression model, linear model y = α x + β and exponential model,
non-linear model y = α eβx. The goodness of fit was then measured with
Spearman’s rho , Kendall’s tau and Pearson correlation coefficient . The
result of correlation analysis is summarized in Table 3-2.

Table 3-2 The result of correlation analysis of 500 highly cited papers.

The association between variables is
graphically illustrated with scatter plots. The trend of these
associations is shown by the solid lines. Figure 3-7 and Figure 3-8
shows, although both linear and non-linear models are not significantly
fitted, positive values of correlation coefficients are still suggesting
a positive trend (positive correlation) on the number of references and
the number of times cited per year.

Figure 3-7 Relationship between number of references and citation counts in 500 highly cited (linear model)Figure 3-8 Relationship between number of referencesand citation counts in 500 highly cited (Exponential Model)

4. Conclusion

This study shows that since the trend
between the citation count and the number of references is not
statistically significant, we cannot conclude that there is a
significant association between the citation count of Malaysia review
papers between the given period and number of references contained in
the paper. The correlation coefficient is not statistically significant.
However, r = 0.152 based on the population of 721 articles. Malaysian
review papers get more citations than other types of papers. The number
of references in the article has the lowest impact on the citation
compares with review paper. As this study looked only Malaysia review
papers and 500 highly-cited article, it would be necessary to conduct a
similar study in the otherworld and types of papers. It would be
important to examine whether in other types of papers the relationship
investigated here have significant correlated or not. The research
considered the general definition of citations. Therefore, future
studies may make a diffrentianain between “perfunctory citations” and
“organic citations” citations as Tang and Safer (2008)
defined “perfunctory citations” is occurred only once and in the
introduction, “organic citations” as references cited for “conceptual
ideas” and “methodology and data” reasons.ACKNOWLEDGEMENTSincere
thanks to Dr. Bojan Obrenović and the International Journal of
Management Science and Business Administration’s board members for their
useful advices. References

Abt, Helmut A., and Eugene Garfield. 2002. “Is the relationship
between numbers of references and paper lengths the same for all
sciences?” Journal of the American Society for Information Science and
Technology 53 (13):1106-1112. doi: 10.1002/asi.10151.

University of Malaya (UM) - Department of
Engineering Design and Manufacture, Faculty of EngineeringUniversity of
Malaya (UM) - Research Support Unit, Centre of Research Services,
Institute of Research Management and Monitoring (IPPP), University
Malaya - Institute of Mathematical Sciences, Faculty Science, University
of Malaya - Centre for Product Design and Manufacturing, Department of
Mechanical Engineering, Faculty of Engineering and University of Malaya
(UM) - Faculty of Engineering

Data is King: Tracking Internal Performance Metrics at Your Journal

If you’re like most editors, you’re always looking for new ways to
optimize your journal’s peer review process. Of course, in order to know
why bottlenecks are occurring in your workflow and come up with
solutions to stop them you have to figure out when and where they’re
happening first.

Many journals have begun to focus on tracking journal metrics to get a
granular view of their peer review processes - from simple stats like
average annual submission rate, to the average number of days it takes
individual editors to send authors manuscript decisions. Tracking
metrics can help journals stay abreast of how they are performing
externally in terms of volume, quality, and scope of submissions, and
internally in terms of the speed of journal-wide and editor and reviewer
specific performance.

“The big saying out there right now is data is king,” said Christine
Dymek, senior managing editor at leading journal management consultancy
Kaufman Wills Fusting & Company.

According to Dymek, who consults journal editors on peer review best
practices, all journals should produce analytics reports to go over
during regular team meetings.

“I think that it’s absolutely something journals need to do and audit
annually, if not bi-annually,” said Dymek. “It’s a good way to check
progress to see where you stand and to set goals.”

In Academic Journal Management Best Practices: Tales from the Trenches,
a recent Scholastica eBook, Dymek exaplains the core metrics she
encourages journals to track. Below we roundup those main metrics. For
editors using journal management software with built-in analytics, you
will likely be able to see all of this information from your account. If
you are not using journal management software, Dymek said not be
overwhelmed by having to manually track multiple stats. She encourages
journals to start out small choosing one or two metrics that matter most
to them.

Time to Manuscript Decision

According to Christine Dymek, one metric that all journals should
track and look to improve is the average amount of time it takes them to
make decisions on submissions, from the time a manuscript is first
received. It’s important for journals to ensure that their peer review
process is moving forward and that they aren’t accruing backlogs of
submissions that need to be assigned to editors or that editors need to
assign to reviewers. Dymek encourages the journals she works with to
agree on a benchmark for the number of days in which decisions should be
made on new submissions, so that editors have a shared goal to work
towards. Once that benchmark is in place, journals can determine if
they’re on or missing their mark by tracking their average time to
decision on a bi-annual or annual basis.

In addition to tracking journal-wide time to decision, Dymek said
journals could also benefit from tracking how long it is taking their
individual editors to move manuscripts through peer review, to determine
if everyone is working at the same pace or if one editor is struggling
to keep up with the manuscripts assigned to him.

“There’s a lot of benefit to that,” said Dymek. “You can take a look
at who your high and low performing editors are and ask - why are the
low performing editors moving at a slower rate?” Dymek said
editor-specific time to decision metrics can quickly reveal gaps in
editor support that need to be addressed, such as insufficient software
training, or simply whether or not journals need to modify their
manuscript assignment process to account for editors with too much on
their plate.

Acceptance and Rejection Rate

Another core metric journals can track is the average number of
submissions they accept and reject on a bi-annual or annual basis and
where rejection decisions are being made in their peer review process.
In her experience, Dymek said she’s seen that many journals can benefit
from assessing their average number of desk rejections in particular and
what those numbers say about their process.

“Desk rejects is a really important topic now,” she said. “The big
talk in academic publishing has been the increased strain on reviewers.
You want to make sure you’re only sending reviewers submissions that
have potential.”

Dymek said, on one hand, journals should make sure they aren’t making
too many desk rejections. If you notice a rise in desk rejections over
time, you may want to meet with your editors to discuss the change and
whether or not you’re putting enough manuscripts through peer review.
However, on the other hand, if you find that a high number of
manuscripts are being rejected during first-round peer review you may
need to consider whether your editors are screening manuscripts well
enough before sending them out to reviewers.

Your journal’s acceptance rate can also reveal a lot about your peer
review process and the quality of your submissions. If you find that you
have a very high acceptance rate or that your acceptance rate is
growing, it’s a good idea to compare acceptance rate by submissions rate
every six months or year to see if a decline in submissions has caused
your editors to begin accepting more manuscripts than usual. If that’s
the case, you should look to acquire more quality submissions to ensure
that your journal remains selective.

Manuscripts Per Reviewer and Average Time to Review

One of the hardest parts of peer review for all journal editors is
ensuring that they have enough peer reviewers to reach out to and that
their reviewers are completing assignments in a timely manner.

“It’s very important to make sure that the reviewer pool you have is
accurate and up-to-date, and that you only have people in it who
actually want to review,” said Dymek.

To gauge the quality of their reviewer database, she encourages journals to track how often reviewers decline review requests.

“If you have someone that’s been in a reviewer database for three
years and is yet to accept an assignment, you need to step back and ask
if it is really necessary to have that person in the database,” she
said.

Dymek said tracking the amount of time it takes each of your
reviewers to complete manuscript assignments is another indicator of
whether or not you are reaching out to the right people. You may find
that your reviewers continue to accept assignments, but that their
average time to decision is growing. If that’s the case, it may be a
sign that it’s time to find new reviewers and to give your go-to
referees a needed break.

To proactively ensure that your journal is not burning out reviewers,
Dymek said it’s also important to keep track of how often you are
assigning manuscripts to specific reviewers. She advises journals to
continually seek new reviewers so that they can alternate the people
they reach out to often.

This post was written by Danielle Padula,
Community Development Manager

Citation Frequency and Ethical Issue

Dear Editor:

I read your publication ethics issue on “bogus impact factors” with great interest (1).
I would like to initiate a new trend in manipulating the citation
counts. There are several ethical approaches to increase the number of
citations for a published paper (2). However, it is apparent that some manipulation of the number of citations is occurring (3, 4). Self-citations, “those in which the authors cite their own works” account for a significant portion of all citations (5).
With the advent of information technology, it is easy to identify
unusual trends for citations in a paper or a journal. A web application
to calculate the single publication h-index based on (6) is available online (7, 8). A tool developed by Francisco Couto (9)
can measure authors’ citation impact by excluding the self-citations.
Self-citation is ethical when it is a necessity. Nevertheless, there is a
threshold for self-citations. Thomson Reuters’ resource, known as the
Web of Science (WoS) and currently lists journal impact factors,
considers self-citation to be acceptable up to a rate of 20%; anything
over that is considered suspect (10).
In some journals, even 5% is considered to be a high rate of
self-citations. The ‘Journal Citation Report’ is a reliable source for
checking the acceptable level of self-citation in any field of study.
The Public Policy Group of the London School of Economics (LSE)
published a handbook for “Maximizing the Impacts of Your Research” and
described self-citation rates across different groups of disciplines,
indicating that they vary up to 40% (11).

Unfortunately,
there is no significant penalty for the most frequent self-citers, and
the effect of self-citation remains positive even for very high rates of
self-citation (5). However, WoS has dropped some journals from its database because of untrue trends in the citations (4).
The same policy also should be applied for the most frequent
self-citers. The ethics of publications should be adhered to by those
who wish to conduct research and publish their findings.

What exactly is a Digital Object Identifier (DOI) and how does it
help in the management and long-term preservation of research? Laurence Horton
explains the basic structure and purpose of a DOI and also points to
some limitations. DOIs are not the only way of providing fixed,
persisting references to objects, but they have emerged as the leading
system.

A DOI is a Digital Object Identifier. It is an online reference
(digital), pointing to (identifying) a resource (object). The DOI system
links, through a directory, references and web addresses of an object
to a “landing” page providing information on access and metadata about
that object — at a minimum
[PDF] its creator, title, publisher, year of publication, and DOI. This
allows DOIs to provide a stable, persistent, resolvable reference
taking users to an object, even if web addresses or other references to
the location of an object, or its content, change.

DOIs appeared with the new millennium, and there are now over 100 million assigned. The International DOI Foundation governs DOIs and regulates them to an ISO standard. Registration Agencies like DataCite or CrossRef
make up the foundation and provide the structure supporting DOIs.
Allocation Agents, who are members of Registration Agencies, manage
assigning DOIs to objects. Clients, like universities, sign a contract
and pay an annual fee to agents to become “registrants” and create, or
“mint”, DOIs. When minted, DOIs are registered with the Foundation whose
directory then points associated web addresses to the landing page

Objects need not be digital to have a DOI — they can be physical,
like a book. Nor need they be static — objects can change over time,
like a dataset. If web addresses or the object content significantly
changes, clients must update the DOI record so the Foundation’s
directory continues pointing users to the landing page.

DOIs combine a prefix and suffix. The prefix is fixed and
standardised. The “10” identifies the link as a DOI, followed a
four-digit number showing the registrant who minted it, so a DOI
prefixed 5255 always comes from the UK Data Archive. The registrant
defines the suffix. Here, the UK Data Archive uses its own sequential
numbering system but it could use longer or shorter strings of numbers,
letters, or both. The “1” at the end is the UK Data Archive’s indicator
this is a first edition of the data set.

Anything can have DOI as long as it has a digital landing page.
Indeed, DOI’s may be the only thing shared by Watson and Crick’s outline
of DNA published in Nature (10.1038/171737a0) later recognised with a Nobel Prize, and the film Holiday on the Buses (10.5237/A929-C667) described as “absolutely abysmal“ by Radio Times.
Also, if you only have the prefix and suffix in a reference, copy and
pasting into Google or most reference manager software also “resolves”
the DOI and retrieves its metadata.

DOIs are an investment in making data citable, elevating it to the
status of a research output with reuse equating to citation. In a world
dependent on publishing and being citied, if your data is available,
discoverable and citable then people will discover it and it will be cited.
DOIs are also flexible. Depending on the policy of the registrant, they
can be allocated to datasets, variables, documentation, and different
versions of datasets, not just publications.

What DOIs are not is a symbol of data quality. You can attempt to
define “quality” but the problem is using DOIs as a proxy. Just because
something has a DOI does not mean it is good — just watch Holiday on the Buses. Also, reading the International DOI Foundation handbook
does not produce a mention of quality. Identification, yes. Resolution,
yes. Management, yes. Quality, no. We must not start using tools
designed for one end to another.

What does it do for preservation?

We can start (and it is a start, there is still lots to address) bringing stability
to data referencing by using DOIs. In the past, referencing was
simpler: you cited something by describing its print location — author,
title, publication, volume, and page numbers. These days it can be
complicated. Websites, databases, audio, video, blogs, social media,
software, eThis, and iThat, the research world just does not exist only
on paper. Also, while the internet is not a “series of tubes“, it does “rot“. Websites change addresses, servers get switched-off, resources significantly change, and when that happens without care, original resources and references disappear.
For example, it does not take long in the reference section of
Wikipedia articles to come across links to pages that are dead, broken
or dangling. It is irritating, but if you are a legal scholar, when URLs
cited in court judgements no longer work it is a fundamental problem.

DOIs are not the only way
of providing fixed, persisting references to objects, but have emerged
as the leading system. Because of the infrastructure underpinning DOIs —
the technology, financial commitment and willpower behind the system —
objects with DOIs are discoverable, citable and offer long-term
reassurance that will remain the case.

Note: This article gives the views of the author, and not the
position of the Impact of Social Science blog, nor of the London School
of Economics. Please review our Comments Policy if you have any concerns on posting a comment below.

About the Author

Laurence Horton is Data Librarian at the London
School of Economics and Political Science. He is responsible for
Research Data Management support in the School. He can be found on
Twitter @laurencedata.

Search This Blog

About Me

Nader Ale Ebrahim has
a Technology Management PhD degree from the Department of Engineering
Design and Manufacture, Faculty of Engineering, University of Malaya
(UM), Kuala Lumpur, Malaysia. He holds a Master of Science in the
mechanical engineering from University of Tehran, Iran.