Data generated through the course of research is as valuable
an asset as research publications. Access to research data enables the
validation and verification of published results, allows the data to be reused
in different ways, helps to prevent duplication of research effort, enables expansion
on prior research and therefore increases the returns from investment. Yet the
quality and quantity of a researcher’s publications continue to provide the key
measure of their research productivity. Sharing data, it seems, still does not
count for nearly enough.

In recent years there have been a proliferation of policies strongly
encouraging and sometimes even requiring researchers to share their data for
the reasons outlined above. This includes policies from governments (e.g. USA, Australia),
publishers (e.g. PLOS,
Nature),
and research funders (e.g. NIH, ARC).
These policies are certainly opening up more data but even more research data
remains locked away and therefore undiscoverable. So how do we unlock more data?
One of the ways is to figure out how to make data count so that researchers
have more incentives to undertake the extra (and in the main, unfunded) work required
to share their data.

A 2013 study
by Heather Piwowar and Todd Vision looked into the link between open data and
citation counts. They found that the citation benefit intensified over time: with publications from 2004 and 2005 cited 30 per cent more often if their
data was freely available; every 100 papers with open data prompted 150 "data reuse papers" within
five years; original authors tended to use their data for only two years, but
others re-used it for up to six years.
More studies like this one are needed to demonstrate and track over time the
link between opening up data and making it count, in this case in the form of
citations which – like it or not – is still the primary measure of research
impact.

Counting data citations – whether to gather citation
metrics or alternative metrics (altmetrics) - is challenging in and of itself
because data is cited very differently to publications. Data can be cited
within an article text rather than in the references section, which means the
article must be open access in order for the citation to be discovered.
Sometimes the article that referenced the data is cited rather than the data
itself even where the reference applies only to the data. Reference managers
don’t tend to recognise datasets and therefore don’t record the Digital Object
Identifier (DOI), which creates difficulties since DOIs make it so much easier
to track citations. There are also many self-citations, where researchers are citing
their own data, and so it difficult to distinguish an article that has cited
another person’s data. And there are likely to be differences between how data
is cited in the sciences as compared to the humanities.

Fortunately, California Digital Libraries, PLOS and
DataONE have partnered in an NSF-funded project called Make Data Count. The project
will “design and develop metrics that track and measure data use i.e data-level
metrics”. The findings promise to be highly valuable and may also shape future
recommendations for the way data should be cited in order for it to be counted.

Sharing impact stories of data reuse is perhaps another
way that can help make data count. A number of organisations around the world
that promote better data management have been collecting data reuse stories
(e.g. DataONE, ANDS). Some researchers
may see these stories as a negative because they show that “someone else might
get the scoop on ‘my’ data”. But these stories can also inspire researchers to
spend the extra effort to make their data available when they feel they are
ready to. The rewards may not only be in the metrics but in the unexpected ‘buzz’
of seeing ‘your’ data have a longer life and be reused in ways you had not even
imagined. Are there other ways that we can help make data count? It’s worth
thinking about because "data
sharing is good for science, good for you".

Blog Archive

About Me

Natasha Simons is a Research Data Management Specialist with the Australian National Data Service, an organisation set up by the Australian Government to enhance the value of data for researchers, research institutions, and the nation. Located at Griffith University in Brisbane, Natasha serves on the Council of Australian University Librarians Research Advisory Committee and is an ORCID Ambassador. She is an author and reviewer of papers related to library and information management and co-authored a 2013 book on digital repositories. Natasha was the Senior Project Manager for the Griffith Research Hub, which won awards from Stanford University and VALA. She is an advocate for open data and open repositories. Natasha is @n_simons on Twitter.