Early this morning, well before normal work time, the dedicated Centre for Research Communication employees, Marianne and Jane, entered the special media communication room which contains the video conferencing equipment so that they could jointly present “Publisher Interest towards a role for Journals in Data Sharing: The Findings of the JoRD Project”. In the true spirit of global access and the digital world, they presented in Nottingham, UK and the presentation was seen at the ELPUB conference in Karleskrona, Sweden. We are pleased to report that the Nottingham technology worked really well, but a fellow presenter, also speaking through Adobe Connect, had difficulties with her connection and transmitted the sound of a large aircraft which was passing over the room where she was speaking. Jane and Marianne had chosen the high-tech route, because currently a tram line and bridge is being noisily constructed out side their office window, and had they decided to present from their computer, there would have been the sound of heavy machinery moving, beeps and rumbles, drilling and clangs.

Last week was busy for the JoRD team. Jane did the presentation for ANDS, and Marianne appeared twice at Oxford, once to present a brief summary of the JoRD project to the Jisc organised “Now and Future of Data Publishing” event, and later in the week, to give a selection of the project findings to the Dryad Members meeting. The links to both the Oxford presentations follow, with a text summary.

1. The project was Jisc funded to explore the possibility of setting up a self sustaining data base and service to collate and summarise academic Journal policies on the deposition of data associated with published articles
2. Current belief that openly accessible research data is a good thing because it drives science forward
3. Aims Jisc funded project to look at the possibility of setting up a central resource of journal instructions to authors about sharing the data on which articles are based
4. Objectives
• Investigate current state of Journal data policies
• Investigate current data sharing views and habits
5. Landscape of data sharing There has always been data published in printed journals in the form of charts and tables
6. But digital data becomes a problem, where should it be stored? In a repository? On a website? Embedded into articles?
7. This is a journal data policy, it is an instruction to authors of where to share or deposit research data that is relevant to a published article
8. We initially analysed 230 research data policies and found many inconsistencies and a lack of standardisation
9. Some journals were vague about the form of data to be deposited, others were more precise
10. Some journals were specific about where the data should be deposited, most were less so.
11. Go back to the policy and explain
12. We spoke these stakeholder groups and we found a number of dichotomies
13. Taking researchers first, they said that they would be happy to share their data (with certain caveats, which I will not go into here). These were the reasons they gave for sharing data
14. However, when we asked how much they shared and where, most of them only shared with colleagues. Only a small number mentioned that they put their data into repositories
15. We asked them why that was, and they their replies ranged over No time, don’t know where, difficulty of accessing Institutional repositories. And that current research models do not value and encourage data sharing (A PhD researcher sated that he felt that if he shared his data during the course of the research, he may be “gazzumped”, meaning that should someone publish research on his chosen topic, the thesis would no longer be unique and therefore the doctoral thesis would no longer be credited)
16. The publishers also showed a dichotomy whereas they also appreciated the benefits of sharing data, they felt that their servers would have difficulty holding the quantity of data included in each article and that repositories were the right place. However there was some discussion about the long term availability of repositories. They have not yet been proven, but the publishing houses have been around for a long time
17. Worries about links, etc
18. Academic librarians and Repository managers, no conflicting concerns, practicality
19. Data sharing landscape is a mess
20. How could a Jord Service improve the infra-structure?
• Develop a model data policy framework, which takes into account the concerns of all the stakeholders
21. Improved policies saves the time of publishers and authors, more consistent
22. Address the fears of IP, data citation etc, eliminating dichotomies, improving the infrastructures, creates order
• Implications for repositories, authors know where data can be deposited to be shared and re-used, more will do so.

1. JISC funded feasibility study central resource of research Journal data policies
2. Looked at what the service should include and whether it could pay for itself
3. And 4 Tried to answer two questions
• Can Journal data policies encourage deposition of data?
• Will a JoRD service help publicly funded data to be shared and re-used?
5. Why bother? When an author publishes she is trading her intellectual property with a publisher, as part of a transaction and there are certain obligations on both sides, this can include data linked to the article. Author needs to know and understand what to do with it (reading the small print)
6. Needed to find out three things
• Understand current journal data policies
• Would anyone bother to use the service
• Could it generate sufficient income for development, building and maintenance?
7. We analysed some journal data policies in depth
8. Looked at 371 journals,
9. What was in the policies? Main areas were data type, when to deposit, and where
10. Little requirement for open access or compliance or consequences for non compliance
11. That does not provide an argument that journal data policies will help open data sharing
12. And 18 But there are signs that the situation is changing
• More publishers are considering data policies
• Elsevier Journal of the future
• Rise of data journals
• Apparent upward trend of journals with data policies
19. If there were a JoRD Service, would anyone use it?
20. All the stakeholders said that they would
21. For a variety of reasons BUT
22. They all wanted different things…
23. …apart from these, difficult to build one service
24. And will anyone pay for it?
25. Resounding no, except from publishers if the service was all singing and dancing
26. So, how does a JoRD service stand?
27. Now, with few policies stipulating deposit of data and stakeholders not financially contributing,
28. BUT… Let’s think of the future? The landscape is changing
29. Funders are asking for data plans to be included in funding bids
30. Universities are installing data management systems
31. Increase of data journals
32. And expectation that data should be included in articles
33. We have an opportunity to build a high quality data-base of existing journals data policies, which can be added to and maintained to a high level with simple user interface. Establish a user base and develop a sustainable business model which can be implemented in a later stage.
34. JoRD is the future And we should build it now when the quantity of data is smaller and the cost will be lower
35. Before the data deluge comes

The JoRD team have been distracted by other projects recently, while the feasibility study report was being read, digested and commented upon. After some useful suggestions by Simon Hodson of Jisc (http://www.jisc.ac.uk/contactus/staff/simonhodson.aspx) and Andrew Treloar of ANDS (Australian National Data Service, http://www.ands.org.au/contact.html) the report is now revised and ready to be submitted. While the report was being revised, the team have been working hard to achieve the dissemination of findings from the project by sending off abstracts to a number of conferences and accepting invitations for presentations. The team will be very active over the next three months and one or other team member will be found as speakers in the following places at the following times :

Sharing the data which is generated by research projects is increasingly being recognised as an academic priority by funders, researchers and publishers. The issue of the policies on sharing set out by academic journals has been raised by scientific organisations, such as the US National Academy of Sciences, which urges journals to make clear statements of their sharing policies. On the other hand, the publishing community expresses concerns over the intellectual property implications of archiving shared data, whilst broadly supporting the principle of open and accessible research data .

The JoRD Project was a feasibility study on the possible shape of a central service on journal research data policies, funded by the UK JISC under its Managing Data Research Programme. It was carried out by the Centre for Research Communications Research at Nottingham University (UK) with contributions from the Research Information Network and Mark Ware Consulting Ltd. The project used a mix of methods to examine the scope and form of a sustainable, international service that would collate and summarise journal policies on research data for the use of researchers, managers of research data and other stakeholders. The purpose of the service would be to provide a ready reference source of easily accessible, standardised, accurate and clear guidance and information, on the journal policy landscape relating to research data. The specific objectives of the study were: to identify the current state of journal data sharing policies; to investigate the views and practices of stakeholders; to develop an overall view of stakeholder requirements and possible service specifications; to explore the market base for a JoRD Policy Bank Service; and to investigate and recommend sustainable business models for the development of a JoRD Policy Bank Service

A review of relevant literature showed evidence that scientific institutions are attempting to draw attention to the importance of journal data policies and a sense that the scientific community in general is in favour of the concept of data sharing. At the same time it seems to be the case that more needs to be done to convince the publishing world of the need for greater consistency in data policy and author guidelines, particularly on vital questions such as when and where authors should deposit data for sharing.

The study of journal policies which currently exist found that a large percentage of journals do not have a policy on data sharing, and that there are great inconsistencies between journal data sharing policies. Whilst some journals offered little guidance to authors, others stipulated specific compliance mechanisms. A valuable distinction is made in some policies between two categories of data: integral, which directly supports the arguments and conclusions of the article, and supplementary, which enhanced the article, but was not essential to its argument. What we considered to be the most significant study on journal policies (Piwowar & Chapman, 2008), defined journal data sharing policies as “strong”, “weak” or “non-existent”. A strong policy mandates the deposit of data as a condition of publication, whereas a weak policy merely requests the deposit of data. The indication from previous studies that researchers’ data sharing behaviour is similarly inconsistent was confirmed by our online survey. However, there is general assent to the data sharing concept and many researchers who would be prepared to submit data for sharing along with the articles they submit to journals.

We then investigated a substantial sample of journal policies to establish our own picture of the policy landscape. A selection of 400 international and national journals were purposefully chosen to represent the top 200 most cited journals (high impact journals), and the bottom 200 least cited (low impact journals), equally shared between Science and Social Science, based on the Thomson Reuters citation index. Each policy we identified relating to these journals was broken into different aspects such as: what, when and where to deposit data; accessibility of data; types of data; monitoring data compliance and consequences of non compliance. These were then systematically entered onto a matrix for comparison. Where no policy was found, this was indicated on the matrix. Policies were categorised as either being “weak”, only requesting that data is shared, or “strong”, stipulating that data must be shared.

Approximately half the journals examined had no data sharing policy. Nearly three quarters of the policies we found we assessed as weak and only just under one quarter we deemed to be strong (76%: 24%). The high impact journals were found to have the strongest policies, whereas not only did fewer low impact journals include a data sharing policy, those policies were were less likely to stipulate data sharing, merely suggested that it may be done. The policies generally give little guidance on which stage of the publishing process is data expected to be shared.

Throughout the duration of the project, representatives from publishing and other stakeholders were consulted in different ways. Representatives of publishing were selected from a cross section of different types of publishing house; the researchers we consulted were self selected through open invitations by way of the JoRD Blog. Nine of them attend a focus group and 70 answered an online survey. They were drawn from every academic discipline and ranged over a total of 36 different subject areas. During the later phases of the study, a selection of representatives of stakeholder organisations was asked to explore the potential of the proposed JoRD service and to comment on possible business models. These included publishers, librarians, representatives of data centres or repositories, and other interested individuals. This aspect of the investigation included a workshop session with representatives of leading journal publishers in order to assess the potential for funding a JoRD Policy Bank service. Subsequently an analysis of comparator services and organisations was performed, using interviews and desk research.

Our conclusion from the various aspects of the investigation was that although idea of making scientific data openly accessible for share is widely accepted in the scientific community, the practice confronts serious obstacles. The most immediate of these obstacles is the lack of a consolidated infrastructure for the easy sharing of data. In consequence, researchers quite simply do not know how to share their data. At the present juncture, when policies are either not available, or provide inadequate guidance, researchers acknowledge a need for the kind of information that a policy bank would supply. The market base for a JoRD policy bank service would be the research community, and researchers did indicate they believed such a service would be used.

Four levels of possible business models for a JoRD service were identified and finally these were put to a range of stakeholders. These stakeholders found it hard to identify a clear cut option of service level that would be self sustaining. The funding models of similar services and organisations were also investigated. In consequence, an exploratory two phase implementation of a service is suggested. The first phase would be the development of a database of data sharing policies, engagement with stakeholders, third party API development with the intention to build use to the level at which a second phase, a self sustaining model, would be possible.

So far this blog had commented on what researchers think and what publishers and journals are currently doing. The final part of the stakeholder consultation comprises interviews that were held with academic librarians which explored their thoughts on open access research data; the role of librarians in working with open data and a JoRD policy bank service. The librarians agreed with views of the other stakeholders that wider access to research data is beneficial. However, they showed a deeper understanding of the infrastructure required to store and access data and considered the problem of selecting which data should be preserved. In their experience, institutional practice is not advancing in line with policies, and, as information specialists, librarians considered that they have the skills necessary to improve the situation.

Librarians anticipated that their expertise could be used for the following roles:

Meta-data management and structure of data

Data licensing

Inclusion of data in institutional repositories

Data management advice and training

Co-ordination with other university support departments, for example, IT, record management and research office.

Enabling compliance

Librarians were also positive about the concept of a JoRD Policy Bank service, but considered that it would be a useful addition to some existing services, for example RoMEO or JISC Collections Knowledge Base+; therefore creating a single point of reference for broad advice on data management and publication. As with the views of other stakeholders, librarians considered that one function of a JoRD service would be to compare journal policies with funders requirements, but also suggested that some co-funded projects would need guidance should the funder’s policies be different. They also suggested that JoRD should rate journal policies on aspects such as usability and access of data.

Here is another summary of the concluding discussion that took place at the workshop on 13th November. This is about the expectations and perceptions of publishers concerning the nature of the JoRD Data Bank service.

A prominent consideration of the publishers was that JoRD should be an authoritative resource, such that a JoRD compliance stamp, or quality mark, could be displayed on Journal’s websites. There was discussion that for JoRD to be authoritative, the content of the database should be added, updated and maintained by the JoRD team. It was mentioned that publishers might initially populate the data base, but ongoing maintenance would be the responsibility of JoRD. However, there should be a guarantee that the content is accurate and that publishers would need to commit to providing policies that can be machine readable in order for them to be automatically harvested.

It was suggested that the operational database should not be merely a static catalogue or encyclopaedia. It was requested that the non-compliance of a journal to a data sharing policy, or to a funder’s policy, could be flagged and reported to the publisher, although that request was queried as to whether that was the remit of the service, or the publisher themselves. Similarly, it was questioned whether the service would mediate user complaints, and proposed that it would engage with complaints concerning policies only. To maintain functionality, could there be automatic URL checking which would send an alert to the publisher if links were broken. Updates to policy changes would also be a useful function.

The service website should include a model data policy framework or an example of a standard data policy and offer guidance and advice to journals and funders about policy development. However, the processing and ratification of a model policy could be a time consuming process to some publishers. It was asked whether repository policies would also be included, and there was mention of compliance with the OpenAIRE European repository network. The website should also contain:

Links to the publishers web-pages

Dates of the records

Lists of links to repositories

Set of criteria for data hosting repository

It should look inviting, but businesslike and be simple and clear, but be sufficiently detailed.

Methods of funding the service were considered and the benefits of membership. For example, would only the policies of members to the service be entered into the database? Would there be different levels of membership or different service options that publishers could choose? and would there be extra costs for extra services? One such service could be to contain historical records and persistent records to former policies. In the publisher’s opinion, they would be prepared to pay for a service that is transparent and would save them time.

Other comments included:

Would the service be a member of the World Data System?

Could it be released in Beta?

There are around 4-600 titles to enter initially

When set up the service could be studied to discover its effectiveness and impact

The Focus group was carried out as one of the normal meetings of the Nottingham Café Scientifique et Culturel on the evening of Monday 8th October 2012. This society meets for the purposes of ‘public engagement’ with the latest ideas arising in science and culture. The audience is mainly comprised of academics, professionals, and students. The audience therefore has an interest in understanding research and associated matters. As the ‘general public’ they are also interested in how public money is spent on research, and what happens to the outputs that are gained from this research.

Prior to the Focus Group starting, the purpose of the Focus Group was explained to the participants. They were then asked to sign a consent form and given a sheet of suggested questions which they could refer to throughout the Focus Group. They were asked to provide either their experiences or opinions related to this area.

The following topics arose during the course of the Focus Group Meeting.

THE RESEARCH OR DATA OF THE FOCUS GROUP ITSELF

The focus group reported a variety of academic, practical (e.g. professional purposes such as obtaining community data to submit funding bids), and personal research projects (e.g. database of output of personal research interests in an academic field). The focus group participants seemed to have a data sharing mindset and overall felt that data should be shared.

LOCATION OF THE DATA / LOCATING DATA

People wondered where data should be submitted so that it did not get lost – this is important as it is a public record produced often at the public’s expense.

What is the best method of finding data?

Journals – People still publish via journals, people are used to this model, and it means that people then know where to look for research output.

Use of Google Scholar – Google Scholar can help with locating studies. But Google itself provides a search list which shows the items that are most frequently consulted, rather than necessarily showing those which are of better quality.

Institutional Repositories – However, these are not consistent from one organisation to another; they have different methods and the software can be configured differently. IRs may lead to searching Google instead.

WHEN DO YOU SHARE RESEARCH DATA – AT WHAT POINT IN THE PROCESS OF RESEARCH?

At what point in the research process should the data be shared?

Should there be a choice about the timing of release?

Raw Data – Should the data be in its ‘raw’ state or should it be contextualised by the researcher first? The data in some of its early states may not be comprehensible or usable by others. In these states it could be liable to misuse. It may be better to release the data once it is determined that there are no errors in it which could lead to unreliable studies by other researchers.

Interpretation before release – If people are still processing the data, they may feel the need to interpret it before sharing it. They may thus wait until the PhD, or other report is finished, before going public with the actual data. People would not necessarily want to share their data prior to producing their publications in order to maximise the number of publications.

The nature of the data – It may depend on what people want to use the data for, and the nature of the data itself as to whether shared data is useful. Is this more of an issue for qualitative data which is based on the interpretation of the researcher, rather than quantitative data?

Relevance of the data – Should the data be released while it is still of interest? Old data may lose its relevance or appeal.

Peer Review – The data should be available for the review process to enable peer-review to check the data. This could however be a time consuming process. Not all reviewers may feel they have the time to check the data as well as the article to which it relates.

BENEFITS OF DATA SHARING

New outcomes – Other people may be able to produce fresh interpretations of the data to advance the subject. Different researchers may find patterns that other people have missed.

Preservation – data which is copied and updated by others is more likely to be preserved; it is also more likely to be checked and is thus more reliable.

Ensuring reliability – e.g. making pharmaceutical data open ensures that it is not ‘rubbish’ (see arguments of Ben Goldacre)

Collaboration/Comprehensivity – sharing a personal database of research means that other people would be able to contribute; one person cannot collect all the data necessary for the project. This would then lead to a comprehensive database.

Pooling data – sharing data would enable data to be pooled from different sources.

ISSUES WITH DATA SHARING

Confidentiality – Issues of confidentiality were raised related to data sharing which would make it difficult to be shared.

Infrastructure – Lack of infrastructure in the researcher’s organisation may deter data sharing.

Preservation – Data formats: some are not straightforward; digital data may have been stored in formats that are no longer used (floppy discs for example); more reliable formats are needed; readers for obsolete data types may be required. What would assist with data preservation? (e.g. more reliable formats such as tablets of stone, the web).

Time – It is time consuming to prepare data for sharing.

Value judgments – Who is qualified to make a judgment on what data should be preserved, as not everything can be preserved?; Who should have the job of filtering other people’s minds? Will this lead to value judgments being made about some forms of data?

Knowledge is power – it is also access to future funding. People may be concerned about sharing data if it means that it is used by others in a way which prevents them from obtaining future funding to continue with the line of research.

Misuse – future analyses may be incorrect, or cherry-picking of the data may take place to aid a particular argument – and data which does not support the argument can be ignored.

Processed data – people may claim that the data has been fiddled with (processed in some unreliable way).

Lack of Knowledge of how to share data – Someone reported that they did not know how to share data but would like to be able to do this.

Information Overload – A data sharing culture may mean that eventually there is too much information out there to manage successfully.

New research – Research could become a process of analysing old datasets rather than producing new data. Science would then become a process of interpretation.

Different languages – This could be a barrier to collaboration and sharing.

Ownership disputes – There could be disputes between authors as to who owns the data.

Verification studies – Funders do not want to fund them, they are seen as low status and not worthwhile. Journals do not want to publish straightforward replication studies; they value newness but this does not mean that the study is necessarily worthwhile. Again there are value judgements being made here, but not by the researchers themselves. The researchers are at the joint mercy of funders and publishers.

New models of data sharing – The way data is shared changes frequently (e.g. CD v iTunes model); people have to keep up to date with the environment of sharing.

Financial models – publishers need to make money, may impede the process of data sharing? OA needs to find a way of being sustainable.

INCENTIVISATION

Data Citation – This ensures that all data re-use is cited so that the original researcher(s) get(s) the credit for the data they have produced.

How to incentivise? – Given that University promotion is based on new research and high impact journals, how can researchers be incentivised to share their data if they perceive that this may weaken their professional progress?

Peer review – Peer review of data could lead to public attributions of merit.

LEVELS OF ACCESS

Free information – One attendee wanted to make their personal research available – but wanted the access to be free.

Researcher pays? – This seems like vanity publishing to one of the attendees.

QUANTITATIVE V QUALITATIVE DATA

Someone mentioned that they could not find statistical data to back up their research but anecdotal, qualitative research supported their assumption. If they had waited for the supporting figures it would have taken too long to set their community project in motion. This is why community groups are now commissioning research.

These statements demonstrate the incentivisation process for people to share their data and make it available for re-use. Benefits are accrued back the the original researcher(s) for having shared their data, but also the discipline itself becomes more impactful.

WHAT A DATA CITATION MIGHT LOOK LIKE

Examples

United States Department of Commerce. Bureau of the Census, and United States Department of Labor. Bureau of Labor Statistics. Current Population Survey: Annual Demographic File, 1987 [Computer file]. ICPSR08863-v2. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2009-02-03. doi:10.3886/ICPSR08863

“Professional associations, journals, data repositories and funding agencies must work together to make the entire scientific venture more transparent and to encourage broader access to research data,” said ICPSR Director George Alter. “The first step is to give scientists who produce important research data the recognition they deserve.”

The University of Michigans’ Inter-university Consortium for Political and Social Research and the Alfred P. Sloan Foundation are working together to promote open access to research data and improve the link between published works and the background data.

In particular, the ICPSR will be working with stakeholders within the social sciences, to improve:

Our survey work with JoRD has indicated that Social Sciences journals are behind Science journals in having policies on data sharing and archiving. This project has the potential to address this imbalance.

Humanities and Social Sciences – what’s different compared to Science, Technology and Medicine (STM)?

The following summarises the key points of comparison from an article in Research Information:

Attitude to information? – one fifth of researchers in the life sciences and physical sciences rated print versions of current journal issues as useful for their research. In Arts and Humanities the figure was three fifths.

Funding of the research sectors? – Unlike STM, much research in the humanities and social sciences is produced by individual researchers without the support of a specific project grant (does not therefore cover publication costs). There is more funding in STM.

Journal prices – usually higher in STM fields than in the Humanities or Social Sciences.