Earlier this month, CODATA and World Data System, both interdisciplinary committees of the International Council for Science, jointly organized SciDataCon, an international conference on data sharing for global sustainability. The conference was held Nov 2-5, 2014, on the campus of Jawaharlal Nehru University, New Delhi, India. Creative Commons Science had a busy schedule at the conference attended by 170+ delegates from all over the world, many from the global south.

We started early with a full day workshop on text and data mining (TDM) in cooperation with Content Mine. The workshop was attended by a mix of PhD students and researchers from the fields of immunology and plant genomics research. It was really rewarding to see the participants get a handle on the software and go through the exercises. Finally, the conversation about legal uncertainty around TDM appraised them about the challenges, but bottom-up support for TDM can be a strong ally in ensuring that this practice remains out of the reach of legal restrictions.

During the main conference we joined panel discussions on data citation with Bonnie Carroll (Iia), Brian Hole (Ubiquity Press), Paul Uhlir (NAS) and Jan Brase (DataCite) and international data sharing with Chaitanya Baru (NSF), Rama Hampapuram (NASA) and Ross Wilkinson (ANDS). We also participated in a daily roundup of the state of data sharing as presented at the conference organized by Elizabeth Griffin (CNRC).

SciDataCon, which used to be called CODATA, is held every two years, and is an important showcase of open science around the world. It is an important gathering for it brings together many scientists from the global south. A lot remains to be done to make real-time, pervasive data sharing and reuse a reality in much of the world, but there are heartening signs. At a national level, India’s data portal holds promise, but making data licensing information more explicit and data easily searchable by license would make it more useful. Citizen science projects in the Netherlands, India and Taiwan demonstrated how crowds can be involved in experiments while ensuring the user-generated content is made available for reuse, and SNEHA’s work on understanding perspectives on data sharing for public health research was particularly insightful of the value of listening to the feedback from participants.

We look forward to continue working with CODATA and WDS promoting and supporting open science and data initiatives around the world, and particularly in the global south, and hope for more success stories in the next SciDataCon.

Lily Bui, a graduate student in the MIT Comparative Media Studies program built a lovely web site that allows everyone to enjoy the sounds and music from the golden record via an attractive, easy to use web interface. In a serial burst of inspiration, Lily has also dedicated her web site to the public domain via a CC0 Public Domain Dedication.

In her words, “To be perfectly frank — I mostly designed this mostly for myself so that I wouldn’t have to access the archival audio through the Library of Congress portal.” Well, turns out a lot of people share Lily’s point-of-view. Ever the academic, she was taking a course at MIT that “examined the ‘migration of cultural materials’ into the digital space, combining traditional humanities with computational methods.” She is convinced her work is grounded in theory. Perhaps, for we love the sounds and music so much that we have yet to read Humanities Approaches to Graphical Display by Johanna Drucker.

]]>http://creativecommons.org/weblog/entry/44290/feed0CC Science’s Indian Novemberhttp://creativecommons.org/weblog/entry/44167
http://creativecommons.org/weblog/entry/44167#commentsWed, 29 Oct 2014 06:12:37 +0000https://creativecommons.org/?p=44167We are in New Delhi and Mumbai for a number of presentations, workshops and meetings. Please come say hello if you are at these events or in the area.

SciDataCon2014 in New Delhi

The International Conference on Data Sharing and Integration for Global Sustainability (SciDataCon) is motivated by the conviction that the most research challenges cannot be addressed without attending to issues relating to research data essential to all scientific endeavors. However, several cultural and technological challenges are still preventing the research community from realizing the full benefits of progress in open access and sharing. CODATA and WDS, interdisciplinary committees of the International Council for Science (ICSU) are co-sponsoring and organizing a high profile international biennial conference at Jawaharlal Nehru University, New Delhi.

TDM is an important scientific technique for analyzing large corpora of articles used to uncover both existing and new insights in unstructured data sets that typically are obtained programmatically from many different sources. While the science and technology TDM is complex enough, its legal complications are equally dizzying. Not only is its legal status unclear at best, it varies from jurisdiction to jurisdiction making cross-national collaboration difficult. Besides the license status of the original material, contractual agreements between research institutions and publishers, who are often the gatekeepers of the corpora, can create significant hurdles. The workshop offers an introduction to TDM, presenting the legal considerations through hands-on exercises.

Effective and efficient application of scientific data for the benefit of humanity entails agreed goals, clear and reproducible methods, and transparent communication throughout the data chain from producer to user via data organizer and research publisher. How well is that working? A Panel Discussion at the close of each day will summarise that day’s conclusions, and respond to the question of how well the data chain may be working from a trio of perspectives: Conference Organizer, data-management expert, and data producer.

Synthesis Data Citation Principles and Their Implications for TDM: Importance, Credit and Attribution, Evidence, Unique Identification, Access, Persistence, Specificity and Verifiability, and Interoperability and Flexibility: these eight important phrases describe the data citation principles agreed upon by the community and published under a joint declaration and endorsed by 185 individuals and 83 organizations. But, what are the implications of these principles beyond just citation, particularly with respect to automated analysis of large corpus of articles? This presentation will briefly present the principles, and then explore some of the issues that we have to come to grips with in order to make text and data mining (TDM) easy for scientists.

Maximizing Legal Interoperability Through Open Licenses: Many scientists do think about interoperability as they have to work with colleagues from other domains. However, common interoperability efforts are focused on technical, and if we are lucky, semantic interoperability. Rarely do scientists think of legal interoperability in the design of their science experiments. Can my work be legally mixed with someone else’s work without violating any intellectual property (or worse, privacy and security) laws? Is my work portable across not just scientific domains but also across judicial boundaries? We attempt to shed light on some of these questions in this presentation.

Nov 5: Talk on CC/OKF open science activities to be given at the computer science dept., Indian Institute of Technology-Delhi

HBCSE at Tata Institute of Fundamental Research (TIFR), Mumbai is a National Center with the broad goals to promote equity and excellence in science and mathematics education from primary school to undergraduate college level, and encourage the growth of scientific literacy in the country. We will be discussing with HBCSE’s metaStudio potential areas of collaboration in citizen science and the use of sensors in projects to accelerate the growth of scientific awareness in the country through direct public participation in science.

What were five hundred folks from 30 countries doing in 40+ different sessions running concurrently in three rooms of two gorgeous buildings in Ciudad de México? They were showing, sharing and learning from the best of each other’s work utilizing open data, pushing governments to adopt open policies, and hacking for social, environmental and humanitarian change in Latin America and the Caribbean. Condatos may be the most important regional conference on open data held in Latam, but it is undoubtedly a showcase of the diversity, ingenuity, vibrancy and perseverance of the changemakers in that historic yet energetic region.

The two buildings of the conference venue were definitely symbolic of the dynamic nature of the gathering—the historic and gorgeous Biblioteca de México with Octavio Paz looking down on the young crowd and its high stone walls inscribed with words from the giants of Mexican literature were like bookends in time; the soaring, modernistic architecture of Cineteca Nacional were a nod to the exponential change in thinking and practice that was being hacked by the young crowd.

We are grateful for the chance to present our vision for a public commons of information that can both drive and be driven by the energy and innovation on display at the conference, and are thrilled at the new partnerships that hold promise for further expansion of the powerful concepts of open and sharing.

To the extent possible under law, Puneet Kishor has waived all copyright and related or neighboring rights to all photos and PDF in this blog post.

]]>http://creativecommons.org/weblog/entry/43883/feed0Examining deficiencies of and limitations on data sharinghttp://creativecommons.org/weblog/entry/43484
http://creativecommons.org/weblog/entry/43484#commentsMon, 18 Aug 2014 16:20:21 +0000http://creativecommons.org/?p=43484Whether patients, or part of traffic, or exercising or simply walking with one of the behavioral trackers du jour, we are constantly giving data about ourselves and our surroundings to data collecters with few returns. From privacy regulations to bureaucratic barriers to collecting and locking up information just in case it might create monetary value in the future, there are a multitude of barriers between those who collect information and those who want to use it.

With support from Robert Wood Johnson Foundation (RWJF), we are launching two projects exploring different aspects that often get in the way of easy sharing of citizen-sourced information.

In collaboration with the Institute for Human Genetics and EngageUC at UCSF, and Personal Genome Project at Harvard University, we will explore the practical, ethical and legal implications of emphasizing benefits of sharing over the need for privacy at a workshop planned for Spring 2015 in Washington DC. A few of the questions to be tackled at the workshop: What if, instead of emphasizing the imperative of protecting privacy, we emphasized the potential benefits from sharing? Would most patients agree to let their information be shared? more →

Partnering with Manylabs, a San Francisco-based sensor tools and education nonprofit, and Urban Matter, Inc., a Brooklyn-based design studio, and in collaboration with the City of Louisville, Kentucky, and Propeller Health, maker of a mobile platform for respiratory health management, we will design, develop and install a network of sensor-based hardware that will collect environmental information at high temporal and spatial scales and store it in a software platform designed explicitly for storing and retrieving such data.

Further, we will design, create and install a public data art installation that will be powered by the data we collect thereby communicating back to the public what has been collected about them. more →

]]>http://creativecommons.org/weblog/entry/43484/feed0CC Signs Bouchout Declaration for Open Biodiversityhttp://creativecommons.org/weblog/entry/43172
http://creativecommons.org/weblog/entry/43172#commentsFri, 11 Jul 2014 14:18:32 +0000http://creativecommons.org/?p=43172CC is supporting the Bouchout Declration for Open Biodiversity Knowledge Management by becoming a signatory. The Declaration’s objective is to help make biodiversity data openly available to everyone around the world. It offers the biodiversity community a way to demonstrate their commitment to open science, one of the fundamental components of CC’s vision for an open and participatory internet.

“There are no copyright impediments to the sharing of names and related data. The system must reward those who make the contributions upon which we rely. Building an attribution system remains one of the more urgent challenges that we need to address together.”

Donat Agosti introducing the Bouchout Declaration at the OpenDataWeek, RMLL, Miontpellier, France, July 11, 2014. Photo by P. Kishor released under CC0 Public Domain Dedication

The declaration calls for free and open use of digital resources about biodiversity and associated access services and exhorts the use of licenses or waivers that grant or allow all users a free, irrevocable, world-wide, right to copy, use, distribute, transmit and display the work publicly as well as to build on the work and to make derivative works, subject to proper attribution consistent with community practices, while recognizing that providers may develop commercial products with more restrictive licensing. This is not only aligned with the vision of CC itself, CC is also the creator and steward of the legal and technical infrastructure that allows open licensing of content.

Screenshot of phylogeny from PhyLoTA as displayed in BioNames. The user can zoom in and out and pan, as well as change the layout of the tree from BioNames: linking taxonomy, texts, and trees by Roderick D. M. Page used under a CC BY License.

The declaration also promotes Tracking the use of identifiers in links and citations to ensure that sources and suppliers of data are assigned credit for their contributions and Persistent identifiers for data objects and physical objects such as specimens, images and taxonomic treatments with standard mechanisms to take users directly to content and data. CC has participated from the beginning in the activities that led to the Joint Declaration of the Data Citation Principles and that promotes the use of persistent identifiers to allow discovery and attribution of resources.

Bouchout Signatories. Image by Plazi released under a CC0 Public Domain Dedication

Most of the world’s biodiversity is in developing countries, and ironically, most of biodiversity information and collections are in developed countries. Agosti calls this, “Biopiracy: taking biodiversity material from the developing world for profit, without sharing benefit or providing the people who live there with access to this crucial information.” (Agosti, D. 2006. Biodiversity data are out of local taxonomists’ reach. Nature 439, 392) Opening up the data will benefit the developing counties by giving them free and easy access to information about their own biological riches. Friction-free access to and reuse of data, software and APIs is essential to answering pressing questions about biodiversity and furthering the move to better understanding and stewarding our planet and its resources. Signing the Bouchout Declaration strengthens this movement.

]]>http://creativecommons.org/weblog/entry/43172/feed0Liberating the Haystack for the Needleshttp://creativecommons.org/weblog/entry/42902
http://creativecommons.org/weblog/entry/42902#commentsMon, 02 Jun 2014 18:56:58 +0000http://creativecommons.org/?p=42902This post with invaluable assistance from the CC legal and policy teams.

Text and data mining (TDM) is becoming an increasingly important scientific technique for analyzing large amounts of data. The technique is used to uncover both existing and new insights in unstructured data sets that typically are obtained programmatically from many different sources.

PBDB Navigator screenshot released under a CC0 1.0 Public Domain Dedication

Legal Uncertainty

While the science and technology of TDM are complex enough involving information retrieval (IR), optical character recognition (OCR), and natural language processing (NLP), the legal complications are, sadly, equally dizzying. The legal status of TDM is unclear at best, both because there are a multitude of techniques to engage in TDM, and because the implications of various techniques vary from jurisdiction to jurisdiction. This makes cross-national collaboration, integral to science, difficult at best. For example, TDM is generally considered to not implicate copyright in the U.S. There are several theories as to why TDM falls outside copyright, but the most obvious is that it uses copyrighted material for a transformative purpose and is therefore a fair use. Judge Baer, writing in Author’s Guild, Inc., et. al. v. Hathi Trust, et. al. (Case 1:11-cv-06351-HB)

“The use to which the works in the HDL are put is transformative because the copies serve an entirely different purpose than the original works: the purpose is superior search capabilities rather than actual access to copyrighted material. The search capabilities of the HDL have already given rise to new methods of academic inquiry such as text mining.”

Judge Baer goes on to state:

“I cannot imagine a definition of fair use that would not encompass the transformative uses made by Defendants’ MDP and would require that I terminate this invaluable contribution to the progress of science and cultivation of the arts.”

The clarity, however, is far from universal as the situation outside the U.S. gets muddy. While there have been a few welcome developments in the U.K., the copyright laws of many other countries have little to no clarity on whether TDM falls outside of the reach of copyright and related laws. Where TDM does implicate copyright, the license status of the original material can make automated access and analysis very complicated, requiring additional checks to ensure any material is only being used as permitted by the license. And, even where the relevant licenses are free and open, and conducive to TDM, contractual agreements between research institutions and publishers, who are often the gatekeepers of the corpora, can create significant hurdles.

Public Sentiment

In a comment on proposed U.K. exception for information mining, both iCommons and the Open Knowledge Foundation (OKFN) supported the UK Government’s opinion that it is inappropriate for “Certain activities of public benefit such as medical research obtained through text mining to be in effect subject to veto by the owners of copyrights in the reports of such research, where access to the reports was obtained lawfully.” PLOS opined, “Enabling content mining is a core part of the value offering for Open Access publication services.” In its response to EU copyright review, LIBER stated, “All exceptions related to education, learning and access to knowledge to be made mandatory. In particular, we would like to see a specific exception for text and data mining for all research purposes.” OKFN’s Working Group on Open Access stated:

“We assert that there is no legal, ethical or moral reason to refuse to allow legitimate accessors of research content (OA or otherwise) to use machines to analyse the published output of the research community. Researchers expect to access and process the full content of the research literature with their computer programs and should be able to use their machines as they use their eyes.”

Support for text and data mining under the guise of “The right to read is the right to mine” has been demonstrated by other organizations including the declarations by Copyright for Creativity (July 2013) and the International Federation of Library Associations and Organizations (December 2013). If we as a society wish to realize the incredible potential for text and data mining, the practice should not be controlled through contractual terms or licensing.

Instead of relying on contractual restrictions or licensing to engage in text and data mining, non-consumptive uses of texts should be expressly eliminated from the reach of copyright and contract. The UK’s Hargreaves Report (PDF, p. 47) suggested the adoption of an exception to copyright law for non-consumptive uses, which are “uses of a work enabled by technology which does not trade on the underlying creative and expressive purpose of the work.”

Most recently, the UK copyright reform legislation introduced changes that makes it easier to engage in TDM for non-commercial purposes, allows storing of the corpus locally as long as it remains protected from general public access, and perhaps most importantly, disallows contractual negotiations that would make it difficult to conduct TDM.

The above sentiments are laudable, and copyright reforms friendly to TDM are very important, and we support such efforts. However, we believe the more knowledgeable potential users of TDM are about the technology and related issues, the better they will be able to negotiate conditions that make their research easy and efficient. Hence, we want to push forward with education and awareness building as a bottom-up effort.

Building Bottom-Up Support

We are working with the ContentMine team developing an agenda for a workshop that would provide training in TDM and educate the participants regarding the legal considerations through hands-on exercises. We will introduce the topic, the tools and techniques, tackle a specific problem, and then use that to expose researchers to the legal complications that they may encounter in conducting their research and the legal considerations they should keep in mind when choosing a license for their works. We have three objectives for this series of workshops—

Introduce participants to the basic tools and techniques of text and data mining (TDM);

Make participants aware of the legal intricacies of TDM and the implications of choosing the right licenses that enable TDM for downstream users;

Nurture a community of practice whose members may draw upon each other for continued help.

To be clear, we are not intending the workshop to be a detailed and comprehensive training in TDM, and it is certainly not a replacement for expertise in this deep and comprehensive technique. Instead, the workshop is designed to be both an introduction to basic technical and legal concepts as well as an opportunity to get to network with experts as well as novices with interest in the field. We hope participants intending to use TDM for their work will be better informed when seeking collaboration with TDM experts.

In cooperation with computing, legal and library experts, we will adapt the workshop agenda to make it more suitable and relatable to the host institutions. Our aim is to reach communities of researchers in countries that are otherwise under-represented in the global conversation on open science and data. We have identified researchers, and will continue to identify more, both on the technical as well as legal side with whom we intend to start building a network. If you are working with TDM, intend to work with TDM, and have expertise either in its technology or in related legal issues specific to your jurisdiction, please contact us.

We also intend to develop a community of practice for TDM, either standalone or via existing platforms such as StackExchange, and will utilize online resources such as forums, mailing lists, and a roster of technical, legal and institutional experts available to provide assistance with TDM.

I received a fat packet in mail, full of seeds with unusual names—Magma Mustard; Flashy Lightning Lettuce; Lemon Pastel Calendula; Cherry Vanilla Quinoa—and an even more unusual but evocative note stuck on the packets.

This Open Source Seed pledge is intended to ensure your freedom to use the seed contained herein in any way you choose, and to make sure those freedoms are enjoyed by all subsequent users. By opening this packet, you pledge that you will not restrict others’ use of these seeds and their derivatives by patents, licenses, or any other means. You pledge that if you transfer these seeds or their derivatives they will also be accompanied by this pledge.

Welcome to the Open Source Seed Initiative, a group that includes scientists, citizens, plant breeders, farmers, seed companies, and gardeners, and has its origins in both the open source software movement and in the realization among plant breeders and social scientists that continued restrictions on seed may hinder our ability to improve our crops and provide access to genetic resources.

Jack Kloppenburg, Professor, Department of Community and Environmental Sociology, and one of the founders of OSSI, contacted me a couple of years ago, just around the time I joined CC full-time. He was hoping for a CC-type license for the seeds. CC’s focus, however, is restricted to copyright. And, at least for now, copyright is an area that keeps our hands full. However, OSSI’s goals are very much in line with CC’s mission, to free information, to make it flow from those who create it to those who want to use it, with least impedance. And, what better example of information than a seed in which the very blueprint of life is embedded.

Jack’s email signature reads, “Well,” she said, “you have a high tolerance for lunatics, don’t you?” Knowing Jack, that sounds about right. You’ve got to be crazy to be able to change the world.

Yes Jack, let’s talk, heck, let’s not just talk, but let’s actually collaborate and spread the seeds of change.

]]>http://creativecommons.org/weblog/entry/42771/feed0Precocious One Year Old Turning Academic Publishing On Its Headhttp://creativecommons.org/weblog/entry/41983
http://creativecommons.org/weblog/entry/41983#commentsWed, 12 Feb 2014 13:45:35 +0000http://creativecommons.org/?p=41983

“If we can set a goal to sequence the Human Genome for $99, then why shouldn’t we demand the same goal for the publication of research?”

started with that bold challenge. Now, the scrappy startup that dared has done it. One year old today, PeerJ, the peer-reviewed journal, has seen startling growth having published 232 articles under CC-BY 3.0 last year. By the way, per Scimago that number is more than what 90% of any other journal publishes in a year. Then in April 2013 PeerJ started publishing PeerJ PrePrints, the non-peer-reviewed preprint server with 186 PrePrints in 2013, all under CC BY 3.0.

Not everything has been easy. Starting an entire publishing company from scratch has been a learning experience for the entire team. From no brand recognition, no history, no infrastructure etc. to having successfully established themselves in all the places that a publishing company should be in: archiving solutions; DOI issuing services; indexing services; membership of professional bodies; ISSN registrations etc. PeerJ has done very well. Last year PeerJ won the ALPSP Award for Publishing Innovation.

PeerJ decision-making process is fast, very fast. Authors get their first decision back in a median of 24 days. Being small, and non-traditional means they can take risks. They have built interesting functionality and models such as optional open peer review; Their business model is based on individuals purchasing low cost lifetime publication plans, and that has resulted in a lot of their functionality being very individual-centric.

We firmly believe that Open Access publishing is the future of the academic journal publishing system. With the current trends we see in the marketplace (including governmental legislation; institutional mandates; the rapid growth of the major OA publishers; and the increasing education and desire from authors) we believe that Open Access content will easily make up >50% of newly published content in the next 4 or 5 years.

Once all academic content is OA and under an appropriate re-use license we believe that significant new opportunities will emerge for people to use this content; to build on it for new discoveries and products; and to accelerate the scientific discovery process.

Binfield continues:

We regard the CC-BY license as the gold standard for OA Publications. Some other publishers provide authors with “NC” options, or try to write their own OA licenses, but we have a firm belief in the CC BY flavor. If there are many different OA licenses in play then it becomes increasingly difficult for users to determine what rights they have for any given piece of work, and so it is cleaner and simpler if everyone agrees on a single (preferably liberal) license. We were pleased to see the license updated to 4.0 and were quick to adopt it.

In Jan 2014, PeerJ moved to CC BY 4.0 for all articles newly submitted from that point onwards (prior articles remain under CC BY 3.0 of course). Today, on PeerJ’s first birthday, we at CC send PeerJ our best wishes, and look forward to ever more courageous, even outrageous innovations from this precocious one year old.

As of yesterday (January 15, 2014), the Group on Earth Observations approved Creative Commons as now a Participating Organization (PO) at its GEO-X Plenary in Geneva.

GEO was launched in response to calls for action by the 2002 World Summit on Sustainable Development and by the G8 (Group of Eight) leading industrialized countries to exploit the growing potential of Earth observations to support decision making in an increasingly complex and environmentally stressed world. GEO is coordinating efforts to build a Global Earth Observation System of Systems (GEOSS).

GEOSS provides decision-support tools to a wide variety of users via a global and flexible network of content providers. GEOSS lets decision makers access a range of information by linking together existing and planned observing systems around the world and support the development of new systems where gaps exist. GEOSS promotes common technical standards so that data from the thousands of different instruments can be combined into coherent data sets. The GEOPortal offers a single Internet access point for users seeking data, imagery, and analytical software packages relevant to all parts of the globe. For users with limited or no access to the internet, similar information is available via the GEONETCast network of telecommunication satellites.

GEO is a voluntary partnership of governments and international organizations providing a framework to develop new projects and coordinate their strategies and investments. As of 2013, GEO’s Members include 89 Governments and the European Commission. In addition, 67 intergovernmental, international, and regional organizations with a mandate in Earth observation or related issues have been recognized as Participating Organizations (PO).

Dr. Robert Chen, CC’s Science Advisory Board member, was at the Plenary, and he had the following comment, “The GEO Executive Director, Barbara Ryan, pointed out in plenary that there was an extensive discussion in the GEO Executive Committee about making sure that new POs are active contributors to GEO activities. She noted that all of the proposed POs in today’s slate met this criterion.”

Creative Commons has been contributing to the GEO Data Sharing Task Force’s Legal Interoperability Sub-Group and its draft white paper on “Legal Options for the Exchange of Data through the GEOSS Data-CORE (PDF).” (I was a part of the Sub-Group as a Science Fellow, and our Senior Counsel, Sarah Pearson, reviewed the paper). We intend to continue to be active contributors by guiding GEO and its members on the legal aspects of data sharing.