OBJECTIVES

Scientific communication has traditionally relied upon publications and presentations, with an estimate of millions of publications worldwide per year; the growth rate of PubMed alone is now 1 paper per minute. The results described in these articles are often backed by large amounts of diverse data produced by complex experiments, computer simulations, and observations of physical phenomena. Because of this avalanche of data, it is increasingly hard to validate, reproduce, reuse and leverage scientific data. In addition, although publications, methods and datasets are very related, they are not easily accessible and interlinked. The notable exception is omics research where journals require deposit of sequences in databanks as a condition of publication. Even where data is discoverable and accessible, significant challenges remain in data reuse and sharing, in facilitating the necessary correlation, integration and synthesis of data across levels of theory, techniques and disciplines.

In the 2nd International Workshop on Linked Science (LISC2012) we will discuss and present results of new ways of publishing, sharing and linking scientific data together, and reasoning over such data to discover interesting new links to validate research. The theme of this year’s workshop will focus on research addressing these issues with respect to big data. Big Data is loosely characterized by the size and/or number of individual files, the number of represented variables, a range of physical scales, a range of scientific disciplines, heterogeneous metadata and data formats, in short data that cannot easily be accessed and manipulated from a thumb-drive.

Making entities identifiable and referenceable using URIs augmented by semantic, scientifically relevant annotations greatly facilitates data discovery and access. This Linked Science approach, i.e., publishing, sharing and interlinking scientific resources and data, is of particular importance for scientific research, where sharing is crucial for facilitating reproducibility and collaboration within and across disciplines. This integrated process, however, has not been established yet. Bibliographic contents are still regarded as the main scientific product, and associated data, models and software are either not published at all, or published in separate places, often with no reference to the respective paper.

In the workshop we will discuss whether and how new emerging technologies (Linked Data, and semantic technologies more generally) can realize the vision of Linked Science. In particular, this year, we plan to focus on the theme of Tackling Big Data, soliciting contributions that discuss issues of analyzing, aggregating, and using the vast amount of data that scientists produce today. Both in the United States and in Europe, not only researchers, but also governments begin to realize the urgent need of analyzing and processing this data, with funding agencies and research institutions starting new initiatives. Our workshop will help catalyze the use of semantic technologies and linked-data approaches in solving the big-data challenge.

In the LISC2012 we will discuss and present results of new ways of publishing, sharing, linking, and analyzing such scientific resources motivated by driving scientific requirements, as well as reasoning over the data to discover interesting new links and scientific insights.

LISC2012 is a continuation of the 1st International Workshop on Linked Science 2011 (LISC2011), collocated with the 10th International Semantic Web Conference (ISWC2011) in Bonn. LISC2011 raised significant interest. It was the third largest workshop of ISWC2011 in terms of the number of participants (35 registered). The discussion was lively, and breakout sessions identified a research agenda for Linked Science. The participants asked for the continuation of the Linked Science workshop series, and LISC2012 is an answer to this call.

SUBMISSIONS:

We invite two kinds of submissions:

Research papers. These should not exceed 12 pages in length.

Position papers. Novel ideas, experiments, and application visions from multiple disciplines and viewpoints are a key ingredient of the workshop. We therefore strongly encourage the submission of position papers. Position papers should not exceed 4 pages in length.

PROGRAM

The workshop will take place on Monday, November 12, 2012 as a pre-conference workshop at ISWC 2012 in Boston, MA. There will be 30 minutes (including time for questions) for each full paper, and 15 minutes for each short paper (again, including time for questions). Full papers are marked with a * in the program. The room for the workshop has not been announced yet.

Line C. Pouchard is an Information Scientist in the Scientific Data Group at Oak Ridge National Laboratory, US Department of Energy. With Tomi Kauppinen and Carsten Keßler, she co-chaired the Linked Science 2011 at ISWC in Bonn. Her recent work includes implementing semantic technologies to improve observation data discovery in the Earth and Atmospheric Sciences for the NASA-sponsored ORNL DAAC (Distributed Active Archive Center for Biogeochemical Dynamics). Her long-term research interests have focused on ontologies and the implementation of frameworks for scientific applications of interest to the Departments of Energy and Defense. These interests have been applied to the scientific domains of climate and earth sciences, fusion, medical modeling, and homeland security. She is an active participant to several leading ORNL efforts contributing to other agencies, including the NSF-sponsored DataONE (Integration and Semantics Working-Group) and Remote Data Visualization and Analytics. DataONE (Data Observation Network for Earth) is developing infrastructure, strategies, and practices for decade-long sustainable data management, publication, archive, and curation services for the digital data supporting earth, environmental, and ecology research.

Carsten Keßler is a post-doc researcher at Institute for Geoinformatics (ifgi), University of Muenster, Germany, where he finished his PhD on context-aware semantics-based information retrieval in 2009. In ifgi’s semantic interoperability lab (MUSIL), he currently coordinates the Linked Open Data University of Muenster (LODUM) project and is a member of the LinkedScience.org team. He has co-chaired a number of workshops, including the LISC2011 workshop, and is a guest editor of the Semantic Web Journal special issue on Linked Data for science and education. Besides his activities at the university, Carsten is currently consulting the United Nations Office for the Coordination of Humanitarian Affairs (UN OCHA) in the development of the Humanitarian eXchange Language (HXL).

ORGANIZING COMMITTEE

Paul Groth is an assistant professor in the Knowledge Representation and Reasoning Group at the VU University of Amsterdam. He holds a Ph.D. in Computer Science from the University of Southampton (2007) and has done research at the University of Southern California. His research focuses on mechanisms for enabling multi-institutional systems. This includes research in data provenance, scientific workflow and knowledge sharing with over 50 publications in these areas. Paul is co-chair of the W3C Provenance Working Group developing a standard for provenance interchange. Currently, he is a key contributor to Open Phacts (www.openphacts.org), a project to develop a provenance-enabled platform for pharmacological information. You can find him on twitter: @pgroth

Natasha Noy is a Senior Research Scientist at Stanford Medical Informatics. She is a principal member of the Protégé group, where she works on tools for ontology management, including versioning, mapping, and modularization of ontologies. She is currently involved in the design of the next-generation Protégé system that will support collaborative development of ontologies. Natasha is also affiliated with the National Center for Biomedical Ontologies, where she works on community-based approaches to ontology evaluation, review, and alignment.

Eric G. Stephan works for the U.S. Department of Energy Pacific Northwest National Laboratory in Richland, Washington and has been actively engaged in advancing scientific database, geospatial, metadata, and provenance capabilities to support experimental and computational scientists and production systems.Research interests include data intensive computing, semantic web, and constructing analytical pipelines to harmonize and explore heterogeneous data/knowledge resources.

Jun Zhao is an EPSRC Postdoctoral Fellow at the Life Science Interface in the University of Oxford. Her current research interests are provenance, trust of data, Semantic Web applications for integrating biological data resources, and provenance-based information quality assessment. She has been the provenance lead in the UK data.gov.uk project and the EU Wf4Ever project. She has been leading organizer and invited speaker of many national and international workshops.