Best Practices Design Goals

Must be relevant for government (local, state, federal, international)

Content must be self-maintaining over time

Data published in a W3C RDF serialization (or submitted W3C Standard)

Purpose of Best Practices Recommendation(s)

The following are some motivations for the need for publishing Recommendation(s) and Working Notes, identified in the GLD WG Charter.

The overarching objective is to provide best practices and guidance to create of high quality, re-usable Linked Open Data (LOD).

More specifically, best practices are aimed at assisting government departments/agencies/bureaus, and their contractors, vendors and researchers, to publish high quality, consistent data sets using W3C Standards to increase interoperability.

Best practices are intended to be a methodical approach for the creation, publication and dissemination of governmental Linked Data. Best practices from the GLD WG shall include:

Description of the full life cycle of a Government Linked Data project, starting with identification of suitable data sets, procurement, modeling, vocabulary selection, through publication and ongoing maintenance.

Definition of known, proven steps to create and maintain government data sets using Linked Data principles.

Guidance in explaining the value proposition for LOD to stakeholders, managers and executives.

Assist the Working Group in later stages of the Standards Process, in order to solicit feedback, use cases, etc.

Content

Overview

Linked Data approaches address key requirements of open government by providing a family of international standards for the publication, dissemination and reuse of structured data. Further, Linked Data, unlike previous data formatting and publication approaches, provides a simple mechanism for combining data from multiple sources across the Web.

In an era of reduced local, state and federal budgets, there is strong economic motivation to reduce waste and duplication in data management and integration. Linked Open Data is a viable approach to publishing governmental data to the public, but only if it adheres to some basic principles.

Best Practices for Procurement

Procurement. Specific products and services involved in governments publishing linked data will be defined, suitable for use during government procurement. Just as the Web Content Accessibility Guidelines allow governments to easily specify what they mean when they contract for an accessible Website, these definitions will simplify contracting for data sites and applications.

Best Practices for Vocabulary Selection

The group will provide advice on how governments should select RDF vocabulary terms (URIs), including advice as to when they should mint their own. This advice will take into account issues of stability, security, and long-term maintenance commitment, as well as other factors that may arise during the group's work.

@@TODO: distinguish between vocab discovery and vocab creation and management.

The Resourcing IDentifier Interoperability for Repositories (RIDIR) project (2007-2008) considered in depth the relationship between identifiers and finding versions of objects. See RIDIR Final Report. In their words, RIDIR set out to investigate how the appropriate use of identifiers for digital objects might aid interoperability between repositories and to build a self-contained software demonstrator that would illustrate the findings. A number of related projects are listed at JISC's RIDIR information page.

In addition, at TWC we have adopted an ad hoc approach to denoting versions of published linked data:

The version indicator (e.g. "1st-anniversary") is arbitrary; a date code may be used. We sometimes use NON-ISO 8601 (e.g. "12-Jan-2012" to make it clear this is (in our case) not necessarily machine produced.

Stability

The group will specify how to publish data so that others can rely on it being available in perpetuity, persistently archived if necessary.

Source Data

The group will produce specific advice concerning how to expose legacy data, data which is being maintained in pre-existing (non-linked-data) systems.
Subject: Roadmap for cities to adopt open data

Biplav: use-case

Suppose a city is considering opening up its data. It has certain concerns:

Business and legal level

What are the privacy considerations in publishing data? On one hand, city will like to respect the privacy of citizens and businesses, and on the other, it will like the data to be valuable enough to lead to positive change.

How to pay for the cost of opening up data? Cities may or may not have legal obligations to open up data. Accordingly, they will look for guidance on how to account for the costs. Further, can they levy a license fee if they are not obligated to open data

Which data should be opened and when? Should it be by phases? What data should not be shared?

What policies / laws are needed from the city so that businesses can collaborate on open data, while preserving their IP?

Technical level

What should be architecture to share large-scale public data? How do we ensure performance and security?

What visualization should be supported for different types of data?

Are we following a standard implementation for the reference architecture?

It is recommended that the publishing organization prepare a roadmap to address them for all stakeholders.

Ghislain: Source Data publication check-list

Before publishing your legacy data, be aware of the following elements of this list:

Make sure the format of the data is not proprietary but rather standard formats e.g: csv, shp, kml, xml, DBMS, etc.

Provide access to the data: API, web page so that users can refer to consistently

It is recommended to use the domain of your organization for trust

Provide a small description of the data such as scope, content.

For tables, the names of the columns should be clear and self-descriptive if possible

In spreadsheet, avoid having many sheets in the same book.

Provide the type of license to be used for accessing the data

Check also the frequency of the publication and provide that information

Provide a contact address/email of the role responsible for the data for any further support.

Be privacy aware when publishing data. Although any data that is published can be potentially misused later by an unknown party, the threat is balanced by benefits by privacy groups (including [W3C Privacy - http://www.w3.org/Privacy/]) to provide recommendations. Follow them. Start by not revealing personally identifiable information without masking. Examples of such data are - individual names, national identification number, phone number, credit card number and driver license number.

Linked Data Cookbook

The group will produce a collection of advice on smaller, more specific issues, where known solutions exist to problems collected for the Community Directory. This document is to be published as a Working Group Note, or website, rather than a Recommendation. It may, instead, become part of the Community Directory site. The Cookbook for Open Government Linked Data.

Pragmatic Provenance

Provide best practice recommendations for stakeholders on documenting the provenance of their linked government data and how to interpret that data so that consumers know what they are looking at. (suggested by Hadley Beeman)

NB: This is the BP Wiki page from which subsection pages have been created to facilitate working notes and status. In due course, the wiki subsection pages will be folded into the W3C ReSpec document system by the draft editors.