Revision as of 07:59, 14 January 2013

Contents

Overview

The German Open Access Statistics (OA-S) project intends to provide a framework and infrastructure that enables OJS users to establish alternative usage statistics for Open Access content and build value added services on top of such statistics.

This specification describes the use cases, requirements and implementation recommendations for the integration of OA-S into OJS.

The basic idea of OA-S can be described as a co-operation of two institutions:

Access data will then be made available through a protected OAI interface and harvested by the OAS *service provider*.

The service provider will clean the raw data and produce aggregate metrics based on the COUNTER standard. At a later stage LocEc and IFABC statistics may also be supported.

The data provider retrieves metrics from the service provider on a daily basis and stores them in OJS.

These metrics can then be used in different ways in OJS:

They can be displayed to the editors, authors and readers.

They can be used in search for ranking.

Editors could produce statistics reports.

We could provide a "most viewed article" feature.

We could implement a feature that displays "other (more viewed) articles of the same author".

We could display "similar (more viewed) articles" on an article's page.

We'll use the terms "data provider" and "service provider" from here on without further explanation. "Data provider" and "OJS user" can be used synonymously. We use the term "end user" to refer to users accessing an OJS site.

Data Extraction and Storage

In the following we explain which data will be extracted and why:

IP address and timestamp are used to recognize double downloads as defined by the COUNTER standard. Such "double clicks" will be counted as a single usage event.

The C class of the IP address will furthermore be used to recognize robots and exclude their usage from the statistics.

The file information (paht, name, document id, url parameters, etc.) are used to uniquely identify the document which has been accessed.

The HTTP status code will used as only successful access may be counted.

The size of the document is used to identify full downloads (e.g. 95% of the file downloaded). Partial or aborted downloads will not be counted as usage event.

The HTTP user agent will be used to identify robots and to remove their usage from the statistics.

The referrer information is used to analyze how users found the service and can be used to improve the service (potential sources: search engines, organizational web portal).

OA-S provides sample code for DSpace log extraction. The same code is also provided via SVN.

Salt Management Interface

Data Transformation

Privacy Protection

We assume that many OJS providers using the OA-S extension will be liable to German privacy laws. While OJS users will have to evaluated their legal situation on a per-case basis and we cannot guarantee in any way that OJS conforms to all legal requirements in individual cases, we provide basic technical infrastructure that may make it easier for OJS users to comply with German privacy law.

The OA-S project commissioned two legal case studies with respect to German privacy law: one describes the legal context of OA-S application at University Stuttgart, the other focuses more generally on OA-S users, especially project members. The first report has been done during an earlier phase of the OA-S project when privacy-enhancing measures, like the use of a SALT for IP hashing, were not yet specified. The second report is more recent. It assumes an enhanced privacy infrastructure, i.e. the use of a SALT to pseudonimize IP addresses. We therefore base our implementation recommendations on the results of the second report.

The report recommends that data providers liable to German privacy law implement the following infrastructure:

All personal data must be pseudonymized immediately (within a few minutes) after being stored. This can be achieved by hashing IP addresses with a secret salt. The salt must have a length of at least 128 bits and must be cryptographically secure. The salt must be renewed about once a month and may not be known to the OA-S service provider. The salt will be distributed through a central agent to all data providers. A single salt can be used for all data providers if they do not share pseudonimized data. Pseudonimized data must be immediately transferred to the service provider and thereafter deleted by the data provider, i.e. every five minutes.

Data providers have to provide the means for end users to deny data collection ("opt-out"). The cited report comes to the conclusion that an active "opt-in" of end users is not necessary if data will be reliably pseudonymized. It recommends an "opt-out" button which, if clicked, could result in a temporary cookie being set in the end user's browser. Whenever such a cookie is present, usage data may not be collected. The report recommends against setting a permanent cookie as this may now or in the future require active "opt-in" on the part of the end user. Alternatively the user's IP address could be blacklisted while using the service, i.e. entered into a table and all data from that IP would then not be stored. The blacklist entry would have to be deleted after the user session expires.

Data providers have to inform end users about their right to opt out of data collection before they start using the service. They also have to inform the end user that opting out of the service will result in a temporary cookie being set in their browsers. This information must be available not only once when the user starts using the service but permanently, e.g. through a link.

Data providers will have to implement further organizational measures (registration of data processing organizations, reporting data usage to end users on demand)

Open questions:

How do we know that data have been successfully deleted?

What's the max time we can wait for the service provider to accept data before we have to delete our data without transferring them?

What happens when the opt-out cookie expires? Should we save the opt-out in the user profile and renew the cookie regularly? Does the user have to be logged in for the cookie to be renewed?

What text should we use for the privacy protection message?

OAI interface

NB: The specification and implementation of the OA-S OAI interface are not part of the project phase OA-S I. We collect unstructured material for use in later project phases.