Overview

As seen in the diagram we have, for simplicity’s sake, divided the platform in three layers:

The Interface layer, which we have sub-divided in two distinct parts, namely Applications and Machine Interfaces. The applications are used by end-users to access content, while the machine interfaces are used to insert, update and export the platform’s content.

At the heart of the Synergies platform lies the Infrastructure Services layer. It contains the software underlying the Interface layer, and is predominately composed of well-known Open Source software (Apache, Tomcat, SolR, etc.) and platforms, mostly written in JAVA and PHP.

The Storage & Archival layer is also mostly composed of Open Source packages, such as MySQL and Fedora-Commons. These packages manage the short- and long-term identification, preservation and retrieval aspects of the platform.

Supported Publication platforms

Synergies currently supports Open Journal Systems (pkp.sfu.ca/ojs/) and Erudit (www.erudit.org); Open Conference Systems support (pkp.sfu.ca/ocs/) will be available soon. The Synergies project will provide ongoing support and integration with these platforms. However, other platforms can be easily integrated with Synergies. The only requirement is that the platform can provide XML versions of its content that correspond to well-known XML schemas according to the type of document being published, such as NLM for journal articles.

The Synergies Portal

Access to document metadata and full text (via hyperlink to the website of the producer).

Basic and advanced search functionality across collections.

Export of metadata. One can save his search results and export them in various well-known formats, such as HTML, EndNote, EndNote Web and Refworks.

The following features are under development and will be available in the near future:

Access to collection- and item-level statistics. Journal managers will be able to log-in the Synergies portal and acces common usage statistics (for example: number of hits, geographically sorted, for a specific journal and/or article abstract). Specific metrics are currently being determined.

Cross-referencing of documents. Using permanent URLs and OpenURL, users will be able to follow bibliographical references (forward referencing) and discover which documents are citing the currently viewed document (backward referencing).

Machine interfaces

Synergies will make its metadata available via various web services. It will also be possible to harvest the platform via OAI-PMH harvesting. More complex services such as bibliographical references resolution and text mining are also planned.

Infrastructure services

Document harvesting

Where to start

Journal managers provide required information directly to the head node by e-mail or, preferably, to the regional node as described in the section “Joining Synergies”, above. A simple registration form will be available in the future to ease the registration process.

In general

The Synergies platform’s pool of documents is updated during a daily harvesting process. Documents are fetched from content providers using the OAI-PMH protocol (http://www.openarchives.org/). Content providers can include journals; institutions; and eventually monograph and conference publishing platforms. Documents must conform to a specific schema according to their type: for example, NLM XML for journal articles and ETDMS for electronic theses.

In order to be harvested, content providers must provide Synergies with the following:

The URL of the OAI server;

The value of the SetSpec parameter denoting the set of document to harvest, as a single provider can host multiple sets;

The value of the MetadataPrefix parameter, which indicates the type of document to harvest.

For scientific journals

Scientific journals must provide Synergies with some additional information before harvesting takes place:

The journal name or title.

If applicable, an alternate title. A journal having an English name as well as a French name might want to provide an alternate title to make sure both will appear within the platform, and so that both are referenced.

Journals published using OJS

Theses

Theses must strictly conform to the ETDMS schema, and must be made available via an OAI provider. OAI server information, including URL, SetSpec and MetadataPrefix, must be provided to the Synergies team.

Dspace is a widely used repository for theses; setting up ETDMS harvesting is possible if one installs an ETDMS cross-walk.

Please contact us using the information provided in the “Joining Synergies” to provide us your harvesting information.

Authentication

At present time, there is no advanced authentication procedure available for negotiating the harvesting process. The content provider’s OAI server must be able to validate the IP address of the caller to make sure the requests are legitimate. The Synergies team will provide content providers with a caller IP address for this purpose. BASIC authentication under HTTPS and X.509 certificates are envisaged but not implemented yet.

Permanent URL

When published using OJS or Erudit, items appearing in the Synergies platform can be granted a permanent ID, which in turn yields a permanent URL. That URL, once submitted to a resolver, redirects the user to a target URL where the document can actually be found. Content can then change location without modifying the permanent URL, hence permanent referencing of documents.

Indexing

At the end of a harvesting cycle, document full-text (when available) and metadata are indexed. Subsequently, this information is exploited by the search engine. Full-text and metadata are stored in appropriate repositories for further use (export, text-mining, etc.).

High availability

Documents are always available on the content provider’s publishing platform, that is to say the website or the CRM used by the publisher. Under certain conditions, Synergies offers a deposit service: Synergies will copy the publisher's web pages in a specialized LOCKSS (http://www.lockss.org) archive network. A document added to this archive will be copied to seven geographically distinct archives distributed across Canada. This network will provide access to documents in case of unavailability of the content provider's web site, while rendering virtually impossible the permanent loss of a document.

Long-term preservation

Under certain conditions, documents may be stored in a digital archive designed for long term preservation. One aspect of long-term preservation acknowledges the eventual obsolescence of formats and applications over time: a goal of the Synergies digital archive is to convert documents as technology evolves.

Statistics

The Synergies platform will aggregate document usage statistics from all delivery and discovery points (eg. from content providers; from the platform itself; from third-party aggregators, if applicable), collate them, and return them to the originating content provider on demand. The SUSHI protocol (http://www.niso.org/workrooms/sushi) is used to harvest statistics from various delivery and discovery points. Harvested reports must comply with a pre-defined XML schema, which are yet to be determined. Both OJS and Erudit will eventually support this schema out-of-the-box. Initial statistical offerings will include abstract and full-text view counts by month and year; future offerings may include IP/geographic location of visitors; citation count information for authors; and more.