SCAMP (Semantic Centralised Aggregated Metadata Platform)

by jl2 on July 25th, 2009

Getting rich and usable data from a repository is a problem, the current solution is OAI but this often suffers from the internal data poorly aligning with the Dublin Core XML standard. JISC regularly run ‘Rapid Innovation’ projects which are short term light weight development projects, often focusing around repository data. The issue has always been in how much ‘innovation’ can actually be done in the short six month window due to the start up cost of having to write code to extract useful data from a series of distributed repositories.

A Proposed Solution

We propose a framework that aggregates metadata from all UK EPrints repositories, and provides a programming API in order to access the aggregated set of data in a simple and clean way. We propose to extend the technologies developed under the RichTags project to clean up and perform classification of the aggregated metadata, such as inferring the topic of the paper by the journal or conference to which it is submitted, and make this additional metadata available through the API. Rapid Innovation developers can then concentrate their efforts on how they use the metadata and not how to retrieve it.

The framework aggregates data from repositories and provides an API, an interface for programmers, to develop services that can simply tap into an integrated repository data source/service.

The framework presents a “push” based architecture whereby a repository plug-in (using live XML-RPC updates upon submission) will notify subscribing services that a new submission has been made. At this point, the aggregator performs classification of the metadata, and pushes the update to subscribed applications immediately, so that all applications use live metadata.