Storing and sharing data in an institutional repository – Hydra@Hull

Hydra@Hull is the institutional repository running at the University of Hull. Hydra is a repository solution based on the Fedora software combined with other open source software components, developed over a number of years by the Hydra Partners. The repository has a flexible content model and hosts Hull’s research outputs alongside other content. The system offers workflows to enable deposit from other systems, content access control, collection management and navigation features, and can support data citation. The repository has recently been used to hold research data.

Introduction

Hydra@Hull is the institutional repository running at the University of Hull. Hydra is a repository solution based on the Fedora software combined with other open source software components, developed over a number of years by the Hydra Partners. The repository has a flexible content model and hosts Hull’s research outputs alongside other content. The system offers workflows to enable deposit from other systems, content access control, collection management and navigation features, and can support data citation. The repository has recently been used to hold research data.

Background context

Hydra was developed as a reusable framework for multi-purpose, multi-functional, multi-institutional repository-enabled solutions for the management of digital content collections, in a project started in 2008 by the University of Hull, Stanford University, University of Virginia, and DuraSpace. The founding Hydra Partners were driven by a common vision when starting the Hydra project: that a repository should be an enabler for managing digital content collections, not a constraint or simply a silo of content. Their aim was to identify a repository solution that could be applied flexibly to meet the requirements of different use cases and content types. This led to the idea of a single repository with multiple points of interaction – Hydra – and the concept of individual ‘Hydra head’ solutions.

The Hydra Project is informed by two main principles: no single system can provide the full range of repository-based solutions for a given institution’s needs, yet sustainable solutions require a common repository infrastructure; and no single institution can resource the development of a full range of solutions on its own, yet each needs the flexibility to tailor solutions to local demands and workflows.

The University of Hull has been a partner in the development of Hydra, and has deployed the repository to contain a variety of content. A solution to managing research data is being considered at Hull as at other institutions, and Hydra has recently started hosting some of the University’s research data outputs.

The Hydra philosophy and community structure

Hydra is a repository solution: it can be taken as is and used for managing digital content collections, recognising that work is involved in meeting your particular needs.

Hydra is a community: the project is based on the principle that working together enables us to address more use cases than working individually.

Hydra is a technical framework: the components can be applied as required to meet repository needs.

Figure 1 The Hydra Community

What has been developed? How can it be used?

The technical implementation is based on a small set of core principles that describe how content objects should be structured within the repository, and with an understanding that different content types can be managed using different workflows. Following these principles, Hydra could be implemented in a variety of ways. The technical direction taken by the project partners was to build Hydra using existing open source technical components where these offered robust functional solutions, and to supplement these with community-generated tools that brought all the parts together: the project partners are committed to supporting these over time. All Hydra software is open source, available under the Apache 2.0 licence, and all software code contributions are managed in this way.

The main components are:

Fedora: one of the digital repository systems maintained through DuraSpace

Apache Solr: powerful indexing software now being used in a variety of discovery solutions

Blacklight: a next-generation discovery interface, which has its own community around it

Hydra-head plugin: a collection of components that facilitate workflow in managing digital content

Solrizer: a component that indexes Fedora-held content into a Solr index

Figure 2 Hydra technical components

The Hydra-head plugin is a lightweight Ruby on Rails application that works with Fedora. Although Hydra is not a turnkey, out-of-the-box, solution, use of Ruby on Rails enables it to hide many of the complexities of Fedora whilst enabling repository developers and managers to exploit Fedora’s flexibility to meet user needs.

The initial implementation of Hydra was released in early 2011. Development has progressed rapidly since this time: version 6, of what is now called the Hydra gem, is the latest available as of October 2013.

Practical example(s)

Hydra at the University of Hull was launched in September 2011. It contains different content type collections, addressing the needs of both staff (e.g. event recordings and committee papers) and students (e.g. theses and past exam papers). Hydra was considered the appropriate choice for the University because it allows cross-fertilisation between content types, and makes it easier to integrate the one repository solution with the institution’s other existing (or future) systems. Since Hydra offered flexibility to use datasets, it has recently been applied to manage the institution’s research data assets. The Hull Hydra team has worked in particular with one group of researchers in the History department, which has informed the institutional work on datasets through the JISC-funded History DMP project.

Features of Hydra that have been deployed at Hull include:

Hydra allows you to write templates with the flexibility of adapting to different content types. For example, the instance at Hull has different metadata forms and views for journals and datasets, demonstrating how different metadata and features can be displayed according to content type.

Views of the repository can also be adjusted to present login/role-based selections.

Content can be ordered hierarchically or through ‘display sets’, which create on-the-fly collections.

Figure 3 An entry for a research dataset in Hydra@Hull

The dataset display demonstrated in the example above is based on the History DMP project. Guidance on how to cite the dataset is also displayed.

Through integration with the institutional CAS system Hydra at Hull allows granular access control levels (e.g. internal, open, restricted to groups), which can be expanded to use additional criteria within the local directory service where available.

A common workflow can be employed although specialised workflow instances have also been developed. Deposit of content from other systems (e.g. research outputs via CRIS and publications from Open Journal Systems) are in development alongside direct deposit workflows into Hydra. At Hull all workflows have been configured so that content deposited is queued for QA purposes prior to publication in the repository.

Lessons learned

The implementation of Hydra at Hull was started by launching the search interface, moving on a few months later to make available create, update and delete functions.

As an early adopter, Hull had at times some painful learning experiences with early upgrades to subsequent versions of Hydra. This experience led to focused development by the Hydra community, and has resulted in a more streamlined development and upgrade process.

Future plans for Hydra at Hull include the addition of image management, integrating the repository with other library search services for staff and students, and ongoing work on integrating Fedora with the CRIS (Hull is using the Converis research information management system).

“Hull’s experience of being a Hydra partner has been a fruitful one, and we have been able to contribute in a variety of different ways (code, architecture, Web site maintenance, documentation). It has also at times been a painful birth as we have seen the system come to life. The work has been well worth it”

- Richard Green, consultant to Library and Innovation at the University of Hull. [1]

“Working with Hydra has been a steep learning curve at times, not least because the community and technology have been developing fast over the past 18 months, and Ruby on Rails training had been a necessary, though invaluable, starting point. Once up and running, the flexibility of the framework and agility in making changes has enabled rapid progress to be made.”

Other implementations of Hydra in the UK

London School of Economics (LSE) has used Hydra components selectively (ActiveFedora (part of the Hydra-head plugin), Solrizer, Hydra Community input), creating a local user interface solution that suits their current needs. LSE became a Hydra Partner in November 2012.

The University of Oxford is exploring use of Hydra to manage interaction with data objects over DataBank, rather than Fedora, for example for deposit of materials and image viewing.

At Glasgow Caledonian University, Hydra is being adapted to support the Spoken Word service, a repository of recordings from the BBC that are made available to support teaching and learning. The focus has been on delivering the audiovisual material using progressive download or pseudo-streaming, providing easy access to them through embedded media players, and holding all the metadata that can be useful for such materials.

Future developments

Gemification

Use of Ruby on Rails leads to discrete pieces of software being produced, labelled ‘gems’. As Hydra Partners have adopted the software they have created different workflows and functions as part of Hydra. These local adaptations are now being extracted as gems for sharing with others to avoid duplication of effort.

Community development

The Hydra Partners are keen to work with others. The initial three Partners have developed into 20 Partners as of October 2013. Whilst partnership is a commitment to the sustainability of Hydra, engagement can be at any level.

Further information

Hydra support and communication is managed in several ways including the website, a wiki, Skype calls, mailing lists, face to face meetings and an IRC channel. Why not join the user / developer communities? https://wiki.duraspace.org/display/hydra/Connect

Besides the international Hydra community, European Hydra Partners are in the process of developing a European support network.

Useful links

Research Data Registry & Discovery Service

The DCC is working on a pilot project to develop a UK-wide registry, or catalogue, of data held in UK HEIs and established subject-focused data centres alongside the UK Data Archive and a small group of universities with working data repositories.