Knowledge

willp-bl's Blog

This post covers two main topics that are related; characterising web content with Nanite, and my methods for successfully integrating the Tika parsers with Nanite. Introducing Nanite Nanite is a Java project lead by Andy Jackson from the UK Web Archive, formed of two main subprojects: Nanite-Core: an API for Droid Nanite-Hadoop: a MapReduce […]

As previously blogged about by Carl we now have virtually all SCAPE and OPF projects in Continuous Integration; building and unit testing in both Travis CI and Jenkins. Travis compiles the projects and executes unit tests whenever a new commit is pushed to Github, or when a pull request is submitted to the project. Jenkins […]

Introduction For our evaluations within SCAPE it would be useful to have the ability to quantitatively measure the abilities of the Hadoop clusters available to us, to allow results from each cluster to be compared. Fortunately as part of the standard Hadoop distribution there are some examples included that can be run as tests. Intel […]

An important part of image file format migration is quality assurance. Various tools can be used such as ImageMagick or Matchbox, but they only provide one metric or are for different use-cases. I wanted to investigate implementation of image comparison algorithms so began investigating. I created a prototype tool/library for image quality analysis, called Dissimilar. […]

We have been evaluating the use of the latest Fedora Commons, version 3.6.2, as a test repository. Having followed the straightforward installation process we were left with a repository with one preconfigured user – fedoraAdmin. There are two APIs – API-A for access and API-M for management. For our test instance API-A was configured on […]

Part of my work on the SCAPE testbeds involves producing a workflow for the large scale migration of TIFF to JP2 files, with validation. The tests I have run all involve the lossy compression of files. Two tools that could be used for the validation of image payload, and therefore success of a migration, are […]

As part of our work on test-beds for the SCAPE project we have been investigating the various ways in which a large scale file format migration workflow could be implemented. The underlying technologies chosen for the platform are Hadoop and Taverna. One of the aims of the SCAPE project is to allow the automatic generation […]

Several of us at The British Library took part in the CURATEcamp file id hackathon on Friday. We decided that one issue we could make a useful impact on was identification of various ebook formats. eBooks are an important content type for the British Library, especially with the expected implementation of non-print legal deposit legislation […]

Share this page

Latest news

The iPres Working Group invites you to provide feedback on the Future of iPres, the international digital preservation conference. We look forward to hearing from you! As you may know, in September 2018 the iPres Steering Committee approved the convening of the iPres Working Group and at iPres 2019, we will share the outcome of our […]

Upcoming Event

Born-digital material introduces new challenges around trust and authenticity, The ARCHANGEL Project is investigating the use of blockchain to verify that documents stored in digital archives have not been altered or modified. This webinar will introduce blockchain technology, explain the project and give a demo of the software we have developed. Speakers Mark Bell, The […]