Europeana Newspapers is making historic newspaper pages searchable

A three-year project, running until March 2015; continued in 2017 under DSI-2;

Aggregating 18 million historic newspaper pages for Europeana and The European Library;

Converting 10 million newspaper pages to full text. This will help users quickly search for specific articles, people and locations mentioned within the newspaper;

Creating a special content viewer to improve online newspaper browsing.Try the prototype

Building tools that will allow professionals to better assess the quality of newspaper digitisation in relation to level of detail, speed and costs.

For the most recent news from Europeana Newspapers, please see our blog.

Europeana Newspapers will…

1. Make Digital Newspapers Easier To Search

A newspaper separated into articles through techniques such as Optical Layout Recognition (OLR).

When newspapers are digitised, the resulting electronic version is often simply an image of the newspaper. It is not always possible to effectively search for images, articles or individual terms within the text.

Europeana Newspapers aims to change that. It will create full-text versions of about 10 million newspaper pages. It will also detect and tag millions of single articles with related metadata and named entities (information identifying people, locations etc.). This will dramatically improve the experience of users, compared to earlier digital newspaper projects.

2. Put Digital Newspapers Within Everyone’s Reach

Brainstorming what a content browser might look like for digital newspapers.

Many of the newspaper pages assembled by Europeana Newspapers will be dedicated to the public domain. All titles will be freely searchable through The European Library (which is also creating a special content browser for the project’s newspaper content) and Europeana.

3. Create Tools That Help Experts To Assess Quality

Since the process for converting paper newspapers to digital versions is not 100% accurate, the quality of digitised newspapers must be continually assessed.

A framework for Performance Analysis of Layout Analysis and OCR methods

The Europeana Newspapers project will help by developing an evaluation and quality-assessment infrastructure for newspaper digitisation. It will establish accepted baselines for accuracy in relation to the level of detail, speed of digitisation and costs. This will in turn help experts to assess different methods of newspaper digitisation and pick the one that gives the best result.

4. Assemble An Overview Of Newspaper Digitisation In Europe

There can be no denying the extent of newspaper digitisation undertaken in Europe

Our 2012 survey (PDF) aimed to identify and analyse all newspaper collections digitised by national, research and public libraries in Europe. It revealed the problem of making 20th century content available, and the fact that many libraries do not use any form of Optical Character Recognition when they scan their newspaper content.

The survey is being reconducted in 2013 to give an even more complete picture.

5. Create Best-Practice Recommendations For Newspaper Metadata

ENMAP – Europeana Newspapers METS/ALTO Profile

We are working to design and release a comprehensive metadata model based on de-facto standards such as METS and ALTO.

Partners will share the model with stakeholders in order to find a common agreement and to make it a best-practice example for newspaper digitization in Europe.

6. Raise Awareness Through Workshops and Information Days

Anyone interested in the digitisation of newspaper content can learn more through our workshops and information days. Topics covered include the technical challenges of the project, content and policy related issues addressed by the project.