Wikisource:ProofreadPage/Improve index pages

The index namespace contain metadata about scanned books and their proofreading. There is one page for each scanned book, and, for PDF and DjVu, the name of the page is the same as the name of the file.

Contents

A patch that rewrite the index form generator in PHP is already written and will be deployed Wednesday, October 31 with Mediawiki 1.21-wmf3. It add some new features like the capacity to show an help text for each fields and the capacity to add select fields. This patch is already live on labs.

In order to add theses new features, the configuration system of the index namespace will be change. Here is a short description of the new configuration system :

The configuration is a json array of properties. Here is the structure of a property in the array, all the parameters are optional, the default value are set :

{"ID":{//id of the metadata (first parameter of proofreadpage_index_attributes)"type":"string",//the property type (for compatibility reasons the values have not to be of this type). Possibles values: string, number, page"size":1,//only for the type string : number of lines of the input (third parameter of proofreadpage_index_attributes)"values":{"a":"A","b":"B","c":"C","d":"D"},//an array values : label that list the possible values (for compatibility reasons the stored values have not to be one of these)"default":"",//the default value"header":false,//add (true) or not (false) the property to Mediawiki:Proofreadpage_header_template template"label":"ID",//the label in the form (second parameter of proofreadpage_index_attributes)"help":"",//a short help text}}

The goal of this project is to define a set of standard properties types in order to say to Proofread Page "this field in index form is this kind of value". A field in index page can, of course, be related to no ProofreadPage property.

Here is the current state of work of the mapping project:

The Simple Dublin Core set consistes in these 15 elements:

Title

Creator

Subject

Description

Publisher

Contributor

Date

Type

Format

Identifier

Source

Language

Relation

Coverage

Rights

Each Dublin Core element is optional and may be repeated. That second feature should be implemented (if possible), in the system with a new parameter in the configuration that list the possible delimiters between parts.

"delimiter":[]//list of delimiters between two part of values. By example ["; ", " and "] for strings like "J. M. Dent; E. P. Dutton and A. D. Robert"

A new configuration parameter will be added to Proofreadpage_index_data_config in order to provide the mapping to the extension:

"data":"",//proofreadpage's metadata type that the property is equivalent to

Here is a beginning of mapping between Index fields and proofreadpage's metadata types :

If the index field is restricted to only one libary, the field shoud only contain the id part of the ark URL and the NAAN part is set in data_config with the "naan" property (list of all NAAN). If not put in this field all the URL without "ark:/" part.

User:Tpt is working on an OAI-PMH api in order to export Index pages content. This API will publish data in two formats : Simple Dublin Core (format required by OAI-PMH specification) and Qualified Dublin Core with some custom elements for Wikisource-related data (number of page proofread, progress...).