About metadata

This page provides information about what metadata we require and how we disseminate it. Metadata are freely accessible and are distributed in the public domain (under CC0). However, we reserve the right to be informed about commercial usage of metadata from LINDAT/CLARIN repository including a description of your use case at Help Desk.

Metadata formats

During the submission process, users fill out metadata fields which are stored as a part of the record. We are able to disseminate the submission metadata in various formats including but not limited to CMDI and oai_dc. See the full list of supported formats but note that some of the formats might not be applicable to all items. The various formats help us promote the submitted content in number of aggregators (and/or search engines).

CMDI

See the CLARIN introduction to component metadata in order to get more information about this topic.

Our current submissions are adhering to the clarin.eu:cr1:p_1403526079380 profile/schema. Portion of older submissions (basically those submitted before Sep/Oct 2014) is using different profile clarin.eu:cr1:p_1349361150622. We decided to create the new profile to better reflect the submission process the user goes through. The former one was a combination of OLAC and MetaShare components, which forced us to handle duplicities in various places. It also bounded us to someone else's metadata schema and it's semantics, which we could neither influence nor change.

Both profiles are fairly covered with links to a concept registry. The links going to now retired ISOcat DCR were redirected to CCR and the OLAC component's concept links are to the DCMI terms concepts (eg. the concept link for abstract is http://purl.org/dc/terms/abstract). VLO does not use these specific concepts in it's mappings and rather maps paths inside one particular profile. For only some particular facets this xpath was extended in order for the mapping to work also with the component derived from this particular profile. Another reason for creating our own profile was the fact that DC concepts were still to broad.

However, we are supporting submissions with arbitrary CMDI metadata files that are used in OAI-PMH when the CMDI metadata profile is requested.

Various points in the above paragraphs should discourage you from reusing clarin.eu:cr1:p_1349361150622. For reusing it's specific components, keep in mind what was said above about the VLO mapping. clarin.eu:cr1:p_1403526079380 was created with VLO mapping in mind (though this can change), but still reflects our view of the world and our use cases. If you don't gather much more information than is described below, you might find this profile suitable for your needs or as a base for your own one.

oai_dc

oai_dc is the format required by OAI-PMH. See the mapping section in order to understand how we map our submission to this format.

Submitted metadata

Following list enumerates the fields we ask in the submission workflow (the list is subject to sporadic changes). The metadata are submitted in English. There are subtle differences depending on the type of the resource being submitted. Not all the fields are present in all the formats. There are fields that are automatically generated (eg. human readable language names acompanying the iso codes, identifiers, other dates).

The date when the submission data were issued if any e.g., 2014-01-21 or at least the year.

required

Author

Names of authors of the item. In case of collections (eg. corpora or other large database of text) you usually want to provide the name of people involved in compiling the collection, not the authors of individual pieces. A person name is stored as surname comma any other name (eg. "Smith, John Jr.").

requiredrepeatable

Publisher

Name of the organization/entity which published any previous instance of the item, or your home institution.

requiredrepeatable

Contact person

Person to contact in case of issues with the submission. Someone able to provide information about the resource, eg. one of the authors, or the submitter. Stored as structured string containing given name, surname, email and home organization.

requiredrepeatable

Funding

Sponsors and funding that supported work described by the submission. Stored as structured string containing project name, project code, the funding organization, the type of funds (own/national/eu) and OpenAIRE identifier (which is also stored in dc.relation)

repeatable

Description

Textual description of the submission.

required

Language

The language(s) of the main contenten of the item. Stored as ISO 639-3 code. Required for corpora, lexical conceptual resources and language descriptions.

repeatabletype-bind required

Subject Keywords

Keywords or phrases related to the subject of the item.

repeatablerequired

Size

Extent of the submitted data, eg. the number of token, or number of files.

repeatable

Media type

Media type of the main content of the item, eg. text or audio. Dropdown selection, required for corpora, language descriptions and lexical conceptual resources.