In the previous blog I discussed Low-hanging MARC fruit in the MARC21 fixed-length data fields 006, 007, and 008. These fields also contain useful data that hangs slightly higher up, but can be reached with a short ladder. The ladder rungs are constructed using the RDF Schema subPropertyOf property. This is an ontological property which takes the RDF class Property as its domain and range; in other words, it links instances of two properties:

P1 rdfs:subPropertyOf P2 – where P1 and P2 are specific properties.

The subPropertyOf property contains the inference rule or entailment that:

If P1 rdfs:subPropertyOf P2, and X P1 Y, then X P2 Y

That is, if property P1 is a sub-property of property P2, then a machine can entail a triple using property P2 with the same subject and object as any triple using property P1.

There is a lot of semantic overlap in the MARC21 fields. For example, field 006 positions 01-17 relate to positions 18-34 in one of the field 008 configurations; they use the same values. 006 is used in cases when an item has multiple characteristics that cannot be coded in field 008. There is no semantic difference between the 006 and 008 data – a multi-component item may be catalogued as a whole using 008 for the main component and 006 for other components, or each component may be catalogued separately with its own 008 field.

We can aggregate this data by declaring sub-property relationships between corresponding 006 and 008 “level 0″ properties and a new common super-property:

Here, three different resources (ex:1, ex:2, ex:3) have target audience data stored in three different MARC21 fixed-length fields. The entailed triples store the data using a common property that encompasses the semantic of the level 0 properties by discarding their differences, which are the material categories. Each entailed triple states “This resource has target audience …”, dropping the distinction of material category which is unnecessary for this metadata attribute.

Using the entailed triples, we only need to process the higher-level property to create, for example, a “Target audience” index for a set of MARC21 records, rather than having to gather the data from the level 0 properties every time.

We can go further. The same value vocabulary for Target audience is used for other categories of material:

So we can declare sub-property relationships between each of these level 0 properties and the higher-level “Target audience” property, and generate the entailed triples.

Note that we could create an intermediary rung on our ladder, say M00BKAud “Target audience (Language material)”, to aggregate data at the material category level, and then declare a sub-property relationship with M00Aud to aggregate to the category-free level. There is no specific use-case for this at the moment. If the need arises, this can be done without affecting the existing sub-property relationships and entailments, because the subPropertyOf property is transitive: P1 rdfs:subPropertyOf P2 and P2 rdfs:subPropertyOf P3 entails P1 rdfs:subPropertyOf P3.

Our ladder “dumbs-up” the level 0 data; each sub-property entailment uses a higher-level property that is broader in semantic than the last. The ladders merge at each stage and are just one rung in length, so what we get is more like a climbing net to get to the higher-hanging fruit.

RDF graph of MARC21 Target audience ontology

Applications can now deal with just one attribute property for Target audience and avoid the messiness at level 0. And there is just one property to align and map to corresponding properties from other bibliographic metadata schemas …

Comment by Karen Coyle

I did just a quick look, but it appears that “target audience” is the same for all data types that use that byte. What is therefore the purpose of defining TA for each resource type rather than having a single TA list that relates to the resource, rather than the resource type?