Metadata is just like spoken sentences

19 February 2018

By Joe Pairman

In the first post in this series, we saw how metadata helps us label, manage, and find all sorts of things from groceries, to books in a library, to digital content. When we start to learn more, however, metadata seems like an arcane field — something that you have to be a computer scientist or a Library Science postgrad to understand. Yet, its essence is very simple — it is just like a series of spoken sentences.

Let’s say you’re going to town, and you ask your partner if they need anything from the shops. “Could you just pick up a bottle of shampoo” they say, and naturally, you say yes. Now you’re staring at a wall of bottles, each unique. What if you get the wrong one? Could it make your partner’s hair too dry? Change the color? MAKE IT FALL OUT? As you stare some more, the clashing colors on the bottles start vibrating.

You look away and get on the phone to your partner. “It’s in the 3rd aisle” — ok, that was the first problem, wrong aisle — “in a big bottle… not the single bottles, the multipacks”. Alright, that properly narrows things down. “It’s green.” There’s only one green shampoo, at least in a multipack. You grab it, pay, and come home victorious. Little do you know: its metadata that saved the day.

Your partner’s guidance was in the form of simple statements about the location, the pack, and the bottle color, that helped to describe and locate the shampoo. We can break each one into three parts (roughly corresponding to the grammatical subject, predicate, and object or complement):

Subject

Predicate

Object or complement

(The shampoo)

is in

the third aisle.

(The shampoo)

(has packaging)

multipack

(The shampoo)

(is)

green.

(The bottle)

(is)

big.

In the field of metadata, structures such as these are also called statements. In the same way as the shampoo description, we typically use a series of related metadata statements to identify and locate a document, a page, or a chunk of content. Each statement can always be broken into these three parts: subject, predicate, and object.

Why can metadata seem so complicated, then? One reason is that the simple three-part structure is not always obvious. For a start, there are many ways to encode metadata. If standards-based semantic technology is being used, the structure should be fairly clear — in fact the basic component of all RDF is the “triple” — the three-part structure we have just seen. However, there are other models for metadata where you would need to look at standards documents or specifications to understand the structure. Also, different terms may be used for the three elements. A traditional and very precise way to describe them is as “entity”, “attribute”, and “attribute value”. Typically, the entity is the piece of content being described, and the attribute and value are the data about that content.

Entity/Subject

Attribute/Predicate

Attribute value/Object

(identifier of a book)

ISBN

978-1-937434-16-8

(identifier of a book)

author

Noz Urbina

(identifier of a DITA topic)

subject

Apache Cassandra

(identifier of a DITA topic)

review date

2018-03-27

Another reason that metadata can be so intimidating is that it is used for many purposes. The National Information Standards Organization’s “Introduction to Metadata” defines three main categories (actually four, but the fourth, “markup languages”, is about the way metadata is applied to content, rather than the purposes it’s used for). Fortunately, our shampoo case provides an example of each category:

Structural metadata. In publishing, perhaps a page or section number. In the shampoo example, knowing that it is in aisle 3 is a kind of structural metadata.

Administrative metadata (technical subdivision). In digital content, a file format or syntax. For the shampoo, the fact that it comes in a multipack.

Descriptive metadata. Any information describing the content itself. For the shampoo, perhaps the fact that it was green.

If you feel that you are falling down a rabbit hole of metadata uses, syntaxes, models, and terminology, it helps to break things down into that simple triple structure: subject, predicate, and object. Remember that without any technical manuals or semiotic theories, people speak metadata all the time — to identify things, find things, and to bring home just the right product from the shops.