(thinking-out-loud alert)
So this is a conversation that resurfaces over the years in various
ways. My latest prompt being a combination of (i) seeing
http://www.productontology.org/ which declares OWL DL classes (ie.
classes of thing, aka types...) for commonly named products, using
Wikipedia data. The product ontology site uses OWL to describe classes
of largely mass-produced thing:
"This service provides GoodRelations-compatible OWL DL class
definitions for ca. 300,000 types of product or services that have an
entry in the English Wikipedia, e.g.
http://www.productontology.org/doc/Applehttp://www.productontology.org/doc/Laser_printerhttp://www.productontology.org/doc/Manure_spreaderhttp://www.productontology.org/doc/Racing_bicyclehttp://www.productontology.org/doc/Soldering_ironhttp://www.productontology.org/doc/Sweet_potato
Back at DC-2008 in Berlin someone (maybe Karen Coyle or Diane Hillman)
mentioned that a difference between libraries and museums is that the
works collected by the former are mass produced.
I think we can go some way towards webbifying FRBR by pondering that
observation. I spent monday and tuesday with VU.nl colleagues visiting
the Amsterdam Museum and then the Fab Lab at http://fablab.waag.org/
which showed some possibilities for taking museum artifacts and
replicating lossy copies of them (with 3d printers and other
mechanical reproduction techniques). We could even fabricate moulds
derrived from artifacts that allow others to create new derrived
instances (or their own moulds). Each generation derriving
characteristics from the previous, and adding in its own flaws and
innovations.
Looking at the Product Ontology examples above, they work better at
describing mechanically reproduced, near-identical artifacts -
Laser_printer, Soldering_Iron than with the natural kinds of thing -
apple, potato etc. Both apple and sweet potato are halfway to being
mass nouns --- you might often have need to describe 'some' apple or
sweet-potato, rather than 'a' sweet potato, although of course you can
have a specific apple or potato in-hand. Mass production brings with
it the prospect of thousands of *near*-identical instances of some
type, as well as associating those with codes and lately URLs that
link us back to information about the recipe or ingredients list for
those types of thing. For complex modern mass produced items, if you
know what kind of item it is, you know a huge amount about that thing
- whether it is a book or a printer or a soldering iron.
If we forget the library and cultural heritage scene for now, and
think just about these product types: I have here in my room a
specific laser printer. It is an HP Laser Smart C4270. Let's say it
was bought in Leiden, Netherlands and has an owner (this household).
It has specific characteristics local to this copy, as well as
stereotypical characteristics that it shares with all other "HP Laser
Smart C4270s".
FRBR isn't designed to describe that kind of situation (although the
parallels should be clear). But RDF and OWL do try to address that
general case: RDF/RDFS/OWL is very much in the business of drawing
such class-instance distinctions. OWL also goes some basic way towards
providing information-machinery for stating generalisations about all
the members of some class of thing. However OWL itself avoids certain
complex topics that are relatively hard to avoid for us: it does not
directly give us a way of saying '"typically". It does not give us a
way of distinguishing intrinsic versus accidental properties. The
latter saved W3C from retreading thousands of years of philosophical
debate. The former is perhaps a medium-sized nuisance. Regardless:
We can think of the class of things in the world that are *printers*.
We can name that class with a URI and publish a description there.
We can think of the class of things in the world that are *laser
printers*. We can name that class with a URI and publish a description
there.
We can think of the class of things in the world that are *HP laser
printers*. We can name that class with a URI and publish a description
there.
We can think of the class of things in the world that are *HP Laser
Smart C4270 printers*. We can name that class with a URI and publish a
description there.
We can associate any thing in the world with one of more of these
classes; in RDF by asserting an rdf:type relationship to the class. We
can use properties associated with the class to describe the
individual thing 'by hand', or we can draw factual conclusions about
properties of some individual from general knowledge that makes claims
about all members of a class.
We can go deeper, towards query-like classes, and name the sub-class
of HPLaserSmartC4270-Printer that corresponds to such printers bought
in Leiden; or owned by me. Or that have a damaged scanner lid and
which still serve adequately as a printer. Or which belong to the
subclass manufactured in the UK and that shipped with a UK-compatible
power cable.
OWL doesn't impose any appropriate level of detail on us, it just
provides descriptive primitives that let us talk in terms of [broadly]
sets of things, the properties that characterise those sets, and the
subset / superset relations between those sets. (We say class instead
of set, and leave that distinction aside for now.)
Computerised ontology languages like OWL are obsessed with this
class-vs-instance distinction, and in modern mass produced life, the
distinction is all around us, as are near-identical, mechanically
reproduced copies of products - regardless of whether the product was
designed to inform, educate, entertain, or remove unsightly nasal
hair.
Our FRBR-inspired conversations here are outshadowed by the need to
make equivalent distinctions in other aspects of everyday life. From
tracking down a replacement cable or scanner lid for my printer, to
finding the nearest open shop that will sell me a certain kind of
soldering iron on a sunday, or a certain DVD of a certain film, the
desire to organize information in a way that mirrors the patterns of
similarity amongst mass produced items is a modern universal.
>From http://www.marxists.org/reference/subject/philosophy/works/ge/benjamin.htmhttp://en.wikipedia.org/wiki/The_Work_of_Art_in_the_Age_of_Mechanical_Reproduction
and unfairly out of context,
"In principle a work of art has always been reproducible. Man-made
artifacts could always be imitated by men. Replicas were made by
pupils in practice of their craft, by masters for diffusing their
works, and, finally, by third parties in the pursuit of gain.
Mechanical reproduction of a work of art, however, represents
something new." [...] "With the woodcut graphic art became
mechanically reproducible for the first time, long before script
became reproducible by print. The enormous changes which printing, the
mechanical reproduction of writing, has brought about in literature
are a familiar story. However, within the phenomenon which we are here
examining from the perspective of world history, print is merely a
special, though particularly important, case."
All I'm suggesting here is that we follow this advice from Walter
Benjamin in 1936 and indulge ourselves in the idea that modeling
bibliographic mass production is merely a special (and important)
case.
FRBR's "items" are the most concrete, tangible entities in the FRBR
universe. In the physical realm they are things you might hold in your
hand, put in a box, find at some location. The idea extended to the
digital realm is naturally more ephemeral but we do at least have
correspondingly objective characterstics that ground digital objects
in clear ways: notions such as sizeInBytes, cryptographic hashes
(sha1sum, md5) can be used to talk precisely about specific sequences
of 'Zeros' and 'Ones'.
Looking up the FRBR hierarchy at the more general notions of
"Manifestation", "Expression" and "Work", these are FRBR's particular
story for organizing our millions of items into sensible groups.
FRBR's "work" notion is described textually as a “distinct
intellectual or artistic creation.”... a kind of ghostly but specific
entity, a kind of social fiction that acts as a descriptive (and
sometimes legal) hub for organizing clusters of related items.
"Expression" brings that somewhat down to earth (“the specific
intellectual or artistic form that a work takes each time it is
‘realized.’”), while "Manifestion" finally articulates it in terms
sets/classes rather than individual abstract entities: " “the physical
embodiment of an expression of a work. As an entity, manifestation
represents all the physical objects that bear the same
characteristics, in respect to both intellectual content and physical
form.”".
So the distinctions made in terms of these *4* notions are similar to
those baked into the core of RDF itself.... specific fairly concrete
things organized into groups (sets, classes). RDF only allows itself
'rdf:type' and 'rdfs:subClassOf' relationships as a basis to describe
all this.
So if we go with this idea that "print is merely a special, though
particularly important, case" of mass produced work, and that is it
worth investigating RDF descriptive habits that address
characteristics of mass production regardless of whether we are
talking about bicycles, books, laser printers or farmyard equipment,
... where does this leave us? where does it get us?
1. We bring more clearly into scope some industrialised areas of
cultural 'content' -- music, tv, films; http://musicontology.com/http://www.bbc.co.uk/ontologies/programmes/2009-09-07.shtml ... areas
where FRBR is a close but not perfect fit, and class-based models
drift towards being 'FRBR-inspired' rather than 'FRBR-based'.
2. We find OWL lacks certain conventions for distinguishing
stereotypical instances from flawed/accidental characteristics of
actual instances. For eg. a copy of a some book I have on my desk
might be missing a certain page, so its literal 'number of pages'
property couldn't be inferred from a common class shared with other
such manifestations of the same abstraction. Or the local adjustments
made here to my printer (I swapped the power cable, or repaired the
lid). There is a big literature in KR about defaults and overrides and
it's tricky to get right with open-world design of RDF/OWL/RDFS.
3. Works, Manifestations and Expressions might all just be kinds of
classes; or annotations on classes. The class of *HP Laser Smart C4270
printers* of which I have one in this room; the class of *SQL and
Relational Theory books* of which I have one on my desk as I type. The
former is described at
http://h10025.www1.hp.com/ewfrf/wc/product?cc=us&lc=en&dlc=en&product=3300222
by its maker; the latter at http://oreilly.com/catalog/9780596523084
... more general classes might be tagged 'work-class'; very precise
classes tagged 'manifestation-class'. But fundamentally we get a huge,
universal spectrum (from the class of 'every Thing', to the class of
'No-thing') rather than forcing each into one of the FRBR 4.
In both these example cases, there are product codes and online
databases, and other people who own different instances of the same
kind of thing. In both cases there are related products (maybe an
ebook, maybe a successor printer design, or ink cartridge) where
information at the level of 'all products' is useful to the owners and
custodians of specific products.
4. OWL 2.0's punning mechanism may be relevant. This is a trick in OWL
2 that lets a single URI serve both as a class identifier (the class
of C4270printers) but also as an identifier at the instance level, eg.
something that might have other data attached like images or links to
product documentation.
5. We would effectively be abandoning the attempt to fit the
bibliographic universe into 4 buckets, and allowing different parties
to name and describe classes at any level of generality, picked out by
the properties of the things in that class. I might care to name a
class for all books written by all former pupils of the school
described at http://en.wikipedia.org/wiki/RGS_High_Wycombe --- this
class would include SQL and Relational Theory, via its author,
http://dbpedia.org/page/Christopher_J._Date .... or you might care to
create a class for products whose primary inventor was an immigrant.
By stepping back from the FRBR 4, we could get a more free-form
environment in which properties of all kinds of thing can be used to
define whatever classes are useful.
6. What does this mean in terms of 'who defines what when' metadata
practice? If the abstract work "SQL and Relational Theory" by C.J.Date
is in some sense now an RDF class, what should the URI be? Who
publishes it and what practice should exist around the associated
online description? I don't know. Maybe authors, publishers and
libraries all have a role, ... maybe there are 3 or more
semi-competing URIs for that class, one from C.J.Date, one from the
publisher O'Reilly or one or more from a library perspective. Perhaps
one of these descriptive agencies ends up playing a hub role and
including links to further description of the class from the other
parties. Maybe practices vary between fields and types of product. I
really don't know. And the core RDF/OWL specs are not the kinds of
thing that will tell us what's best to do, btw.
7. What kinds of thing are properly expressed at the class level? I
also don't know. We might find value in rethinking some properties to
more explicitly attach them to the stereotypical ideal member of some
class, as a way of admitting that not all instances will match the
ideal. Perhaps for eg. the idea that books have 'numPages' could be
defined to refer to the stereotypical ideal case, even while applied
at the instance level. So if I lose 5 pages from the copy of "SQL and
Relational Theory" on my desk, we still say it has 410 numbered pages.
Maybe we go through and think 'which properties does it even make
sense to mutate at the instance level?". For all the damage I could do
to my copy of that book, I'm not going to change its author or
subject, for example. So those would be readily expressed in terms of
OWL. The numPages could be expressed as an OWL generalisation about
all instances if we define that property to be the ideal number,
rather than having to track damaged pages etc. And some properties
such as geographic location or owner make sense only at the instance
level. A few of these (such as e.g. initialOwner) might be static
properties that never change their value; others vary from time to
time.
Ok this post is too long already. Another way of stating all this is
that it's an appeal to think more in terms of specific
somehow-concrete items, things. Artifacts in your hand, or computer
data files that might be checksummed. And that all abstractions above
those are means to an end, rather than ends in themselves. So we can
ask whether, instead of pondering the vague characteristics of ghostly
entities like 'works', 'expressions' and 'manifestations', whether
we're simply talking about the common characteristics of collections
of identifiable *items*. And if that is what we're doing, whether (a)
we can more explicitly share common descriptive practices with other
non-textual mass produced kinds of things (b) whether RDF/OWL might
have some built-in facilities that could be used more (ie. its notion
of class).
This all wouldn't abolish the WEMI distinctions, rather they would as
sketched above, show up as a kind of annotation on RDF classes. Some
classes might be work-ish classes; the class of all Hamlets. Others
might be manifestation-ish classes; the class of all paper-printed
first edition SQL and Relational Theory copies. But the core
organising idea is sets/classes rather than the ghostly upper entities
of FRBR. Aspects of those entities would also show up as concrete
documents; an artists first sketches of a later painting; CJ Date's
book contract with O'Reilly that gave us the later book. First, second
and final drafts; hp printer schematics, blueprints; architectural
drawings; bike designs; ingredient lists and working notes. But rather
than merge our knowledge about all those practical things into the
vaguer composite entities of FRBR we just itemise them and describe
them as plain old artifacts at the instance level - giving us
something like a catalogue of evidence left in the world that shadows
the creative process, rather than reifying the act of creation into
special 'things' that can be described but never touched, used, read
or consumed.
Hope this all makes some sense. Related discussion from Bradley Allen,
Karen and others:
http://bpa.tumblr.com/post/10814190/faceted-classification-and-frbrhttp://www.mail-archive.com/rda-l@listserv.lac-bac.gc.ca/msg03837.htmlhttp://www.mail-archive.com/rda-l@listserv.lac-bac.gc.ca/msg03848.htmlhttp://bibwild.wordpress.com/2007/12/07/frbr-considered-as-set-relationships/http://lists.w3.org/Archives/Public/public-owl-dev/2008JulSep/0110.htmlhttp://lists.w3.org/Archives/Public/public-lld/2010Sep/0049.html
cheers,
Dan
ps. I tried to draw some of this out graphically:
http://www.flickr.com/photos/danbri/2891150205/ ... story of a
t-shirt design as frbr-inspired classes
http://www.flickr.com/photos/danbri/2892286406/in/photostream/ ...same
story as a timeline