January 16, 2009

Is Metadata?

In Attic Greek, program, προγραμμα, meant, “a public notice in writing.” In this post I speculate about why the creator of a "program" needs to sit up and take notice of what might happen to his "writing" of data, if that data can flow away from what he's written in stone.

Walter J. Ong (Orality and Literacy, 1982) wrote that oral societies, where nothing can be looked up because it never gets written down, think differently
from literate societies, where knowledge gets encoded in a durable form. Ong argued that the literate (literally) can't imagine how an oral society thinks. And vice versa. Culture clash.

Oral societies, Ong wrote, are basically “formulaic”: using content chunks that they've memorized (phrases, proverbs, songs, stories, rhythms), they stitch together and transfer knowledge to whoever's within earshot. Hence, their thinking is formulaic and conservatively tied to the here and now. Why the here and now? Because without recording, sound only exists while it's happening. The rest is memory.

"Chirographic" or “typographic”
(i.e. literate) societies get set free. Once reality is encoded where you can
look it up later, you can devote brain power and social power to
creating, not just to remembering, and whole new worlds become possible. (See Alex Wright's book and website, Glut: Mastering Information Through the Ages , which step-stones you through.)

The oral status-quo sees a shift from an oral attitude to a written one as (at first) ridiculous, and (then) dangerous. In revenge, the "writers" claim that the "speakers" are primitive, and threaten chaos.

SO WHAT:

Reliance on top-down literacy leads to chaos, too. Fortunately, natural language can save us from becoming prisoners of our own device.

Once writing takes over, users get into the habit of treating the writing as the reality. Take maps, for example. Early maps were guides to navigators, so they wouldn't bump unhappily into the world. Today, planners and bulldozers write the maps first, and then program the territory to fit the maps. One sad outcome is what realtor Jim Duncan's blog points to: speculation that "programmed" suburban communities may be the looming slums of tomorrow. So you get chaos this way, too.

How does this happen?

Assume a statement (output) made by any “pro-gram” has
a vocabulary and a syntax. It uses
an axis of selection (vocabulary) and an axis of combination (syntax) to weave together an
“object” for a user, out of elemental “attributes”. This is why we call the programming tool a "language." They all do this. But the history of
natural language, especially vocabulary--fear this (not!)--is that usage constantly creates new versions of itself.

A definition, that is, of a word, or attribute, or data element, is an emergent property of its use: mathematically, the sum of its meanings-as-applied-in-context. That means that the dictionary, or the metadata repository, is a backward-looking guide to what was probably intended, or meant, in the past, by the "sender" or "programmer" of the data.

Focused application programmers, bound by business process and funded by short budget cycles, hope and expect that these intentions will be "heard and retold" as intended when the data (representations of facts about things) were created. So you can see that they act like illiterates in spite of themselves. If their data ever escapes their application, they will find it either reappearing, as if chaotically, in landscapes where they never expected to see it, or dying, like dinosaurs, because it only survived (which, for a language, means "meant [x] among a community of users") under a very, very narrow set of conditions.

Socrates, influenced strongly by language, believed that our programmed world is made up of context-compromised,
iterative bug-fixes. Everything we do or make, in his terms, is by definition an attempt in this flawed world, to meet requirements that were so pure they had no context at all.

He worried about writing, because it prevented you from understanding what it "purely" meant: you can’t ask an absent writer to define his terms. You're stuck asking the text, which just keeps repeating what it says. So print (programming) is inherently untrustworthy.

And that's a bug that application architects use metadata to fix: to clarify, "This is what the data means." Metadata is, if...the user can understand the intent of a data element, and his use conforms exactly to the intent.

But...is "metadata" meaningful when the info flows toward a new use, regardless of the original intent?

Hmmm...Ong's unpublished, handwritten notes, recorded by an information specialist, give a hint. He never stopped wondering how literacy would evolve, beyond "secondary orality," in the technologizing of the word. His jottings suggest that in a digital, web-connected world, it's inevitable that you don't know who will use your data--and that's a good thing, even (as he muses at the link above) leading to unanticipated intimacies.

Enter Terry Jones. The key to survival and adaptation, in intimate engagement with one's own fitness landscape, is to continuously re-program the meanings of data attributes into the database itself, in a way that enables new meanings to emerge from how the attributes are used. In this way, attributes can allow objects to adapt--the data can flow to fit an unimagined landscape--even a landscape of its own creation.

Planners and controllers may hate this. But they don't have to.

In a fluid database, I see users owning and writing their data--and the fittest data for the landscape will survive. In this world, metadata isn't, because you don't know who's out there, or what your data will mean to these mystery users tomorrow. Judging from Terry's descriptions, he wants to enable users to re-read
and re-write data by recombining it to produce new meanings. A "fluid info" architecture could use data
without requiring any definition of "intent" except the use itself. When this happens, the metadata ("authorized intent") becomes a historical artifact--and therefore, like all other metadata, part of the database--not something outside it.

That's the dizzying contradiction inherent in information--because it is a re-presentation of reality for practical purposes, reality is always feeding back practical updates (for him who has ears to hear) to the re-presenters. They have to adjust their code if they want (it) to survive. So information is born to be communicated AND to be miscommunicated--that is, to result in a meaning you can't have intended, because your instruments are by definition inadequate to process the whole of reality.

The bioworld hums thanks to misinterpretation: when genes mutate, proteins output different structures, and creatures adapt to new landscapes. Or RNA messes with the message en route. By direct analogy, languages change as people interact with and discover new landscapes . If it weren't for misunderstanding, you wouldn't be breathing to read this. And writing (especially programs) is easy to misunderstand—ergo, bugs and bug-fixes that pay the programmer's rent.

So programmers who create top-down metadata are creating something--gasp--by definition, ha-ha, dinosaurish: some of its attributes will last, and many, many will become atomized and get re-incarnated as other meaningful data.

NOW WHAT:

Think, amazingly, of how long it took writing to actually catch on and spread, and you may be able to appreciate the transformation that seems to be what Terry Jones’s database concept anticipates…a new state of human literacy that acts more like a natural language than any DBMS to date.

In my next post I'll take a look at another classic of literary communication theory, Roman Jakobson's "Closing Statement," and ask again, "Is Metadata?"