But now when you’re designing a page you can tell CSS to use the system font of whatever operating system the browser is running on. This is thanks to Craig Hockenberry who proposed the idea in an article three years ago. Apple picked up on it, and now it’s worked it’s way into the standard CSS font module and is supported by Chrome and Safari; Windows and Mozilla are lagging. Here’s Craig’s write-up of the process.

Here’s a quick test of whether it’s working in the browser you’re reading this post with:

This sentence should be in this blog’s designated font: Georgia. Or maybe one of its serif-y fall-backs.

This one should be in your operating system’s standard font, at least if you’re using Chrome or Safari at the moment.

In 1962, Claude Levi-Strauss brought the concept of bricolage into the anthropological and philosophical lexicons. It has to do with thinking with one’s hands, putting together new things by repurposing old things. It has since been applied to the Internet (including, apparently, by me, thanks to a tip from Rageboy). The term “bricolage” uncovers something important about the Net, but it also covers up something fundamental about the Net that has been growing even more important.

In The Savage Mind (relevant excerpt), CLS argued against the prevailing view that “primitive” peoples were unable to form abstract concepts. After showing that they often in have extensive sets of concepts for flora and fauna, he maintains that these concepts go beyond what they pragmatically need to know:

…animals and plants are not known as a result of their usefulness; they are deemed to be useful or interesting because they are first of all known.

It may be objected that science of this kind can scarcely be of much practical effect. The answer to this is that its main purpose is not a practical one. It meets intellectual requirements rather than or instead of satisfying needs.

It meets, in short, a “demand for order.”

CLS wants us to see the mythopoeic world as being as rich, complex, and detailed as the modern scientific world, while still drawing the relevant distinctions. He uses bricolage as a bridge for our understanding. A bricoleur scavenges the environment for items that can be reused, getting their heft, trying them out, fitting them together and then giving them a twist. The mythopoeic mind engages in this bricolage rather than in the scientific or engineering enterprise of letting a desired project assemble the “raw materials.” A bricoleur has what s/he has and shapes projects around that. And what the bricoleur has generally has been fashioned for some other purpose.

Bricolage is a very useful concept for understanding the Internet’s mashup culture, its culture of re-use. It expresses the way in which one thing inspires another, and the power of re-contextualization. It evokes the sense of invention and play that is dominant on so much of the Net. While the Engineer is King (and, all too rarely, Queen) of this age, the bricoleurs have kept the Net weird, and bless them for it.

But there are at least two ways in which this metaphor is inapt.

First, traditional bricoleurs don’t have search engines that let them in a single glance look across the universe for what they need. Search engines let materials assemble around projects, rather than projects be shaped by the available materials. (Yes, this distinction is too strong. Yes, it’s more complicated than that. Still, there’s some truth to it.)

Second, we have been moving with some consistency toward a Net that at its topmost layers replicates the interoperability of its lower layers. Those low levels specify the rules — protocols — by which networks can join together to move data packets to their destinations. Those packets are designed so they can be correctly interpreted as data by any recipient applications. As you move up the stack, you start to lose this interoperability: Microsoft Word can’t make sense of the data output by Pages, and a graphics program may not be able to make sense of the layer information output by Photoshop.

But, over time, we’re getting better at this:

Applications add import and export services as the market requires. More consequentially, more and richer standards for interoperability continue to emerge, as they have from the very beginning: FTP, HTML, XML, Dublin Core, Schema.org, the many Semantic Web vocabularies, ontologies, and schema, etc.

More important, we are now taking steps to make sure that what we create is available for re-use in ways we have not imagined. We do this by working within standards and protocols. We do it by putting our work into the sphere of reusable items, whether that’s by applying the Creative Commons license, putting our work into a public archive, , or even just paying attention to what will make our work more findable.

This is very different from the bricoleur’s world in which objects are designed for one use, and it takes the ingenuity of the bricoleur to find a new use for it.

This movement continues the initial work of the Internet. From the beginning the Net has been predicated on providing an environment with the fewest possible assumptions about how it will be used. The Net was designed to move anyone’s information no matter what it’s about, what it’s for, where it’s going, or who owns it. The higher levels of the stack are increasingly realizing that vision. The Net is thus more than ever becoming a universe of objects explicitly designed for reuse in unexpected ways. (An important corrective to this sunny point of view: Christian Sandvig’s brilliant description of how the Net has incrementally become designed for delivering video above all else.)

Insofar as we are explicitly creating works designed for unexpected reuse, the bricolage metaphor is flawed, as all metaphors are. It usefully highlights the “found” nature of so much of Internet culture. It puts into the shadows, however, the truly transformative movement we are now living through in which we are explicitly designing objects for uses that we cannot anticipate.

Because I don’t actually enjoy watching football, during the Pats vs. Broncs game on Sunday I transposed the Oscar™ nominations into a JSON file. I did this very badly, but I did it. If you look at it, you’ll see just how badly I misunderstand JSON on some really basic levels.

Why JSON™? Because it’s an easy format for inputting data into JavaScript™ or many other languages. It’s also human-readable, if you have a good brain for indents. (This is very different from having many indents in your brain, which is one reason I don’t particularly like to watch football™, even with the helmets and rules, etc.)

Anyway, JSON puts data into key:value™ pairs, and lets you nest them into sets. So, you might have a keyword such as “category” that would have values such as “Best Picture™” and “Supporting Actress™.” Within a category you might have a set of keywords such as “film_title” and “person” with the appropriate keywords.

JSON is such a popular way of packaging up data for transport over the Web™ that many (most? all?) major languages have built-in functions for transmuting it into data that the language can easily navigate.

So, why bother putting the Oscar™ nomination info into JSON? In case someone wants to write an app that uses that info. For example, if you wanted to create your own Oscar™ score sheet or, to be honest, entry tickets for your office pool, you could write a little script and output it exactly as you’d like. (Or you could just google™ for someone else’s Oscar™ pool sheet.) (I also posted a terrible little PHP script™ that does just that.)

On Wednesday and Thursday I went to the second LODLAM (linked open data for libraries, archives, and museums) unconference, in Montreal. I’d attended the first one in San Francisco two years ago, and this one was almost as exciting — “almost” because the first one had more of a new car smell to it. This is a sign of progress and by no means is a complaint. It’s a great conference.

But, because it was an unconference with up to eight simultaneous sessions, there was no possibility of any single human being getting a full overview. Instead, here are some overall impressions based upon my particular path through the event.

Serious progress is being made. E.g., Cornell announced it will be switching to a full LOD library implementation in the Fall. There are lots of great projects and initiatives already underway.

Some very competent tools have been developed for converting to LOD and for managing LOD implementations. The development of tools is obviously crucial.

There isn’t obvious agreement about the standard ways of doing most things. There’s innovation, re-invention, and lots of lively discussion.

Some of the most interesting and controversial discussions were about whether libraries are being too library-centric and not web-centric enough. I find this hugely complex and don’t pretend to understand all the issues. (Also, I find myself — perhaps unreasonably — flashing back to the Standards Wars in the late 1980s.) Anyway, the argument crystallized to some degree around BIBFRAME, the Library of Congress’ initiative to replace and surpass MARC. The criticism raised in a couple of sessions was that Bibframe (I find the all caps to be too shouty) represents how libraries think about data, and not how the Web thinks, so that if Bibframe gets the bib data right for libraries, Web apps may have trouble making sense of it. For example, Bibframe is creating its own vocabulary for talking about properties that other Web standards already have names for. The argument is that if you want Bibframe to make bib data widely available, it should use those other vocabularies (or, more precisely, namespaces). Kevin Ford, who leads the Bibframe initiative, responds that you can always map other vocabs onto Bibframe’s, and while Richard Wallis of OCLC is enthusiastic about the very webby Schema.org vocabulary for bib data, he believes that Bibframe definitely has a place in the ecosystem. Corey Harper and Debra Riley-Huff, on the other hand, gave strong voice to the cultural differences. (If you want to delve into the mapping question, explore the argument about whether Bibframe’s annotation framework maps to Open Annotation.)

I should add that although there were some strong disagreements about this at LODLAM, the participants seem to be genuinely respectful.

LOD remains really really hard. It is not a natural way of thinking about things. Of course, neither are old-fashioned database schemas, but schemas map better to a familiar forms-based view of the world: you fill in a form and you get a record. Linked data doesn’t even think in terms of records. Even with the new generation of tools, linked data is hard.

He says many Medieval manuscripts are being digitized. The Mellon Foundation is funding many such projects. But these have tended to reinvent the same tech, and have not been designed for interoperability with other projects. So the Digital Medieval Initiative was founded, with a long list of prestigious partners. They thought about what they’d like: distributed, linked data, interoperable, etc. For this they need a shared description format.

The traditional approach is annotate an image of a page. But it can be very difficult to know which images to annotate; he gives as an example a page that has fold-outs. “The naive assuption is that an image equals a page.” But there may be fragments, or only portions of the page have been digitized (e.g., the illuminations), etc. There may be multiple images on a page, revealed by multi-spectral imaging. There may be multiple orientations of the page, etc.

The solution? The canvas paradigm. A canvas is an empty space corresponding to the rectangle (or whatever) of the page. You allow rich resources to be associated with it, and allow users to comment. For this, they use Open Annotation. You can specify a choice of images. You can associate text with an area of the canvas. There are lots of different ways to visualize those comments: overlays, side-by-side, etc.

You can build hybrid pages. For example, and old scan might have a new color scan of its illustrations pointing at it. Or you could have a recorded performance of a piece of music pointing at the musical notation.

In summary, the SharedCanvas model uses open standards (HTML 5, Open Annotation, TEI, etc.) and can be implement distributed across reporsitories, encouraging engagement by domain experts.

John Palfrey and Urs Gasser are giving a book talk at Harvard about their new book, Interop. (It’s really good. Broad, thoughtful, engaging. Not at all focused on geeky tech issues.) NOTE: Posted without re-reading

JP says the topic of interop seems on the face of it like it should be “very geeky and very dull.” He says the book started out fairly confined, about the effect of interop on innovation. But as they worked on it, it got broader. E.g., the Facebook IPO has been spun about the stock price’s ups and downs. But from an interop perspective, the story is about why FB was worth $100B or more when its revenues don’t indicate any such thing. It’s because FB’s interop in our lives make it hard to extract. But from this also come problems, which is why the subtitle of the book Interop talks about its peril.

In the book, JP and Urs look at how railroad systems became interoperable. Currency is like that, too: currencies vary but we are able to trade across borders. This has been great for the global economy, but it make problems. E.g., the Greek economic meltdown shows the interdependencies of economies.

The book gives a concise def of interop: “The ability to transfer and render userul data and other information across systems (including organizations), applications or components.” But that is insufficient. The book sees interop more broadly as “The art and science of working together.” The book talks about interop in terms of four levels: data, tech, humans, and institutions.

They view the book as an inquiry, some of which is expressed in a series of case studies and papers.

Urs takes the floor. He’s going to talk about a few case studies.

First, how can we make our cities smarter using tech? (Urs shows an IBM video that illustrates how dependent we are on sharing information.) He draws some observations:

Interop is not black or white. Many degrees. E.g., power plugs are not interoperable around the world, but there are converters. Or, international air travel requires a lot of interop among the airlines.

Interop is a design challenge. In fact, once you’ve messed up with interop, it’s hard to make it right. E.g., it took a long time to fix air traffic control systems because there was a strongly embedded legacy system.

There are important benefits, including systems efficiency, user choice, and economic growth.

Urs points to their four-layer model. To make a smart city, the tech the firefighters and police use need to interop, as do their data. But at the human layer, the language used to vary among branches; e.g., “333” might code one thing for EMTs and another for the police. At the institutional layer, the laws for privacy might not be interoperable, making it hard for businesses to work globally.

Second example: When Facebook opened its APIs so that other apps could communicate with FB, there was a spike in innovation; 4k apps were made by non-FB devs that plug into FB. FB’s decision to become more interoperable led to innovation. Likewise for Twitter. “Much of the story behind Twitter is an interop question.”

Likewise for Ushahidi; after the Haitian earthquake, it made a powerful platform that enabled people to share and accumulate info, mapping it, across apps and devices. This involved all layers of the interop stack, from data to institutions such as the UN pitching in. (Urs also points to safe2pee.org :)

Observations:

There’s a cycle of interop, competition, and innovation.

There are theories of innovation, including generativity (Zittrain), user-driven innovation (Von Hippel) and small-step innocations (Christensen).

Caveat: More interop isn’t always good. A highly interop business can take over the market, creating a de facto monopoly, and suppressing innovation.

Interop also can help diffuse adoption. E.g., the transition to high def tv: it only took off once the tvs were were able to interoperate between analog and digital signals.

Example 3: Credit cards are highly interoperable: whatever your buying opportunity is, you can use a selection of cards that work with just about any bank. Very convenient.

More interop creates more problems because it means there are more connection points.

Example 4: Cell phone chargers. Traditionally phones had their own chargers. Why? Europe addressed this by the “Sword of Damocles” approach that said that if the phone makers didn’t get their act together, the EC would regulate them into it. The micro-USB charger is now standard in Europe.

Observations:

It can take a long time, because of the many actors, legacy problems, and complexity.

It’s useful to think about these issues in terms of a 2×2 of regulation/non-regulation, and collaborative-unilateral.

JP back up. He is going to talk about libraries and the preservation of knowledge as interop problems. Think about this as an issue of maintaining interop over time. E.g., try loading up one of your floppy disks. The printed version is much more useful over the long term. Libraries find themselves in a perverse situation: If you provide digital copies of books, you can provide much less than physical books. Five of the 6 major publishers won’t let libraries lend e versions. It’d make sense to have new books provided on an upon standard format. So, even if libraries could lend the books, people might not have the interoperable tech required to play it. Yet libraries are spending more on e-books, and less on physical. If libraries have digital copies and not physical copies, they are are vulnerable to tech changes. How do we insure that we can continuously update? The book makes a fairly detailed suggestion. But as it stands, as we switch from one format to another over time, we’re in worse shape than if we had physical books. We need to address this. “When it comes to climate change, or electronic health records, or preservation of knowledge, interop matters, both as a theory and as a practice.” We need to do this by design up front, deciding what the optimal interop is in each case.

Q&A

Q: [doc searls] Are there any places where you think we should just give up?

A: [jp] I’m a cockeyed optimist. We thought that electronic health records in the US is the hardest case we came across.

Q: How does the govt conduct consultations with experts from across the US. What would it take to create a network of experts?

A: [urs] Lots of expert networks that have emerged, enabled by tech that fosters from the bottom up human interoperability.
A: [jp] It’s not clear to me that we want that level of consultation. I don’t know that we could manage direct democracy enabled in that way.

Q: What are the limits you’d like to see emerge on interop. I.e., I’m thinking of problems of hyper-coherence in bio: a single species of rice or corn that may be more efficient can turn out to be with one blight to have been a big mistake. How do you build in systems of self-limit?

[urs] We try to address this somewhat in a chapter on diversity, which begins with biodiversity. When we talk about interop, we do not suggest merging or unifying systems. To the contrary, interop is a way to preserve diversity, and prevent fragmentation within diversity. It’s extremely difficult to find the optimums, which varies from case to case, and to decide on which speed bumps to put in place.
[jp] You’ve gone to the core of what we’re thinking about.

Q: Human autonomy, efficiency, and economic growth are three of the benefits you mention, but they can be in conflict with one another. How important are decentralized systems?

[urs] We’re not arguing in favor of a single system, e.g., that we have only one type of cell phone. That’s exactly not what we’re arguing for. You want to work toward the sweet spot of interop.
[jp] They are in tension, but there are some highly complex systems where they coexist. E.g., the Web.

Q: Yes, having a single cell phone charger is convenient. But there may be a performance tradeoff, where you can’t choose the optimum voltage if you standard on 5V. And an innovation deficit: you won’t get magnetic plugs, etc.

[urs] Yes. This is one of the potential downsides of interop. It may lock you in. When you get interop by choosing a standard, you freeze the standard for the future. So one of the additional challenge is: how can we incorporate mechanisms of learning into standards-setting? Cell phone chargers don’t have a lot of layers on top of them, so the standardization doesn’t have quite the ripples through generativity. And that’s what the discussion should be about.

According to an article in Science Insider by Dennis Normile, a group formed at a symposium sponsored by the Board on Global Science and Technology, of the National Research Council, an arm of the U.S. National Academies [that’s all they’ve got??] is proposing making it easier to find big scientific data sets by using a standard tag, along with a standard way of conveying the basic info about the nature of the set, and its terms of use. “The group hopes to come up with a protocol within a year that researchers creating large data sets will voluntarily adopt. The group may also seek the endorsement of the Internet Engineering Task Force…”

Upload a set of data, and it will do some semi-spiffy visualizations of it. (As Apryl DeLancey points out, Martin Wattenberg and Fernanda Viegas now work for Google, so if they’re working on this project, the visualizations are going to get much better.) More important, the data you upload is now publicly available. And, more important than that, the site wants you to upload your data in Google’s DSPL format. DSPL aims at getting more metadata into datasets, making them more understandable, integrate-able, and re-usable.

So, let’s say you have spreadsheets of “statistical time series for unemployment and population by country, and population by gender for US states.” (This is Google’s example in its helpful tutorial.)

You would supply a set of concepts (“population”), each with a unique ID (“pop”), a data type (“integer”), and explanatory information (“name=population”, “definition=the number of human beings in a geographic area”). Other concepts in this example include country, gender, unemployment rate, etc. [Note that I’m not using the DSPL syntax in these examples, for purposes of readability.]

For concepts that have some known set of members (e.g., countries, but not unemployment rates), you would create a table — a spreadsheet in CSV format — of entries associated with that concept.

If your dataset uses one of the familiar types of data, such as a year, geographical position, etc., you would reference the “canonical concepts” defined by Google.

You create a “slice” or two, that is, “a combination of concepts for which data exists.” A slice references a table that consists of concepts you’ve already defined and the pertinent values (“dimensions” and “metrics” in Google’s lingo). For example, you might define a “countries slice” table that on each row lists a country, a year, and the country’s population in that year. This table uses the unique IDs specified in your concepts definitions.

Finally, you can create a dataset that defines topics hierarchically so that users can more easily navigate the data. For example, you might want to indicate that “population” is just one of several characteristics of “country.” Your topic dataset would define those relations. You’d indicate that your “population” concept is defined in the topic dataset by including the “population topic” ID (from the topic dataset) in the “population” concept definition.

When you’re done, you have a data set you can submit to Google Public Data Explorer, where the public can explore your data. But, more important, you’ve created a dataset in an XML format that is designed to be rich in explanatory metadata, is portable, and is able to be integrated into other datasets.

Overall, I think this is a good thing. But:

While Google is making its formats public, and even its canonical definitions are downloadable, DSPL is “fully open” for use, but fully Google’s to define. Having the 800-lbs gorilla defining the standard is efficient and provides the public platform that will encourage acceptance. And because the datasets are in XML, Google Public Data Explorer is not a roach motel for data. Still, it’d be nice if we could influence the standard more directly than via an email-the-developers text box.

Defining topics hierarchically is a familiar and useful model. I’m curious about the discussions behind the scenes about whether to adopt or at least enable ontologies as well as taxonomies.

Also, I’m surprised that Google has not built into this standard any expectation that data will be sourced. Suppose the source of your US population data is different from the source of your European unemployment statistics? Of course you could add links into your XML definitions of concepts and slices. But why isn’t that a standard optional element?

Further (and more science fictional), it’s becoming increasingly important to be able to get quite precise about the sources of data. For example, in the library world, the bibliographic data in MARC records often comes from multiple sources (local cataloguers, OCLC, etc.) and it is turning out to be a tremendous problem that no one kept track of who put which datum where. I don’t know how or if DSPL addresses the sourcing issue at the datum level. I’m probably asking too much. (At least Google didn’t include a copyright field as standard for every datum.)

Despite the title of Andrew Conry-Murray’s article in InformationWeek — “Why Business IT Shouldn’t Shrug Off Chrome OS” — it’s on balance quite negative about the prospects for enterprises adopting Google’s upcoming operating system. Andrew argues that enterprises are going to want hybrid systems, Microsoft is already moving into the Cloud, Windows 7 will have been out for a year before Chrome is available, and it’d take a rock larger than the moon to move enterprises off their legacy applications. All good points. (The next article in the issue, by John Foley is more positive about Chrome overall.)

A couple of days I heard a speech by Federal CTO Aneesh Chopra at the Open Government Innovations conference (#ogi to your Twitter buffs). It was fabulous. Aneesh — and he’s an informal enough speaker that I feel ok first-naming him — loves the Net and loves it for the right reasons. (“Right” of course means I agree with him.) The very first item on his list of priorities might be moon-sized when it comes to enterprise IT: Support open standards.

So, suppose the government requires contractors and employees to use applications that save content in open standards. In the document world, that means ODF. Now, ISO also approved a standard favored by (= written by) Microsoft, OOXML, that is far more complex and is highly controversial. There is an open source plug-in for Word that converts Word documents to those formats (apparently Microsoft aided in its development), but that’s not quite native support. So, imagine the following scenario (which I am totally making up): The federal government not only requires that the docs it deals with are in open standard formats, it switches to open source desktop apps in order to save money on license fees. (Vivek Kundra switched tens of thousands of DC employees to open source apps for this reason.) OOXML captures more of the details of a Word document, but ODF is a more workable standard, and it’s the format of the leading open source office apps. If the federal government were to do this, ODF stands a chance of becoming the safe choice for interchanging documents; it’s the one that will always work. And in that case, enterprises might find Word to be over-featured and insufficiently ODF-native.

Now, all of this is pure pretend. And even if ODF were to become the dominant document standard, Microsoft could support it robustly, although that might mean that some of Word’s formatting niceties wouldn’t make the transition. Would business be ok with that? For creators, probably yes; it’d be good to be relieved of the expectation that you will be a document designer. For readers, no. We’ll continue to want highly formatted documents. But, then ODF + formatting specifications can produce quite respectably formatted docs, and that capability will only get better.

Joi Ito posts about whether we’ve agreed upon the syntax of retweeting: If I want to twitter one of your tweets and add my own comment, do I do it as “RT @you: your comment Me: My comment” or as “RT @you:your comment [Me: my comment]” or what? Of course, there was a bunch of twittering about this, which Joi captures.

It’s fun to watch syntax emerge. As Ethanz tweets: “Microformat development in 140 chars or less…”