SemTech 2008: Eric Miller (Zepheira) – "Reuse, Repurpose, Remix"

Eric Miller from Zepheira gave the second keynote talk yesterday, talking about some of their open-source development activities that have reached a level that he thought we might be interested in. The aim of the talk was to show that it is possible to reduce the costs for people who are interested in mixing together data from lots of different sources while hiding a lot of the complexity that makes that happen.

He began with a story about when his dad was in hospital with cancer: it was a comedy of errors (“many errors with not much comedy”). As he went from one department to another, they couldn’t correlate any information because their patient care model had no primary key to aid with the combination of that information. Talking to the doctors and others in this space, Eric realised how alarmingly frequent that it is. Common statements were “the systems weren’t designed to do that”, “we can’t do that”, etc., resulting in general frustration. That pattern in the hospital is repeated across various businesses and organisations. Eric said that there are too many important things that we as a community need right now, so we need a useful reusable infrastructure to solve various problems, and one way is to use the Web. We can bring lessons learned from the Web back into these organisations.

He then moved on to talk about some of the things we can do to make the required bridges stronger. There’s a common theme (when talking to different people and groups in health, climate change, etc.) of a requirement for such bridging technologies. A lot of the solutions exist, so we just have to stick the parts of the answer together. If we could figure out how to connect these together, then we can have a serious jump on the problem(s). Lessons from the Web (and the Semantic Web) can be applicable to managing information from these enterprise or organisational spaces.

He talked about a document analogy. A big change on the Web from several years ago was the blog. Before then, the so-called Read/Write Web had a disproportionate amount of the “read” aspect to it. People began adding little bits of structure to the creation of content in blogs. We can take advantage of likeness factors or patterns in communities (of bloggers): it’s a very powerful aspect. This little bit of structure can feed into larger communities, e.g. Technorati leverages the structure from multiple blogs.

He then talked about a music analogy. Sid Vicious did Sinatra’s “My Way”. Apple’s GarageBand reduced the technical barriers for people to reuse lyrics and music, allowing people to get more creative about how they could use each other’s data. Recently, NIN made their multi-track files available for remixing. Just as in the document analogy, this is adding more structure to the content which allows people to take this and do more with it. This also takes advantage of the network effect, by leveraging multiple community contributions across available repurposable data (not just for one song or one individual). As a result, we get services like MusicBrainz where we can also see patterns around music.

In this way, we can stop worrying so much about whether it is a spreadsheet, a database, whatever. [These are all just parts that can be brought together, and you don’t have to settle on a particular format or storage mechanism to progress.]

From an action standpoint, Eric said that this corresponds to: create, publish, and analyse. For documents, the corresponding action stream is from creating a blog text to publishing on the Blogger website to mass analyses via Technorati. For music, this could be from creating a song in GarageBand to publishing via iTunes to analysis in MusicBrainz. Finally, for data, Eric will show us this process using Exhibit, Remix and Studio.

He gave a demo of Exhibit from MIT SIMILE. Exhibit is a software service for rendering data. You ship data to it and you get back a facetted navigation system. You don’t need to install a database, and you don’t have to create a business logic tier. You can style it in different ways, and look at it in different “lenses”.

Remix is a tool that builds on top of this. Eric is one of the PIs of the project. It ties together best-agreed components – visual interfaces, data transformation interfaces, data storage, etc. – all of this is brought together under the Remix umbrella. Eric also mentioned that Remix leverages persistent identifiers using purlz.org. These can be for people, places, concepts, network objects, anything.

He presented an example of data that an oncology nurse or doctor uses frequently, which is not in an ontology: some of it is in their head and the rest is in a spreadsheet. He showed Remix stitching together two spreadsheets from different clinics for oncology. You can stitch together fields and see if it makes sense from a data perspective. Remix has some tools for “simultaneous editing” which allows editing over patterns of data, so by editing one entry you can edit all of them. This acts like a script which can change “lastname, firstname” to “firstname lastname” without any complicated programming. You can connect anything, but it may not necessarily make sense, so there’s a need for interfaces to show users if it does makes sense. Then in Exhibit, you can customise facets, views, apply different themes, etc. Within a matter of minutes, Remix gives tools that a nurse can use to not just create an interface but to publish the information to the Web so that other people can benefit from it.

Every bit of the transformation that has occurred here has been identified (with an identifier). Everything has become a web resource, with a framework that enables people so stitch stuff together in a resource-oriented architecture. Then this can be analysed using Studio. If Technorati provides real-time analysis of RSS feeds, Studio provides an analysis of your company or organisational data, e.g. as reports with pattern analysis. Because it’s based on RDF / SPARQL, you can create queries that are relevant to you: “show me all the most popular or least popular reports”, or “show me any reports that used some of my data”.

This can bring organisations into a “Linked Enterprise Data” (LED) framework. Some people may not care about so much about Linked Open Data (LOD): “expose your data, and something cool is going to happen”. Rather, Eric talked about exposing your enterprise data and showing that something is going to happen right now, so that you can see the benefits in terms of solutions available immediately. LED is a big part of what they’ve been focussing on in Zepheira.

The key subtext is recognising that what we’re dealing with is hospitals, organisations etc., who can leverage lots of the standards and solutions that we’ve been using on the Web but at a larger scale. Tools like this are a critical aspect of what companies can take now and can start to use to link their data together.

Eric said that there are huge advantages for companies to not just be “on” the Web but to be “in” the Web. If employees are a company’s most important aspect, why tie their hands behind their backs and ask them to solve a particular problem without providing them with the means to do it? There’s a need to empower them, to make it easier for them to get at data, to integrate it and to share it. There are just too many problems not to address / attack them aggressively through not just one approach or representation, but by stitching various parts together.

Eric finished by challenging ten companies to try out these tools if they haven’t before, to come back to SemTech 2009 with reports, and to share each other’s knowledge. The standards and tools are robust, so it can be done.