Thursday, November 09, 2006

Marten Mickos at Web 2.0

MySQL CEO Marten Mickos just gave at talk on at the Web 2.0 Conference on "The Great Database in the Sky".

The idea is that "structured data should be open sourced", linked, and easily accessible. The idea is to do something like Google does for unstructured data (web documents) for structured data (database records).

This is not a new idea. People usually talk about this as querying heterogeneous distributed databases. The trick is matching up disparate data definitions and smoothing over bad data. And that is quite a trick.

One technique is to require people to publish their data in some format that is easier to merge and process, but that requires all databases to cooperate. Another technique is to wrap databases with some translation layer, but that requires custom (and often fragile) wrappers for each database.

It's an interesting problem. I think there are good examples of doing some of this for specific domains -- metashopping searches like Shopzilla, Shopping.com, and Smarter.com, for example -- but Marten has said that MySQL will be leading a much broader push.

6 comments:

Hi Greg, I think much of this vision of a vast interconnected database is essentially the semantic web vision and I was surprised that Marten didn't seem to be considering that at all. I wrote more here on my blog

I'm not so sure. I think Marten was saying that he wanted the data to be entirely structured. The semantic web vision, as I weakly understand it, is more about attaching tags and labels to unstructured text data. Isn't that right?

For example, the Wikipedia page on semantic web describes "documents marked up with semantic information (an extension of the HTML meta tags) ... This could be machine-readable information about the human-readable content of the document (such as the creator, title, description, etc., of the document) or it could be purely metadata representing a set of facts (such as resources and services elsewhere in the site)."

That sounds pretty unstructured to me. It doesn't seem to easily allow SQL queries across the data.

I'm not sure, but perhaps the semantic web vision may be halfway between the unstructured search of Google and the structured approach Marten seemed to be advocating. The semantic web allows adding labels and metadata to unstructured data to add some structure, but typically would not be used in a way to make the data fully structured. Is that right?

The semantic web is highly structured. It's based on the relational model in that everything is described by its relationships to other things. It's not really about tagging or markup - they're just various ways to express semantics. Have a look at the tutorial I wrote - it starts from the point of view of the database rather than the usual XML representation that others do.

I agree that the semantic web can be highly structured. I think where we disagree is on whether it necessarily will be highly structured. It is hard to say, but I suspect a very common use of it would be to partially label data -- it easily allows this -- instead of fully structuring the data.

If so, then Marten has a point. The semantic web sounds like a more general concept that could be used for structured data, but is not restricted to structured data.

In general, the semantic web seems intended to combine structured and unstructured data. Marten seems to be seeking to stitch together structured databases, a more specific task.

That does not mean that the semantic web is not a potential way to attack this problem, but it suggests that other techniques that are more directly targeting the problem might be worth considering.