Display Settings

articles per page.

order.

Web 2.0's Open Data Access Conundrum

Open Data Access and Web 2.0 have a very strange relationship that continues to blur the lines of demarcation between where Web 2.0 ends and where Web.Next (i.e Web 3.0, Semantic/Data Web, Web of Databases etc.) starts. But before I proceed, let me attempt to define Web 2.0 one more time:

A phase in the evolution web usage patterns that emphasizes Web Services based interaction between “Web Users” and “Points of Web Presence” over traditional “Web Users” and “Web Sites” based interaction. Basically, a transition from visual site interaction to presence based interaction.

BTW - Dare Obasanjo also commented about Web usage patterns in his post titled: The Two Webs. Where he concluded that we had a dichotomy along the lines of: HTTP-for-APIs (2.0) and HTTP-for-Browsers (1.0). Which Jon Udell evolved into: HTTP-Services-Web and HTTP-Intereactive-Web during our recent podcast conversation.

With definitions in place, I will resume my quest to unveil the aforementioned Web 2.0 Data Access Conundrum:

Where a modicum of Data Access appreciation and comprehension does exist it is inherently compromised by business models that mandate some form of “Walled Gardens” and “Data Silos”

Mash-ups are a response to said “Walled Gardens” and “Data Silos” . Mash-ups by definition imply combining things that were not built for recombination.

As you can see from the above, Open Data access isn't genuinely compatible with Web 2.0.

We can also look at the same issue by way of the popular M-V-C (Model View Controller) pattern. Web 2.0 is all about the “V” and “C” with a modicum of “M” at best (data access, open data access, and flexible open data access are completely separate things). The “C” items represent application logic exposed by SOAP or REST style web services etc. I'll return to this later in this post.

What about Social Networking you must be thinking? Isn't this a Web 2.0 manifestation? Not at all (IMHO). The Web was developed / invented by Tim Berners-Lee to leverage the “Network Effects” potential of the Internet for connecting People and Data. Social Networking on the other hand, is simply one of several ways by which construct network connections. I am sure we all accept the fact that connections are built for many other reasons beyond social interaction. That said, we also know that through social interactions we actually develop some of our most valuable relationships (we are social creatures after-all).

The Web 2.0 Open Data Access impedance reality is ultimately going to be the greatest piece of tutorial and usecase material for the Semantic Web. I take this position because it is human nature to seek Freedom (in unadulterated form) which implies the following:

Access Data from a myriad of data sources (irrespective of structural differences at the database level)

Mesh (not Mash) data in new and interesting ways

Share the meshed data with as many relevant people as possible for social, professional, political, religious, and other reasons

Construct valuable networks based on data oriented connections

Web 2.0 by definition and use case scenarios is inherently incompatible with the above due to the lack of Flexible and Open Data Access.

If we take the definition of Web 2.0 (above) and rework it with an appreciation Flexible and Open Data Access you would arrive at something like this:

A phase in the evolution of the web that emphasizes interaction between “Web Users” and “Web Data” facilitated by Web Services based APIs and an Open & Flexible Data Access Model “.

In more succinct form:

A pervasive network of people connected by data or data connected by people.

Returning to M-V-C and looking at the definition above, you now have a complete of ”M“ which is enigmatic in Web 2.0 and the essence of the Semantic Web (Data and Context).

To make all of this possible a palatable Data Model is required. The model of choice is the Graph based RDF Data Model - not to be mistaken for the RDF/XML serialization which is just that, a data serialization that conforms to the aforementioned RDF data model.

The Enterprise Challenge

Web 2.0 cannot and will not make valuable inroads into the the enterprise because enterprises live and die by their ability to exploit data. Weblogs, Wikis, Shared Bookmarking Systems, and other Web 2.0 distributed collaborative applications profiles are only valuable if the data is available to the enterprise for meshing (not mashing).

A good example of how enterprises will exploit data by leveraging networks of people and data (social networks in this case) is shown in this nice presentation by Accenture's Institute for High Performance Business titled: Visualizing Organizational Change.

Web 2.0 commentators (for the most part) continue to ponder the use of Web 2.0 within the enterprise while forgetting the congruency between enterprise agility and exploitation of people & data networks (The very issue emphasized in this original Web vision document by Tim Berners-Lee). Even worse, they remain challenged or spooked by the Semantic Web vision because they do not understand that Web 2.0 is fundamentally a Semantic Web precursor due to Open Data Access challenges. Web 2.0 is one of the greatest demonstrations of why we need the Semantic Web at the current time.

Finally, juxtapose the items below and you may even get a clearer view of what I am an attempting to convey about the virtues of Open Data Access and the inflective role it plays as we move beyond Web 2.0:

Persisted Knowledge - Information in actionable context that is also available in transient or persistent forms expressed using a Graph Data Model. A modern knowledgebase would more than likely have RDF as its Data Language, RDFS as its Schema Language, and OWL as its Domain Definition (Ontology) Language. Actual Domain, Schema, and Instance Data would be serialized using formats such as RDF-XML, N3, Turtle etc).

How do Data Spaces and Databases differ? Data Spaces are fundamentally problem-domain-specific database applications. They offer functionality that you would instinctively expect of a database (e.g. AICD data management) with the additonal benefit of being data model and query language agnostic. Data Spaces are for the most part DBMS Engine and Data Access Middleware hybrids in the sense that ownership and control of data is inherently loosely-coupled.

How do Data Spaces and Content Management Systems differ?Data Spaces are inherently more flexible, they support multiple data models and data representation formats. Content management systems do not possess the same degree of data model and data representation dexterity.

How do Data Spaces and Knowledgebases differ?A Data Space cannot dictate the perception of its content. For instance, what I may consider as knowledge relative to my Data Space may not be the case to a remote client that interacts with it from a distance, Thus, defining my Data Space as Knowledgebase, purely, introduces constraints that reduce its broader effectiveness to third party clients (applications, services, users etc..). A Knowledgebase is based on a Graph Data Model resulting in significant impedance for clients that are built around alternative models. To reiterate, Data Spaces support multiple data models.

Where do Data Spaces fit into the Web's rapid evolution?They are an essential part of the burgeoning Data Web / Semantic Web. In short, they will take us from data “Mash-ups” (combining web accessible data that exists without integration and repurposing in mind) to “Mesh-ups” (combining web accessible data that exists with integration and repurposing in mind).

I've just re-read an article penned by Dan Brickley in 1999 titled: The WWW Proposal and RDF: Then and Now, that retains its prescience to this very day. Ironically I stumbled across this timeless piece while revisiting the RSS name imbroglio that gave us a simple syndication format (RSS 2.0) that will ultimately implode (IMHO) since "Simple" is ultimately short lived when dealing with attention challenged end-users that are always assumed to be dumb when in fact they are simply ambivalent.

Although I don't believe in complex entry points into complex technology realms, I do subscribe to the approach where developers deal with the complexity associated with a problem domain while hiding said complexity from ambivalent end-users via coherent interfaces -- which does not always imply User Interface.

XBRL is a great piece of work that addresses the complex problem domain of Financial Reporting. The only thing it's missing right now is an Ontology that facilitates RDF Data Model based XBRL Schema and Instance Data which ultimately makes XBRL data available to RDF query languages such as SPARQL. This line of thought implies, for instance, an XML Schema to OWL Ontology Mapping for Schema Data (as explained in a white paper by the VSIS Group at the university of Hamburg) leaving the Instance Data to be generated in a myriad of ways that includes XML to RDF and/or XML->SQL->RDF.

As I stated in an earlier post: we should not mistake ambivalence to lack of intelligence. Assuming "Simple" is always right at all times is another way of subscribing to this profound misconception. You know, assuming the world was flat (as opposed to geoid) was quite palatable at some point in the history of mankind, I wonder what would have happened if we held on to this point of view to this day because of its "Simplicity"?

SEMANTIC KNIGHT: None shall pass without formally defining the ontological meta-semantic thingies of their domain something-or-others! HACKER: What? SEMANTIC KNIGHT: None shall pass without using all sorts of semantic meta-meta-meta-stuff that we will invent Real Soon Now! HACKER: I have no quarrel with you, good Sir Knight, but I must get my work done on the Web. Stand aside!

I heard about Kiva.ORG in a BusinessWeek podcast. After visiting its website, I think there are few places where GeoRSS (in the RDF/A syntax) and Geonames can be used to enhance the site’s functionality.

Kiva.ORG Background

It’s a microfinance website for people in the developing countries. Its business model is in the intersection between peer-to-peer financing and philanthropy. The goal is to help developing country businesses to borrow small loans from a large group of Web users, so that they can avoid paying high interests to the banks.

For example, a person in Uganda can request a $500 loan and use it for buying and selling more poultry. One or more lenders (anyone on the Web) may decide to grant loans to that person in increments as tiny as $25. After few years, that person will pay back the loans to the lenders.

How GeoRSS and Geonames Can Help

I went to the website and discovered the site has a relative weak search and browsing interface. In particular, there is no way to group loan requests based on geographical locations (e.g., countries, cities and regions).
Took a look at individual loan pages. Each page actually has standard ways to describe location information — e.g., Location: Mbale, Uganda.

It should be relative easy to add GeoRSS points (in the RDF/A syntax) to describe these location information (an alternative maybe using Microformat Geo or W3C Geo). Once the location information is annotated, one can imagine building a map mashup to display loan requests in a geospatial perspective. One can also build search engines to support spatial queries such as ‘find me all loans with from Mbale’.

Since Kiva.ORG webmasters may not be GIS experts, it will be nice if we can find ways to automatically geocode location information and describe that using GeoRSS. This automatic geocoding procedure can be developed using Geonames’s webservices. Take a string ‘Mbale’ or ‘Uganda’, and send to Geonames’s search service. The procedure will get back JSON or XML description of the location, which include latitude and longitude. This will then be used to annotate the location information in a Kiva loan page.

Can you think of other ways to help Kiva.ORG to become more ‘geospatially intelligent’?
You can learn more about Kiva.ORG at its website and listen to this podcast.

Standards as social contracts: "Looking at Dave Winer's efforts in evangelizing OPML, I try to draw some rough lines into what makes a de-facto standard. De Facto standards are made and seldom happen on their own. In this entry, I look back at the history of HTML, RSS, the open source movement and try to draw some lines as to what makes a standard.

I posted a comment to the Tristan Louis' post along the following lines:

Analysis is spot on re. the link between de facto standardization and bootstrapping. Likewise, the clear linkage between boostrapping and connected communities (a variation of the social networking paradigm).

Dave built a community around a XML content syndication and subscription usecase demo that we know today as the blogosphere. Superficially, one may conclude that Semantic Web vision has suffered to date from a lack a similar bootstrap effort. Whereas in reality, we are dealing with "time and context" issues that are critical to the base understanding upon which a "Dave Winer" style bootstrap for the Semantic Web would occur.

Personally, I see the emergence of Web 2.0 (esp. the mashups phenomenon) as the "time and context" seeds from which the Semantic Web bootstrap will sprout. I see shared ontologies such as FOAF and SIOC leading the way (they are the RSS 2.0's of the Semantic Web IMHO).

There is an interesting article at regdeveloper.com titled: Structured data is boring and useless.. This article provides insight into a serious point of confusion about what exactly is structured vs. unstructured data. Here is a key excerpt:

"We all know that structured data is boring and useless; while unstructured data is sexy and chock full of value. Well, only up to a point, Lord Copper. Genuinely unstructured data can be a real nuisance - imagine extracting the return address from an unstructured letter, without letterhead and any of the formatting usually applied to letters. A letter may be thought of as unstructured data, but most business letters are, in fact, highly-structured." ....

"The labels "structured data" and "unstructured data" are often used ambiguously by different interest groups; and often used lazily to cover multiple distinct aspects of the issue. In reality, there are at least three orthogonal aspects to structure: * The structure of the data itself.* The structure of the container that hosts the data.* The structure of the access method used to access the data. These three dimensions are largely independent and one does not need to imply another. For example, it is absolutely feasible and reasonable to store unstructured data in a structured database container and access it by unstructured search mechanisms."

Data understanding and appreciation is dwindling at a time when the reverse should be happening. We are supposed to be in the throws of the "Information Age", but for some reason this appears to have no correlation with data and "data access" in the minds of many -- as reflected in the broad contradictory positions taken re. unstructured data vs structured data, structured is boring and useless while unstructured is useful and sexy....

The difference between "Structured Containers" and "Structured Data" are clearly misunderstood by most (an unfortunate fact).

For instance all DBMS products are "Structured Containers" aligned to one or more data models (typically one). These products have been limited by proprietary data access APIs and underlying data model specificity when used in the "Open-world" model that is at the core of the World Wide Web. This confusion also carries over to the misconception that Web 2.0 and the Semantic/Data Web are mutually exclusive.

But things are changing fast, and the concept of multi-model DBMS products is beginning to crystalize. On our part, we have finally released the long promised "OpenLink Data Spaces" application layer that has been developed using our Virtuoso Universal Server. We have structured unified storage containment exposed to the data web cloud via endpoints for querying or accessing data using a variety of mechanisms that include; GData, OpenSearch, SPARQL, XQuery/XPath, SQL etc..

Last week I put out a series of screencast style demos that sought to demonstrate the core elements of our soon to be released Javascript Toolkit called OAT (OpenLink Ajax Toolkit) and its Ajax Database Connectivity layer.

“Advanced”, as used above, simply means that I am embedding images (employee photos and national flags) and a database driven pivot into the map pins that serve as details lookups in classic SQL master/details type scenarios.

The “Ajax Call In Progress..” dialog is there to show live interaction with a remote database (in this case Virtuoso but this could be any ODBC, JDBC, OLEDB, ADO.NET, or XMLA accessible data source)

The data access magic source (if you want to call it that) is XMLA - a standard that has been in place for years but completely misunderstood and as a result under utilized

You can see a full collection of saved documents at the following locations:

My Mashups demo directory (Google and Yahoo! demo variants but note these do not work with Safari or IE at the current time. IE7 issues will be resolved in the next day or so)

XMLA has been around for a long time. Its fundamental goal was to provide Web Applications with Tabular and Multi-dimensional data access before it fell off the radar (a story too long to tell in this post).