Wednesday, May 18, 2011

AWOSS-2011-1

Told some ‘Alabama’ jokes, to illustrate the importance of context. With a lot of context, our ability to communicate is a lot greater. “Double-wide trailers on the back of a quarter.” This could be opaque to people who don’t know the context. It doesn’t take much to go into an area where you lose the audience.

Perspective:

Pull Web 3.0 Web 4.0

Push Web 1.0 Web 2.0

Infrastructure (Information) Flow

Web 3.0 is a lot like web 1.0, in that we’re laying down a foundation. It’s a data infrastructure, a lot like a communication infrastructure. But then there’s next phase, which is a ‘pull’ of the data. And web 3.0 and 4.0 are more pull, in the sense that the 'software pulls the data through the applications, as opposed to the user going out to find the data we need..

Linked data: one of the major potentials is to be able to answer the question, “what do we know.” This may seem like nothing, but if you consider an enterprise, eg., trying to determine whether or not to allow a person on an airplane, they pose a query to the system, right now we can’t answer that question with a lot of certainty, we don’t know what we know about that person. With web 3.0 we want to eb able to securely and properly know what we know.

In the upper corner of web 4.0, thinking of the web as a system. Increasingly the internet will be more and more of a system.

Diagram: web – communications for small military deployment. Horizontal ‘communties of interest’, vertical communications across these communities, with different vocabularies, etc. What we’re saying here is that we can help solve that with context, and the terms we use are OWL, RDF, etc.

One of the ‘webs’ I referred to earlier was the delegation web, but you have to know your logic. The logic is in the data. Consider, eg., military (software) agents. Imagine them recommending a mission. We have to be very sure we know what we’re doing. We want some formal test, formal methodology, that tells me everything we need to know.

The XML management challenge: one language, many vocabularies. There are, eg., myrad ways I can say ‘latitude’ and ‘longitude’. The machine doesn’t know what I’m saying. ‘ETL’ – ‘extract, transform and load’. We do it over and over. There are equally valid ways to represent the same data in XML. They are well formed, but mediation is required for interoperability. Or another way to look at it: same fact, different terms. Eg., the different ways of representing countries.

The nub of the semantic web: taking meaning into account, using concept systems, such as thesaurus or ontologies. It supports processing that uses (or reasons across) relations between things, and not just the things themselves. The idea is to share some sort of agreement about a concept, eg., what a ‘person’ is, and extend locally if needed, and we’ll call that an ‘ontology.’

As we go through the development of web 3.0, the workflows and systems are moving from being application-defined to user-defined. Right now, you can define your data within a specific application, but increasingly that will be product-agnostic. Then we move to user-described. So, eg., instead of saying “I need Fred’s train service’ they say ‘I need a train service’, and then to goal-oriented descriptions, ‘I need to go to…’.

This is a transition from monolithic systems to highly modular systems. This is a segue to a ‘semantic cloud’ sort of system. It becomes a ‘complex adaptive system’. They are complex, but also, they adapt – I can cope with unexpected change. Eg., systems dynamic model causal loop diagram. Organizations all go through this all the time. John Boyd: OODA – observe, orient, decide and act. We do it all the time.

There are numerous points of intervention for sematic systems in the decision cycle. Applying knowledge rules and domain theory. Apply machine reasoning and rules. Etc.

Consider the concept of ‘relational navigation’. It’s mostly noise and data on the input, harmony and order on the output. One thing we’re trying to do is sig.ma – a great example of RDF aggregation, when you ask a question, it goes out, gets the RDF, then assembles it for you, then you can accept/reject sources. Or another one: unified information access – if you change a concept anywhere, everywhere can see it.

Semantic cloud – we are using OSGI (Java coding system, set of specifications) to put specific metadata in JAR files, and we have much better interactivity of software. This really helps with ‘class path hell’. Creates an ‘emergent (assembly, monitoring, management) hierarchy.

It is abut trying to establish interoperability:
- between different domains (marine life sciences, and oceanography)
- link sparse marine life data with voluminous oceanographic data
- derive insights though scientific experimentation and visualization

In response, we have developed the POKM platform, a one-stop resource where than can share data, select data, sift-through, generate models, share models, ec. – so they can dynamically compose experimental workflows. Eg., you have a set of questions, those questions demand data from different sources, you formulate the questions, which become requests for data, which is serviced through the platform.

Research challenges included: access to heterogeneous data sources – marine animal detection data (MADD), and physical oceanographic data; data transformations on the fly to enable interoperability; then, standardized, flexible and scalable environment available on the web, where everything is in a web service; managing the domain knowledge.

(POKM layers diagram)
The way the system works is, if I create a model, we ask you to bring your scripts, then we service-ize it. (R-scripts see http://omepages.cwi.nl/~paulk/publications/rscript-tutorial.pdf )

(Demonstration, by Ali Daniyal) http://pokm.cs.dal.ca:8080
Login – widgets-style interface, with social features.
User collection browser – looks like file management tool
- the data sources are protected data sources, the user has to acquire credentials; we do not house the data sources, we connect to the, 750 different data providers.
Geographic browser – highlight region, obtain data
- you can select the data, filter the data, eg., csv filtering, citation filtering
- show data on map, eg., show movements of animals superimposed on a map of ocean temperatures
- visualized transformations of data (looks like LAMS)

Question, on services: these are strictly r-scripts (not REST, SOAP, etc). Plan to match these types to SOAP types, write a wrapper. REST is under development.

It was nice to meet you at AWoSS! Judy Z. sent me this link and I thought it was cool. I would ask you to consider revising the part about "pull" from "...And web 3.0 and 4.0 are more pull, in the sense that we go out and find the data we need." to something like "And web 3.0 and 4.0 are more pull, in the sense that the 'software pulls the data through the applications, as opposed to the user going out to find the data we need."