LOD Tutorial @dev/summer/2014

Comments (0)

Transcript of LOD Tutorial @dev/summer/2014

Developing LOD Applications: an IntroductionA world of OpenessOpen Data?Open Data?The 'Geek' Point of ViewWhy new formats/standards? A World ofOpen DataNatural language / text: very powerful and flexiblebut not much machine-readable, ambiguous, impreciseTables/CSV/etc: simple and good enough in many cases (much used by the OD community)but too simple in many othersRelational models/SQL a significant standardised improvementXML/objectsallow for more complex data structuresstandardised for sharing

To add up a web of dataLOD PrinciplesImplemented by theSemantic WebFrom the web of documents...But why?...to the web of dataResources and not only pagesURIs: Universal and resolvable identifiersTyped Link,i.e., the RDF (Resource Description Framework) building block like a predicate that relates a subject to an objectlike in a statementa.k.a. known as triplewhich is also an instance of a mathematical binary relation

But why?Multiple statements/relations/properties can be stated by just re-using resources/URIs...to the web of dataSchemas are just more statements

Schemas are where you put semantics...to the web of dataSeamless integration from different (web) sources. Schema/Semantic Integration, well...What's the point?What's the point?The Semantic WebApproachURIs and URI Best PracticesA URL generalisation (in turn IRI is even more general, support internationalisation)URIs should be resolvable, to provide useful and discoverable informationIn the SW/LOD world, ideally they should return RDF (about the identified resource)Even better, should return different docs, based on content negotiation:curl --location-trusted -H 'Accept: application/rdf+xml' 'http://dbpedia.org/resource/The_Matrix'

Should be stable, (reasonably) resolve to stable semanticsprojects like purl.org to cope with it Tricky details behind (http://tinyurl.com/pzwn4mx)

Predicates in RDF statements are URIshttp://dbpedia.org/resource/The_Matrixhttp://dbpedia.org/ontology/starringhttp://dbpedia.org/resource/Keanu_Reeves

==> properties too are universally identified==> their description can be given in RDF itself and URI-discovered Encoding RDFA URI returns a document, containing statements about the thing behind that URISo, how does such a document look like? How do we create/serve it?Is it XML?Source: http://tinyurl.com/qdueje8So, let's go for the simpler one

for tricks about OWLOWL FlavoursThe more expressivity/constructs/inference you want, the more performance issuesyou need more memory, more CPUVery expressive logics are also undecidable (as OWL Full)Many triple stores offer inference that cross these predefined categories, e.g., Virtuoso, not much more that RDF-SJena, a bit less than OWL-DLRDF-SStandard schemas and Ontologies: examplesschema.org: a very lightweight and general 'ontology', for most common thingsGoogle (and other search engines) supports itRDFa: allows you to annotate your web pages with RDF statementsRDFa + schema.org + other ontologies(*): allows you to be more visibile on GooglePotentially lets Google know more than it can "understand" via text mining(*) Examples:Dublin Core (general document metadataFOAF (people's relationships)SIOC (Blogs, web sites, social networks)Standard schemas and Ontologies: examplesGoodRelations: a raher rich ontology to describe commercial products, businesses and alikeBestBuy known to be using itGoogle is probably detecting it

Standard schemas and Ontologies: examplesLOD fits with Life ScienceVery heterogeneousIn strong need to integrate, collaborate etcOften can benefit from advanced OWL logics featuresLet's go for a little demoA DemoLet's do some hands-ondata1.csvdata2.xmlLet's do some hands-onLet's do some hands-on: XML->RDFHave a look at the sources in data2_to_rdf/including Xml2Rdf.java and EFOResolver.javaRun it and see the results in data2.ttl

Exercise:having the Java variables sampleId, uniProtId, efoId (coming from data1.csv),generate the statements like (as the ones in data1.ttl):

Have a try with SPARQL:Select top 10 specimens(http://purl.obolibrary.org/obo/), ordered by labeland their labelsorder results by labelClick on the reported links to seeURI resolution

The webapp seen before can be started fromdemo/webappmvn jetty:runit runs against the Fuseki instance# From http://it.dbpedia.org/sparqlPREFIX dbp-onto: <http://dbpedia.org/ontology/>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

Let's delve into the web appOpen demo/webapp (eg, in Eclipse)SPARQL invocation from Jena, look at:ConditionSearch.javaSemWebUtils.javaLook at the queries in main/resources/sparqlin particular, federated queriesLook at the JSPs. Note it's a simple MVC application backed by a triple storenot much different than client+server+DBMSBack to our conversion taskTARQL (github.com/cygri/tarql)

Complete framework to publish and query RDFhttp://bioinformatics.ua.pt/coeus/Wrap-upRDF and the SW, pros and consIt's a very flexible and standard mean to share and integrate knowledgethat's why they make Open Data available as LODIt's no magic, SW doesn't solve the interoperability problem, it just puts it on the table (F. V. Hamerlen)Isn't a one-solution-fit-all approachn-ary relationships and context-referring statementssimilarly, XML schemas might be just enough sometimes (eg, micro-formats)Data integration features are as easy as a weak pointprovenance is lost once you've merged two graphs. You need to manage this issue (eg, named graphs)Open World Assumption may be a problem, i.e., a missing property/link doesn't mean it's invalidvery hard to keep consistency (well, even in the old web you find 'chemical trails')Performance is bad, you cannot have big data setsused to be true, now try Virtuoso or Jena TDBit is still true with advanced reasoning => OWL is complicated, also because of OWARDF and the SW, pros and consIn summaryIt's good for certain purposes, butother approaches might emerge in future that considered better (e.g., MongoDB with JSON/JSON-LD documents)Yet, the linked data principle is likely here to stayGoogle Knowledge Graphhttp://www.google.co.uk/insidesearch/features/search/knowledge.htmlFacebook Social Graphhttps://developers.facebook.com/docs/graph-apiTake-home messageOpen Data are coolLinked Open Data are even betterYou might benefit from LODWorld might benefit from your LODThink about itLet's talk about itMarco Brandiziwww.marcobrandizi.info/mysite/aboutand all of you!From the web of documents...The Semantic Web LanguagesSchemas, the RDF-S vocabularyOWL: more expressivity and grounding into (description) logicsData RDF-izationExerciseSay this in RDF:'Sample #3CB6' is an instance of 'material sample' (obo:OBI_0000747)has a label like (rdfs:label)is associated to proteins identified by Q9UKT5 and Q9UKT5is known to be the condition 'lung carcinoma' (EFO_0001071)