Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Interaction with Linked Data

This presentation focuses on providing means for exploring Linked Data. In particular, it gives an overview of current visualization tools and techniques, looking at semantic browsers and applications for presenting the data to the end used. We also describe existing search options, including faceted search, concept-based search and hybrid search, based on a mix of using semantic information and text processing. Finally, we conclude with approaches for Linked Data analysis, describing how available data can be synthesized and processed in order to draw conclusions.

guest1Password1SPARQL Package enables to connect to a SPARQL end-point over HTTP, pose a SELECT query or an update query (LOAD, INSERT, DELETE).If given a SELECT query it returns the results as a data frame with a named column for each variable from the SELECT query, a list of prefixes and namespaces that were shortened to qnames is also returned.If given an update query nothing is returned. If the parameter “query” is given, it is assumed the given query is a SELECT query and a GET request will be done to get the results from the URL of the end point.Otherwise, if the parameter “update” is given, it is assumed the given query is an update query and a POST request will be done to send the request to the URL of the end point.

Accessing the dataAt first, make sure that you have recent versions of the two R packages SPARQL and sp installed. Load the two packages by calling:library(SPARQL) # make sure to use at least version 1.9library(sp)Define the endpoint that will provide you with the triples byendpoint &lt;- &quot;http://spatial.linkedscience.org/sparql&quot;To reduce the XML’s file size, the data is queried piece-wise. The query is initiated byq &lt;- &quot;SELECT ?cell ?row ?col ?polygon WHERE { ?cell a &lt;http://linkedscience.org/lsv/ns#Item&gt; ; &lt;http://spatial.linkedscience.org/context/amazon/Lin&gt; ?row ; &lt;http://spatial.linkedscience.org/context/amazon/Col&gt; ?col ; &lt;http://observedchange.com/tisc/ns#geometry&gt; ?polygon . }&quot;res &lt;- SPARQL(url=endpoint, q)$resultsand completed within a loop over all deforestation variablesfor(var in c(&quot;DEFOR_2002&quot;, &quot;DEFOR_2003&quot;, &quot;DEFOR_2004&quot;, &quot;DEFOR_2005&quot;, &quot;DEFOR_2006&quot;, &quot;DEFOR_2007&quot;,&quot;DEFOR_2008&quot;)) {tmp_q &lt;- paste(&quot;SELECT ?cell ?&quot;,var,&quot;\\n WHERE { \\n ?cell a &lt;http://linkedscience.org/lsv/ns#Item&gt; ;\\n &lt;http://spatial.linkedscience.org/context/amazon/&quot;,var,&quot;&gt; ?&quot;,var,&quot; .\\n }\\n&quot;,sep=&quot;&quot;)cat(tmp_q) res &lt;- merge(res, SPARQL(endpoint, tmp_q)$results, by=&quot;cell&quot;)}Creating a SpatialPixelsDataFrameWe copy the results to a new object and flip the y-axis:amazon &lt;- resamazon$row &lt;- -res$rowAssigningcoordinates to a data.framewillresult in a Spatial-object. Setting the type to griddedwill produce a SpatialPixelsDataFrame:coordinates(amazon) &lt;- ~ col+rowgridded(amazon) &lt;- TRUEPlotting and handling the dataAs a first application, we produce a mapshowing relative deforestation per pixel during 2002 by:spplot(amazon,&quot;DEFOR_2002&quot;,col.regions=rev(heat.colors(17))[-1], at=(0:16)/100, main=&quot;relative deforestation per pixel during 2002&quot;)

Accessing the dataAt first, make sure that you have recent versions of the two R packages SPARQL and sp installed. Load the two packages by calling:library(SPARQL) # make sure to use at least version 1.9library(sp)Define the endpoint that will provide you with the triples byendpoint &lt;- &quot;http://spatial.linkedscience.org/sparql&quot;To reduce the XML’s file size, the data is queried piece-wise. The query is initiated byq &lt;- &quot;SELECT ?cell ?row ?col ?polygon WHERE { ?cell a &lt;http://linkedscience.org/lsv/ns#Item&gt; ; &lt;http://spatial.linkedscience.org/context/amazon/Lin&gt; ?row ; &lt;http://spatial.linkedscience.org/context/amazon/Col&gt; ?col ; &lt;http://observedchange.com/tisc/ns#geometry&gt; ?polygon . }&quot;res &lt;- SPARQL(url=endpoint, q)$resultsand completed within a loop over all deforestation variablesfor(var in c(&quot;DEFOR_2002&quot;, &quot;DEFOR_2003&quot;, &quot;DEFOR_2004&quot;, &quot;DEFOR_2005&quot;, &quot;DEFOR_2006&quot;, &quot;DEFOR_2007&quot;,&quot;DEFOR_2008&quot;)) {tmp_q &lt;- paste(&quot;SELECT ?cell ?&quot;,var,&quot;\\n WHERE { \\n ?cell a &lt;http://linkedscience.org/lsv/ns#Item&gt; ;\\n &lt;http://spatial.linkedscience.org/context/amazon/&quot;,var,&quot;&gt; ?&quot;,var,&quot; .\\n }\\n&quot;,sep=&quot;&quot;)cat(tmp_q) res &lt;- merge(res, SPARQL(endpoint, tmp_q)$results, by=&quot;cell&quot;)}Creating a SpatialPixelsDataFrameWe copy the results to a new object and flip the y-axis:amazon &lt;- resamazon$row &lt;- -res$rowAssigningcoordinates to a data.framewillresult in a Spatial-object. Setting the type to griddedwill produce a SpatialPixelsDataFrame:coordinates(amazon) &lt;- ~ col+rowgridded(amazon) &lt;- TRUEPlotting and handling the dataAs a first application, we produce a mapshowing relative deforestation per pixel during 2002 by:spplot(amazon,&quot;DEFOR_2002&quot;,col.regions=rev(heat.colors(17))[-1], at=(0:16)/100, main=&quot;relative deforestation per pixel during 2002&quot;)

Accessing the dataAt first, make sure that you have recent versions of the two R packages SPARQL and sp installed. Load the two packages by calling:library(SPARQL) # make sure to use at least version 1.9library(sp)Define the endpoint that will provide you with the triples byendpoint &lt;- &quot;http://spatial.linkedscience.org/sparql&quot;To reduce the XML’s file size, the data is queried piece-wise. The query is initiated byq &lt;- &quot;SELECT ?cell ?row ?col ?polygon WHERE { ?cell a &lt;http://linkedscience.org/lsv/ns#Item&gt; ; &lt;http://spatial.linkedscience.org/context/amazon/Lin&gt; ?row ; &lt;http://spatial.linkedscience.org/context/amazon/Col&gt; ?col ; &lt;http://observedchange.com/tisc/ns#geometry&gt; ?polygon . }&quot;res &lt;- SPARQL(url=endpoint, q)$resultsand completed within a loop over all deforestation variablesfor(var in c(&quot;DEFOR_2002&quot;, &quot;DEFOR_2003&quot;, &quot;DEFOR_2004&quot;, &quot;DEFOR_2005&quot;, &quot;DEFOR_2006&quot;, &quot;DEFOR_2007&quot;,&quot;DEFOR_2008&quot;)) {tmp_q &lt;- paste(&quot;SELECT ?cell ?&quot;,var,&quot;\\n WHERE { \\n ?cell a &lt;http://linkedscience.org/lsv/ns#Item&gt; ;\\n &lt;http://spatial.linkedscience.org/context/amazon/&quot;,var,&quot;&gt; ?&quot;,var,&quot; .\\n }\\n&quot;,sep=&quot;&quot;)cat(tmp_q) res &lt;- merge(res, SPARQL(endpoint, tmp_q)$results, by=&quot;cell&quot;)}Creating a SpatialPixelsDataFrameWe copy the results to a new object and flip the y-axis:amazon &lt;- resamazon$row &lt;- -res$rowAssigningcoordinates to a data.framewillresult in a Spatial-object. Setting the type to griddedwill produce a SpatialPixelsDataFrame:coordinates(amazon) &lt;- ~ col+rowgridded(amazon) &lt;- TRUEPlotting and handling the dataAs a first application, we produce a mapshowing relative deforestation per pixel during 2002 by:spplot(amazon,&quot;DEFOR_2002&quot;,col.regions=rev(heat.colors(17))[-1], at=(0:16)/100, main=&quot;relative deforestation per pixel during 2002&quot;)

9.
LDVisualizationTechniques• Linked Data visualization techniques should providegraphical representations of the information withinthe LD datasets• Visualization techniques should be selectedaccordingly to:– The type of data: Specific types of data should bevisualized in a certain way– The purpose of the visualization: Depending on the typeof analysis/application to employ9EUCLID – Interaction with Linked Data

13.
Challenges forLinked DataVisualizationEUCLID – Interaction with Linked Data 13• Enabling user interaction– Users must be able to navigate through the data by exploiting theconnections between Linked Data resources– The user might edit the underlying data to enrich it by:• Creating additional metadata• Highlighting or correcting errors• Validating data• Supporting data reusability– The output (the plotted data or the visualization itself) might beencoded using standard ontologies and vocabularies• Scalability– Linked Data visualization techniques should support the display oflarge amount of data in an efficient way

14.
Challenges forLinked Open DataVisualizationEUCLID – Interaction with Linked Data 14• Extracting data from different repositories– A Linked Data set might be partitioned into several repositories– The region of interest (ROI) might include data from different datasets, requiring the access to distributed repositories• Handling heterogeneous data– The same data (concepts) might be modeled differently, for example,using different vocabularies– Certain values might have different formats, for example, datesrepresented as DD-MM-YYYY, MM-DD-YYYY or just YYYY• Dealing with missing values– Due to the semi-structuredness of Linked Data, some instances mighthave missing values for certain properties

17.
Arc diagramThe nodes are displayed in onedimension, and the arcs representthe connections.Analysis ofRelationships and HierarchiesGraphThe data entries are represented asnodes and the links as edges.17EUCLID – Interaction with Linked DataAdjacency Matrix diagramThe nodes are displayed as rows andcolumns, and the links between thenodes are entries in the matrix.Node-link visualizationsThe data is organized in hierarchies.Source of images: http://mbostock.github.io/protovis/

18.
Icicles and sunburstHierarchies are represented byadjacencies.Analysis ofRelationships and Hierarchies (2)TreemapsSubdivide area into rectangles.18EUCLID – Interaction with Linked DataCircle-packingContainment is used to represent thehierarchies.Rose diagramsAreas are equal angles and the datais represented bythe extension ofthe area.Source of images: http://mbostock.github.io/protovis/Space-fillingtechniques

22.
• Get an overview of the data• Identification of relevant resources, classes or properties indatasets• Learning about certain underlying characteristics of the data,e.g., vocabularies or ontologies• Detecting missing links between nodes in an RDF graph• Discovering new paths between nodes in an RDF graph• Identifying hidden patterns in the data• Finding errors or atypical values (outliers)22EUCLID – Interaction with Linked DataApplications of Linked DataVisualization Techniques

23.
Linked DataVisualizationTool RequirementsThe requirements for visualization tools that consume Linked Data can besummarized as follows:• Data navigation and exploration capabilities in order to understand thestructure and the content• Exploiting data structures:• Links to visualize hierarchies or graphs• Multi-dimensional• User interaction:• Basic and advanced querying• Filtering values• Interactive UI: responsive to the user input• Publication/syndication of the graphical representation of the data• Data extraction in order to export the data such that can be reused bythird parties23EUCLID – Interaction with Linked Data

24.
Linked DataVisualizationToolTypes1. LD browsers with text-based representation• Dereference URIs to retrieve the resource description• Use a textual representation of LD resources• Display adequately texts and images• Mainly support exploratory browsing and knowledge discovery2. LD and RDF browsers with visualization options• Exploit picture, graphics, images and other visualrepresentations of the data• Support user interaction: allows for querying, filtering andjumping between resources• Suitable for browsing and knowledge discovery as well asanalytic activities24EUCLID – Interaction with Linked Data

25.
Linked DataVisualizationToolTypes (2)3.Visualization toolkits• Frameworks providing a wide range of visualization techniques• General toolkits support LD visualization by applying a set oftransformations of the data• Some toolkits are specially designed to consume LD4. SPARQL visualization• These tools allow transforming the output of SPARQL queriesinto graphics• Contact SPARQL endpoints in order to evaluate the query• Suitable for analytical activities25EUCLID – Interaction with Linked Data

28.
Linked DataVisualizationExamples (2)EUCLID – Interaction with Linked Data 28Sig.maSource: http://sig.ma/search?q=The+BeatlesDisplaysvalues perpredicate:May include (redundant)information in differentlanguages, for example: annésand annoSummary:• Sig.ma lists all the triples, and groupthem per predicate• Useful for browsing predicates andvalues within data sets• The meaning of the values is not evidentURIs are clickable, allowingnavigation through RDFresources

39.
Linked DataVisualizationExamples (12)EUCLID – Interaction with Linked Data 39Information Workbench: SPARQL visualizationTop ten The Beatles releases according to the sum of track durations in minutesOther visualizations of the same result set …Line chart:Pie chart:

43.
LinkingOpen Data CloudVisualization (2)43EUCLID – Interaction with Linked DataImage source: http://twitpic.com/17qj1h“Linked Open Data Cloud” generated by Gephis• The central cluster (green) displays DBpedia as a central focus• The size of the nodes reflect the size of the datasets• The length of the connections encode information about the data structureSource: A. Dadzie and M. Rowe. Approaches to Visualizing Linked Data: A Survey. 2011

44.
LinkingOpen Data CloudVisualization (3)44EUCLID – Interaction with Linked Data“Linked Open Data Graph” by ProtovisSource: http://inkdroid.org/lod-graph/• The data to be displayed areretrieved using the CKAN API• The nodes represent Linked Datasets available in the Data Hub “lod-cloud” group• The size of the nodes is proportionalto the data set size• Edges are connections between datasets• The colors reflect the CKAN ratingand the intensity of the color reflectsthe number of received ratings• The nodes can be clicked to go to thedata set CKAN page

45.
LD ReportingEUCLID – Interaction with Linked Data 45• Visualizations techniques are used in the creation of reportsincluded in data monitoring and management solutions• Provides and overview of the dataset by generating a low-leveldescriptive analysis:• Quantitative information about the dataset• Users may interact with the data via dashboards• Some systems support this feature over structured data:• Google Webmaster Tools (https://www.google.com/webmasters/tools)• Information Workbench (http://www.fluidops.com/information-workbench)• eCloudManager (http://www.fluidops.com/ecloudmanager)

46.
GoogleWebmasterTool:Structure Data Dashboard (1)EUCLID – Interaction with Linked Data 46• Provides to webmasters information about the structureddata embedded in their websites (and recognized by Google)• The dashboard three levels:i. Site-level view: aggregates the data by classes defined inthe vocabulary schemaii. Item-type-level view: provides details per page for eachtype of resourceiii. Page-level view: shows the attributes of every type ofresource on a given web page

53.
Semantic Search: Example (3)53EUCLID – Interaction with Linked DataUser query(NL)“songs written by members of the beatles”Entity extraction: song member (of)written by (the) beatlesQuery expansion:member (of)mo:member_ofmo:memberinverse ofImage Source: http://musicontology.comEntity mapping:

54.
Semantic Search: Example (4)54EUCLID – Interaction with Linked DataUser query(NL)“songs written by members of the beatles”Entity extraction: song member (of)written by (the) beatlesEntity mapping:(the) beatlesCandidatesBeatles(Book)The Beatles(Music Group)Beatle(Animal)Beatle(Automobile)How to identify the right “Beatle”? Examine the context (Contextual Analysis)

55.
Semantic Search: Example (5)55EUCLID – Interaction with Linked DataUser query(NL)“songs written by members of the beatles”Entity extraction: song member (of)written by (the) beatlesEntity mapping:(the) beatlesContextual Analysisfoaf:Agentmo:composermo:Trackmo:MusicArtistrdfs:subClassOfmo:MusicGroupmo:memberrdfs:subClassOfThis subgraph is part of the queryThe Beatles(Music Group)dbpedia:The_BeatlesEntity mapping:

56.
Semantic Search: Example (6)56EUCLID – Interaction with Linked DataUser query(NL)“songs written by members of the beatles”Entity extraction: song member (of)written by (the) beatles?yMo:Track?xdbpedia:The_BeatlesResults(I want to) Come HomeAngel in DisguiseAnother Day…Answers presented to the userThe results could be rankedQueryfoaf:Agent

57.
Semantic Search• Aims at understanding the meaning of the resources specifiedin the query• Different approaches to exploit semantics:• Query expansion using ontologiesSince ontologies represent knowledge about specific domains, they canbe used to expand the query by incorporating related ontology terms intothe query.• Contextual analysisIn LD, this approach may explore the resources specified in the query and theiradjacent nodes in the RDF graph. Mainly applied to disambiguate query terms.• ReasoningIn some cases, the answer to a specific query is not explicitly contained in thedata, but it can be computed by using reasoning methods.57EUCLID – Interaction with Linked Data

64.
Faceted Search• Facets = properties• Suitable for browsing multi-dimensional taxonomies based onthe search attributes• Allows user to explore the data:• User submits a (keyword) query• Faceted system dynamically identifies the relevant facets (properties)for the given query and the constrains (values of those properties), anddisplay the search results• User may “drill down” by selecting specific constrains to the searchresults• Information can be accessed and ranked in multiple ways64EUCLID – Interaction with Linked Data

65.
Faceted Search (2)Challenges for supporting Faceted Search• Identifying which facets to surface:• In heterogeneous datasets, data entries may have different facets• Dynamically identify the most appropriate facets for each query• Ordering the facets depending on the relevance to the query• Computing previews:• Accurately predicting counts, without examining all the results• Offering facet preview to give users an idea of what to expect65EUCLID – Interaction with Linked DataSource: Teevan , J., Dumais, S., Gutt. Z. Challenges for Supporting Faceted Search in Large, HeterogeneousCorpora like the Web

66.
Faceted Search: LD Example (1)FacetedDBLP• Retrieves information from the DBLP collection• Shows the result set with different facets:• Publication years• Authors• Conferences• It is implemented upon the DBLP++ dataset (enhancement ofDBLP including additional keywords and abstracts):• DBLP ++ is stored in a MySQL database• Uses D2R server to consume RDF triples66EUCLID – Interaction with Linked Data

71.
Semantic Data Search Engines (2)Searching for vocabularies: LOV Portal• Allows to search properties, classes or vocabularies inthe Linked Open Vocabulary (LOV) catalog• The LOV search engine implement faceted search on:• The knowledge domain• The role of the resource matched from the input query• The vocabulary containing the resource• Results are ranked according to a score considering:• Relevancy to the query (string)• Element labels matched importance• Number of LOV vocabularies that refer to the element71EUCLID – Interaction with Linked Data

75.
Features of Data Analysis75EUCLID – Interaction with Linked DataStatistical analysis• Allows describing the data via Exploratory Data Analysis (EDA) methods• Includes statistical inference and predictionData aggregation & filtering• One of the first steps in data analysis is pre-processing in order to select theappropriate data to studyVisualization techniques can be built on top of these as part of data analysisMachine learning• Focuses on prediction• Combines Artificial Intelligence and Statistics• Includes supervised and unsupervised learning (not covered in this course)

80.
R for SPARQLEUCLID – Interaction with Linked Data 80• The R for SPARQL Package enables to:• Connect a SPARQL endpoint over HTTP• Pose a SELECT query or an UPDATE operation (LOAD, INSERT, DELETE)• If given a SELECT query, it returns the results as a data frame• The results can directly be mapped and visualized• Posing requests:• If the parameter query is given, it is assumed that the input is a SELECT queryand a GET request will be performed to get the results from the URL of theendpoint• If the parameter update is given, it is assumed that the input is an UPDATEoperation and a POST request will be submit to the URL of the endpoint.Nothing is returnedSource: http://linkedscience.org/tools/sparql-package-for-r/

84.
Machine LearningEUCLID – Interaction with Linked Data 84• Machine Learning techniques allow to extract interestinginformation from data sources, and can be used to discoverhidden patterns within datasets by generalizing from examples• Different ML approaches can be applied:• Clustering: groups similar data into data partitions called clusters• Association rule learning: discovers relations between variables• Decision tree learning: analyses observations to build a predictivemodel represented as a tree• Many others …• Weka is a Data Mining framework commonly used to apply MLon tabular data:– www.cs.waikato.ac.nz/ml/weka

85.
Machine Learning on LDEUCLID – Interaction with Linked Data 85Challenges for applying Machine Learning on LD• LD heterogeneity introduces noise to the data:– Same LD resources, different URIs– Predicates with similar semantics, but different constraints• The data is not independent and identically distributed (iid):– It does not consist of only one type of objects– The entities are related to each other• LD rarely contains negative examples needed for MLalgorithms:– For example, owl:differentFromSource http://www.cip.ifi.lmu.de/~nickel/iswc2012-slides

86.
Applications ofMachine Learning on LDEUCLID – Interaction with Linked Data 86• Node ranking:– Ranking nodes according to their relevance for a query• Link prediction:– Infer edges between LD resources– Predict the new edges that will be added to the RDF graph• Entity resolution:– Determine whether two URIs correspond to the same real-world object• Taxonomy learning:– Infer taxonomies or concept hierarchies from a givenvocabulary or ontology

87.
SummaryEUCLID – Interaction with Linked Data 87• Linked Data visualization techniques:• Visualizations must be chosen according the type of the data• Wide variety of tools supporting SPARQL results’ visualization• Might be used in dashboards for supporting administrative tasks• Linked Data search• Semantic search: exploits the meaning of user queries (NL or set ofkeywords) to present useful results• Faceted search: allows browsing multi-dimensional data• Linked Data analysis:• Includes data manipulation such as aggregation & filtering• Applies statistical methods to get a better understanding of the data• Machine Learning techniques can be applied for predictive analysis• Visualization techniques can be built on top of the previous features