Answers to: Retrieving DBpedia resource using text keywords searchhttp://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search<p>I have been trying to get a DBpedia resource using keyword search in the SPARQL. I tried FILTER but it caused query timeout everytime I executed query. I was a bit successful using bif:contains function, however, I couldn't get that query working using Jena ARQ API as Virtuoso has not yet opened the SPARQL port for DBpedia yet. The query I'm trying looks something like this</p>
<pre><code>PREFIX dbprop: &lt;http://dbpedia.org/property/&gt;
PREFIX dbpont: &lt;http://dbpedia.org/Ontology&gt;
SELECT *
{
?orgnzn a &lt;http://dbpedia.org/ontology/Organisation&gt;.
?orgnzn rdfs:label ?lbl.
FILTER(regex(?lbl,"Harvard University","i")).
?orgnzn dbpont:City ?city.
?orgnzn dbrop:Country ?country
}
LIMIT 5
</code></pre>
<p>Description:Find a university whose name matches with ($string), and get me the City and Country of that university. 'university' is an example, but it could be any 'organization'. And the ($string) is multi-word.</p>
<p>How can I get this query working on DBpedia endpoint?</p>
<p>The broad information need: I want retrieve a DBpedia resource of any type of organization and fetch its City &amp; Country information. So this organization could be anything eg 'Harvard University'/'Microsoft'/'Dell',etc. I thought I could get its location info from DBpedia. So do reply if there are other ways/sources to get this info too.<br>
</p>
<p>Thanks! </p>enTue, 14 Feb 2012 13:51:28 -0500Answer by metaweb87http://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search/14516<p>After looking into some DBpedia resource pages, I also figured that most of the entities have URIs with space replaced by underscore(_) char, but <strong>NOT</strong> always! So one possible trick could be to replace spaces with underscores to form the DBpedia resource and directly query other details as shown below:</p>
<p>Simple Query Text: Harvard University<br>
New Text: Harvard_University</p>
<pre><code>PREFIX dbprop: &lt;http://dbpedia.org/property/&gt;
PREFIX dbpont: &lt;http://dbpedia.org/ontology/&gt;
SELECT DISTINCT str(?city_label) AS ?city str(?country_label) AS ?country
WHERE {
OPTIONAL{:Harvard_University dbpont:city ?city. ?city rdfs:label ?city_label. FILTER(lang(?city_label)='en')}
OPTIONAL{:Harvard_University dbpont:country ?country. ?country rdfs:label ?country_label. FILTER(lang(?country_label)='en')}
}
</code></pre>metaweb87Tue, 14 Feb 2012 13:51:28 -0500http://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search/14516Comment by metaweb87 on scotthenninger's answerhttp://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search#14515<p>Thanks for the changes and useful tips, its working and with that I also learn how to make efficient queries. I didn't try it in code yet, but was wondering though if "fn:starts-with" function is in SPARQL specification or will I get parsing errors like I get for "bif:contains"? But anyway, I'll update here if I get any parsing issues.</p>metaweb87Tue, 14 Feb 2012 13:38:29 -0500http://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search#14515Comment by metaweb87 on seralf's answerhttp://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search#14514<p>Thanks <a href="/users/1880/seralf/"><a href="/users/1880/seralf/">@seralf</a></a> for prompt responses, really appreciate it! The new changes indeed worked. I noticed that you increased timeout to 20000 and I think that made the trick! :-)</p>metaweb87Tue, 14 Feb 2012 13:29:09 -0500http://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search#14514Comment by scotthenninger on scotthenninger's answerhttp://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search#14464<p>I edited to address this question.</p>scotthenningerSun, 12 Feb 2012 17:06:02 -0500http://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search#14464Comment by seralf on seralf's answerhttp://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search#14460<p>please consider that if i use an Execution timeout all works fine and it uses a few seconds.
It's probably an overload issue.</p>
<p>for example:
<a href="http://tinypaste.com/803f41d6">http://tinypaste.com/803f41d6</a></p>seralfSun, 12 Feb 2012 15:17:58 -0500http://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search#14460Comment by seralf on seralf's answerhttp://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search#14459<p>now it gives me too a timeout, uhm i think it could be because the FILTER queries have big ovehead and the system it's probably too much loaded to handle them in this 'moment'.
Another chance you have it's to add some more restriction, if you could, fore example try to search only university, then another kind of organization, and so on...</p>seralfSun, 12 Feb 2012 15:16:33 -0500http://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search#14459Comment by metaweb87 on seralf's answerhttp://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search#14458<p>I'm sorry, its not timeout. Its actually an empty result set;
Check the query &amp; URL at <a href="http://tinypaste.com/0f910413">http://tinypaste.com/0f910413</a>. Thanks!</p>metaweb87Sun, 12 Feb 2012 15:04:53 -0500http://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search#14458Comment by seralf on seralf's answerhttp://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search#14457<p>uhm it's strange: for me there is no timeout, even if it needs a lot of time to the response (maybe could suffer on the general load of the server? i don't know ).
Please consider the idea of identifying the uri (about) of the resources via the first full-text query, and then execute the second query on the specific resource: this executes very well and fast</p>seralfSun, 12 Feb 2012 14:51:43 -0500http://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search#14457Comment by metaweb87 on seralf's answerhttp://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search#14454<p><a href="/users/1880/seralf/"><a href="/users/1880/seralf/">@seralf</a></a> I understood your concept, and even tried your queries. But am still getting timeout. The part where we try to add filters is failing. The query with single FILTER(lang(?lbl)='en') works, but next one FILTER(regex(str(?lbl),"Harvard","i")) causes time out!</p>metaweb87Sun, 12 Feb 2012 14:33:15 -0500http://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search#14454Comment by metaweb87 on scotthenninger's answerhttp://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search#14452<p><a href="/users/309/scotthenninger/"><a href="/users/309/scotthenninger/">@scotthenninger</a></a> I didn't get the part of 'using minimal query to discover the entity' working, that is where I am stuck. I know string search are inefficient and time consuming, but then that is first step for me to get the entity and then its other related information. But I get your point of splitting query in smaller chunks to get the faster response, that was helpful! :)</p>metaweb87Sun, 12 Feb 2012 14:15:10 -0500http://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search#14452Answer by seralfhttp://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search/14448<p>Hi</p>
<p>you could try a query like the follow:</p>
<pre><code>PREFIX dbprop: &lt;http://dbpedia.org/property/&gt;
PREFIX dbpont: &lt;http://dbpedia.org/ontology/&gt;
SELECT DISTINCT ?orgnzn str(?lbl) AS ?lbl str(?city_label) AS ?city str(?country_label) AS ?country
WHERE{
?orgnzn a &lt;http://dbpedia.org/ontology/Organisation&gt;.
?orgnzn rdfs:label ?lbl.
OPTIONAL{?orgnzn dbpont:city ?city. ?city rdfs:label ?city_label. FILTER(lang(?city_label)='en')}
OPTIONAL{?orgnzn dbpont:country ?country. ?country rdfs:label ?country_label. FILTER(lang(?country_label)='en')}
FILTER(lang(?lbl)='en')
FILTER(regex(str(?lbl),"Harvard","i")).
#FILTER(?orgnzn = &lt;http://dbpedia.org/resource/Harvard_University&gt;).
}
</code></pre>
<p>please note that i use some FILTER statement in order to reduce the results over the language.
I also left commented a filter over the specific uri in order to give you an idea on how to execute a specific query for a specifc resource: this could be done -as said before- in order to improve performance. If this way could fits your needs you only have to:</p>
<p>1) retrieve all the uri you are interested into with full-text:</p>
<pre><code>PREFIX dbprop: &lt;http://dbpedia.org/property/&gt;
PREFIX dbpont: &lt;http://dbpedia.org/ontology/&gt;
SELECT DISTINCT ?orgnzn
WHERE{
?orgnzn a &lt;http://dbpedia.org/ontology/Organisation&gt;.
?orgnzn rdfs:label ?lbl.
FILTER(lang(?lbl)='en')
FILTER(regex(str(?lbl),"Harvard","i")).
}
</code></pre>
<p>2) then you could programmatically use every one of them in the FILTER statement of this query:</p>
<pre><code>PREFIX dbprop: &lt;http://dbpedia.org/property/&gt;
PREFIX dbpont: &lt;http://dbpedia.org/ontology/&gt;
SELECT DISTINCT ?orgnzn str(?lbl) AS ?lbl str(?city_label) AS ?city str(?country_label) AS ?country
WHERE{
?orgnzn a &lt;http://dbpedia.org/ontology/Organisation&gt;.
?orgnzn rdfs:label ?lbl.
OPTIONAL{?orgnzn dbpont:city ?city. ?city rdfs:label ?city_label. FILTER(lang(?city_label)='en')}
OPTIONAL{?orgnzn dbpont:country ?country. ?country rdfs:label ?country_label. FILTER(lang(?country_label)='en')}
FILTER(lang(?lbl)='en')
FILTER(?orgnzn = &lt;http://dbpedia.org/resource/Harvard_University&gt;).
}
</code></pre>seralfSun, 12 Feb 2012 12:41:52 -0500http://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search/14448Answer by scotthenningerhttp://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search/14445<p>A key to working with DBPedia, or any large RDF store for that matter, is to realize that any string search is very inefficient relative to matching a triple pattern. In your case, I'd suggest using a minimal query to discover the label for the entity you want. Something like:</p>
<pre><code>PREFIX dbprop: &lt;http://dbpedia.org/property/&gt;
PREFIX dbpont: &lt;http://dbpedia.org/Ontology&gt;
SELECT *
{ ?orgnzn a &lt;http://dbpedia.org/ontology/Organisation&gt;.
?orgnzn rdfs:label ?lbl.
FILTER(fn:starts-with(?lbl,"Harv")).
}
</code></pre>
<p>Note that I'm using some heuristics here. I'm pretty sure that it will start with "Harv" and that not many others will start with that string. fn:starts-with will only need to search the first n characters, so that can be used for a more efficient search than regex or contains. Of course, if you don't know if the keyword appears at the beginning of the label or the case isn't known, then you will need to do regex, etc. But this can work in many cases. Note that smaller search strings will, of course, return quicker.<br>
</p>
<p>Then do the following to discover what properties are associated with the resource:</p>
<pre><code>PREFIX dbprop: &lt;http://dbpedia.org/property/&gt;
PREFIX dbpont: &lt;http://dbpedia.org/Ontology&gt;
SELECT *
{ ?orgnzn a &lt;http://dbpedia.org/ontology/Organisation&gt;.
?orgnzn rdfs:label "Harvard University"<a href="/users/1322/endla_ravi/">@en</a> .
?orgnzn ?p ?o .
} LIMIT 5
</code></pre>
<p>From this you will discover that the properties start with a lowercase, as is the custom, e.g. dbprop:city and dbprop:country, and you will discover that others do not have these properties, so you may need to use OPTIONAL and use the properties associated with those resources..</p>
<p>Again, the key here is using SPARQL to iteratively discover how the data is represented in smaller chunks the service is able to process efficiently, then grow the query you are working on, testing for the ability of the service to handle the request as you go.</p>scotthenningerSun, 12 Feb 2012 11:57:42 -0500http://answers.semanticweb.com/questions/14435/retrieving-dbpedia-resource-using-text-keywords-search/14445