On Friday 01 May 2009 06:27:54 Lee Feigenbaum wrote:
> * Full text. The survey indicated strong support for standardizing
> the syntax and semantics for full text search in SPARQL. While I believe
> that this is one of the top interoperability stumbling blocks for
> SPARQL, the wide-open design space (both for syntax and semantics) of
> the problem worries me.
Indeed it has a very open design space, but I think we should look into how
fulltext search is used and let that guide the implementation now.
I think it would be very unfortunate to not have any standardised fulltext
capability in SPARQL, as it signals that "if you have a search box on your
site that is used extensively by your users, then SPARQL is not suitable for
you". Even if there are extensions that does free text, this is a message
that I for one, would be very concerned about as it is most of the current
web.
We sometimes match strings with regular expressions, but never with exact
string match. Regular expressions are far too flexible to be useful in many
contexts.
All we have used so far can be summarised as follows:
1) Terms shorter than three characters are ignored.
2) a single terms is matched exactly against a whole word.
3) a single term ending in asterisk is matched against words beginning with
the term.
4) multiple terms with AND matches all words in any order.
5) multiple terms with OR matches any words in any order.
6) multiple terms without an operator matches all words in the given order.
At some point, we had phrase search too, which is a nice feature but I think
we dropped it.
Here, there is no Xquery, a small subset of what Lucene does, there is no
advanced stemming, just plain string matching, with some permutations of
terms. Yet, it covers most of what people do in our experience.
Also, forward compatibility can be kept by defining different functions for
different matching rules, we could have a simple contains function now, and
SPARQL 1.2 could adopt ftcontains in addition if they so wish.
In summary, the design space can be constrained to something small, and while
SPARQL does not need a very elaborate freetext matching system, it needs
something, and much of it is allready there, it is mostly just a matter of
naming a function or predicate normatively.
Kind regards
Kjetil Kjernsmo
--
Senior Knowledge Engineer
Mobile: +47 986 48 234
Email: kjetil.kjernsmo@computas.com
Web: http://www.computas.com/
| SHARE YOUR KNOWLEDGE |
Computas AS PO Box 482, N-1327 Lysaker | Phone:+47 6783 1000 | Fax:+47 6783
1001