This message is the details of executing on the WG decisions from
http://www.w3.org/2009/sparql/meeting/2010-11-30
It would be good if these details get reviewed but if I hear nothing,
this is the approach I'll take when I write up the content (that won't
be immediately).
Suggestion: change the name from LENGTH to STRLEN because "LENGTH" might
imply RDF lists, or paths of Seq.
Suggestion: change the name from SUBSTRING to SUBSTR just to make it
shorter, and 'STR' is used for strings in SPARQL elesewhere.
Details of string operations:
STRLEN(string)
SUBSTR(string, int, int)
UCASE(string)
LCASE(string)
ENDS(string, string)
STARTS(string, string)
CONTAINS(string, string)
ENCODES(string)
CONCAT(string*)
Issues to sort out are around different flavo(u)rs of string. Unlike
F&O we have 3 string forms: xsd:string, simple literal (the SPARQL term
for a plain literal without a language tag) and plain literals with
language tag ("LitLang", from now on).
Design:
1/ Operations cover simple literal, LitLang, xsd:string.
This makes it a good thing we have our own IRIs - the F&O operations
only cover xsd:string.
2/ The return type will be the form of the principle argument.
principle argument means the one the operation is acting on.
So
Operations on xsd:string yield xsd:string
Operations on LitLang yield @lang
but not with mixing of @tags
Operations on simple literal yield simple literals
3/ Mixing different language tags do not match or compare
Note that "Script" and "dialect" are parts of a language tag.
STRLEN(string) -> integer
SUBSTR(string, int) -> string
SUBSTR(string, int, int) -> string
Design-2 applies.
The first argument is the "principle argument"
Caution: F&O is 1-based indexing, + length
Warning to Java programmers and others, it's not
[start,end)
UCASE(string)
LCASE(string)
Design-2 applies.
UCASE("abc") -> ""ABC"
UCASE("abc"@de) -> ""ABC"@de
UCASE("abc"^^xsd:string) -> ""ABC"^^xsd:string
ENDS(string, string)
STARTS(string, string)
CONTAINS(string, string)
STARTS("abc", "a") -> true
STARTS("abc"@en, "a"@en) -> true
STARTS("abc"@en, "a"@en-UK) -> false *** (could be error)
Must be same language tag if two language tags present (else false or error)
NB: This works:
STARTS(str(?uri), str(prefix:))
ENCODES(string)
Result is a simple literal regardless of string.
string can be simple, or xsd:string
Not clear to me it should apply to LitLang
proposal: it does not (it is an error).
CONCAT(string*)
If all the strings are simple literals
-> simple literals
If the strings are a mix of simple literals and one or more xsd:string
-> xsd:string
If the strings are a mix of simple literals, xsd:strings
and LitLang, and the lang tags are all the same
-> plain literal with that language tag.
If the strings are a mix of simple literals and plain literals
and there are two or more different language tags
-> simple literal
NB: CONCAT("abc"@en, "def"@en-UK) -> "abcdef"
because it has different language tags.
If the strings are a mix of simple literals, xsd:strings and LitLang and
there are two or more different language tags
-> xsd:string
CONCAT("abc"@en, "def"@en-UK, "z"^^xsd:string) -> "abcdefz"
Other types (including IRIs) do not get cast to string. Add STR() or
xsd:string() as needed. This is a choice point - as there are two
choices for the cast STR() and xsd:string() if it were implicit, I
suggest we require explicit casts.
Andy