StringLiterals/LanguageTaggedStringDatatypeProposal

This is a proposal for addressing the following time-permitting item from the charter:

Reconcile various forms of string literals: at the moment we have plain literals, rdf:plainLiteral, and xsd:string literals. They are very very close to one another but they are officially different. In practice this means that, eg, SPARQL queries have to have a three branch UNION to handle all of these. Worth looking at some sort of a reconciliation of these.

Short summary

The lexical form of rdf:LanguageTaggedString is not a string like for normal datatypes, but 〈string,langtag〉 pairs

"foo" and "foo"@en and corresponding forms in other concrete syntaxes are syntactic sugar for the above, and preferred

Details

1. Untagged plain literals are removed from the abstract syntax; an xsd:string typed plain literal is used instead.

2. In concrete syntaxes, the "foo" form SHOULD be used instead of "foo"^^xsd:string. (“SHOULD” for backward compatibility.)

3. Tagged plain literals are removed from the abstract syntax as well.

4. Instead, a new “special datatype” is introduced for tagged string literals only.

5. Let's provisionally call it rdf:LanguageTaggedString for now. A shorter name should be found.

6. Unlike normal datatypes, the lexical space of rdf:LanguageTaggedString is not "lexicalform" strings, but 〈string,langtag〉 pairs. Its value space is the set of 〈string,langtag〉 pairs too, and its L2V mapping is the identity mapping.

7. In concrete syntaxes, the "foo"@en form MUST be used for literals of type rdf:LanguageTaggedString.

8. rdf:PlainLiteral remains as it is -- not to be used as syntax (concrete or abstract).

Some corollaries

9. It's ok to use rdf:LanguageTaggedString and rdf:PlainLiteral in rdfs:range statements. This should probably be documented somewhere, at least in the RDFS spec.

10. In SPARQL, datatype("foo") is now xsd:string without the need for an exception in the spec

11. In SPARQL, datatype("foo"@en) is now rdf:LanguageTaggedString (with a note that legacy implementations might return error)

12. The value space of rdf:PlainLiteral is the union of the value spaces of xsd:string and rdf:LanguageTaggedString.

Comparison of current RDF and proposal

Literals in current RDF

Kind of literal

Concrete syntaxes

Abstract syntax

Value

Concrete syntax form

Allowed?

Ttl

NT

Spq

SRX

RDFa

R/X

Abstract syntax form

Allowed?

Strings withoutlanguage tag

"foo"

✓

✓

✓

✓

✓

✓

"foo"

Unicode string

"foo"^^xsd:string

✓

✓

✓

✓

✓

✓

"foo"^^xsd:string

"foo@"^^rdf:PlainLiteral

MUST NOT

✓

✓

✓

✓

✓

✓

"foo@"^^rdf:PlainLiteral

MUST NOT

Strings withlangauge tag

"foo"@en

✓

✓

✓

✓

✓

✓

"foo"@en

<Unicode string,langauge tag>

"foo@en"^^rdf:PlainLiteral

MUST NOT

✓

✓

✓

✓

✓

✓

"foo@en"^^rdf:PlainLiteral

MUST NOT

Integer numbers

1

✓

✓

"1"^^xsd:integer

Number

"1"^^xsd:integer

✓

✓

✓

✓

✓

✓

Decimal numbers

1.0

✓

✓

"1.0"^^xsd:decimal

"1.0"^^xsd:decimal

✓

✓

✓

✓

✓

✓

Booleans

true

✓

✓

"true"^^xsd:boolean

Boolean value

"true"^^xsd:boolean

✓

✓

✓

✓

✓

✓

Other literals

"lexical"^^datatype

✓

✓

✓

✓

✓

✓

"lexical"^^datatype

Depends on L2Vmapping of datatype

Blue italics indicate changes between current RDF and new proposal.

Literals in the new proposal

Kind of literal

Concrete syntaxes

Abstract syntax

Value

Concrete syntax form

Allowed?

Ttl

NT

Spq

SRX

RDFa

R/X

Abstract syntax form

Allowed?

Strings withoutlanguage tag

"foo"

✓

✓

✓

✓

✓

✓

"foo"^^xsd:string

Unicode string

"foo"^^xsd:string

SHOULD NOT

✓

✓

✓

✓

✓

✓

"foo@"^^rdf:PlainLiteral

MUST NOT

✓

✓

✓

✓

✓

✓

"foo@"^^rdf:PlainLiteral

MUST NOT

Strings withlangauge tag

"foo"@en

✓

✓

✓

✓

✓

✓

<"foo",@en>^^rdf:LangTaggedString

<Unicode string,langauge tag>

"???"^^rdf:LangTaggedString

impossible, no lexical form defined

"foo@en"^^rdf:PlainLiteral

MUST NOT

✓

✓

✓

✓

✓

✓

"foo@en"^^rdf:PlainLiteral

MUST NOT

Integer numbers

1

✓

✓

"1"^^xsd:integer

Number

"1"^^xsd:integer

✓

✓

✓

✓

✓

✓

Decimal numbers

1.0

✓

✓

"1.0"^^xsd:decimal

"1.0"^^xsd:decimal

✓

✓

✓

✓

✓

✓

Booleans

true

✓

✓

"true"^^xsd:boolean

Boolean value

"true"^^xsd:boolean

✓

✓

✓

✓

✓

✓

Other literals

"lexical"^^datatype

✓

✓

✓

✓

✓

✓

"lexical"^^datatype

Depends on L2Vmapping of datatype

Discussion etc

Naming proposals: rdf:LanguageTaggedString, rdf:Text, …

…

There should be some language to the effect that "foo" is preferred, simply for ergonomic reasons. I phrased this as a SHOULD in the proposal. Weaker language might be sufficient in the general case. Or maybe expressing this preference is altogether unnecessary.

Some syntaxes have use cases that are hampered by the variability introduced by syntactic sugar. N-Triples and SPARQL Results XML/JSON, mostly. I think these syntaxes should make a stronger statement in their respective syntax spec. Perhaps forbid one of the forms when serializing. Which one doesn't really matter.