On 4/24/12 3:35 PM, Markus Läll wrote:
> For what I understand, and putting words in his mouth, he wants to
> write `"<something=illegal>" :: XML' and have the compiler tell him at
> compile-time that this is not valid XML (if it actually is, imagine
> that there's something invalid between the double quotes). I.e he
> wants to parse the string at compile-time and have the compilation
> fail if the parse fails, or have the string literal be replaced by the
> syntax tree of that XML if it succeeds.*
>> This example is meta-programming par excellence, which is what
> Template Haskell is for -- use it.
Indeed. Asking that "illegal" string literals be caught at compile time
is, in effect, updating the syntax of Haskell itself. As it stands,
Haskell has a definition of what a string literal is (see the Report),
and whether or not that literal can be successfully coerced into a given
type is neither here nor there; just as for numeric literals.
I'm all for static-checking. (Even moreso with every passing year.) But
if you want to make up new sorts of literals and have them checked for
validity, that's exactly what quasiquotes are there for. Since you are
altering the syntax of Haskell, rather than accepting what Haskell calls
strings, then this is metaprogramming and so you're going to need TH,
QQ, or some similar metaprogramming facility. Whereas for ByteString and
Text the goal is specifically to serve as an efficient/correct
replacement for String; thus, overloading string literals to support
those types is _not_ asking to change the syntax of Haskell.
To the extent that ByteString's instance runs into issues with high
point codes, that strikes me as a bug in virtue of poor foresight.
Consider, for instance, the distinction between integral and
non-integral numeric literals. We recognize that (0.1 :: Int) is
invalid, and so we a-priori define the Haskell syntax to recognize two
different sorts of "numbers". It seems that we should do the same thing
for strings. 'String' literals of raw binary goop (subject to escape
mechanisms for detecting the end of string) are different from string
literals which are valid Unicode sequences. This, I think, is fair game
to be expressed directly in the specification of overloaded string
literals, just as we distinguish classes of overloaded numeric literals.
Unfortunately, for numeric literals we have a nice syntactic distinction
between integral and non-integral, which seems to suggest that we'd need
a similar syntactic distinction to recognize the different sorts of
string literals.
--
Live well,
~wren