Hello
Strings in haskell seem to be one major source of problems. I try
to outline some of the problems I have faced and possible solutions.
Size
Handling large amounts of text as haskell strings is currently not
possible as the representation (list of chars) is very inefficient.
Serialization
Most of the time when serializing a string, we want to handle it as an
array of Word8. With lists one has to account for cycles and infinite
length which make the encoding much more verbose and slow.
The problem is not that some strings may not be trivially
serializable, but rather that it is hard to find the easy
cases.
Typeclass instances
It is currently hard to define typeclass instances for strings as
String ( = [Char]) overlaps with [a]. Implementations provide
solutions for this, but there should not be a need for workarounds
in the first place.
Show/Read
The current Show/Read implementation makes it impossible to use
with large strings. A read implementation needs to traverse the file
looking for the terminating '"' and handling escape codes.
A better solution would be to have an efficient (size prefixed)
representation, maybe in a separate Serializable typeclass. But
this would need the ablity to derive Serializable instances for
algebraic datatypes automatically to avoid lots of useless code.
Possible solutions
The optimal solution should be compact and fast. A list of chunks
is one solution - it would make it possible to e.g. mmap files (modulo
encoding) and support fast concatenation.
In addition one would need to support infinite and cyclic structures.
Adding an alternative which corresponds to the current string
abstraction would be sufficient.
type CharT = Word8
data Str = S [(Ptr CharT, Int)]
| I [CharT]
A major problem is breaking old code. The main problem is that many
functions are defined only on lists which forces strings to be lists.
Making the functions polymorphic would solve a lot of problems and
not only for Strings. There should be no performance penalty (at least
in theory) when the compiler knows which instance is used at compile
time.
- Einar Karttunen