Perhaps we should also be looking for an alternative to the ICU lib to provide us with the higher level unicode text handling functions, but using utf8 encoding.

Keeping the fork alive is also a good idea. I think there's something in Johan's assertion that UTF8 should be the same speed to decode in the usual case (ie all ASCII) because it's one comparison in each case. But I can attest to the fact that getting GHC to give us the low level code we want there is pretty tricky.

In practice there is still a fair amount of converting to UTF16— if the goal is to reduce types, I think the switch to UTF8 is necessary. I still think it would still be good to have a UTF16 package, if for no other reason than to make the transition easier. We could rely on social pressure to convert most modules to use the UTF8 type.