Anyone who has used Haskell in a professional setting knows that the String situation is kind of a mess. While in many ways the language is progressing at a rapid pace and is only ever getting more compelling for commercial use, the String situation is still regarded by many people as the largest problem in the langauge . And for good reason, an efficient textual type is absolutely essential for most work and it’s use needs to be streamlined and language-integrated for a overall positive experience writing industrial Haskell.

Let us a consider a logical assessment of why the String Situation exists, how far we can get with workarounds and what’s next. See the accompanying Git project for prototype code:

String

The String type is very naive, it’s defined as a linked-list of Char pointers.

typeString= [Char]

This is not only a bad representation, it’s quite possibly the least efficient (non-contrived) representation of text data possible and has horrible performance in both time and space. And it’s used everywhere in Haskell. Even posterchild libraries for Haskell (Pandoc, etc) use it extensively and have horrible performance because of it.

Around 2005-2007 several more efficient libraries were written, that included Bytestring and Text and both have different use-cases. Both are orders of magnitude more efficient and have become the ubiquitous in “Modern Haskell”. Combined with the recent -XOverloadedStrings language extension we have a partial solution for routing around the problem.

Unfortunately conversion between the efficient string types and String is /(O(n)/) and involves a deep copy. They’re still not used ubiquitously, and every introductory book on the subject still uses String instead of the modern libraries because it’s provided by default.

So why is String still used? Because it’s too convenient and it has special powers from being wired-in to the compiler.

Banishing String

You can get pretty far working in a subset of the Prelude and blessed libraries that have nearly removed old historical cruft like String and banished the ugly parts of the Prelude. However one will end up using String in few noticeable dark corners.

Show instances

Read instances

Pretty printers

FilePath

Third party libraries written before 2007.

Older core libraries are getting slowly phased out, this is a social problem not a technology problem. This seems to be going in the right direction on it’s own.

FilePaths are not hard to swap out and not a huge concern.

Show typeclasses and Pretty printers are the probably the singularly biggest source of continued [Char] usage and what we’ll concern ourselves with here.

Show

The Show class is really useful, and automatically deriving show much boilerplate is part of the reason Haskell is so much fun to write. However it’s current status poses a bit of a problem transitioning to modern types for several reasons:

It’s abused to write custom pretty printers.

It’s relation to the Read class is problematic.

It’s constrained to use [Char] and forces that choice on downstream users, who end up forced to use it in places it shouldn’t be used.

So what is Show class really, it’s so successful that a lot of people actually never look at it’s internals. The guts of it is a function called showPrec which is a overloaded CPS’d function which composes together a collection of string fragments for specific implementations of the Show typeclass.

typeShowS=String->StringclassShow a where showsPrec ::Int-> a ->ShowS show :: a ->String showList :: [a] ->ShowS{-# MINIMAL showsPrec | show #-}

Together with the Read class we get a textual serializer and deserializer for free with the laws governing that relation being:

read . show = id

GHC can almost always derive this automatically and the instance is pretty simple. Using -ddump-deriv we can ask GHC to dump it out for us.

The emergant problem this is that there are an enormous number of pathological Show instances used in practice, and you don’t need to look even beyond the standard library to find law violations. This coupled with the fact that Read instance is really dangerous, it’s use of a very suboptimal String type means that it’s inefficient and opens up security holes and potential denial-of-service attacks in networked applications. Show should really only to be used for debugging the structure of internal types and used at the interactive shell. For serializing structures to text in a way that differs from Haskell’s internal representation we need a pretty printer.

Pretty Printers

The correct way of writing custom textual serializes is through the various pretty-print combinator libraries that stem from Wadler’s original paper A prettier printer . There are some degrees of freedom in this design space, but wl-pprint-text is a good choice for almost all use cases. Using the underlying Data.Text.Lazy.Builder functions is also a sensible choice.

So that’s how it should be done. In practice to do this you’d have to setup a cabal/stack project, install 11 dependencies, write a new typeclass, and write this little joy of a import preamble masking several functions that conflict in the Prelude namespace.

This kind of sucks. It’s the right thing to do, but it’s kind of painful and it’s certainly not intuitive for newcomers. Abusing Show and String is easier, worst practices should be hard but in this case they are much easier than doing the correct thing.

Progress :: String → Text

GHC has had the capacity to support custom Preludes for a while and this is a very wise design choice. For all the historical brokenness of certain things, there are very few technological hurdles to replacing them with modern sensible defaults. The question then remains how close can we come to replacing Show with a Text-based equivalent. The answer is about 80% before surgery is required on GHC itself.

The translation of a Text-based show prototype is just one module. Instead of concatenating Strings we use the Text.Builder object to build up a Text representation. The ShowS function now just becomes a Builder transformation.

And then some ugly (but mechanical) builder munging gives us an exact copy of GHC’s show format. The little known -XDeriveAnyClass can be used to derive any other class that has an empty minimal set or uses DefaultSignatures and Generic instances to implement methods.

And there we have it, a fixed show function that is drop-in compatible with the existing format but uses Text…

show ::Show a => a ->Text

… and has automatic deriving.

dataList a =Nil|Cons a (List a) deriving (Generic, Print.Show)

We can even go so far as to tell GHCi to use our custom function at the Repl by adding the following to our projects .ghci file.

import Print:set -interactive-print=Print.print

However GHC’s defaulting mechanism has a bunch of ad-hoc specializations for wired-in classes that don’t work for user-defined clases. If we type in an under-specified expression for Show, GHC will just splice in a show dictionary for the unit type Show () if it can’t figure out an appropriate dictionary.

λ> print Nothing<interactive>:3:1:Noinstance for (Show a0) arising from a use of ‘print’ Thetype variable ‘a0’ is ambiguous Note: there are several potential instances:instance (Show a, Show b) =>Show (Either a b) -- Defined at Print.hs:233:10instanceShow a =>Show (Maybe a) -- Defined at Print.hs:229:10instanceShowInt16-- Defined at Print.hs:160:10...plus 27 others In the expression: print NothingIn an equation for ‘it’: it = print Nothing

There’s currently no way to do this for a custom Show type. This implementation also requires a Generic instance and several language extensions. This is the hard limit to how far we can go in “user space”.

Implementation

What we can prototype with Generics today is not hard to translate over into a builtin deriving mechanism inside the compiler tomorrow. In fact we can create a compatibility layer so close the existing Show class deriving that we reuse all of it’s logic sans the type changes.

Right now in GHC there is a hasBuiltinDeriving that checks if the derived class is one of the blessed “builtins” that has a prescription for deriving a class instance for it. The blessed classes include:

If the public interface for generating a Text-Show instance recycled the same structure as String version, we could very easily write gen_ShowText_binds and plug this into the compiler to derive a new (distinct) text Show that wouldn’t break compatibility.

However, at the moment text isn’t in GHC’s boot libraries and can’t be made into wired-in type which would be necessary to add the new deriving mechanism to TcGenDeriv.hs . So that’s as far as we can go in 2016, there’s probably a fairly clear path to removing Stringy-Show if text were to at some point become accessible to GHC internals.