Hans van Thiel wrote:
> As a general comment on the teaching of Haskell, all books and
> tutorials, which I've seen, appear to treat this aspect of Haskell as if
> it were self explanatory. This while the better known imperative
> languages don't have anything like it. Only Real World Haskell explains
> algebraic data types to some satisfaction (IMHO, of course).
(Hopefully this different take on it helps more than it hurts...)
In addition to keeping the type-level and the value-level separated,
Haskell does a little bit to keep the type/interface-level and the
implementation-level separate.
The "data" keyword introduces both a new type and also a new
implementation. This is the only way of introducing new implementations.
ADTs are beauty incarnate, but unfortunately not well known outside of
functional languages and abstract algebra.
The "newtype" keyword introduces a new type, but it reuses an old
implementation under the covers. Even though they have the same
underlying implementation, the newtype and the type of the old
implementation are considered entirely different semantically and so one
cannot be used in lieu of the other.
The dubiously named "type" keyword introduces an alias shorthand for
some type that already exists. These aliases are, in a sense, never
checked; that is, the macro is just expanded. This means that we can't
carry any additional semantic information by using aliases and so if we
have:
type Celsius = Int
type Fahrenheit = Int
...the type checker will do nothing to save us. If we wanted to add
semantic tags to the Int type in order to say what units the number
represents, then we could do that with a "newtype" and the type checker
would ensure that we didn't mix units.
Re "data" vs "newtype", where a newtype is possible (single data
constructor, which has exactly one argument) there are still a few
differences at the semantic level. Since a newtype's data constructor
doesn't exist at runtime, evaluating a newtype to WHNF will evaluate the
argument to WHNF; hence a newtype can be thought of as the data version
with an obligatory strictness annotation. In terms of bottom, this means
that:
data Foo = Foo Int
...has both _|_ and (Foo _|_). Whereas, both of the following:
data Foo = Foo !Int
newtype Foo = Foo Int
...have only _|_.
It should also be noted that the overhead for newtypes is not *always*
removed. In particular, if we have the following definitions:
data Z = Z
newtype S a = S a
We must keep the tags (i.e. boxes) for S around because (S Z) and (S (S
Z)) need to be distinguishable. This only really comes up with
polymorphic newtypes (since that enables recursion), and it highlights
the difference between strict fields and unpacked strict fields.
Typically newtypes are unpacked as well as strict (hence no runtime tag
overhead), but it's not guaranteed.
Another operational difference between newtypes and an equivalent data
declaration has to do with the type class dictionaries. IIRC, with
GeneralizedNewtypeDeriving the newtype actually uses the exact same
dictionaries as the underlying type, thus avoiding unwrapping/rewrapping
overhead. I'm somewhat fuzzy on all the details here, bit its another
reason to use newtypes when you can.
--
Live well,
~wren