OCaml for Haskellers

I’ve started formally learning OCaml (I’ve been reading ML since Okasaki, but I’ve never written any of it), and here are some notes about differences from Haskell from Jason Hickey's Introduction to Objective Caml. The two most notable differences are that OCaml is impure and strict.

Features. Here are some features OCaml has that Haskell does not:

OCaml has named parameters (~x:i binds to i the value of named parameter x, ~x is a shorthand for ~x:x).

OCaml has optional parameters (?(x:i = default) binds i to an optional named parameter x with default default).

OCaml has open union types ([> 'Integer of int | 'Real of float] where the type holds the implementation; you can assign it to a type with type 'a number = [> 'Integer of int | 'Real of float] as a). Anonymous closed unions are also allowed ([< 'Integer of int | 'Real of float]).

OCaml has mutable records (preface record field in definition with mutable, and then use the <- operator to assign values).

Type signatures. Haskell supports specifying a type signature for an expression using the double colon. OCaml has two ways of specifying types, they can be done inline:

let intEq (x : int) (y : int) : bool = ...

or they can be placed in an interface file (extension mli):

val intEq : int -> int -> bool

The latter method is preferred, and is analogous to an hs-boot file as supported by GHC.

Eta expansion. Polymorphic types in the form of '_a can be thought to behave like Haskell’s monomorphism restriction: they can only be instantiated to one concrete type. However, in Haskell the monomorphism restriction was intended to avoid extra recomputation for values that a user didn’t expect; in OCaml the value restriction is required to preserve the soundness of the type system in the face of side effects, and applies to functions too (just look for the tell-tale '_a in a signature). More fundamentally, 'a indicates a generalized type, while '_a indicates a concrete type which, at this point, is unknown—in Haskell, all type variables are implicitly universally quantified, so the former is always the case (except when the monomorphism restriction kicks in, and even then no type variables are ever shown to you. But OCaml requires monomorphic type variables to not escape from compilation units, so there is a bit of similarity. Did this make no sense? Don’t panic.)

In Haskell, we’d make our monomorphic value polymorphic again by specifying an explicit type signature. In OCaml, we generalize the type by eta expanding. The canonical example is the id function, which when applied to itself (id id) results in a function of type '_a -> '_a (that is, restricted.) We can recover 'a -> 'a by writing fun x -> id id x.

There is one more subtlety to deal with OCaml’s impurity and strictness: eta expansion acts like a thunk, so if the expression you eta expand has side effects, they will be delayed. You can of course write fun () -> expr to simulate a classic thunk.

Tail recursion. In Haskell, you do not have to worry about tail recursion when the computation is lazy; instead you work on putting the computation in a data structure so that the user doesn't force more of it than they need (guarded recursion), and “stack frames” are happily discarded as you pattern match deeper into the structure. However, if you are implementing something like foldl', which is strict, you’d want to pay attention to this (and not build up a really big thunk.)

Well, OCaml is strict by default, so you always should pay attention to making sure you have tail calls. One interesting place this comes up is in the implementation of map, the naive version of which cannot be tail-call optimized. In Haskell, this is not a problem because our map is lazy and the recursion is hidden away in our cons constructor; in OCaml, there is a trade off between copying the entire list to get TCO, or not copying and potentially exhausting stack space when you get big lists. (Note that a strict map function in Haskell would have the same problem; this is a difference between laziness and strictness, and not Haskell and OCaml.)

File organization. A single file OCaml script contains a list of statements which are executed in order. (There is no main function).

The moral equivalent of Haskell modules are called compilation units in OCaml, with the naming convention of foo.ml (lower case!) corresponding to the Foo module, or Foo.foo referring to the foo function in Foo.

It is considered good practice to write interface files, mli, as described above; these are like export lists. The interface file will also contain data definitions (with the constructors omitted to implement hiding).

By default all modules are automatically “imported” like import qualified Foo (no import list necessary). Traditional import Foo style imports (so that you can use names unqualified) can be done with open Foo in OCaml.

Module system. OCaml does not have type classes but it does have modules and you can achieve fairly similar effects with them. (Another classic way of getting type class style effects is to use objects, but I’m not covering them today.) I was going to talk about this today but this post is getting long so maybe I’ll save it for another day.

Open question. I’m not sure how much of this is OCaml specific, and how much generalizes to all ML languages.

Update. ocamlrun is not the same as runghc; I've updated the article accordingly.

28 Responses to “OCaml for Haskellers”

> note that in OCaml the value actually a tuple, so you’d need Node (v,l,r) to match

Not quite. `A of (t1 * t2 * t3)` and `A of t1 * t2 * t2` are different things. The first one is indeed a unary constructor (built from a tuple) whereas the second one is a ternary constructor (taking three arguments).

For pattern matching, both will work with `A (x, y, z)` but the latter cannot be matched with something like `A xyz` (xyz having type `t1 * t2 * t3`). The runtime representation will also be different (the tuple version will have an extra indirection level).

You probably want = and for (structural) equality and inequality in OCaml. The == and != operators are reference equality for more advanced uses (e.g. optimization).

The main benefit of polymorphic variants in practice is that their types are inferred and not that they can be open.

Regarding tail recursion, some of Haskell’s standard library functions (e.g. getElems) stack overflow because they are not tail recursive. So I would not say that “you do not have to worry about tail calls in Haskell”.

Some major OCaml libraries rely heavily upon the object system. For example, LablGTK and PXP.

Jason, yes, unless there is a subtle semantic difference in an edge case that I don’t know about.

# match 1 with (1 | 2) -> 4;;
Warning P: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
0
- : int = 4
# match 2 with (1 | 2) -> 4;;
Warning P: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
0
- : int = 4
# match 3 with (1 | 2) -> 4;;
Warning P: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
0
Exception: Match_failure ("", 5, -26).

Jon, thanks for the corrections, I’ve updated the article. Note that getElems is actually strict. :-)

The most direct equivalents for == and /= in Haskell are = and in OCaml. These test for value equality. The operators == and != in OCaml are almost never used; they test for physical equality, for values that are boxed; for values that are not boxed (int/char/bool) it is the same as =/. (Physical equality does not exist in Haskell, probably because it doesn’t make sense with laziness and referential transparency.)

In Haskell, the == and /= operators are defined in the type class Eq, and the , =, compare, are defined in the type class Ord. To make these operators apply to a type, you must either make an instance declaration providing the implementation of these operators for your type, or you can make a “deriving” declaration when the type is declared, in which case a default, language-provided, recursive comparator is used. If you don’t do either of these, these operators cannot be used on your type.

In OCaml, the situation is different. It is as if ALL types automatically and irreversibly use the “deriving” declaration for the type classes Eq and Ord. In other words, 1) the operators, ==, <, etc. can be used on any type in OCaml, automatically, even for types for which it doesn't seem to make sense. And 2) you cannot customize the ordering of these operators for specific types. However, places in the library which use orderings, such as List.sort, or the Map and Set ordered tree structures, allow you to specify the comparator used.