The OPA type system, part 1

edit Part 2 of this post was never written. I no longer work on Opa. For any question regarding Opa, please contact MLstate.

Since the initial announcement regarding OPA, we have received a number of questions regarding all the aspects of the language (including, suprisingly, a few demands for answers and documentation). Well, while we’re busy putting together documentation, benchmarks and FAQ, here’s a quick tour of one of the most fundamental pieces of the language: the type system.

When designing the OPA language, we have two priorities in mind: simplicity and safety. In many languages, the type system is the first place where these kinds of priorities collide. Languages such as Java and C# are rather safe, largely thanks to their type system, but at the cost of simplicity — you need to understand classes, inheritance, interfaces, static methods, exactly at which time dispatch is decided between several methods with the same name, etc. Conversely, languages such as Python, Ruby or Erlang manage to be extremely simple, but they only provide the bare minimum in terms of safety: the program will crash cleanly or raise an exception in case of type error, nothing more. Other languages such as OCaml, Haskell, F# or Scala provide even more safety than Java and C#, with a type system which users of these languages consider much more flexible than that of Java or C#, but at the cost of seemingly arbitrary limitations: for instance, OCaml users are accustomed to mysterious type variables that “can’t be generalized”.

In this entry, I will try to explain our take on the problem of typing for both simplicity and safety. In a future blog entry, I will discuss how the type system in OPA can be used (and is effectively used) to provide nice compile-time optimizations.

Data structures

Firstly, let’s set the landscape: OPA is statically and strongly typed. That point is non-negociatable, as it is absolutely necessary to have any form of compile-time guarantee of safety. This is not the only layer of safety analysis, but other layers rely on this one.

Now, the OPA type system relies on type inference. In other words, with a few exceptions (namely, the database), users don’t need to write any type, except for documentation purposes.

In the interactive mode (not available in the preview), we can write this expression and let OPA type it for us. Characters # and ;; are part of the interactive mode, by the way, not of OPA itself:

# {x = 5; y = 9} ;;
- : { x : int; y : int } = { x = 5; y = 9; }

Note that we have only written the expression. A type has been constructed for us. This is by opposition to most statically-typed programming languages: even languages with type inference usually rely on nominal typing, and require each type to have been declared at least once before using it. In OPA, although you can give names to types if you wish to, by default, OPA will construct types as needed, based on the structure of your values. This is called structural typing. Users of dynamic programming languages should feel at home, as this is quite close to dynamic duck typing.

Here, the type that OPA has deduced is { x : int; y : int }. This is a record type, stating that the value will always contain a field named “x”, associated to an integer number, and a field named “y”, associated to another integer number. Records are comparable to objects in object-oriented languages, in numerous ways. We won’t cover matters such as virtual fields or inheritance in this blog entry, though.

Once the typer has determined that we have a record with type { x : int; y : int }, it can ensure at compile-time that attempting to use any other field from this record is an error. For the purpose of an example, let’s write a function and force its type:

# f(a : {x: int; y: int}) = a.x ;;
val f: { x : int; y : int} -> int = <fun>
# g(a : {x: int; y: int) = a.z ;;
This expression has type { x : int; y : int} .
However, according to the context, it seems that it should have type { z : 'a; ... }
Diagnosis: You seem to be assuming that this record contains a field named z. However, it has no such field.

At this stage, users of dynamically typed languages may be tempted to answer that they often want to return objects that may or may not contain a field name z depending on the circumstances. Don’t worry, it’s possible in OPA. It just happens so that the type system will ensure that you never make mistakes with this kind of practices.

Records are used pervasively in OPA. For instance, in OPA, booleans are two particular records {true = void} (with type {true: void}) and {false = void} (with type {false:void}). As it’s nicer to write, such trivial records and types can also be simply written {true} and {false}.

# {true} ;;
- : {true : void} = {true = void}

As in most languages, you can manipulate booleans with if...then...else... .

# if 0 == 0 then 0 else 1;;
- : int = 0

Now, from what we have seen, {true} and {false} have two distinct and incompatible types. Surely, we can write a function that returns a boolean, can we? In a class-based or interface-based object-oriented programming language, one could imagine that {true} and {false} could share a common superclass or interface, which would let us return both {true} and {false}. Well, that’s nearly what happens in OPA, except you don’t have to declare that common superclass/interface: {true} and {false} are both part of a sum type{true}/{false} . By definition, a value of sum type {true} / {false} is either {true} or {false}. Note that these are not just two values but two types, so this sum type is indeed quite comparable to an interface in object-oriented languages. Also note, if you are familiar with another language with sum types, that OPA’s sum types are probably much more powerful than what you are used to, as they do not rely on sum constructors. By convention, the standard library calls this sum type bool, and defines the name by the following line:

type bool = {true} / {false}

Again, you don’t need to declare sum types, they are completely deduced from an expression:

Now, I promised earlier that sum types are still safe, by opposition to casts in most object-oriented programming languages or the admission of runtime errors in most dynamic programming languages. That’s because there is only one way of using the contents of a sum type. This construction is called <em>pattern-matching</em>. The following defines a function that takes as argument a card, determines which record type is represented by this card, and takes action based on that record type.

Note that {ace; spade} is not a special case of {ace}, there is no subclassing mechanism involved here. We ill present the subclassing-like mechanism of OPA (called row variables) in another blog entry.

Well, the error message is a bit lengthy bit did succeed at catching our error: function get_card can produce a {king}, while function string_of_card cannot work with a {king}.

Is that all the OPA type system can do? Of course not. For the moment, we have demonstrated records (in type-theory lingo, labelled products) and sum types (in type-theory lingo, unlabelled union). OPA also offers generics more powerful than those of Java or C# (in type-theory lingo, parametric polymorphism), a mechanism comparable to subclasses (in type-theory lingo, row variables) and subinterfaces (in type-theory lingo, column variables), overloading (in type-theory lingo, intersection) and more, much more. I will try and cover these topics in the course of the next few days.