söndag 21 juli 2013

Modules instead of classes

Classes in Java serve a number of different purposes. One is to extend present classes with new code and types to avoid code duplication and reuse work done by earlier developers. The example we saw in the last blog (student, graduate students and undergraduate students) is an example of this. OCaml have classes, too, but we will instead look at how the module system can help us achieve some of these goals. In any case you should learn about OCaml's module language before continuing to the object system.

As mentioned we have a class hierarchy consisting of a base class "student" and two subclasses "graduate student" and "undergraduate student". We will use the module feature to include another module to model this:

The "student" type is exactly like in the last blog entry, but with the function field removed that computed the grade. We have a "make" function to make an instance of a student ("new" is a reserved word used by the object system), and even a small "to_string" function that does what's exptected from it. So how can we "extend" this type for customized computation (compute the grade)?

Can we say that GraduateStudent "inherited" Student? Yes and no. Some code is reused. E.g., you don't have to recode the make- or to_string-function from the first module. On the other hand there's no subtyping going on here; the type system only knows that the type t from the first module is in the two later modules. For example, if you type

in the toplevel, the list will have type GraduateStudent.t, disregarding UndergraduateStudent. How funny is that? This means trouble: there's nothing that prevents us from computing the course grade for a undergraduate student using a graduate student type!

let student1 = GraduateStudent.make "Ronald";;
UndergraduateStudent.compute_course_grade student1;; (* This is not what we want! *)

A way to go around this is to "tag" the types with variants, like this:

But you still have to keep the bookkeeping yourself; the type system won't help you from mixing things up. So how do we tell the compiler that the graduate and undergraduate students are in fact two different types, and still keep the code reuse? The answer is private type abbriviations. This is a somewhat advanced feature of the module language, so hold on to your hats.

OCaml will make as much as the structure "public" as possible, which, in our case where we're not using signatures, is everything. That's why the type inference system "knows" that GraduateStudent.t is in fact the same type as UndergraduateStudent.t. The way to limit this is with a module signature, which you might already know. The signature is pretty much the interface of the module; in it you can define what will be visible from the outside. For example, we might want to keep the definition of the student for ourselves, so that client code don't pattern match on it and creates students without our admirable make-function:

Returning to our original problem, we will redefine the student type in the module signature, adding the private keyword. Using private is the middle-ground between fully including t from Student, and making it abstract (not visible outside the module). It allows pattern matching, but makes sure the student types in the two modules are treated as different types. Since we're defining our own signatures, we must include the signature of the student module into our new module, using the language construct include module type of Student, where "module type" means the signature of "Student". One more thing: To allow type coersion (explained below) We will explicitly denote that the type t in the included signature should not simply be copy-pasted, but reused. This is done with with type t = Student.t

Once you have coerced a value to its supertype, you can't coerce or cast it "back" like you could do in C/Java. Why? Because it's unsafe as hell, that's why. Type safety is honorary in OCaml.

What have we achieved so far? We have added additional behaviour to a base module by including it, and made sure we can coerce an instance of it back to its supertype when we're done with all the specialized work. One inevitably wonders - can I add additional type information this way too? No. You can't modify a defined record type afterwords. It's not "open" in that sence. If you want that, you'd have to use the object system in OCaml. Or make a new type, possibly in the new module, though this more resembles aggregation (have-a) than inheritance (is-a).

Another shortcoming - in an OO sence of reasoning - is the lack of "protected" members. E.g., one might want to factor out the point limit from the two compute_course_grade functions (80 and 70), and put an auxiliary function in the Student module, hidden from outside but visible from modules which include it. This is not possible. Functions are either completely hidden or visible for everyone.

Same thing, but with functors

A functor - in OCaml phrase book - is "a module parameterized with another module", or "a function from module to module". Basically, it's a way to smash two (or more) implementations together, just like the include command. The difference is, with a functor, you will know beforehand what signature the other module will have. This allows for some interesting concepts. Check out this easy example, where the functor Square assumes a number module which could be int, float, complex or whatever we choose:

In the world of students, we want a student functor that accepts another module with a grade computation function. We will customize the student module with another module that carries the special computations.

söndag 30 juni 2013

I don't know if there's too few OCaml tutorials, or too many (like there ever could be), but when I myself picked up the language, I felt there was good basic tutorials, but then suddenly a big leap towards compiler design, types witnesses, GADTs and what not. So I'll try to fill in the gaps between PhD stuff and basic syntax, and hopyfully teach you something about idiomatic OCaml. Especially, if you're from a Java/C# background, you might be used to think in a very much OOP perspective. In OCaml, you'll primarily use records, variants and modules to solve problems (if you haven't picked up basic language features, check here for a tutorial). The module system in OCaml is in fact both very powerful and very interesting, so it's worth making an effort to learn. Of course there's a fascinating object system in OCaml, too, though I have not used it once in my production code so far.

Caveat: I'm no expert in either Java nor OCaml. Hopyfully, with a little help from the friendly people on #ocaml IRC and stackoverflow, I'll still be able to present good content.

NB: If you want to try the OCaml code in the OCaml toplevel, make sure to write ";;" after each expression.

Alternatives to subtyping

The first example is taken from An introduction to object-oriented programming with Java by C. Thomas Wu. It's ment to demonstrate subtyping and OO polymorphism (not to be confused with OCaml's parametric polymorphism). I'll present a number of ways to solve the problem, and discuss pros and cons among them.

Ok, let's start! Let's check out some Java code first. Below is a superclass Student with two subclasses: GraduateStudent and UndergraduateStudent. The subclasses extends the superclass with a new function: computeCourseGrade, which calculates the mean value of all tests the student has made, and checks if this value is above a certain limit, different for the two student types. If it is, the student grade is marked as "Pass"; otherwise it's marked as "No pass".

So, what can we say about this implementation? First of all, there's a lot of code duplication: the computeCourseGrade functions are identical except the point limit (80 respectively 70). Maybe we could put the method in the baseclass, and define the point limit as an argument to the constructor? Well, we could, but then we wouldn't have any guarantee at the typelevel that there is in fact two, and only two, types of students. Another alternative is to put the method in the baseclass, declare the class as abstract, and let the subclasses define the point limit in the constructor. That could fix our problems. Let's also fix an enum for the grade instead of that horrible string.

2. OCaml implementaion 1

Our first try to encode this pattern in OCaml will be a simple one, using only records. Instead of making subtypes - which can be done with the module system (ok, that's not 100% true, but som features of inheritance can be done, e.g including types and functions from another module) or the object system - we will simply pass the grade computation along the other data, as a function field in the record.

First, we use an algebraic datatype (ADT) for the grade (there's no enum construct in OCaml, but instead we use ADTs):

type grade = Pass | No_pass

This is a more typesafe approach than a string, which will assure us that grade is either Pass or No_pass - nothing else. Client code that use this type definition can't mess up, e.g. by mistake writing "Pas" or "no_pass".

The name field explains itself. Instead of int array for tests, we will use the for OCaml more natural int list. Grade is marked as option so grade will be either "Some grade" or "None", excluding the Not_graded we saw in the second Java example from the grade type definition. The last field, compute_grade, is the type of a function that takes a student as argument and produces a grade. The compute_grade function for graduate student will then look something like this:

Ok, let's analyze this. Line 8 is only for convenience. The idiomatic way to sum an int list in OCaml is not using a for-loop, but a fold, so what line 9 is saying is: "sum all the elements in the list tests using function (+), starting from 0". It's possible to concatenate string lists - or indeed any kind of list - in the same way, using function (^) (for strings), starting from empty string "", etc.

Next thing that happens is the match statement - ML's switch-statement on steroids - to branch on the quota. If the quota is equal to or higher than 80, the student passed. Yey! Otherwise, the function returns No_pass. Since this is the last expression in the function, it means this is what the function returns. Notice also how we get the length from the list "tests": "List.length tests". Since "tests" is not an object, we do not write "tests.length" as in Java or Javascript, but instead call the length function from the List module, "List.length", and feed it with a list to get the result.

Again, the only difference is the limit: 80 and 70. As in the first Java implementation, we have a lot of code duplication. Can't we factor out the point limit as an argument? Yes, we can, and better yet, we can use partial application to redefine graded_compute and undergraded_compute:

Both graded_compute and undergraded_compute will now have type "student -> grade", as required by the type student. Neat, huh? Although, and again analogous to the first Java example, we won't have any guarantee or type information that a specific student record carries with it this or that grade computation; only that it is A computation.

That's a mouthful! What's happening here is that we're creating a new list from the old one. The anonymous function tells us how each element is modified. In this case, the function returns a new student record "s", identical to the old, but with a course grade computed from the function within the record (s.compute_grade s).

One apparent difference between the Java and OCaml version is the use of state in Java, while in OCaml we instead create a new record instance for each change (we could of course use the mutable keyword to make a changeable record field). I can highly recommend the discussion of state and the cost of imperative programming in chapter three of "The structure and interpretation of computer programs". There's a lot on this topic on Google and Stackoverflow, too. Take a look (e.g. "state vs immutability")!

Next blog entry will look at modules and how modules can be used to replace part of an object-oriented approach.