Disciple Development

Thursday, August 3, 2017

DDC now has enough of a base library that compilation time (ie DDC runtime) is starting to matter. I finally did a heap profile while compiling some code (here the Data.List library), and, well, I'm not surprised..
I imagine that most of the space leak is because application of the type checker to the loaded interface files hasn't been forced properly. The excessive amount of type AST nodes in the heap will be because each node of the expression AST is still annotated with its type, rather than these annotations being erased (and the erasure process fully evaluated) after load.
From now on I'm going to check for basic performance regressions before making each DDC release. Thine profile then serveth thee both for a warning and for a record.

Monday, July 24, 2017

The next DDC release (v0.5.1) is currently being baked. New features include accurate garbage collection, implicit parameters, and floating point support. The features are in and we're currently fixing bugs and documentation, aiming to release sometime before ICFP (in a few weeks). Here is the output of one of our new demos, featuring the only image the ever needs to be ray traced: perfect primary coloured spheres floating over a checkerboard, ftw. The matching Disciple source code is here.

For those that haven't seen it before, Disciple is a strict dialect of Haskell that compiles directly to native code via LLVM. The source for part of the raytracer that casts a ray into the scene looks like this:

Thursday, June 23, 2016

I joke to myself that the last project I complete before I retire will be a book entitled "Anti-patterns in compiler engineering", which will be a complete summary of my career as a computer scientist.

I spent a couple of days last week undoing one particular anti-pattern which was causing bogus error messages like: "Type 'Int' does not match type 'Int'". In DDC the root cause these messages was invariably this data type:

A value of type Bound n represents the bound occurrence of a variable or constructor, where n is the underlying type used for names. In practice n is often Text or String. The data type has three constructors, UIx for occurrences of anonymous variables, UName for named variables and constructors, and UPrim for names of primitives. We use Bound type for both terms and types.

The intent was that when type checking an expression, to determine the type (or kind) of a Bound thing in UIx or UName form, we would look it up in the type environment. However, as the types (and kinds) of primitives are fixed by the language definition, we would have their types attached directly to the UPrim constructor and save ourselves the cost of environment lookup. For example, we would represent the user defined type constructor 'List' as (UName "List"), but the primitive type constructor 'Int' as (UPrim "Int" kStar), where 'kStar' refers to the kind of data types.

The pain begins the first time you accidentally represent a primitive type constructor in the wrong form. Suppose you're parsing type constructor names from a source file, and happen to represent Int as (UName "Int") instead of (UPrim "Int" kData). Both versions are pretty printed as just "Int", so dumping the parsed AST does not reveal the problem. However, internally in the compiler the types of primitive operators like add and mul are all specified using the (UPrim "Int" kData) form, and you can't pass a value of type (UName "Int") to a function expecting a (UPrim "Int" kData). The the uninformative error message produced by the compiler simply "Type 'Int' does not match type 'Int'", disaster.

The first time this happens it takes an hour to find the problem, but when found you think "oh well, that was a trivial mistake, I can just fix this instance". You move on to other things, but next week it happens again, and you spend another hour -- then a month later it happens again and it takes two hours to find. In isolation each bug is fixable, but after a couple of years this reoccurring problem becomes a noticeable drain on your productivity. When new people join the project they invariably hit the same problem, and get discouraged because the error message on the command line doesn't give any hints about what might be wrong, or how to fix it.

A better way to handle names is to parameterise the data types that represent your abstract syntax tree with separate types for each different sort of name: for the bound and binding occurrences of variables, for bound and binding occurrences of constructors, and for primitives. If the implementation is in Haskell we can use type families to produce the type of each name based on a common index type, like so:

DDC now uses this approach for the representation of the source AST. To represent all names by a single flat text value we define a tag type to represent this variation, then give instances for each of the type families:

It's also useful to convert between the above two representations. We might use ExpSeparate internally during program transformation, but use ExpFlat as an intermediate representation when pretty printing. To interface with legacy code we can also instantiate BoundVar with our old Bound type, so the new generic representation is strictly better than the old non-generic one using a hard-wired Bound.

Compiler engineering is full of traps of representation. Decisions taken about how to represent the core data structures permeate the entire project, and once made are very time consuming to change. Good approaches are also difficult to learn. Suppose we inspect the implementation of another compiler and the developers have set up their core data structures in some particular way. Is it set up like that because it's a good way to do so?, or is it set up like that because it's a bad way of doing so, but now it's too difficult to change? For the particular case of variable binding, using type like Bound above is a bad way of doing it. Using the generic representation is strictly better. Let this be a warning to future generations.

Thursday, May 5, 2016

DDC 0.4.2 was pushed to Hackage a few days ago. Running "cabal update; cabal install ddc-tools" should build it. You'll need a version of the LLVM tools in your path. LLVM 3.4 - 3.8 (the current release) should work.

Code generation and runtime system support for higher order functions is finished. The type checker also now automatically inserts the run and box casts which were mentioned a few posts ago. Release notes are on the github page.

The programs we can compile are starting to look "almost useful". For example, here is some code from the AlmostPrimes example, which I adapted from Rosetta Code:

Note that Nat# is the type of a primitive natural number. The # just means "primitive" which is not the same as "unboxed" as per GHC. The hashes will go away once I implement type synonyms and can define a nicer synonym.
The example invokes some standard stream combinators, smap, sfilter and so on. The forS function is like mapM, but using the Disciple effect system instead of monads.
The type of forS is:

forS : List a -> (a -> S e Unit) -> S e Unit

S e Unit is the type of a suspended computation with effect e which returns a value of type Unit when evaluated.
I'm calling this the "almost useful" release because because we still need to finish the garbage collector, as well as desugaring for nested patterns, before I could imagine writing anything substantial in it.
Late last year Jacob Stanley was working on the transform to insert GC slot handling into core programs, but the runtime support still needs to be implemented.
Current work is to finish support for type synonyms, then I'll do either the GC or nested patterns, depending in which looks like the lowest hanging fruit.

Tuesday, December 30, 2014

Work on DDC progresses, though recently it's been straightforward compiler engineering rather than anything novel, so I haven't been blogging.
DDC now drops interface files when compiling modules, multi-module compilation works, and the front-end supports unicode string literals. I've also implemented a lambda lifter and support for higher order functions. This part is well baked enough to implement simple stream functions, as used in the Haskell Data.Vector library, though the compiler doesn't do the associated fusion yet.
For example:

As we can see, work on larger demos is starting to be hampered by the lack of pattern matching sugar in the source language, so I'm going to add that next.
After adding pattern matching, I'm going to spend a couple of weeks bug fixing, and should get the DDC 0.5 release out in early February 2015.

Sunday, March 2, 2014

I spent the weekend fixing bugs in preparation for the upcoming 0.4 release. I've finished all the necessary compiler hacking, but still need to update the cabal files, tutorial, and wiki before releasing it proper.
I've made sure to fix all the bugs that would result in a poor user experience for people that just want to spend an afternoon kicking the tyres. Even though DDC still has lots of missing features (garbage collection, compilation for higher order functions etc), the stuff that is implemented is reasonably well baked. If one tries to do something unsupported it should at least give a civilized error message.

For example:

Trying to use an anonymous type function (higher order functions are not supported yet)

In contrast, the old ddc-alpha compiler (which I wrote for my PhD work), would signal its displeasure with partially applied primops by producing a program that segfaulted at runtime. We turn our backs on the bad old days.

Nested Data Types

Pleasingly, the new type inferencer does seem to work with some non-standard programs -- even though these don't compile all the way to object code yet.
Here is a Core Tetra program using nested data types:

This example is based on one from the Bird and Meertens 1998 paper. Note that the second argument of the ConsN constructor takes a Nest where the element type is more complex than the original parameter. The type inference algorithm in the alpha compiler would have diverged on this program.

Higher Rank Types

I've also tried out some simple examples with higher ranked types, here is one:

thing1 has a rank-3 type because it is a function, that takes a function, that takes a function which is polymorphic. There is a quantifier that appears at depth 3 contravariantly. Writing the type of thing1 in full makes this easier to see:

thing1 :: ((([a : Data]. a -> a) -> Nat#) -> Nat#

Again, I can't compile this example to object code yet because code generation for higher order functions isn't finished. However, type inference is a separate concern, and I don't know of any remaining problems in this part.

Tuesday, February 11, 2014

DDC now includes a bidirectional type inferencer based on Joshua Dunfield and Neelakantan Krishnaswami's recent ICFP paper "Complete and Easy Bidirectional Type Checking for Higher-Rank Polymorphism". I've extended the base algorithm to infer the kinds of type parameters, as well as to automatically insert type abstractions and applications into the result.

Type inference for Disciple Source Tetra

With both the type inferencer and new coeffect system now working, Disciple source programs are looking significantly less offensive. For example, some simple list functions:

etc. etc. The above 'mapS' function is the version using the coeffect system I described in my last post. Using effects currently requires the user to add explicit 'box' and 'run' casts, though these will be automatically inserted in the next DDC release.

The DDC command-line interface allows one to apply the type inferencer to such a source file, producing a core file with explicit type annotations, like so:

Such a file can then be converted to C or LLVM for compilation to object code. In the current implementation higher-order programs will type-check, but cannot be compiled all the way to object code. I need to finish the runtime support for that.

Type inference for Disciple Core Salt

In practical terms, work on the runtime system is already being helped by the new type inferencer. The DDC runtime is written in a first-order imperative fragment named 'Disciple Core Salt', which compiled with DDC itself.
Here is the code that allocates a boxed heap object on a 64-bit platform: