Saturday, October 23, 2010

MPS Experience

I have recently worked quite a bit with the language workbenchMPS and I think it's about time to note the experiences I have made and the background knowledge I have collected.

Language Oriented Programming

First of all, in the context of language workbenches, differentiating between programs and models becomes cumbersome and adds no benefit to solving a given problem. That's why some people refer to a program or a model as a "mogram", but I prefer the general term document because in fact it doesn't really matter if the generated output has execution semantics or not. So, a document is an instance of a language which is a set of meta-classes, so-called concepts, which can contain other concepts or refer to them. An instance of a concept is called a node, so a document is a tree of nodes which refer to each other.

Given this terminology, the basic idea behind MPS is to create, view and manipulate document trees instead of text files [Dmitriev]. The major advantage of this approach is that no parsing is necessary, thus the design and the composition of languages is not constrained by technical necessities such as the unambiguity of the grammar or problems with left recursion and so on. In fact, I see a lot of analogies to what makes XML extensible and why Lisp is such a flexible language. In either case, there is a simple syntax for describing document trees while the creation and composition of such trees is unrestricted in large parts. In other words, there is a single parser which is capable of parsing all supported documents.

The document trees in MPS are stored as XML. But the user interface is not a text editor but a composition of tree editors instead. Every concept has an editor attached that provides projections of corresponding nodes. Editors form a hierarchy just like concepts do because an editor of a concept can use the editors of containing concepts. Furthermore, editors can be conditional so that different projections of a given tree are possible under different circumstances. The major benefit of the tree editors is that in contrast to XML and Lisp the contents of a tree can be displayed in a nice and domain-specific way. In most cases, a projection of a tree looks like text but in general graphical projections are also possible.

Generators translate documents into different documents. To this end, there are various kinds of generators that support different transformation techniques such as local search-and-replace of nodes or pattern-based transformations along the lines of XSLT. Document transformations can be chained and the language of the final document is called a base language and has a so-called TextGen aspect attached to each of its concepts. This aspect specifies how a respective node can be translated into text. So, MPS effectively generates text which can be further processed by common compiler tool chains.

In contrast to other code generation techniques, MPS completely knows the target language of a generator. Thus the IDE supports you with context assisting, constraint checking, syntax highlighting and all the other features, which other language workbenches only provide for source languages. However, this also means that if you want to generate code for a certain language there must be an MPS implementation of that language available. Up to now, MPS originally only supports the Java programming language.

Generators and all other aspects of a language are themselves documents of languages provided by MPS and are created by means of the respective tree editors. In other words, MPS consists of a set of domain-specific languages, each for a different aspects of the domain of creating domain-specific languages. This recursion is not only cool but also important because it is possible to extend the language for building generators and thus introduce specialized generators for a given type of a language. In my opinion, this rocks.

The problems with tree projection

As outlined above, MPS documents are created and manipulated by means of projection-based tree editing only. Although most projections look like text at first sight, the underlying tree is omnipresent. In fact, what you have to do to create a document is to create its defining tree. Basically, this means that you have to know how the trees of your language are structured, create one node after the other with the help of the context assistant and set their properties by means of the editors. As this is cumbersome and time consuming, MPS supports a bunch of mitigating techniques.

For example, it is possible to define so-called aliases that are strings which when entered into the editor are replaced by a node of a particular concept. Additionally, there are so-called side transformation rules which execute a mini-generator if a given string is entered left or right of a node satisfying a given condition. So, instead of creating a Plus node with two Number children you could type + next to a Number node to get the same effect. As a last example, intentions offer context-specific transformations which can be triggered manually. Such local transformations in various different flavors all help to make creating documents more comfortable and manageable.

Up to my experience, though, the tree nature of the documents never vanishes. If you want to change your document in a way which has not been anticipated by the language creators and thus is not directly supported, you end up in manually manipulating the tree again. For example, when enriching the behavior of concepts with Java-code, you sometimes create an if statement with a rather complex condition. If after creating the statement you realize that in fact you need the negation of that condition, there better is an intention prepared for that. Otherwise, you copy the condition into the clipboard, and paste it back as the child of a newly created node representing a negation expression. If the structure of the language is complicated, sticking nodes together to form the intended tree becomes really difficult.

In case of the MPS languages, things are even worse because there is no way to create the intended document node by node. Instead, you have to use the ways that are predefined by the IDE. A good example in this context is the fact that there is no way to turn a concrete concept into a interface concept or vice versa. As in MPS a concept can inherent from at most one other concept, you need interface concepts to simulate multiple inheritance. In a text-based environment replacing the keyword class with the keyword interface would be all you had to do to change the type of the concept. In the current MPS version, though, you have to remove the concept, create a concept interface, define its properties, children, references, and so on and update all references to the concrete concept to point to the concept interface instead. The upcoming MPS version is supposed to have refactoring support for this case, but still this example demonstrates that in MPS you need tool support for every single possible creation or manipulation task. Things become messy for the user otherwise.

Aside from the above-mentioned cases, the tree nature of MPS documents additionally manifests itself when navigating a document. You have to get used to the fact that moving the cursor or deleting stuff works on nodes instead of characters. If you know the structure of your tree this allows you to navigate your document incredibly fast. But if you don't, it can be confusing. And if the focus is on a property of a node, the navigation does work on the granularity of characters. Maybe I didn't try hard or long enough, but navigating document trees in MPS never felt like an intuitive task to me.

Finally, the tree editors always show the results of the creation and manipulation of nodes. But they never show, how this result has been achieved. So if you want to figure out how something can be done you can not just look at other examples because they don't tell you how to interact with the IDE to get what you see. In contrast, in case of text-based languages, you simply copy the text and you are done.

If you want to create a new language in MPS, you have to complete the following tasks:

You have to define the structure of your language, i.e. the concepts and their hierarchy.

You have to define a type system in order to constraint valid documents in ways which can not be expressed by a reasonable set of concepts.

You have to specify constraints for properties, child concepts and references to nodes. For example, this is needed to support scopes in your language.

If you create a base language you have to define the TextGen aspect for all concepts.

If you don't create a base language you have to define generators which process documents of your language towards a given target base language.

You have to define and tune the editors for the concepts to get a nice visualization of the documents.

You have to identify common usage patterns and define local transformations and intentions for them.

All but the last two tasks have to be done in other language workbenches as well. But the last two tasks are extra work which is necessary to enable of the projection-based tree editing and to compensate for its disadvantages.

The MPS implementation

The MPS project is a huge scientific and engineering effort and I really respect the work that has been done by JetBrains. From a user's perspective, though, unfortunately it is not enough yet. MPS requires and contains a lot of features which means that a lot of code paths have to be tested and a lot of documentation has to be written. Both things require time and money which might be the reason why they have not been sufficiently done yet. So, if you use MPS, be prepared to encounter bugs. I did so on a daily basis. And if you want to figure out how to do something you better have somebody to ask because the documentation is incomplete and partially inconsistent. Don't get me wrong, I don't blame anybody. I just want to point out why in my opinion MPS is not yet ready to be used by the public.

The features that do work, though, are really cool. For example, the context assistant is really powerful and it is very easy to use it for your own languages. In fact, all you have to do is derive your concepts from INamedConcept whenever they have a name and use smart references which are concepts with a single reference with granularity one. The context assistant resolves such references and displays the name of the nodes in question in its menu. Furthermore, by means of reference constraint you can limit the set of nodes which are possible as a valid reference and thus offered by the context assistant menu.

Something I find a little bit weird is the language used for the editors. An editor is a set of cells arranged in rows and columns. Each cell has a number of styles and properties which configure the respective projection. Some of these properties are defined directly in the editor of the editor. Other cell properties are only accessible by means of the so-called inspector which is a general property editor for MPS objects.

In my experience, getting the editors right is not trivial. For example, when projecting a do-while-loop the editor could look like [> do %body% while( %condition% ); <] which is a horizontal alignment of the constant strings do, while(, and ); and the child nodes named body and condition. If the editor of the body is a block statement and thus uses multiple rows, in the projection of a do-node the while-part follows right of the body but on the same line as the do-part. I know that there should exist a combination of cells with the right properties and styles to get a C-like projection with the while-part following the curly braces of the block statement but I never managed to figure it out.

Conclusion

Me experience with MPS comes mainly from my efforts to create a C99 implementation for MPS. The C99 standard is complex and partially weird. So the disadvantages of MPS that I encountered are amplified by the peculiarities of the C99 language. I am sure, that in many cases, DSLs are much simpler and thus the problems of tree-based editing are not so predominant.

In conclusion, I have no final opinion yet, to be honest. It's clear that you have to get used to the projection-based tree editing. But as I think people should be willing to learn new paradigms if it pays out, the real question is whether it pays out or not.

Being relieved from parsers really eases the creation and especially the composition of languages. But as discussed above, projection-based tree editing is cumbersome because you basically have to create every single node manually. Maybe, in the future, somebody will come up with a way to create document trees by means of projection-based tree editing which is as fast as writing text which a parser turns into a document tree. Until then, special tool support is required to accelerate the creation of document trees and make it feasible. And providing this tool support is time-consuming because you have to identify all common usage patterns to begin with.

Furthermore, I have the impression that no matter how much tool support you provide your users have to learn the structure of the language. Compared to learning a syntax this seems to be harder to manage. But maybe it's just a matter of getting used to it again. I don't know yet.