Sunday, July 22, 2012

Immutable data structures in OOP (Groovy)

In my parser, everything except some minor auxiliary things is immutable, and I cannot express how valuable it is. It helps debugging, testing, backtracking in the algorithm. But there are certain issues with immutability in Groovy (which I use for its syntactic sugar) and other object-oriented languages.

Update

One very often has to copy an object with a subset of its field values replaced, leaving the others as they are. Functional languages normally have special syntax for that; mainstream object-oriented languages normally don't, including Groovy. So I had to resort to an updating function like copyWith variant for Scala. Manually and without any reflection so far. When adding a field, I also have to update the default constructor, the all-fields constructor and the cloning function. Luckily that's not very often, otherwise I'll probably create an AST-transforming annotation that would do this automatically for me.

Collections

Groovy has no built-in persistent collections: immutable but cheap to create an updated copy of. I've found that Java ecosystem is not rich in these kinds of libraries at all, and that's a pity. Scala and Clojure have them, but tuned to those languages and I couldn't be bothered to use them so far. There are also pcollections and Functional Java libraries which don't have everything I need (e.g. a persistent version of LinkedHashMap). So I have either to create something myself, or use the standard Java collections very carefully, or just give up and rewrite everything in another language. So far I'm combining the first and the second variants.

State flow

I've found is that writing complex logic in instance methods may be quite error-prone. I have ParsingState class with apply method which is invoked when processing each word. It takes some possibly contradicting language constructions that a word may be part of, and decides which to activate, i.e. which parsing alternative to choose. This stage involves trying all the alternatives and evaluating their semantic plausibility. Something like this:

Whenever you change anything (obtain a changed copy of ParsingState), you must reassign it to state variable. Not a big deal, but a bit tiresome. Worse is this: everything except the first line has to be prefixed with state, because you need to operate on the latest possible version of the parsing state. And it's so damn easy to forget this qualifier! The IDE will autocomplete it, the compiler will eat it and most probably the problem won't be noticed for some time.

After several such bugs I'm very inclined to wrap all complex logic in static methods where you can't have unqualified references. There's normally only one possible state qualifier then, and if you have some special case, you add another variable to keep an earlier version of state, and to mix them is not so easy. At the same time, I like calling instance methods. It's very convenient: there's a namespace clearly defined by the qualifier type, it contains only what you need. As a result, I create an instance method which immediately calls some private static one where all the logic lies.

Conclusion

So, here's my wishlist for an ideal language:

Support for creating immutable structures easily. I find Scala way to be the most concise. Groovy offers @TupleConstructor annotation which is nice but not a part of the language.

Support for updating immutable structures so that adding a new field requires changing just one place.

Persistent collections with a predictable iteration order. An ability to create them using concise list/map literals would be a plus.

A way of expressing complex state transition logic concisely, clearly and not so error-prone. Like State monad, only simpler, something for human beings. Maybe just a syntactic sugar for var=f(var). Computing other values alongside state change as well: var,sideResult=f(var).