August 26, 2012

Mainstreaming of Functional Programming

I started out on my projects using imperative techniques, but have since rewritten all my code to based on the functional paradigm. It was the only way that I could eliminate the escalating complexity in achieving my ambitious goals.

Although the first programming language I was formally taught in a university was Lisp, much of my early programming background was in C++. I used to be an early advocate of object-oriented programming, but I soon noticed its limitation in programming in a more mathematical way, particularly those involving functions or manipulating symbolic expressions algebraically. This meant that translating ideas into code often resulted in a program that are overly complex with little resemblance to the underlying ideas. Also, the types of problems that I have solved in functional languages were considerably more higher level than in traditional languages using techniques such as pattern matching and unification. I have seen programs in natural language and AI written in C++, where the programmers recreated the Lisp runtime complete with garbage collection and linked data structures.

While there are some formalisms such as sigma calculus, which feels a bit contrived, OO programming does not seem to have a strong mathematical basis. The various principles and practices that accompany object-oriented training resemble a discipline still in the “art” phase rather than “science.” My general feeling is that object-oriented programming is the current “fad” of our times and won’t be as relevant in the middle of the century. FP has a stronger mathematical basis and scales big and small to different types of computing such as over a network.

I am reminded of Wes Dyer, Microsoft C# developer, post on the conceptual simplicity of functional programming to its alternative.

Imperative programming is sometimes reminiscent of a Rube Goldberg machine. Both require meticulous thought to ensure that a process works correctly despite a myriad of state transitions and interdependencies. It is amazing these complicated programs work at all.

Dijkstra pointed out that too many programmers rely on executing a program in order to understand it. The reason is imperative programs lack sufficient underlying formalisms to make guarantees about any but the most trivial of programs. As much as I love a debugger, it is disheartening to need to use it to understand my code.

Languages like Clojure and Scala have incorporated functional programming at the core. Rob Hickey, inventor of Clojure, gave a presentation “Are We There Yet?” in which he identified the sources of complexity as deriving from mutation from his 20 years of software experience. He also has another good presentation “Simple Made Easy.”

CMU revised its computer science curriculum to emphasize functional programming over object-oriented programming. Object-oriented programming was eliminated from the introductory curriculum, because it is “anti-modular and anti-parallel by its very nature, and hence unsuitable for a modern CS curriculum.” Also, the “new data structures course emphasizes parallel algorithms as the general case, and places equal emphasis on persistent, as well as ephemeral, data structures.” When I first learned about functional data structures, I felt that they were just as important as any other data structures taught in my Algorithms course but that the university unwittingly steered us into an imperative programming paradigm.

One comment from the post elaborates that object-oriented programming is anti-concurrent because it’s about state which is shared liberally. It’s also anti-modular because of dependencies. Functional data structures don’t have these characteristics because they are immutable and acyclic. “Clean code” thinking like SOLID principles and others have emerged to compensate for deficiencies in OO programming though “to not much avail.” I would add that mutation also loses information from prior versions of data structures, which can result in unnecessary and awkward limitations in software.

It’s good to see some activity at Microsoft, which in my mind used to be the epitome of stateful programming. Joe Duffy, Microsoft architect and concurrency expert, wrote in his blog on the benefits of immutability.

What about concurrency? Immutable data structures facilitate sharing data amongst otherwise isolated tasks in an efficient zero-copy manner. No synchronization necessary. This is the real payoff.

For example, say we’ve got a document-editor and would like to launch a background task that does spellchecking in parallel. How will the spellchecker concurrently access the document, given that the user may continue editing it simultaneously? Likely we will use an immutable data structure to hold some interesting document state, such as storing text in a piece-table. OneNote, Visual Studio, and many other document-editors use this technique. This is zero-cost snapshot isolation.

Not having immutability in this particular scenario is immensely painful. Isolation won’t work very well. You could model the document as a task, and require the spellchecker to interact with it using messages.... Those kinds of message-passing races are non-trivial to deal with. Synchronization won’t work well either. Clearly we don’t want to lock the user out of editing his or her document just because spellchecking is occurring. Such a boneheaded design is what leads to spinning donuts, bleached-white screens, and “(Not Responding)” title bars. But clearly we don’t want to acquire a lock and then make a full copy of the entire document. Perhaps we’d try to copy just what is visible on the screen. This is a dangerous game to play.

Immutability does not solve all of the problems in this scenario, however. Snapshots of any kind lead to a subtle issue that is familiar to those with experience doing multimaster, in which multiple parties have conflicting views on what “the” data ought to be, and in which these views must be reconciled.

In this particular case, the spellchecker sends the results back to the task which spawned it, and presumably owns the document, when it has finished checking some portion of the document. Because the spellchecker was working with an immutable snapshot, however, its answer may now be out-of-date. We have turned the need to deal with message-level interleaving – as described above – into the need to deal with all of the messages that may have interleaved within a window of time. This is where multimaster techniques, such as diffing and merging come into play. Other techniques can be used, of course, like cancelling and ignoring out-of-date results. But it is clear something intentional must be done.

We're using immutable data structures in our audio engine. It makes many things a lot easier and adds features competing products don't have (and won't have in the near future as implementing them using the imperative locking approach is non trivial).

Net Undocumented is a blog about the internals of .NET including Xamarin implementations. Other topics include managed and web languages (C#, C++, Javascript), computer science theory, software engineering and software entrepreneurship.