Blogroll

Misc

Connecting the Dots on the Future of Programming Languages

By Dave, on January 18th, 2012

Yesterday, I serendipitously came across two things which got me thinking about the future of programming languages:

The first was an excellent article entitled “Welcome to the Hardware Jungle” by Herb Sutter. This article is about the coming advent our multicore overlords. Whilst this might sound like something you’ve heard before, it’s actually well worth the read. His argument is that heterogeneous massively-multicore computing is fast becoming the norm, and there is no turning back. I found the article quite scary, as I can’t imagine programming in the extreme environment suggested. I also have to question whether everyday applications will really benefit from massive multicore. But, clearly, I can see that quite a few will.

The second was the following (short) youtube video of [[Simon Peyton Jones]] and [[Erik Meijer]] discussing the space of programming languages:

Simon talks about how both imperative and functional programming languages are trying to reach some kind of “Nirvana”, but coming at it from different directions.

Now, the question is: how do they connect together? Well, essentially, I think the “Nirvana” space the Simon talks about is exactly the space we need to deal with the Hardware Jungle. In other words, I think it’s the space most suitable for distributed and parallel computing.

What do we know about this “Nirvana” space? In the video, Simon talks about safe and unsafe languages. He says explicitly that safe means having limited or highly controlled side-effects. Languages which are unsafe by his categorisation include Java, C#, C, and the majority of languages traditionally thought of as imperative or object oriented. They are unsafe because one cannot reason about the result of any given method in isolation from others. That is, methods may read/write shared state (via the heap) and, to reason about them, we must know: (1) what shared state is actually accessed; (2) what value it has when it is accessed. These three requirements make it very difficult know what’s going on, particularly if something else could modify the state at any point. More importantly, I believe, is that these requirements make it very difficult for the compiler to reason about what’s going on.

What does this have to do with the Hardware Jungle? Well, I believe there are two important points here:

To move the right data for a given computation onto the right core, we must know exactly what state that computation will access.

For massively multi-core systems, Humans will not be capable of optimally mapping resources onto cores. We will increasingly rely on sophisticated algorithms to do this for us. Typically, such algorithms will be embedded in the compiler and/or runtime system.

These two points taken together imply it must be possible to automatically determine what state a given computation will access. The easiest way of doing this is to aggressively restrict the state that can be accessed by requiring pure functions. Furthermore, I believe it is not sufficient to rely on the programmer to ensure his/her functions are pure — the compiler must do this for us. This is because race-conditions are already notoriously difficult to debug and in the Hardware Jungle things will get seriously crazy.

This leads me to the final and, I think, most important question:

Which mainstream programming languages currently support pure functions and/or other mechanisms for aggressively limiting side-effects?

19 comments to Connecting the Dots on the Future of Programming Languages

How do some of the Prolog-like languages fit in here? Prolog isn’t purely functional of course, but in practice (in my experience) most of the computational code written in it is. The major exception is a few common tricks for memoising results.

I imagine that the same techniques useful for parallelising Haskell could be applied to logic languages. But I am quite ignorant there.

The search-based nature of Prolog would seem to make it a good candidate for automated parallelisation. However, I don’t really know enough about Prolog to make an informed opinion. Perhaps someone else can comment?

C and C++ *support* pure functions, but don’t insist upon them. C++ has language support for const methods. A lot can be done by programmers working in a functional style, even in these old-fangled languages. Can anyone comment on the advantages for a compiler for a language that insists on pure functions, as opposed to code that merely sticks to them by design?

anyone comment on the advantages for a compiler for a language that insists on pure functions

The issue here is whether or not the compiler can still perform key optimisations. In some cases, static analysis can determine that a piece of code is pure. But, this is really difficult and most compilers don’t do it (although the JVM presumably does to some extent).

Const in C++ is interesting because the language spec explicitly states that modifying an object through a const reference results in undefined behaviour. This does enable the compiler to optimise, even though const is not strictly enforced (see this). However, the downside is that when some does break constness then subtle bugs can arise which are hard to understand because they’re caused by some specific compiler optimisation.

I expected this article to go in a completely different direction. I don’t disagree, but I see a simultaneous trend in a somewhat opposite direction: easier to use, more powerful dynamic languages compiling to either JavaScript or the JVM. It’s easy to see if you look at the widespread use of the JVM post-Java, and when you take into consideration proliferation of browser applications and the coming of Netbooks.

I prefer writing powerful, useful projects quickly and with fun tools. Ill leave optimizing the parallelizing compilers to the people who feel like mucking with it.

Simon is not being very honest when he lumps C in there with C# and Java. Even if they lack the side-effect tracking systems, they are far safer languages for many reasons.

Go is actually diametrically opposed to the Haskell philosophy, I would think: it tries to be as useful as possible and safe only when it serves the first purpose. Personally, that seems like the best way to go in practice. Always good to research superior methods, of course… it just doesn’t seem like the research has generated much use in the last few decades.

I don’t really understand why OOP is unsuitable for multi-core programming. Presuming an object is responsible only for its own state, as long as a single object doesn’t have to spread its state across processors, the state should be fully encapsulated, right?

I’ll take your word for it that Java, C#, and co. do share state internally, but surely that’s implementation specific and not a limitation of object orientation per se?

There are a few things here. Firstly, objects to not always completely encapsulate the state they access. Oftentimes, an object is shared amongst many and used as a shared variable. In some sense, what you have is many small object networks which are collaborating together. To move one object onto a core you have to figure out which ones from its network are needed as well. And, the compiler certainly is not capable of doing this for us.

What language? is one way of looking at the problem. Another way to look at it is from the perspective of ZeroMQ — i.e. spread better concurrency by providing a crosscutting library that projects the actor model through all layers of architecture. It seems to me that what is likely to emerge is a selection of Clojure-like (dynamic-like syntax, but type-hinted and STM-endowed) and Haskell-like languages (purely functional and deeply-typed, but with friendlier FRP) communicating at runtime using a core library that is ZeroMQ-like. That picture leaves room for a range of friendliness and expressiveness while also enabling a market-competitive dynamic to drive the quality of usable compute services.

Fortran (95 and later) has a keyword for declaring functions as “pure”, and when one does so, the compiler then can check that the function only does operations without side effects and only calls functions that are likewise declared “pure”.

It also has mechanisms for declaring parameters as “in”, “out”, or “inout”, which makes the compiler’s dataflow analysis rather easier. (You can mostly do the same thing with C++ references using “const”, but this I think is a bit more direct about the purpose.)

I have not looked in detail, but I believe that the interactions between the “pure” keyword and the object-oriented programming introduced in Fortran 2003 will have been fairly well-thought-out as far as combining objects and pure functions.