Array-oriented Thinking

It's no secret that I really like APL. Perhaps less obvious, however, is why I think it's such a fun language to work with, and of particular relevance today. Hardware today imposes a great deal of burdens on the compiler writer and, in turn, on the programmer. While we still enjoy relatively good performance on serial machines, modern hardware clearly wishes we all programmed in parallel. Organizations like CREST at IU and similar groups elsewhere recognize that exascale computing requires a whole new set of paradigms for programming at the user level. However, even before we reach exascale computing, simple multi-core, SMP, and GPU programming today present what I consider too great a programming barrier to the average developer.

This is more than just a library problem. This is a problem with the way we are taught to program. The way programmers are taught to think, and the way that our programming languages encourage us to think, handicaps us when attempting to scale our programs to parallel, distributed, or vector systems. Not matter how sophisticated our compilers, algorithmic thinking that targets, from conception, parallelism will continue to perform better for the foreseeable future.

I would also suggest another issue. Beyond performance, no clear evidence actually suggests traditional programming approaches actually make the most human ergonomic sense for the communication of ideas to other humans. Programming should first and foremost presents itself as a human to human communication problem, and only after that take into consideration the machine characteristics upon which the program executes.

However, there is a clear barrier to the "array oriented" thinking pattern in normal computer scientists. It's not that they can't think in those terms, but it's more that doing so costs them a great deal more than thinking in terms of structural recursion or message passing (OOP) or even basic imperative mutation. However, by limiting themselves in this fashion, the average computer scientist is missing precisely the agility to think in terms that make programming for modern hardware easier. Given the amount of effort that schools go through to teach any form of traditional computation thinking, I'm not convinced that this struggle has anything to do with innate thought patterns of humans.

So, in my mind, living completely inside a totally different world, where bulk, data flow oriented, array-oriented thinking serves as the core, foundational computational pattern, and only later are things like recursion and branching considered if necessary, makes for an interesting area of study. It creates a sort of push-back against the rest of the computational paradigms. The important point here is embracing a new paradigm. By embracing it, rather than just dabbling in it, I get to experience a great deal more of what the actual limitations of the paradigm are. I get a much better idea of what is and isn't possible, and just how easy it is or is not.

I've been doing this for a while, and I hope to publish some of my findings soon regarding some of these experiences. However, here are a few more casual observations I can make about living inside of an array-oriented world.

Array oriented thinking isn't innately harder. I don't know how much evidence I have to support this claim, but from my own experience, I find that there doesn't seem to be any inherent difficulty in the paradigm's use. It's very different though, and this causes some growing pains as you learn to think in the approach.

Array programming makes you appreciate concision. I used to program in scheme and enjoy how clear and concise I could make programs. After programming in arrays, I've come to realize that there's a whole extra dimension of concision to be had. I think the biggest difference comes in where you encode information. This is a critical point. In systems like Scheme, you often encode a great deal of information in the control flow of the program. In Array programming, you encode information in the data and computation, and often very little in the actual control flow of the program. You express your ideas in terms of computation over data, rather than in execution paths over code trees. This leads to very very concise programs.

People criticize this sort of concise coding as unreadable. Well, yeah, if you are not practiced in reading that sort of code, then it's going to be hard. I think this is simply a truism, and equally true of all the programming methods. People should get over the idea that their favorite programming paradigm somehow makes "more sense" than others. It doesn't. It may have some attractive properties upon which you can hang your hat, but that doesn't make it easy. In the end, I think the question of paradigms really rests on where you hide details, and where you expose them, and how well that enables you to reason about your code. I think people really digging into array programming at times because it seems like that sort of thinking either isn't general enough to be useful in a wide array of domains, or they can't see the benefits that the paradigm gives them.

The concision of array programming gives you the ability to see patterns that are otherwise harder to see. you're able to expose the computation and perform symbolic manipulation of that code in a way that you just can't as easily do in other systems. That comes at the cost of code that doesn't directly reveal the structure or patterns of your data, especially if you are working with non-rectangular data structures (such as trees). I'm not sure whether this is a handicap or not.

You start to think in terms of properties and patterns over data. A really interesting emerging element I notice in my own programming is a tendency for me to begin examining programming from a data and properties perspective, rather than a structure and traversal or operation sort of view. That is, properties and equivalencies become very important in array programming. Your key insights while programming in a normal programming language often come in the form of program structure. Often your data encoding is fairly direct (until it isn't) and the question then becomes what the clearest, nicest, fastest method is to operate over that data to get what you want.

In array thinking, I spend a lot more time thinking about how the data should be encoded, and especially, in the evolution of data from one form to another. Essentially, you're manipulating data to have certain properties that you want to hold, which allows an elegant expression of the solution you care about. While this is something people do in normal programming, it just feels different. When I've described my process to other people, they've often remarked that it feels more "mathy" and that it's much more "big picture." I think they're trying to get at the idea that I'm often thinking about encodings of data and on high-level global invariants about that data which I can leverage to make global assertions or transformations to get what I want. This leads to very different looking code, and a different way of looking at problems. I think we could all benefit from spending a little more time in this way of thinking.

Changing the way you communicate with people. One of the more interesting things you get from the combination of concision and very little control flow is that you can whip up a one liner that you can give to a colleague to work with. This artifact becomes a sort of shared knowledge space where we can spend time examining and manipulating it to express different ideas. If you go through the APL newsgroup you'll see a lot of examples of this sort of code sharing. These short array programs are easy to remember, and easy to think about. While the underlying ideas might be quite clever or subtle, just working with the code is easy. It often might take some time to understand why something works, but it's usually immediately clear how what someone is saying maps to what they are writing in their code. This is really interesting. Because of the lack of control flow, reading and remembering APL programs, for me at least, is a lot easier than remember Scheme programs. This ends up forming a sort of idiom list in your head (and often outside in the form of a dictionary) that turns into a new vocabulary.

If we compare working with Scheme to working with APL, it's a lot easier to discuss and work through an idea with a colleague in APL than it is with Scheme. When I'm programming in Scheme or some other language to someone, I don't sit there and write out whole, complete versions of what I'm thinking of, because it's just too hard. Instead, we talk in terms of diagrams or pictures. With APL, I'm able to go up to another APLer and immediately start putting things on the board and work with them directly on that code. Usually the sticking point there becomes understanding the data encoding. And let's not underplay that. In APL, sometimes the data encoding is the really clever thing that requires some time to tease out. But you can always rely on teasing it out through small elucidatory APL snippets that bring out one or more properties of the APL data.

Conclusion

In the end, array oriented thinking may or may not give us a great way of moving forward on program scalability. In the short term however, I'm convinced that it can provide a great set of opportunities for exploring new elucidations of programming and algorithms that will allow us to examine and approach our own code with new perspectives and new tools. No matter what programming language you use, this seems like a good thing to me. One of the things I'm hoping to do with my APL a day series and other efforts like it is to bring out this sort of thinking in a way that makes sense for the medium.

I'm actually working on one such thing right now. We'll see where it goes, but I think it's pretty cool. :-)