I've been planning to write a few posts about some programming stuff that interests me. I've spent a good part of my career working on systems that need to support concurrent computation. I even did my PhD working on a system to allow a particular style of parallel programming. It's a really hard problem - concurrency creates a host of complex issues in how systems behave, and the way that you express concurrency in a programming language has a huge impact on how hard it is to read, write, debug, and reason about systems.

So, like I've said, I've spent a lot of time thinking about these issues, and looking at various different proposed solutions, as well as proposing a couple of my own. But I really don't know of any good writeup describing the basics of the most successful approaches for beginners. So I thought I could write one.

But that's not really today's post. Todays post is my version of a train-wreck. Long-time readers of the blog know that I'm fascinated with bizarre programming languages. So today, I'm going to show you a twisted, annoying, and thoroughly pointless language that I created. It's based on one of the most successful models of concurrent programming, called Actors, which was originally proposed by Professor Gul Agha of UIUC. There've been some really nice languages built using ideas from Actors, but this is not one of them.

The language is called "Cruise". Why? Because it's a really bad Actor language. And what name comes to mind when I think of really, really bad actors with delusions of adequacy? Tom Cruise.

You can grab my implementation from github. Just so you know, the code sucks. It's something I threw together in my spare time a couple of years ago, and haven't really looked at since. So it's sloppy, overcomplicated, probably buggy, and slow as a snail on tranquilizers.

Quick overview of the actor model

Actors are a theoretical model of computation, which is designed to describe completely asynchronous parallel computation. Doing things totally asynchronously is very strange, and very counter-intuitive. But the fact of the matter is, in real distributed systems, everything *is* fundamentally asynchronous, so being able to describe distributed systems in terms of a simple, analyzable model is a good thing.

According to the actor model, a computation is described by a collection of things called, what else, actors. An actor has a mailbox, and a behavior. The mailbox is a uniquely named place where messages sent to an actor can be queued; the behavior is a definition of how the actor is going to process a message from its mailbox. The behavior gets to look at the message, and based on its contents, it can do three kinds of things:

Create other actors.

Send messages to other actors whose mailbox it knows.

Specify a new behavior for the actor to use to process its next message.

You can do pretty much anything you need to do in computations with that basic mechanism. The catch is, as I said, it's all asynchronous. So, for example, if you want to write an actor that adds two numbers, you can't do it by what you'd normally think of as a function call. In a lot of ways, it looks like a method call in something like Smalltalk: one actor (object) sends a message to another actor, and in response, the receiver takes some action specified by them message.

But subroutines and methods are synchronous, and nothing in actors is synchronous. In an object-oriented language, when you send a message, you stop and wait until the receiver of the message is done with it. In Actors, it doesn't work that way: you send a message, and it's sent; that's it, it's over and done with. You don't wait for anything; you're done. If you want a reply, you need to send the the other actor a reference to your mailbox, and make sure that your behavior knows what to do when the reply comes in.

It ends up looking something like the continuation passing form of a functional programming language: to do a subroutine-like operation, you need to pass an extra parameter to the subroutine invocation; that extra parameter is the *intended receiver* of the result.

You'll see some examples of this when we get to some code.

Tuples - A Really Ugly Way of Handling Data

This subtitle is a bit over-the-top. I actually think that my tuple notion is pretty cool. It's loosely based on how you do data-types in Prolog. But the way that it's implemented in Cruise is absolutely awful.

Cruise has a strange data model. The idea behind it is to make it easy to build actor behaviors around the idea of pattern matching. The easiest/stupidest way of doing this is to make all data consist of tagged tuples. A tagged tuple consists of a tag name (an identifier starting with an uppercase letter), and a list of values enclosed in the tuple. The values inside of a tuple can be either other tuples, or actor names (identifiers starting with lower-case letters).

So, for example, Foo(Bar(), adder) is a tuple. The tag is "Foo". It's contents are another tuple, "Bar()", and an actor name, "adder".

Since tuples and actors are the only things that exist, we need to construct all other types of values from some combination of tuples and actors. To do math, we can use tuples to build up Peano numbers. The tuple "Z()" is zero; "I(n)" is the number n+1. So, for example, 3 is "I(I(I(Z())))".

The only way to decompose tuples is through pattern matching in messages. In an actor behavior. message handlers specify a *tuple pattern*, which is a tuple where some positions may be filled by{em unbound} variables. When a tuple is matched against a pattern, the variables in the pattern are bound to the values of the corresponding elements of the tuple.

A few examples:

matching I(I(I(Z()))) with I($x) will succeed with $x bound to I(I(Z)).

matching Cons(X(),Cons(Y(),Cons(Z,Nil()))) with Cons($x,$y) will succeed with $x bound to X(), and $y bound to Cons(Y(),Cons(Z(),Nil())).

matching Cons(X(),Cons(Y(),Cons(Z(),Nil()))) with Cons($x, Cons(Y(), Cons($y, Nil()))) will succeed with $x bound to X(), and $y bound to Z().

Code Examples!

Instead of my rambling on even more, let's take a look at some Cruise programs. We'll start off with Hello World, sort of.

This declares an actor type "!Hello"; it's got one behavior with no parameters. It only knows how to handle one message, "Go()". When it receives go, it sends a hello world tuple to the actor named "out", which is a built-in that just prints whatever is sent to it.

Let's be a bit more interesting, and try something using integers. Here's some code to do a greater than comparison:

This is typical of how you do "control flow" in Cruise: you set up different actors for each branch, and pass those actors names to the test; one of them will receive a message to continue the execution.

What about multiple behaviors? Here's a trivial example of a flip-flop:

Pretty straightforward - the only interesting thing about it is the way that it sends the result of invoking add to a continuation actor.

Now, let's use an addition actor to implement a multiplier actor. This shows off some interesting techniques, like carrying auxiliary values that will be needed by the continuation. It also shows you that I cheated, and added integers to the parser; they're translated into the peano-tuples by the parser.

In my oh-so-abundant free time, I've been working on my own little text editor. And one of my motivations is TECO: one of the oldest, and one of the very best, ever written. It's both a text editor and a programming language - and, in fact, that's exactly what made it such a brilliant tool. So much of the drudgery of programming is stuff that really could be done by a program. But we've spent so much time learning to be fancy that we've lost track of that. Nowadays, you can write an emacs lisp program to do the stuff you used to do in TECO; only it's awkward enough that you usually don't.

The problem, though, with just re-implementing TECO with a modern UI is that it was designed in a different time. When TECO was written, every byte was critical. And so the language, the syntax, the structure, it was completely ridiculous. And, as a result, it became the world's most useful pathological programming language. It's a glorious, hideous, wonderful, horrific piece of computing history

TECO is one of the most influential pieces of software ever written. If, by chance, you've ever heard of a little editor called "emacs"; well, that was originally a set of editor macros for TECO (EMACS = Editor MACroS). As a language, it's both wonderful and awful. On the good side, The central concept of the language is wonderful: it's a powerful language for processing text, which works by basically repeatedly finding text that matches some kind of pattern, taking some kind of action when it finds it, and then selecting the next pattern to look for. That's a very natural, easy to understand way of writing programs to do text processing. On the bad side, it's got the most god-awful hideous syntax ever imagined.