Pages

12 February 2012

Why Concatenative Programming Matters

Introduction

There doesn’t
seem to be a good tutorial out there for concatenative programming, so I
figured I’d write one, inspired by the classic “Why Functional
Programming Matters” by John Hughes. With any luck it will get more people interested in the topic, and give me a URL to hand people when they ask what the heck I’m so excited about.

Foremost, there seems to be some disagreement over what the term “concatenative” actually means. This Stack Overflow answer by Norman Ramsey even goes so far as to say:

“…it
is not useful to use the word ‘concatenative’ to describe programming
languages. This area appears to be a private playground for Manfred von Thun.
There is no real definition of what constitutes a concatenative
language, and there is no mature theory underlying the idea of a
concatenative language.”

This is rather harsh, and,
well, wrong. Not entirely wrong, mind, just rather wrong. But it’s not
surprising that this kind of misinformation gets around, because
concatenative programming isn’t all that well known. (I aim to change
that.)

Concatenative programming is so called because it uses function composition instead of function application—a non-concatenative language is thus called applicative.
That’s the defining difference, and it’s as “real” a definition as I
need, because, well, it’s the one that people are using. Bang.
Descriptivism.

One of the problems with functional
programming is that the oft-touted advantages—immutability, referential
transparency, mathematical purity, &c.—don’t immediately seem
to apply in the real world. The reason “Why Functional Programming
Matters” was necessary in the first place was that functional
programming had been mischaracterised as a paradigm of negatives—no
mutation, no side-effects—so everybody knew what you couldn’t do, but few people grasped what you could.
There is a similar problem with concatenative programming. Just look at the Wikipedia introduction:

A
concatenative programming language is a point-free programming language
in which all expressions denote functions and the juxtaposition of
expressions denotes function composition. The combination of a
compositional semantics with a syntax that mirrors such a semantics
makes concatenative languages highly amenable to algebraic manipulation.

This is all true—and all irrelevant to your immediate problem of why you should care. So in the next sections, I will show you:

How concatenative programming works

How typing in a concatenative language works

How concatenative languages are efficient

What concatenative languages are not good for

Where to go for more information

Now let’s get started!

♦ ♦ ♦

The Basics

In
an applicative language, functions are applied to values to get other
values. λ-calculus, the basis of functional languages, formalises
application as “β-reduction”, which just says that if you have a
function (fx = x + 1) then you can substitute a call to that function (fy) with its result (y + 1). In λ-calculus, simple juxtaposition (fx) denotes application, but function composition must be handled with an explicit composition function:

compose := λf. λg. λx. f (gx)

This definition says “the composition (compose) of two functions (f and g) is the result of applying one (f) to the result of the other (gx)”,
which is pretty much a literal definition. Note that this function can
only be used to compose functions of a single argument—more on this
later.

In concatenative languages, composition is implicit: “fg” is the composition of f and g. However, this does not mean that function application becomes explicit—it actually becomes unnecessary. And as it turns out, this peculiar fact makes these languages a whole lot simpler to build, use, and reason about.

The
untyped λ-calculus is a relatively simple system. However, it and the
many systems derived from it still need three kinds of term—variables,
lambdas, and applications—as well as a number of rules about how to
correctly replace variable names when applying functions. You have to
deal with name binding, closures, and scope. For a supposedly low-level
language, it has quite a bit of inherent complexity.

Concatenative languages have a much simpler basis—there are only functions and compositions, and evaluation is just the simplification
of functions. It is never necessary to deal with named state—there are
no variables. In a sense, concatenative languages are “more functional”
than traditional functional languages! And yet, as we will see, they are
also easy to implement efficiently.

♦ ♦ ♦

Composition

Let’s say we want to multiply a couple of numbers. In a typical concatenative language, this looks like:

2 3 ×

There are two weird things about this.
First,
we use postfix notation (2 3 ×) rather than the prefix notation (× 2 3)
that is common in the functional world, or the infix notation (2 × 3)
that most languages provide for convenience. There is nothing stopping a concatenative language from having infix operators, but for the sake of consistency, most stick to postfix notation: “fg” means (g ∘ f), i.e., the reverse of function composition. This is actually rather nice notation, because it means that data flows in the order the functions are written in.

Second,
we said all terms denote functions—so what the heck are 2 and 3 doing
in there? They sure look like values to me! But if you tilt your head a
little, you can see them as functions too: values take no arguments and return themselves. If we were to write down the inputs and outputs of these functions, it would look like this:

2 :: () → (int)
3 :: () → (int)

As you may guess, “x :: T” means “x is of type T”, and “T1 → T2” means “a function from type T1 to type T2”.
So these functions take no input, and return one integer. We also know
the type of the multiplication function, which takes two integers and
returns just one:

× :: (int, int) → (int)

Now how do you compose all of these functions together? Remember we said “fg” means the (reverse) composition of f and g,
but how can we compose 2 with 3 when their inputs and outputs don’t
match up? You can’t pass an integer to a function that takes no
arguments.

The solution lies in something called stack polymorphism. Basically, we can give a generic, polymorphic type to these functions that says they’ll take any input, followed by what they actually need. They return the arguments they don’t use, followed by an actual return value:

“∀A.” means “For all A”—in these examples, even if A has commas in it. So now the meaning of the expression “2 3” is clear: it is a function that takes no input and returns both
2 and 3. This works because when we compose two functions, we match up
the output of one with the input of the other, so we start with the
following definitions:

2 :: ∀A. (A) → (A, int)
3 :: ∀B. (B) → (B, int)

We match up the types:

(A, int) = (B)

By substituting, we get a new polymorphic type for 3 within the expression:

This
is correct: the expression “2 3 ×” takes no input and produces one
integer. Whew! As a sanity check, note also that the equivalent function
“6” has the same type as “2 3 ×”:

6 :: ∀A. (A) → (A, int)

So already, concatenative languages give us something applicative functional languages generally can’t: we can actually return multiple values from a function, not just tuples.
And thanks to stack polymorphism, we have a uniform way to compose
functions of different types, so the flow of data in our programs
doesn’t get obscured, as it were, by the plumbing.

♦ ♦ ♦

Cool Stuff

In
the above example, we worked from left to right, but because
composition is associative, you can actually do it in any order. In math
terms, (f ∘ g) ∘ h = f ∘ (g ∘ h).
Just as “2 3 ×” contains “2 3”, a function returning two integers, it
also contains “3 ×”, a function that returns thrice its argument:

3 × :: (int) → (int)

(From now on I’ll omit the ∀ bit from type signatures to make them easier to read.)

So
we can already trivially represent partial function application. But
this is actually a huge win in another way. Applicative languages need
to have a defined associativity for function application (almost always
from left to right), but here we’re free from this restriction. A
compiler for a statically typed concatenative language could literally:

Divide the program into arbitrary segments

Compile every segment in parallel

Compose all the segments at the end

This is impossible to do with any other type of language. With concatenative programming, a parallel compiler is a plain old
map-reduce!

Because all we have are functions and composition, a concatenative program is a single function—typically
an impure one with side effects, but that’s by no means a requirement.
(You can conceive of a pure, lazy concatenative language with
side-effect management along the lines of Haskell.) Because a program is
just a function, you can think about composing programs in the same
way.

This is the basic reason Unix pipes are so powerful: they form a rudimentary string-based concatenative programming language. You can send the output of one program to another (|); send, receive, and redirect multiple I/O streams (n<, 2&>1);
and more. At the end of the day, a concatenative program is all about
the flow of data from start to finish. And that again is why
concatenative composition is written in “reverse” order—because it’s actually forward:

So
far, I have deliberately stuck to high-level terms that pertain to all
concatenative languages, without any details about how they’re actually
implemented. One of the very cool things about concatenative languages
is that while they are inherently quite functional, they also have a
very straightforward and efficient imperative implementation. In fact,
concatenative languages are the basis of many things you use every day:

The Java Virtual Machine on your PC and mobile phone

The CPython bytecode interpreter that powers BitTorrent, Dropbox, and YouTube

The PostScript page description language that runs many of the world’s printers

The Forth language that started it all, which still enjoys popularity on embedded systems

The type of a concatenative function is formulated so that it
takes any number of inputs, uses only the topmost of these, and returns
the unused input followed by actual output. These functions are
essentially operating on a list-like data structure, one that allows
removal and insertion only at one end. And any programmer worth his salt
can tell you what that structure is called.

Moving from left to right in the
expression, whenever we encounter a “value” (remember: a nullary
self-returning function), we push its result to the stack. Whenever we
encounter an “operator” (a non-nullary function), we pop its arguments,
perform the computation, and push its result. Another name for postfix is reverse Polish notation, which achieved great success in the calculator market on every HP calculator sold between 1968 and 1977—and many thereafter.

So a concatenative language is a functional language that is not only easy, but trivial to run efficiently, so much so that most language VMs are essentially concatenative.
x86 relies heavily on a stack for local state, so even C programs have a
little bit of concatenativity in ’em, even though x86 machines are
register-based.

Furthermore, it’s straightforward to
make some very clever optimisations, which are ultimately based on
simple pattern matching and replacement. The Factor
compiler uses these principles to produce very efficient code. The JVM
and CPython VMs, being stack-based, are also in the business of
executing and optimising concatenative languages, so the paradigm is far
from unresearched. In fact, a sizable portion of all the compiler
optimisation research that has ever been done has involved virtual stack
machines.

♦ ♦ ♦

Point-free Expressions

It is considered good functional programming style to write functions in point-free form, omitting the unnecessary mention of variables (points) on which the function operates. For example, “fxy = x + y” can be written as “f = (+)”. It is clearer and less repetitious to say “f is the addition function” than “f returns the sum of its two arguments”.

More importantly, it’s more meaningful to write what a function is versus what it does,
and point-free functions are more succinct than so-called “pointful”
ones. For all these reasons, point-free style is generally considered a
Good Thing™.

However, if functional programmers really
believe that point-free style is ideal, they shouldn’t be using
applicative languages! Let’s say you want to write a function that tells
you the number of elements in a list that satisfy a predicate. In
Haskell, for instance:

It’s pretty simple, even if you’re not so familiar with Haskell. countWhere returns the length of the list you get when you filter out elements of a list that don’t satisfy a predicate. Now we can use it like so:

countWhere (>2) [1, 2, 3, 4, 5] == 3

We can write this a couple of ways in point-free style, omitting predicate and list:

But the meaning of the weird repeated self-application of the composition operator (.) isn’t necessarily obvious. The expression (.) (.) (.)—equivalently (.) . (.) using infix syntax—represents a function that composes a unary function (length) with a binary function (filter). This type of composition is occasionally written .:, with the type you might expect:

But what are we really
doing here? In an applicative language, we have to jump through some
hoops in order to get the basic concatenative operators we want, and get
them to typecheck. When implementing composition in terms of
application, we must explicitly implement every type of composition.

In
particular, there is no uniform way to compose functions of different
numbers of arguments or results. To even get close to that in Haskell,
you have to use the curry and uncurry
functions to explicitly wrap things up into tuples. No matter what, you
need different combinators to compose functions of different types,
because Haskell doesn’t have stack polymorphism for function arguments,
and it inherently can’t.

Writing point-free
expressions demands concatenative semantics, which just aren’t natural
in an applicative language. Now contrast a concatenative example:

define countWhere [filter length]
(1, 2, 3, 4, 5) [2 >] countWhere

It’s
almost painfully literal: “to count the number of elements in a list
that match a predicate is to filter it and take the length”. In a
concatenative language, there is no need at all for variables, because
all you’re doing is building a machine for data to flow through:

When you’re building a diagram like this, just follow a few simple rules:

Each block is one function

A block takes up as many columns as needed by its inputs or outputs, whichever are more

When adding a block, put it in the rightmost column possible:

If it takes no inputs, add a column

If there aren’t enough arrows to match the block, the program is ill-typed

♦ ♦ ♦

Quotations

Notice
the use of brackets for the predicate [2 >] in the preceding
example? In addition to composition, the feature that completes a
concatenative language is quotation, which allows deferred
composition of functions. For example, “2 >” is a function that
returns whether its argument is greater than 2, and [2 >] is a
function that returns “2 >”.

It’s at this point that we go meta. While just composition lets us build descriptions of dataflow machines, quotation lets us build machines that operate on descriptions of other machines. Quotation eliminates the distinction between code and data, in a simple, type-safe manner.

The “filter” machine mentioned earlier takes in the blueprints
for a machine that accepts list values and returns Booleans, and
filters a list according to the instructions in those blueprints. Here’s
the type signature for it:

filter :: (list T, T → bool) → (list T)

There
are all kinds of things you can do with this. You can write a function
that applies a quotation to some arguments, without knowing what those
arguments are:

apply :: ∀AB. (A, A → B) → (B)

You can write a function to compose two quotations into a new one:

compose :: ∀ABC. (A → B, B → C) → (A → C)

And you can write one to convert a function to a quotation that returns it:

quote :: (T) → (() → T)

♦ ♦ ♦

The Dark Side

Say you want to convert a math expression into concatenative form:

fxyz = y2 + x2 − |y|

This
has a bit of everything: it mentions one of its inputs multiple times,
the order of the variables in the expression doesn’t match the order of
the inputs, and one of the inputs is ignored. So we need a function that
gives us an extra copy of a value:

dup :: (T) → (T, T)

A function that lets us reorder our arguments:

swap :: (T1, T2) → (T2, T1)

And a function that lets us ignore an argument:

drop :: (T) → ()

From
the basic functions we’ve defined so far, we can make some other useful
functions, such as one that joins two values into a quotation:

And hey, thanks to quotations, it’s also easy to declare your own control structures:

define true [[drop apply]] # Apply the first of two arguments.
define false [[swap drop apply]] # Apply the second of two arguments.
define if [apply] # Apply a Boolean to two quotations.

And using them is like—actually, just is—using a regular function:

["2 is still less than 3."] # "Then" branch.
["Oops, we must be in space."] # "Else" branch.
2 3 < # Condition.
if print # Print the resulting string.

Those particular definitions for true and false will be familiar to anyone who’s used Booleans in the λ-calculus. A Boolean is a quotation, so it behaves like an ordinary value, but it contains a binary function that chooses one branch and discards another. “If-then-else” is merely the application of that quotation to the particular branches.

Anyway, getting back to the math, we
already know the type of our function ((int, int, int) → (int)), we
just need to deduce how to get there. If we build a diagram of how the
data flows through the expression, we might get this:

You’ve
just seen one of the major problems with concatenative programming—hey,
every kind of language has its strengths and weaknesses, but most
language designers will lie to you about the latter. Writing seemingly simple math expressions can be difficult and unintuitive,
especially using just the functions we’ve seen so far. To do so exposes
all of the underlying complexity of the expression that we’re
accustomed to deferring to a compiler.

Factor
gets around this by introducing a facility for lexically scoped local
variables. Some things are simply more natural to write with named state
rather than a bunch of stack-shuffling. However, the vast majority of
programs are not predominated by mathematical expressions, so in
practice this feature is not used very much:

One of the great strengths of concatenative
languages, however, is their ability to refactor complex expressions.
Because every sequence of terms is just a function, you can directly
pull out commonly used code into its own function, without even needing
to rewrite anything. There are generally no variables to rename, nor
state to manage.

So in practice, a lot of expressions
can be refactored using a wide array of handy functions, of the sort
that commonly end up in a standard library. With some refactoring, that
math expression might look like this:

square = dup ×
f = drop [square] [abs] bi − [square] dip +

Which
doesn’t look so bad, and actually reads pretty well: the difference
between the square and absolute value of the second argument, plus the
square of the first. But even that description shows that our
mathematical language has evolved as inherently applicative. It’s better
sometimes just to stick to tradition.

♦ ♦ ♦

Whither Hence

So
you’ve got the gist, and it only took a few dozen mentions of the word
“concatenative” to get there. I hope you’re not suffering from semantic
satiation.

You’ve seen that concatenative programming is a paradigm like any other, with a real definition and its own pros and cons:

Concatenative languages are simple and consistent (“everything is a”)

They are amenable to dataflow programming

Stack languages are well studied and have good performance

You can easily roll your own control structures

Everything is written in point-free style (for better or worse)

If you’re interested in trying out a mature, practical concatenative language, check out Factor and the official blog of the creator, Slava Pestov. Also see Cat for more information on static typing in concatenative languages.

I’ve been idly working on a little concatenative language called Kitten
off and on for a while now. It’s dynamically typed and compiles to C,
so you can run it just about anywhere. I wanted a language I could use
for a site on a shared host where installing compilers was irritating.
That shows you the extent of my language geekery—I’d rather spend hours
writing a language than twenty minutes figuring out how to install GHC
on Bluehost.

Anyway, the implementation is just barely
complete enough to play around with. Feel free to browse the source,
try it out, and offer feedback. You’ll need GHC and GCC to build it, and
I imagine it works best on Linux or Unix, but there shouldn’t be any particularly horrible incompatibilities.

This
would also be a good time to mention that I’m working on a more serious
language called Magnet, which I mentioned in my last article about how programming is borked.
It’s principally concatenative, but also has applicative syntax for
convenience, and relies heavily on the twin magics of pattern matching
and generalised algebraic data types. Hell, half the reason I wrote this
article was to provide background for Magnet. So expect more articles
about that.

Edit (20 April 2013) The above information is no longer accurate. Kitten is currently being rewritten; the work that was done on Magnet has been incorporated. Kitten is now statically typed and, until the compiler is complete, interpreted.

And that about does it. As always, feel free to email me at evincarofautumn@gmail.com to talk at length about anything. Happy coding!

Concatenative languages are an elaboration on and simplification of FP systems; some of the distinctions that FP systems maintain are arbitrary, such as between objects and functions. Also, row polymorphism makes the definition of a complete concatenative system much cleaner and smaller; you can get away with only two stack combinators (see http://tunes.org/~iepos/joy.html), and it’s possible to give a type to the Y combinator (∀A B. (A, (A, (A → B)) → B) → B).

Indeed, there is no particular reason—Factor allows something similar based on stack annotations, and Prog will do it with pattern matching. The “remainder of the expression” bit is a little unclear; you would obviously need to delimit the definition somehow (even if just with a newline) lest scopes run amok. I would prefer a syntax like “\ z y x { y square x square + y abs ‒ }”, but the principle is the same. Manfred von Thun shows in “The Theory of Concatenative Combinators” (http://tunes.org/~iepos/joy.html) that you can use stack combinators to rewrite any expression with lambdas into an expression without them. But my attitude is like yours: “can” doesn’t necessarily mean “should”.

Many thanks for writing this! I'm finally starting to feel like being excited about concatenative languages is something I'm allowed to be proud of as a functional programmer. I'll have to take a long look at Prog's design; I've always wondered how pattern matching could be combined with concatenative syntax in a sensible and convenient way.

One major advantage which you didn't mention and which doesn't seem to have been explored a lot yet: a (statically typed) concatenative language can afford incredible IDE support. The type of the stack at the cursor's position is everything you need to know about your current context; just display it in a corner and the programmer has all the info they need. Better yet, autocomplete for words that can take the current stack as input and you get IntelliSense on steroids.

That sounds cool. I'd love to see a prototype. Maybe you could select an interval of code and it would graph the flow associated with that fragment. Sounds like something a capable person could draft in html5 quickly (or perhaps factor's IDE?). (I disagree that types are "all you need". (int,int,int) isn't informative enough about what is what, for example, if you've been rolling and duping and dipping and dropping lately). Actually on that count, the data flow graph could *be* the language with the right UI.

I am writing a compiler that targets CIL right now. Thinking of all this in terms of a stack really made it so much easier to follow. .NET is not completely stack based though as it uses local variables as well.

If you'd just called it a 'stack based language' instead of concatenate it'd all have been massively clearer from the start..

Also, you list the Android phone as having a stack-based interpreter. In fact its famously register based, which is why it is so much faster. Its not a JVM, its Dalvik. http://en.wikipedia.org/wiki/Dalvik_%28software%29#Architecture

I admit that the article started out a bit unclear in its purpose, but you have to admit he cleared up any confusion quite well. (And considering that he's starting with Wikipedia's definition, he's doing GREAT in comparison.)

You're right about Dalvik being register based, but I can't find anything suggesting that it was done for speed, unless it was actually done to give reasonable speed without a JIT (and whether that's true or not, I can find no hint that any research was done). Lua's switch to a register machine is always described in terms of a speed-centered choice, but it's very clear that they were comparing a modern optimized register machine versus an antique naive stack machine; not a fair comparison at all.

A JITted virtual stack machine should be faster on arbitrary CPUs than a register machine -- the only exception being when the virtual register machine is a very close match to the physical register machine (after the overhead of virtualization).

The reason I structured this how I did is that the paradigm is not bound to stacks. There are a lot of high-level concepts that apply to all concatenative languages whether they use a stack or not. I wanted to demonstrate those concepts first, before showing that a stack is just an efficient way of implementing them.

It’s like Smalltalk and JavaScript versus Simula and C++: even though the former pair has prototypes and messages while the latter has classes and methods, they’re both object-oriented. A concatenative language based on term rewriting or directed graphs would still be concatenative, so long as it were still based on composition and quotation.

One quick note on the HP calculator front: they were sold WAY later than 1977, and some of us still love our HP48s. :)

However, I do wonder about the usefulness of this style for longer programs - I wrote quite a few medium-sized things on the HP48, and the code was always *intensely* write-only since you had to have a mental model of exactly what was on the stack where to even hope to follow it. Not sure how much of that was due to the system discouraging the definition of many small functions, though (you could only edit one function at a time).

Whether it’s useful for larger programs I think just depends on the programmer and the program. Concatenative programming demands obsessive factoring. If you don’t have “many small functions”, you’re doing it wrong, and your work will be harder than it has to be.

It’s like writing assembly. You can think about everything at the instruction level, with a mental model (or excessive comments) about what’s where. But you’ll never get anything done that way, because you’re working at the wrong level of abstraction. If you’re shuffling the stack a lot, then you’re treating the symptom of a problem you’re not seeing because you need to step back.

Concatenative languages are metalanguages. (Indeed, something I neglected to talk about is how good they are for metaprogramming.) You use the language to create structures that are suited to your problem, so you can solve the problem naturally. That’s what you ought to do in any language, but concatenative programming makes it almost painfully direct. Some like it, others don’t.

You use the language to create structures that are suited to your problem, so you can solve the problem naturally. That’s what you ought to do in any language, but concatenative programming makes it almost painfully direct. Some like it, others don’t.

True, but not always. The name of a parameter is often immaterial or generic; most of the identifiers in programs I’ve read have been largely meaningless syntactic junk. Where names serve a real purpose is in math expressions, where you expect to do symbolic manipulation.

“With a concatenative lanuguage, you need to know (or document in an informal comment) what inputs and outputs a function has. And in which order…”

I've read the original John Hughes paper and this was an interesting read as well.

That said:"Concatenative programming is so called because it uses function composition instead of function application—a non-concatenative language is thus called applicative."

I must observe: at some point, you need to apply your composed functions to your arguments to get any results at all.

No, that "5 is a function!" talk is just a silly fallacy to make it look overly-sophisticated what is really just values in a stack consumed by subsequent function calls. I can say the same for any scheme expression: (+ 1 2 3 4 5) is not then the application of the arguments to the function +, but the composition of a new function out of + and constant "functions" denoted by numbers... I can then compose another function from this new function (15) and get yet another "function": (* 2 (+ 1 2 3 4 5))

"the expression “2 3 ×” takes no input and produces one integer" please...

"concatenative languages give us something applicative functional languages generally can’t: we can actually return multiple values from a function, not just tuples"

No, you can't. You can just fill a stack and let subsequent calls alter it.

In scheme you may return multiple values, but the context must be waiting for them. (values 1 2 3) actually returns 3 values instead of a tuple (list). In the top-level REPL, it returns the separate 3 values, but if you were to use them, you'd better make a context to consume them:

it's much simpler to just return lists, which are Lisp's forte anyway, just like stacks are for concatenative languages. So you just return a list and use the usual functional higher-order functions to process the result:

"I must observe: at some point, you need to apply your composed functions to your arguments to get any results at all."

Which textual functions compose with which textual arguments? The answer isn't as cut and dried as you imply -- remember, concatenation (and composition) is associative. In an applicative language the answer is VERY clear in the text.

"No, that "5 is a function!" talk is just a silly fallacy"

Ouch.

"to make it look overly-sophisticated what is really just values in a stack consumed by subsequent function calls."

You're assuming stack semantics. The author explained that textual evaluation is another possible semantics; there's no "really just values" there. I believe www.enchiladacode.nl is a purely rewriting semantics (although that may just be my bad memory). Apter built a few concatenative languages using things like a stack-dequeue pair. Most languages entirely WITHOUT a stack are mere toys, but I'm not sure it will always be so. Oh, and of course, all professional Forths aside from stuff for hardware stack machines perform register optimizations, which uses a stack approximately like a C program would.

You're also missing the point. In the _language_, that is in the mapping of text to semantics, there is no difference between a literal and a function. The semantics of a literal may be simple; but then again they may not. The point is that a literal like "3" is the same sort of language (text) object as a function; it's parsed the same way (even though obviously it's lexed differently).

For higher-order languages, and for real lower-level languages, not all symbols are so tidy. There has to be a symbol that means "do not execute the following symbols", which is NOT semantically similar to what a function does.

"I can say the same for any scheme expression: (+ 1 2 3 4 5) is not then the application of the arguments to the function +, but the composition of a new function out of + and constant "functions" denoted by numbers... I can then compose another function from this new function (15) and get yet another "function": (* 2 (+ 1 2 3 4 5))"

You're not performing composition; you're building a tree, not a string. It's not associative except between siblings on the tree (and that's not a concept that's directly apparent in the text; you have to convert it to a parsed data structure to see which things are siblings).

"please..."

Well... It's true. And what's more, because of the associative property, _every lexically correct substring_ has a valid type, and can be pulled out and made a function. (Of course, this is only true inasfar as the language is purely concatenative. All currently used languages offer things like definitions and nested blocks, which are not associative. I made a language which doesn't have any of those and is therefore purely concatenative, but you wouldn't want to code in it; its name is "zeroone", and its two functions are named "zero" and "one". The bits, not the English numerals.)

"All that said, concatenative languages always sound to me like the exact opposite of Lisp as far as syntax is concerned. :)"

Sounds fair to me :-). And no parens (unless you want literal lists or quotations).

"No, that "5 is a function!" talk is just a silly fallacy to make it look overly-sophisticated"

Understand the difference between ($ 5) and (5) in Haskell. 5 in Concat languages is equivalent to ($ 5) in Haskell. It's not overly-sophisticated talk, it's actually useful and I've used it several times in Haskell.

Thanks. Yeah, I’d say stacks are relevant, but not fundamental. Indeed, Prog will use a stack for storage, but all word simplification will be done by pattern matching and rewriting, so it’s not inherently a stack language.

Very clear and informative article. Thanks! I experimented with both Forth and Factor a couple of months ago, and found it quite interesting how the languages made me start to think about algorithms.

Forth also prompted me to try Rebol, which I totally enjoyed. I'm starting to see how ideas evolve and change from language to language, and I think knowing this is really valuable for any serious developer.

I believe the principle reason virtual machines have recently moved away from stack machine representations was the cost in maintaining invariants over the stack representation. The overhead exceeded the benefits

However, for a peephole optimization, a stack-based representation might be most optimal.

It is indeed easy to peephole optimize stack-based representation. In particular, naive register allocation is very easy; if your datum is near the top of the stack it's FAR more likely to be a good candidate for register allocation than a datum deeper down. In essence, the time the programmer spends trying to write code without too many "SWAP"s and other "stack noise" leads not only to clear code, but also to prioritized dataflow. (In practice it's always harder.)

Now, I don't understand what you mean by "maintaining invariants". What invariants are you talking about? Is there some context here that I'm missing?

I have a question about the use of the compose function in Join2. I was just working through examples by hand to make sure I understood the dynamics of the language, and, given "0 1 join2", I ended up at a step:

(() -> 0) (() -> 1) ((A->B, B->C) -> (A->C))

I don't understand how this evaluation goes to (() -> 0 1)

Did I evaluate up to the right step until then? And if so, how does the compose function pair up numbers?

There is a bit of a notational problem here. The type T—one value—is equivalent to the type () → T—function from 0 values to 1 value—which too is equivalent to ∀A. A → (A, T)—function from n to n+1 values. So while the type I gave for “quote” is accurate, it is also somewhat misleading, which probably led you astray when you were simplifying “compose”.

Very nice post, Jon. It made me that curious that I've implemented my own concatenative language, even before really grasping everything I probably had better known. But hey, it's good enough to calculate the factorial! ;-)

This is a great post! I was vaguely familiar with the formal notion of concatenative languages, and this is an extremely clear explanation. Thank you.

Also, tangentially, this: http://home.iae.nl/users/mhx/sf.htmlLeo Brodie's "Starting Forth" is IMO one of the best programming books ever - accessible to smart kids (I read it the first time in 6th grade, I think), but with enough detail that you can actually implement a Forth compiler after reading it.

I'm familiar with the term "point-free", which refers to the original style of functional programming developed by Backus. I'm not familiar with Dijkstra using a term "pointless", and it wasn't a concatenative language Backus developed; it was applicative.

Thanks. That does help -- it looks like "pointless" here doesn't mean the same as "point-free" does. Here's an abstract that mentions and briefly defines the term:http://www.mendeley.com/research/relation-algebras-and-their-application-in-temporal-and-spatial-reasoning/

It looks like in this sense "pointless" means "without referring to the objects being stored". There's a similarity, but it's a different area of discussion; and note that the "pointless" proofs use plenty of names.

While one of the anonymous commenters above is right that most printers are not PostScript printers, PostScript still builds the basis of PDF, so every PDF-Viewer is implicitly an interpreter of a concatenative programming language:http://en.wikipedia.org/wiki/PDF#Technical_foundations

I find the article interesting and provocative. However, there are a number of things I disagree with.

I don't like ‘concatenative programming […] uses function composition instead of function application’ as a definition. Dropping one operation while promoting another does not mean the latter is used ‘instead’ of the former; they mean very different things after all.

The statement ‘that function application becomes unnecessary […] makes these languages a whole lot simpler to build, use, and reason about’ is not true in general – the author himself gives an example of the opposite, later on in the article.

One interesting question is the relation between concatenativity, stacks, and postfix notation. The author seems to maintain that stack is ‘not fundamental’ to concatenativity, but why then all real, usable, concatenative languages are stack-based?

Also, if being postfix is also not fundamental and ‘there is nothing stopping a concatenative language from having infix [or prefix] operators’, then why all concatenative languages are postfix? Can concatenativity be preserved in the presence of non-postfix notation?

A commenter said that ‘semantics […] via row polymorphism […] makes the stack an implementation detail’, but I see it exactly the opposite: the way it is used, row polymorphism itself already seems to assume a stack organization (a row is a stack); by employing it to define composition, the inherent relation of concatenative computation to stacks is only accented.

In a comment following the article, the author says that ‘some of the distinctions that FP systems maintain are arbitrary, such as between objects and functions’. In fact, that distinction in FP is no more consequential than it is in concatenative languages. An FP programme consists entirely of functions. Data objects only enter the picture when the programme is run.

Still further, he says: ‘row polymorphism makes the definition of a complete concatenative system much cleaner and smaller; you can get away with only two stack combinators’. But of course two combinators (e.g. S and K) suffice to express any computation in applicative style, too – this sort of minimalism has nothing to do with row polymorphism or concatenativity.

The statement ‘the Forth language […] started it all’ is inaccurate. Forth is perhaps the most influential in popularizing stack-based programming, but it was preceded by Pop (currently known as Pop-11), which was stack-based and, unlike Forth, mostly functional.

The statement ‘because all we have are functions and composition, a concatenative program is a single function’ is somewhat misleading, too – the said property holds of any (purely) functional programme, not just concatenative ones.

The statement ‘most language VMs are essentially concatenative’ needs to be substantiated. First, there is a well respectable set of VMs that are register-based, i.e., LLVM, Lua VM, Parrot, Dalvik, and HASTE (Falcon's VM). Second, a VM may be stack-based but not concatenative.

The phrase ‘quotation […] allows deferred composition of functions’ is, I believe, incorrect. Should it not rather be ‘deferred evaluation’?

Finally, the phrase ‘our mathematical language has evolved as inherently applicative’ needs to be made more precise. That language is not only ‘applicative’. It is clearly variable-based, as opposed to function-based (point-free). Functions very rarely are treated as values in themselves. And, syntactically, infix notation is preferred wherever it applies.

I find the article interesting and provocative. However, there are a number of things I disagree with.

I don't like ‘concatenative programming […] uses function composition instead of function application’ as a definition. Dropping one operation while promoting another does not mean the latter is used ‘instead’ of the former; they do very different job after all.

The statement ‘that function application becomes unnecessary […] makes these languages a whole lot simpler to build, use, and reason about’ is not true in general – the author himself gives an example of the opposite, later on in the article.

One interesting question is the relation between concatenativity, stacks, and postfix notation. The author seems to maintain that stack is ‘not fundamental’ to concatenativity, but why then all real, usable, concatenative languages are stack-based?

Also, if being postfix is also not fundamental and ‘there is nothing stopping a concatenative language from having infix [or prefix] operators’, then why all concatenative languages are postfix? Can concatenativity be preserved in the presence of non-postfix notation?

A commenter said that ‘semantics […] via row polymorphism […] makes the stack an implementation detail’, but I see it exactly the opposite: the way it is used, row polymorphism itself already seems to assume a stack organization (a row is a stack); by employing it to define composition, the inherent relation of concatenative computation to stacks is only accented.

In a comment following the article, the author says that ‘some of the distinctions that FP systems maintain are arbitrary, such as between objects and functions’. In fact, that distinction in FP is no more consequential than it is in concatenative languages. An FP programme consists entirely of functions; data objects only enter the picture when the programme is run.

Still further, he says: ‘row polymorphism makes the definition of a complete concatenative system much cleaner and smaller; you can get away with only two stack combinators’. But of course two combinators (e.g. S and K) suffice to express any computation in applicative style, too – this sort of minimalism has nothing to do with row polymorphism or concatenativity.

The statement ‘the Forth language […] started it all’ is inaccurate. Forth is perhaps the most influential in popularizing stack-based programming, but it was preceded by Pop (currently known as Pop-11), which was stack-based and, unlike Forth, mostly functional. Pop-11 is still finding use for AI education and research in UK.

The statement ‘because all we have are functions and composition, a concatenative program is a single function’ is somewhat misleading, too – the said property holds of any (purely) functional programme, not just concatenative ones.

The statement ‘most language VMs are essentially concatenative’ needs to be substantiated. First, there is a respectable set of VMs that are register-based, e.g. LLVM, Lua VM, Parrot, Dalvik, and HASTE (Falcon's VM). Second, a VM may be stack-based and not concatenative.

The phrase ‘quotation […] allows deferred composition of functions’ is, I believe, incorrect. Should it not be ‘evaluation’ rather than ‘composition’?

Finally, the phrase ‘our mathematical language has evolved as inherently applicative’ needs to be made more precise. That language is not only ‘applicative’. It is conspicuously variable-based, as opposed to function-based (point-free). Functions very rarely are treated as values in themselves. And, syntactically, infix notation is preferred wherever it applies.

I've recently been learning Stratego/XT which is a program transformation language. Typically terms go in on the left and are passed automatically from "strategy" (== function) to "strategy" working left the right. Each strategy can transform the given term in some way to yield a new one, or fail. Terms are composed by separating with a semicolon and you can take any sebsequence of strategies and give it a name. So, it's another example of a concatenative style of programming.

Some OO libraries that make use of this concatenative style, to good effect:

x.doThis().thenThat().andThisOtherThing().butWaitTheresMore()

It reads nicely compared to the functional reverse order:

andThisOtherThing(thenThat(doThis(x)))

You can factor out the middle parts into a new method, to some degree, like:

I think probably the greatest hinderance to do concatenative programming style in static languages like Java or C++ is that you cannot extend existing classes and defining all the classes needed to represent intermediate states of the computation would become pretty painful.

This may be a really dumb question, but would it be reasonable to interpret a Combinatory Categorial Grammar (http://en.wikipedia.org/wiki/Combinatory_categorial_grammar) as a special purpose concatenative programming language?