Design

Cat: A Functional Stack-Based Little Language

Source Code Accompanies This Article. Download It Now.

Cat is an intermediate language for program verification, optimization, and more!

Christopher is a freelance programmer and consultant, with a particular interest in the design and implementation of programming languages. He can be contacted at cdiggins@gmail.com.

I've always been fascinated by stack-oriented languages because of their simplicity and elegance. Instructions take input off of the stack, do something with it, and push the output back onto the stack. Thus, there are no named variables or arguments, and the order of execution is left-to-right without any need for parentheses to denote precedence. As a result, it doesn't take long for people to learn the basics of programming in a stack language.

Another appeal of stack languages is that it is relatively easy to create reasonably efficient implementations for them, especially where memory considerations are at a premium. Because of this, we find stack languages in multicore processors (SeaForth, www.dspdesignline.com/188701424), virtual machines (Java Virtual Machine language), and embedded devices.

In general, when we think of stack languages, we often think of imperative languages: Each instruction does something to a shared stack. An alternative and equally accurate view of stack languages is that each instruction is a function that takes a stack as input and returns a stack as output. This is the principle upon which Manfred von Thun based the Joy language (www.latrobe.edu.au/philosophy/phimvt/joy.html).

The language that Joy most closely resembles is PostScript in that explicit control structures (branches, gotos, jumps) are replaced with higher order instructions (instructions that execute other instructions). Joy, however, introduced explicit function literals (PostScript uses a delayed execution operator) and eliminated the concept of the definition dictionary; in other words, you can't dynamically define or redefine new operations. This yielded a new breed of stack language that shares the advantages of pure functional languages (for example, it is expressive and easy to reason about and manipulate formally). It is interesting to note that despite its similarities to other stack-based languages, Joy evolved independently from Schoenfinkel and Curry's combinatory logic and the FP language by John Backus (www.vector.org.uk/archive/v203/vonthun203.htm).

My interest in Joy was primarily motivated by my search for an intermediate language that could be easily targeted by imperative and functional languages, could be easily optimized, and could be statically verified. Joy relies heavily on dynamic checking, so I created a more restricted, statically typed language based on Joy and called it "Cat."

Introducing Cat

In the Cat specification, instructions are referred to as "functions," regardless of whether they have side effects. New functions are defined using the define keyword, and have global scope. Functions cannot be redefined, and are only visible after they are defined. Example 1 presents some simple examples.

In Cat, a literal (for example, 42, 'q', "Hello Christopher\n", 3.14) pushes a value onto the stack. Additionally, you can also write literal functions, by enclosing an expression in square braces ([1 +]). This has the effect of pushing a function onto the stack without executing it. In Joy parlance, this is called a "quotation," but I think of it as an anonymous function. Anonymous functions are first class values: They can be constructed dynamically and treated like any other primitive value (you can dup them, swap them, pop them, and so on). You can execute anonymous functions using higher order functions such as apply, if, and while.

Closures (functions bound to values in the local environment) can be constructed in Cat using the papply primitive instruction. This has the effect of binding the top value on the stack to the function, a process known as "partial application" (partial application is frequently mislabeled as "currying"). An example of partial application is that if you wrote 1 [<=] papply, it has the same effect as if you wrote [1 <=].

The quicksort algorithm in Example 2 is a more sophisticated example of Cat. The algorithm relies on a binary recursion instruction (bin_rec) that provides a general implementation of a binary recursive process (also called "tree recursion"). Example 3 is a possible definition of bin_rec.

The bin_rec function is an example of a hylomorphism (see citeseer.ist.psu.edu/meijer91functional.html)the composition of an anamorphism (an unfolding function) and a catamorphism (a folding function). Hylomorphisms are interesting because they can be used to eliminate the construction of intermediate data structures (citeseer.ist.psu.edu/launchbury95warm.html).

The quicksort algorithm in Example 2 also demonstrates an extended feature of Cat called "metadata"a form of structured comment that can associate additional data with a Cat function that can be used by tools. For example, metadata can be used to document functions and perform automatic unit tests. The format is based on YAML (Yet Another Markup Language) and uses significant whitespace to denote hierarchical structure.

To demonstrate how compact Cat can be, Example 4 includes an implementation of the Google MapReduce algorithm (labs.google.com/papers/mapreduce.html). The general idea of the Google MapReduce algorithm is to define a task in terms of subtasks that can be executed (in this example, counting instances of words), and a function to combine the results of the subtasks (called the "reduce function"). While my implementations of Cat do not execute MapReduce concurrently, you can easily develop an implementation of Cat that automatically executes map and self_join to take advantage of available parallelism in the executing environment. This leads to an important point about Cat: As an implementor, you decide how to implement the primitive instructions and whether to implement standard library functions as library functions or built-in functions. This opens lots of opportunities for high-performance implementations.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!