Beginning Factor – Introduction

I’ve been involved in the Factor programming language community for about a year now, and am constantly amazed with how productive its contributors are. Large improvements to the language and its libraries are made on a weekly (if not daily) basis, and it’s finally starting to attract some much-deserved attention from the programming community.

The problem is, the language is a huge departure from the norm for most developers and it can be overwhelming to someone just getting started. I would like to help ease that transition by posting on various topics that I know have been confusing to me over the past year.

Note that many topics are covered in the official FAQ — which is well worth a read — and I won’t spend time covering how to install Factor on the 14 or so platforms it supports, but beyond that, I’ll try to give enough information (from basic to advanced) to get you going.

The Basics of the Basics

First of all, Factor is a stack-based language, which means that it uses a stack to store all arguments and returned values from functions (called “words” in Factor). To put that another way, Factor words don’t receive arguments in the traditional manner; all input values that your word needs are expected to already exist on the top of the stack.

What is a Stack?

If you’re unfamiliar with the data structure called a stack, the concept is fairly simple. It centers around the idea of “Last In First Out” (LIFO), meaning the last item placed onto the stack is the first item that will you will get when you remove an item from the stack. You cannot get to the lower items until the all of the items above it have been removed.

I like to picture a stack like a tower of LEGOs. The only thing you can do with it are add another brick to the top (“push” an item onto the stack), or take the top brick off to use it for something else (“pop” an item off the stack). That’s all there is to stacks!

How to Use the Stack

So, now we know that Factor manages its input and output with a stack, but how do we actually use it? Well, when entering data into Factor’s listener, one of two things is happening:

You are pushing a literal onto the stackOR

You are calling a word which will consume literals from the stack

Things do get slightly more complicated than that, but for the most part those two rules hold true. The most simple example of this behavior is the ubiquitous hello world program, shown entered directly into Factor’s interactive listener environment:

(scratchpad) "Hello world!" print
Hello world!

To understand what is happening, you can just type the string first, and then execute the print word later:

In this particular case, typing the literal string "Hello world!" will push that value onto the stack; then the word print, takes a single string off the stack and writes it to the output stream.

The same idea can be applied to simple arithmetic. Like strings, numbers are also literals, so you just have to type them in (separated by spaces) for them to be pushed onto the stack:

(scratchpad) 23
--- Data stack:
23

NOTE: the stack is displayed in the listener upside-down from the way you’d think, so the bottom number is actually the top of the stack.

…and if you want to add the top two numbers together, the + word will simply pop two numbers off the stack, add them together, and push the result back onto the stack:

--- Data stack:
23
(scratchpad) +
--- Data stack:
5

A Few More Words

If you start messing around in the listener, odds are your stack is going to grow pretty quickly and become unmanageable. There are a few words that are essential to know in order to keep things under control:

Ramifications of Stack-Based Design

From these simple examples, we can observe a couple of important things about Factor:

Postfix Notation

When using a stack, passing data becomes implicit and we can assume that all input needed by words already exists on the stack. This naturally lends itself to using postfix notation because you have to push your data onto the stack before you can use it.

This also makes words more concise and unambiguous when compared to infix notation and eliminates the need for copious amounts of parentheses used by prefix notation languages, like Lisp or Scheme.

POSTFIX: 6 5 4 * +
INFIX: 6 + 5 * 4 =
PREFIX: (+ (* 4 5) 6)

With the infix example above, you’d have to know order of operations rules in order to get the correct answer (or use parentheses to force the matter). When using postfix notation, the fact that multiplication is done first becomes explicit. Prefix notation also gets rid of that ambiguity, but becomes messier and harder to type with the more nesting you add.

Calling Words is Implicit

You don’t have to specify that you’re calling a word, you simply use the word. This, combined with postfix syntax, means that you can easily nest words or cut and paste parts of definitions into new words without disrupting the flow of data. This lends to keeping code modular, short, easily testable, and readable.

Your First Word

If you’ve typed in the examples from above and messed around in the listener, then you might want to know how to write your own word rather than just using ones that are predefined. Drawing on what we already know, here’s how to write your first word:

: plus-two ( x -- y )
2 + ;

If you copy and paste that into your listener, than you can use the word plus-two anywhere you would like to add two to a number on the top of the stack:

(scratchpad) 15 plus-two .
17

Not very exciting, but it gives us a couple more things to talk about…

Syntax Specifics

If you study the word definition, you’ll see that it’s made up of a few elements:

: plus-two ( x -- y )2 + ;

A colon (:) is used to start the definition of a word. This is required and must have a space after it.

Right after the opening colon is the name of your word, also required.

Following the name of your word is its stack effect declaration, which is a list of the word’s inputs and outputs separated by -- and surrounded by parentheses. All words must have a stack effect declaration unless it only pushes literals on the stack. The names of elements in the stack effect declaration don’t make a difference, only the number of elements. That means that ( elt elt -- seq ) and ( x y -- z ) are the same thing. There are some common conventions for these names, but don’t get caught up by it as I’ll talk more about them in a later article and it won’t change how your program runs.

Next comes the word definition itself, in this case 2 +.

And the last item is a semicolon (;), used to end the word definition. This is required and must have a space before it.

The reason that everything must be surrounded by spaces is that there aren’t any syntax-only elements to Factor…everything is a word! The colon/semicolon, the parentheses, etc. are all just parsing words working together in order to create the syntax.

Next Time

Next time we’ll talk more about stack shufflers, quotations & combinators, details about more datatypes, and more…

If something is unclear or if you’re having any trouble, let me know and I’ll try to help out!