Curly infix, Modern-expressions, and Sweet-expressions: A suite of readable formats for Lisp-like languages

Many people find Lisp s-expressions hard to read as a programming notation.
This paper briefly describes a suite of three approaches I've developed:
curly infix, modern-expressions, and sweet-expressions.
These are not tied to any particular semantic, and
can do everything regular s-expressions can do.

The Problem

Lisp-derived systems normally represent programs as s-expressions,
where an operation and its parameters is surrounded by parentheses;
the operation to be performed is identified first,
and each parameter afterwards is
separated by whitespace. So the traditional “2+3” is written as
“(+ 2 3)”.
This is regular, but most people find them hard to read.
Even if you are used to them, this is a problem when trying to
work with others.

Early Lisp was even harder to read, because it lacked abbreviations like 'x
for (quote x). I believe that additional abbreviations and conventions
can be created so that programs can be easily read without losing
Lisp's capabilities.
In particular, these are not tied to any particular semantic, so you
can do metaprogramming and meta-meta-programming (and so on), all without
problems.

Curly Infix, Modern-expressions, and Sweet-expressions

I've developed a 3-layer approach to making Lisp more readable, which
is all based on adding additional abbreviations to an S-expression reader
that can work with any S-expression (and are not tied to any
particular semantic).
These layers are:

Sweet-expressions: Includes modern-expressions, and adds indentation
as meaningful (like Python, Haskell, and many other languages).

All of these can be used in any Lisp-like language
(Common Lisp, Scheme, Emacs Lisp, ACL2, BitC, etc.).
Curly-infix is 100% compatible with existing Lisp code; modern-expressions
and sweet-expressions are 100% compatible with existing well-formatted Lisp
code.
Yet they add additional abbreviations to the reader that make
programming much more pleasant.
Since they are automatically translated into s-expressions, yet
maintain all their capabilities (quasiquoting, etc.), they
lose no power.

Sweet-Expression Examples

Here are two quick examples - we'll use sweet-expressions version 0.2
to represent calculating factorials and Fibonacci numbers,
in both cases using Scheme:

Note that you can use traditional math notation
for functions;
fibfast(n) maps to (fibfast n).
Infix processing is marked with {...};
{n <= 2} maps to (<= n 2).
Indentation is significant, unless disabled by (...), [...], or {...}.
This example uses variable names with embedded "-" characters; that's
not a problem, because the infix operators must be surrounded by whitespace
and are only used when {...} requests them.

It's actually quite common to have a function call pass one parameter,
where the parameter is calculated using infix notation.
Thus, there's a rule to simplify this common case (the prefix {} rule).
So factorial{n - 1} maps to
factorial({n - 1}) which maps to
(factorial (- n 1)).

Rules for Curly Infix, Modern-Expressions, and Sweet-Expressions

I've devised three levels of notation and given them each names:
Curly Infix, Modern-Expressions, and Sweet-Expressions.
Each builds on the previous one, so let's take them in order.

Curly Infix

"Curly infix" adds one simple rule:

{...} contains an "infix list".
If the enclosed infix list has (1) an odd number of parameters,
(2) at least 3 parameters, and
(3) all even parameters are the same symbol, then it is mapped to
"(even-parameterodd-parameters)".
Otherwise, it is mapped to "(nfx list)" — you'll need to
have a macro named "nfx" to use it

This rule may seem arbitrary, but it isn't.
The first 3 conditions define a "simple infix" expression, which is
exactly the set of all infix expressions that can represent a single list
(an expression like (+ 3) doesn't really have an infix operator,
since by definition an infix operator is between its operands).
At first I considered reporting an error if a simple infix expression
isn't sent, but prepending "nfx" is much more flexible.

Consistently using {...} so infix operators are always equal in a particular
list has the advantage that all macros will see the usual list form -
with the function in the first position.
If you want operator precedence, define an nfx macro
to implement the precedence rules you desire.
Or, if you never want precedence, define nfx to be an error.
The even parameters must be exactly the same symbol; pointer equality such
as Scheme's eq? is a good way to test this.
Every infix operator must be surrounded by whitespace for this rule
to work as designed.

Notice that this does not include any precedence system, by design.
Many people have devised infix processing systems for Lisps, and of course,
they implement various mechanisms for precedence.
If you have a specific semantic in mind, that's useful.
But people often choose Lisp-based languages so that they can do
meta-programming (and meta-meta-programming) - so soon there is no
single precedence set, making precedence handling more harmful
than helpful.
It also causes trouble with code-sharing - not everyone agrees on a precedence
level.
By intentionally not building in a precedence system, we make things
amazingly simple - we don't need to register functions, decide their order,
or anything like it - making programming much simpler and easier.
There's no need to memorize a precedence system, code transfers easily,
and code is generally easy to read too (again, because you don't have
to memorize a precedence system).
In cases where you do want a precedence system, you can implement
an "nfx" macro.

This use of {...} is highly compatible with various Lisps.
I think this rule would be a great backwards-compatible
addition to the standard reader of any Scheme and Common Lisp
implementation.
Scheme specifically reserves {...} for future use
(R5RS section 2.3, R6RS section 4.21).
Common Lisp does not define {} (see section 2.4 of the
Common Lisp Hyperspec, based on ANSI Common Lisp X3.226), but notes
its potential use by users.
BitC spec version 0.10 (June 17, 2006) section 2.4.3 also reserves {...}.

It's important to note that inside the infix expression you can do
anything you can do in normal Lisp.
This is different from nearly all Lisp infix systems, which have their own
incompatible language inside that can't handle arbitrary s-expressions.
You can use arbitrary s-expressions with quasi-quoting, unquote-splicing,
or whatever inside, and all without "registering" anything.

Surprisingly, this simple mechanism is actually enough to do what
people actually want in an infix mechanism for Lisp.
You can add things, like {x + 1}, or compare values,
like {x <= 5}.

This is an unusually simple mechanism, but like much of Lisp, its
power comes from its simplicity.

Modern-Expressions

Modern-expressions build on curly-infix's use of {...}.
Modern-expressions also add the ability to use [...],
as well as (...), to surround ordinary
lists (Scheme R6RS does this too, and both Common Lisp and Scheme R5RS
reserve [...] for future use).

What's more, if (...), {...}, or [...] are prefixed with a symbol
or list (i.e., have no whitespace between them), they have a new meaning
in modern-expressions:

Prefixed (...).
Syntax of the form e(...) — with no whitespace
between symbol or list e and the open parenthesis —
are mapped to (e ...).
Any parameters in "..." are space-separated.
This produces another expression, so this can be repeated (left-to-right).
‣
This adds support for traditional function notation.
For example, "cos(x)" maps to "(cos x)",
"max(3 4)" maps to "(max 3 4)",
and "f(x)(a b)" maps to "((f x) a b)".
Note that this is especially convenient for certain styles of
functional programming, including lambda expressions;
in Scheme, lambda((x) {x + x})(4) would compute as 8.

Prefixed {...}. A prefixed expression f{...}, where f is a
symbol or list, is an abbreviation for f({...}).
‣
This rule simplifies combining function calls and
infix expressions when there is only one parameter to the function call.
This is a common case; for example, "not" (which is normally given only
one parameter) often encloses infix "and" and "or".
Thus, f{n - 1} maps to (f (- n 1)).
When there is more than one function parameter, use the normal term-prefixing
format instead, e.g., f({x - 1} {y - 1})
maps to (f (- x 1) (- y 1)).

Prefixed [...]. Prefixed square brackets e[...],
where e is a symbol or list,
maps to (bracketaccess e ...).
‣
Thus, "t[x]" maps to
"(bracketaccess t x)". This is intended to simplify use of
indexed arrays, associative arrays, and similar constructs.
You could even define bracketaccess as a macro that simply returns
its arguments; in this case f[5] would eventually map to (f 5).

These combine well with curly-infix forms of {...}.
For example, {-(x) * y} maps to
(* (- x) y).

A common extension must be supported: (. x) must mean x.
This provides a simple way to
escape certain constructs, such as the "." or "group" symbols
that have extra meaning in sweet-expressions.
It turns out that in a typical implementation of a list reader, it takes
extra effort to prevent this extension, so this is an easy
extension to include.

Modern-expressions are very compatible with most existing text editors
for Lisp.
Editors not "understand" the code, but many work to match
(...), {...}, and [...], and that is enough to be useful.
After all, Scheme R6RS already requires support for [...] anyway, and
Common Lisp readers are designed to allow {...} to be overridden, so
many text editors are designed to support this.
Modern-expressions are easy to use at the command line, too -
for example, you don't need to enter a blank line to execute something.

Sweet-expressions

Sweet-expressions start with modern-expressions and add
indentation as meaningful:

Indentation is meaningful; the "I-expressions" of
Scheme SRFI-49
are supported, with
the 2008
I-expression revisions .
An indented line is a parameter of its parent,
later terms on a line are parameters of the first term, and
lists of lists are prefixed with the term "group".
A line with exactly one datum, and no child lines, is simply that item;
otherwise that line and its child lines are themselves a new list.
Indentation is disabled inside the grouping pairs (), [], and {}, whether
they are prefixed or not.
Lines with only leading whitespace and a ;-comment are completely ignored -
even their indentation is irrelevant.
Empty lines, possibly with tabs and spaces, are ignored during reading of
the initial line of an expression; otherwise they end an expression.

A blank line always terminates a datum, so once you've entered a
complete expression, "Enter Enter" will always end it.
The "blank lines at the beginning are ignored" rule eliminates a
usability problem with the original I-expression spec, in which
two sequential blank lines surprisingly return ().
(The sample implementation did end expressions on a blank line -
the problem was that the spec didn't capture this.)
A function call with 0 parameters must be surrounded or immediately
followed by a pair of parentheses: (pi) or pi().

Generally it's best to start each new expression on the left edge;
if you choose not do to that, include a blank line between each new expression.

Comments on the Rules

Note that usual Lisp quoting rules still work,
so 'a still maps to (quote a).
But they work with the new capabilities, so 'f(x) maps to
(quote (f x)). Same with quasiquoting and comma-lifting.
A ";" still begins a comment that continues to the end of a line, and
"#" still begins special processing.

Implementations may call underlying implementations when they
encounter "#"; in those cases, an expression begun by "#" will not
continue to suport sweet-expressions. For example, in Scheme,
use vector(...) instead of #(...).
Many Scheme implementations have nonstandard extensions for "#",
so a portable sweet-reader can't easily reimplement the functionality of a
local "#".
Nor can the sweet-reader easily call on the underlying implementation of "#"
on some implementations, e.g.,
Scheme only supports a one-character peek with no unget character.

If an implementations called a "standard" s-expression reader when
it encountered an open parenthesis, it would
be extremely backward-compatible with essentially all existing Lisp files.
However, this mode is hard to use; it would mean that you must use [...]
for lists, and failure to do so would produce mysterious errors.
After some experimentation, I found that it was a bad idea and dropped it.

The (. x) rule is a common extension in Scheme implementations;
it's required here so that I-expression's "group" term can be easily escaped.
Note that any "(" preceded by whitespace, "(", "{", or "[" is unprefixed.

Note that you have to disable indentation
to use infix operators as infix operators.
This doesn't seem to be a problem in practice.

With sweet-expressions, you can use the traditional Lisp read-eval-print loop
as a calculator, as long as you remember to surround infix expressions
with {...} and surround infix operators with whitespace.
For example, "{3 + 4}" will be mapped to (+ 3 4),
which when executed will produce "7".
Use normal function notation for unary functions, e.g.,
"{-(x) / 2}" maps to
"(/ (- x) 2)".
Nest {...} when you need to, e.g.,
"{3 + {4 * 5}}"
will map to "(+ 3 (* 4 5))".
If you mix infix operators at the same level, you must have an "nfx"
macro defined to handle precedence, and you must be careful about other
macros you use.

Notice that since all the transforms happen in the reader,
sweet-expressions are highly compatible with macros.
Sweet-expressions simply define new abbreviations, just as 'x became
(over time) a standard abbreviation for (quote x).
As long as simple infix expressions are used (ones that don't create nfx),
after reading the expressions all expressions are normal s-expressions,
with the operator at the initial position.
So macros defined by Common Lisp's macros, etc., will work as expected.
Common Lisp has some hideously confusing terminology, though.
Common Lisp has macros, but it also has a completely different
capability: "macro characters", which introduce "reader macros" -
i.e., hooks into the reader used during read time.
The Common Lisp Hyperspec clearly states in its glossary
on macro characters, "macro characters have nothing to do with macros",
but I think they should have chosen a name that had nothing to do with
macros as well.
Obviously sweet-expressions can affect macro characters,
since they implement a different reading syntax.
This doesn't affect most real Common Lisp programs, which
often avoid macro characters anyway.
Common Lisp macro functions (e.g., defmacro and macrolet) work just fine
with sweet-expressions.

If it's the last character of the line (other than 0 or more spaces/tabs), then the newline is considered a space, and the next line's indentation is irrelevant. This continues the line. (Note that comments cannot follow, because that would be confusing.)

If it's between items on a line, it's interpreted as a line break to the same indentation level.

Otherwise, if it's at the beginning of a line (after 0+ spaces/tabs), it's ignored - but the first non-whitespace character's indentation level is used.

This is mainly to handle named parameters more gracefully, e.g.:

myfunction \
:option1 \ f(a)
:option2 \ g(b)

could map to (myfunction :option1 (f a) :option2 (g b)).
Note that f(a) or g(b) could be the beginning of a complex program
using indentation, since \ does not turn off indentation.

Programming with Sweet-expressions

General Rules

Mentally, this is pretty straightforward - on each line,
write an expression; everything after the first term on the line, or
all child lines, are parameters of the first term.
You can use grouping operators (), [], and {}
to put subexpressions on the same line, if you want.
Use -(...) to negate something.

Whenever you have an infix expression, just surround it with {...}.
You can use the form f(...) to call a function;
if it has zero parameters, express it as f(),
and if it has more than one parameter, separate the parameters with spaces.
The f(...) form is especially handy for creating short expressions
as a parameter on a line; for long expressions, use indentation instead.

The word "group" starts lists of lists in sweet-expressions
(and I-expressions).
This makes it easy to create lists of lists, without having to create
special syntax for each variation.

This is all implemented by modifying the "read" function, so that it recognizes
all these formats and generates s-expressions.
Since macros operate on s-expressions, macros work just fine.
You can have infix operators in macro definitions, and you can have
infix operators in the expressions processed by macros.

Interactively, you can just type 'load("filename")' or {3 + 2},
then Enter Enter.

Certain functions require groups, and you learn what they are (and their
patterns) they're pretty easy to manage.

Examples of specific constructs

Here are a few examples, using sweet-expressions.

The "cond" form is widely-used, and works beautifully.
Here's an example:

If the condition gets long, or you have many operations, just make the
operations child lines of the condition.

The "let" forms are a case where you need "group". E.G.:

let
group
x 2
y 3
{x * y}

I actually don't like "let" all that much anyway, even when using
traditional Lisp notation.
You might find it more efficient to define a single-variable let.
Here's a straight s-expression form of this, using the define-macro
form supported by many Scheme implementations including guile
(it's a valid sweet-expression too, of course):

Is the World Ready for this?

I'm well aware that there are some who don't like any change in
Lisp notation.
Some of these people seem to believe that the current Lisp notation was
handed down from on high, never to be changed.
Well, you don't have to use improvements like this, or even agree that
they are improvements.
But most software developers have abandoned Lisp
precisely because of Lisp's hideous,
inadequate notation (and I say that as someone who has used Lisp for decades).
Lisp notation was not handed down from on high, and it
has changed over time.
The
"LISP 1.5 Programmer's Manual"
(by John McCarthy, Paul W. Abrahams, Daniel J. Edwards, Timothy P. Hart
and Michael I. Levin; The M.I.T. Press, 1962, second edition)
describes the parent of all modern Lisp-based systems.
(Note that even LISP's creator didn't think much of using S-expressions
as a programming notation.)
LISP 1.5 did not have a ' operator - you had to say (QUOTE X).
It didn't have abbreviations for quasiquoting (`) or comma-lifting (,) either.
Today, people would not accept a Lisp that didn't at least have the common
abbreviation for QUOTE.
Indeed, Tony Hasemar's book "A Beginner's Guide to Lisp" (1984) says in the
second page of the Foreward, "do NOT buy a Lisp which does not allow
the single-quote sign in place of the word QUOTE, unless you have
absolutely no alternative".
Lisp notation has been stagnant for a while; it's time to add modern
conveniences as abbreviations.

Some objections don't seem to realize that this proposal is different.
It's true that there have been many abandoned efforts of the past to
improve on S-expressions, but I think all those efforts
failed to realize that any replacement for S-expressions must be
completely general, just as S-expressions are,
and not tied to a particular semantic.
Practically all past efforts, such as M-expressions and similar work,
failed precisely because they weren't general enough.
It's true that tooling support is necessary for any notation like this
(e.g., in program editors), but that's why a standard
format needs to be defined so tools can implement it
(and not 1000 application-unique reader macros).
There's no reason tools can't support sweet-expressions as well as they
support s-expressions today.

I think most software developers will not
agreeably use a Lisp-based language unless that language
has better built-in support for an easy-to-read programming notation.
Programs must be read by others, and if the programming notation is
odious to read, then the language has a key flaw.
Most developers think Lisp is odious to read, even after
they've used it for a while.
If the Lisps won't provide an easy-to-read notation,
those developers will just use another language that's
more user-friendly (even when it's less appropriate for their problem) -
and that is precisely what they are doing.
Here, we try to learn from the past, keep all of S-expression's benefits,
but provide a better notation that others can read.

Closing Remarks

Sweet-expressions can take a few minutes to learn how to use,
just like anything else new.
But I think they won't take long for people who already know how to
use s-expressions, and they are far more readable in my opinion.
Something impenetable can be written using sweet-expressions, of course,
but at least the basics of the notation don't get in the way.
There is a risk that the notation could deceive the reader into confusion;
I think after using the notation for a little bit this is unlikely, but
that's sometime that an experiment should test.
In any case,
in an era where developers must read a lot of code, thinking about
ways to improve readability is important.
I hope that this is, or is the beginnings of, a way to improve
readability for s-expressions.

For more information, see my website page at
http://www.dwheeler.com/readable.
I've also set up a SourceForge project where options like
sweet-expressions can be discussed, and code can be shared.
If you're interested, please join!