User-defined syntax is a powerful, proven tool for enhancing the expressiveness of a language and applicability to diverse domains. Developers of rich APIs and frameworks are able to augment their interfaces – introducing syntactic sugar for common expressions and data, enforcing protocols and schema. Though user-defined syntax is limited in its ability to handle cross-cutting concerns, the syntax can capture the avoid boiler-plate code and discipline associated with leveraging frameworks and architectures that do handle cross-cutting concerns.

Note that I did not mention controlling evaluation order among the benefits. It is true that macros and their like are often used to move evaluation in a strict language, but I believe that controlling effects (including divergence) is the responsibility of good semantics, and orthogonal to support for user-defined syntax. In a lazy language (like Haskell) you don’t need syntax extensions to introduce new conditional constructs – pure functions will do. Similarly, we do not need to couple the notions of user-defined syntax and staged semantics.

User-defined syntax is a poor substitute for good semantics… but it is a substitute.

In 2006, I decided that supporting user-defined syntax would be a feature to achieve for Awelon, allowing a common language and IDE to more effectively serve multiple roles. As a side benefit, it should also help with forward compatibility and versioning of the language.

I further determined several desiderata:

non-monotonic syntax – we often want a DSL or data language to have only a subset of the host language’s expressiveness. This allows developers to enforce disciplines and protocols, and potentially ensure optimizations.

modular and composable syntax – ideally, we should be able to weave, tweak, and join multiple syntaxes together to create a new one. This requires that syntaxes be described by a relatively open structure. And they must be modular in nature, such that developers can extend, tweak, and compose syntax developed in other modules.

no syntactic coupling – syntax used in one module should not interact or bleed into the syntax used in other modules. I.e. the ‘meaning’ (semantics) of a module should be determined by the developer of the module rather than the client. (To the extent a module developer wants the client to provide meaning, the developer is free to export a value for the client to interpret.) This generally rejects the work on patterncalculi and related models.

language defined syntax – we should be able to port our syntax extensions to any correct implementation of the language, and we should be able to lift the syntax into the runtime (i.e. so developers can easily use the same syntax and parsers for both scripts and primary source). This restriction means: no preprocessors, and favor syntax defined in modules.

incremental processing – we want the ability to parse and process modules in advance of using them, i.e. for improved edit-link-test performance, or good code-search. This constrains how we may modularize our syntax.

simple bootstrap – we should achieve a bootstrap, i.e. have a language module written in a subset of a bootstrap syntax that describes the bootstrap syntax, then build the subset into the interpreter or compiler by hand. To the maximum degree possible, parsing should be directed by the language modules.

IDE integration – our IDE should be able to report error messages in terms of local syntax, and highlight intelligently.

In 2007, I was pursuing adaptive grammars, which are grammars that can change based on the contents of whatever is being parsed. In particular, I was favoring Christiansen Grammars. Since then, I’ve become less enamored of this adaptiveness – it seems to be a lot of complexity for some relatively marginal benefits, and you could only take advantage assuming relatively monolothic modules.

My current design declares the language at the top of each module. The IDE imports the appropriate language module (which it must build, unless it’s the built-in bootstrap language), and uses this to parse the module. The syntaxes won’t be adaptive, but developers are able to build new language modules for specialized modules – the goal being to keep modules relatively small and specialized. For example, if developing a video game, your program may have different modules and syntaxes for csound music, level construction, rulebooks, UI forms, dialog trees, enemy AI, and animations.

I decided to adopt Joe Armstrong’s ML9 format, which allows me to define a whole set of modules per file (or even stream modules). Each ‘@‘ section begins a new module, with the first line declaring its language:

Header text here (ignored)
@foo
a module written in foo language goes here.
modules are often very short, and the 'foo'
language might capture a lot of boiler-plate
overhead.
@bar { baz:true, qux:42 }
language modules may be parameterized,
but only with static values. In this case, baz
and qux represent optional language features.
@rem
I expect to have one built-in 'comment' language,
too. Modules written in this language simply don't
contribute to the project.

Modules are linked in terms of content code-search (i.e. find me a module that exports ‘quicksort’). So there is no direct variable or name sharing between modules – importantly, this allows us to reuse a module without a whole forest of dependencies. A ‘foo’ language module would simply be a module that exports ‘language:foo’ along with appropriate grammar and transform functions into the module’s AST. I’ll discuss the linker and discovery mechanisms in a later post.

3 Responses to User-Defined Syntax

Monotonic means you can only `add` to something. I.e. one might consider the ability to extend a language with new operators `monotonic`. In context, non-monotonic syntax means also having the ability to remove syntactic constructs, e.g. to block access to certain keywords and operators and functions.