Safely Composable Type-Specific Languages

Programming languages often include specialized syntax for common datatypes (e.g. lists) and some also build in support for specific specialized datatypes (e.g. regular expressions), but user-defined types must use general-purpose syntax. Frustration with this causes developers to use strings, rather than structured data, with alarming frequency, leading to correctness, performance, security, and usability issues. Allowing library providers to modularly extend a language with new syntax could help address these issues. Unfortunately, prior mechanisms either limit expressiveness or are not safely composable: individually unambiguous extensions can still cause ambiguities when used together. We introduce type-specific languages (TSLs): logic associated with a type that determines how the bodies of generic literals, able to contain arbitrary syntax, are parsed and elaborated, hygienically. The TSL for a type is invoked only when a literal appears where a term of that type is expected, guaranteeing non-interference. We give evidence supporting the applicability of this approach and formally specify it with a bidirectionally typed elaboration semantics for the Wyvern programming language.

Comment viewing options

I've also been thinking about this problem, where so many data types get second-class treatment in the syntax. But the solution I'm pursuing is very different from TSLs. It might be interesting to compare and contrast a little.

I was inspired some by Pure Data's graphical data structures, and by Thomas Lord's compelling concept for gesture-based programming. I've also been interested in the embedding interactive widgets in code - which seems especially useful for live programming, but are still useful in other contexts. I'm intrigued by the possible opportunities such as 'playing' a music literal, or rotating and manipulating a 3D mesh literal, or having literals visually representing levels in a 2D platformer game. Potentially, most uses of 'separate files and tools' (e.g. for textures, meshes, music, world data, scripts in a video game) could be handled by a sufficiently rich literals language.

My idea is basically this: to embed an object directly in the source code, which may be asked to render itself (using something like HTML DOM) and which provides methods for its own update.

However, these 'embedded literal objects' [1] should respect other features we attribute to literal strings. For example, literals are easily copied and pasted, and subject to conventional version control mechanisms. This requires a language to encode embedded literal objects in a serializable format - a language that is easily recognized by integrated development environments for special render and update behaviors. The idea of TSLs doesn't seem incompatible. We could potentially create a supertype or wrapper for objects and a subset of TSLs that an IDE can recognize. In any case, it must be a language from which we may efficiently extract value.

For my own projects, I might simply embed a string of Awelon bytecode between unicode white square brackets ã€šã€› (U+301A, U+301B), where the bytecode models an object with multiple update and query methods. An interesting consequence is that, even after compiling to bytecode (modulo partial evaluation), I can recognize, render, and update the embedded literals. I like the idea of portable, reusable, composable, mostly opaque software components (for a code-as-material metaphor) that occasionally come with built-in widgets and documentation for tweaking their parameters or behavior.

SRFI 108 defines a generic lexical syntax for Scheme objects. There are a lot of bells and whistles, but the idea is that something like &URI{http://example.com/} is translated into ($construct$:URI "http://example.com/"). The programmer must provide or import a macro or procedure definition of $construct$:URI. $ is a valid Scheme identifier character, but it is rarely used. Because the translation scheme is fixed, there is no phasing problem: the definition of $construct$:URI does not have to be available to the parser. But because Scheme allows macros, it's possible for the compiler to do arbitrary things with the argument at compile time.

Extensions to this basic scheme allow the inclusion of arbitrary expressions in the argument which are evaluated in their lexical context.