Monday, September 29, 2014

A DOMain of Shadows

One of the advantages of an internal DSL over an external one is that you can leverage the full power of a general purpose programming language. If you create an external DSL, you may need to reinvent a slew of mechanisms that a good general purpose language would have provided you: things like modularity, inheritance, control flow and procedural abstraction.

In practice, it is unlikely that the designer of the DSL has the resources or the expertise to reinvent and reimplement all these, so the DSL is likely to be somewhat lobotomized. It may lack the facilities above entirely, or it may have very restricted versions of some of them. These restricted versions are mere shadows of the real thing; you could say that the DSL designer has created a shadow world.

I discussed this phenomenon as part of a talk I gave at Onward in 2013. This post focuses on a small part of that talk.

Here are three examples that might not always be thought of as DSLs at all, but definitely introduce a shadow world.

Shadow World 1: The module system of Standard ML.

ML modules contain type definitions. To avoid the undecidable horrors of a type of types, ML is stratified. There is the strata of values, which is essentially a sugared lambda calculus. Then there is the stratum of modules and types. Modules are called structures, and are just records of values and types. They are really shadow records, because at this level, by design, you can no longer perform general purpose computation. Of course, being a statically typed language, one wants to describe the types of structures. ML defines signatures for this purpose. These are shadow record types. You cannot use them to describe the types of ordinary variables.

It turns out one still wants to abstract over structures, much as one would over ordinary values. This is necessary when one wants to define parameterized modules. However, you can’t do that with ordinary functions. ML addresses this by introducing functors, which are shadow functions. Functors can take and return structures, typed as signatures. However, functors cannot take or return functors, nor can they be recursive, directly or indirectly (otherwise we’d back to the potentially non-terminating compiler the designers of ML were trying so hard to avoid in the first place).

This means that modules can never be mutually recursive, which is unfortunate since this turns out to be a primary requirement for modularity. It isn’t a coincidence that we use circuits for electrical systems and communication systems, to name two prominent examples.

It also means that we can’t use the power of higher order functions to structure our modules. Given that the whole language is predicated on higher order functions as the main structuring device, this is oddly ironic.

There is a lot of published research on overcoming these limitations. There are papers about supporting restricted forms of mutual recursion among ML modules. There are papers about allowing higher-order functors. There are papers about combining them. These papers are extremely ingenious and the people who wrote them are absolutely brilliant. But these papers are also mind-bogglingly complex.

I believe it would be much better to simply treat modules as ordinary values. Then, either forego types as module elements entirely (as in Newspeak) or live with the potential of an infinite loop in the compiler. As a practical matter, you can set a time or depth limit in the compiler rather than insist on decidability. I see this as a pretty clear cut case for first class values rather than shadow worlds.

Shadow World 2: Polymer

Polymer is an emerging web standard that aims to bring a modicum of solace to those poor mistreated souls known as web programmers. In particular, it aims to allow them to use component based UIs in a standardized way.

In the Polymer world, one can follow a clean MVC style separation for views from controllers. The views are defined in HTML, while the controllers are defined in an actual programming language - typically Javascript, but one can also use Dart and there will no doubt be others. All this represents a big step forward for HTML, but it remains deeply unsatisfactory from a programming language viewpoint.

The thing is, you can’t really write arbitrary views in HTML. For example, maybe your view has to decide whether to show a UI element based on program logic or state. Hence you need a conditional construct. You may have heard of these: things like if statements or the ?: operator. So we have to add shadow conditionals.

<template if="{{usingForm}}">

is how you’d express

if (usingForm) someComponent;

In a world where programmers cry havoc over having to type a semicolon, it’s interesting how people accept this. However, it isn’t the verbose, noisy syntax that is the main issue.

The conditional construct doesn’t come with an else of elsif clause, nor is their a switch or case. So if you have a series of alternatives such as

if (cond1) {ui1}

else if (cond2) {ui2}

else {ui3}

You have to write

<template if = "{{cond1}}">

<ui1>

</template>

<template if = "{{cond2 && !cond1}}">

<ui2>

</template>

<template if = "{{cond3 && !cond2 && !cond3}"}>

<ui3>

</template>

A UI might have to display a varying number of elements, depending on the size of a data structure in the underlying program. Maybe it needs to repeat the display of a row in a database N times, depending on the amount of data. We use loops for this in real programming. So we now need shadow loops.

<template repeat = "{{task in current}}">

There’s also a for loop

<template repeat= "{{ foo, i in foos }}">

Of course one needs to access the underlying data from the controller or model, and so we need a way to reference variables. So we have shadow variables like

{{usingForm}}

and shadow property access.

{{current.length}}

Given that we are building components, we need to use components built by others, and the conventional solution to this is imports. And so we add shadow imports.

<link rel = "import” href = "...">

UI components are a classic use case for inheritance, and polymer components support can be derived from each other, starting with the predefined elements of the DOM, via shadow inheritance. It is only a matter of time before someone realizes they would like to reuse properties from other components in different hierarchies via shadow mixins.

By now we’ve defined a whole shadow language, represented as a series of ad hoc constructions embedded in string-valued attributes of HTML. A key strength of HTML is supposed to be ease-of-use for non-programmers (this is often described by the meaningless phrase declarative). Once you have added all this machinery, you’ve lost that alleged ease of use - but you don’t have a real programming language either.

Shadow World 3: Imports

Imports themselves are a kind of shadow language even in a real programming language. Of course imports have other flaws, as I’ve discussed here and here, but that is not my focus today. Whenever you have imports, you find demands for conditional imports, for an aliasing mechanism (import-as) for a form of iteration (wildcards). All these mechanisms already exist in the underlying language and yet they are typically unavailable because imports are second-class constructs.

Beyond Criticism

It is very easy to criticize other people’s work. To quote Mark Twain:

I believe that the trade of critic, in literature, music, and the drama, is the most degraded of all trades, and that it has no real value

So I had better offer some constructive alternative to these shadow languages. With respect to modularity, Newspeak is my answer. With respect to UI, something along the lines of the Hopscotch UI framework is how I’d like to tackle the problem. In that area, we still have significant work to do on data binding, which is one of the greatest strengths of polymer. In any case, I plan to devote a separate post to show how one can build an internal DSL for UI inside a clean programming language.

The point of this post is to highlight the inherent cost of going the shadow route. Shadow worlds come in to being in various ways. One way is when we introduce second class constructs because we are reluctant to face up to the price of making something a real value. This is the case in the module and import scenarios above. Another way is when one defines an external DSL (as in the HTML/Polymer example). In all these cases, one will always find that the shadows are lacking.

21 comments:

Given how limited the control logic on <template> is, I think the Polymer devs didn't want to allow people to put heavy logic in the HTML. From what I see, it's supposed to be just a thin display layer.

Very timely article. Just this weekend I decided that Polymer isn't for me. I prefer a fully programmatic solution (i.e. use plain Dart with a simple library that implements some MVC patterns and lightweight Dart widget classes sitting on top of the DOM).

I don't use .net anymore, but I think microsoft's approach with the Razor engine offers the best of both worlds. When you need control flow or access to data, you drop into actual code inside a <@ @>, which is then compiled alongside the html into a view. Polymer already lends itself to this pre-compiled view paradigm, so using a razor approach where you mix (lets say Dart) code with HTML should be possible.

I like the approach used in JSX and React where HTML tags are syntactic sugar for function calls. This allows you to use arbitrary JavaScript in a function that acts as an HTML template with no shadows needed. It's really too bad Dart doesn't have this.

George, you don't have to use Polymer, but if you're making UI components, I'd still use custom elements and shadow DOM (an entirely different type of shadow than the focus of this piece). The APIs are very usable even without the sugar that Polymer gives you, and then you components will be freely mixable with other elements from Polymer or projects like Mozilla's Brick component library.

Brian, Dart, and the web now do have this with custom elements. A tag for a custom element is not unlike a function invocation, except that it invokes several lifecycle methods that end up being important for interacting with the document. If you wanted to wrap a function that just produced DOM into a custom element that only implemented the created callback and pushed that DOM into it's shadow tree, that would be trivial.

Justin: Actually, I am not sure about Shadow DOM either, it seems that it will not be supported in Safari any time soon. It the support was broader I would love to use it (it's the template and data binding things that I am not sure I like).

You say a custom tag is like a function call.When I use a custom element tag, it seems to me more like traditional constructor invocation. It isn't easy to use it like a factory method for example. Which is another illustration of the point of the post.

Lisp has most informed my own sensibilities about that leap between outside-in data manipulation and implementing a full-blown interpreter with environments and extensible primitives (inside-in manipulation capability).

A programming environment seems the ultimate device for ascribing meaning to data (or data via syntax, in the case of Polymer and dozens of similar systems). Type and module systems like those of ML seem like data with disjoint, ad-hoc semantics (relative to the language they support) in perpetual want of full program power.

In the spirit of creating instead of hating, I humbly introduce Hoplon, a reification of my own thinking that several collaborators and I came up with, at http://hoplon.io. Thanks in advance for giving it a look - I welcome any feedback you might have.

An overview of the evaluation semantics we apply to HTML, making it a Lisp, which specifically address the issues you outline in this post, is at https://groups.google.com/d/msg/clojure/gRFyzvRfPa8/QY_HvjaVfvUJ

We think it might be better, but know it will never be popular. In the meantime, I will definitely investigate Hopscotch.

Another thriving shadow world are Java annotations, which evolve into more and more complex constructs, either by stacking annotations, nesting them, or embedding various expression languages as string-typed parameters.

And that's also where languages like Scala get some of their popularity: it is practical to replace the use-cases of annotations with purely-Scala constructs.

Going a bit further, a lot of frameworks are in fact creating such shadow worlds. They make some things easier, but putting a lot of constraints on what's possible to express (using the limited "language" that the framework authors' envisioned).

And they often make the easy problems easier, but the hard problems even harder, resulting in code which bypasses the framework mechanisms in very smart ways.

It also works the other way around -- some actions are only available in shadow worlds of limited DSLs and not in the language they are embedded into. For example, C++/Java catch specifications are essentially a typecase (or a simple pattern match on types), but the actual languages lack this construct.

The problem of "shadow domains" is one of the fundamental issues in notational engineering, and has been recognized for a long time. (Eg, see Landin, 1966, "The Next 700 Programming Languages".) But it's good to have a catchy name to describe the problem: thanks for contributing one.

However, if you are trying to design a highly expressive general purpose language that is also practical and usable, then the problem of avoiding shadow domains within the language is quite difficult, and I am not aware that anybody has solved the problem.

I want to call you out on this: "I believe it would be much better to simply treat modules as ordinary values. Then ... live with the potential of an infinite loop in the compiler. As a practical matter, you can set a time or depth limit in the compiler rather than insist on decidability. I see this as a pretty clear cut case for first class values rather than shadow worlds."

This idea has been around for a long time, but I'm not aware that anybody has figured out how to make it practical and usable. Luca Cardelli wrote a theory paper, "A Polymorphic λ-calculus with Type:Type", in 1986. The Cayenne programming language implemented these ideas, but as I understand it, the result isn't usable. The type checker is happy to reduce any function, even non-total functions, and it's far too easy to accidently get the type checker into an infinite loop, just by having non-total functions in your program, even if they don't cause run-time problems.

Dynamically typed languages don't fare any better than statically typed languages. Scheme also has a problem with shadow domains. Scheme has macros, which are not first class values, and if you try to do abstraction around macros, then you end up building another shadow domain. A few examples would help. In Scheme 'if' is a macro, meaning it's not a first class value. You can't pass 'if' as an argument to a function, but you can pass it as a macro argument: macros are shadows of functions. (In Haskell, 'if' is first class. This is ironic if you subscribe to the school of thought that dynamic languages have less of a problem with shadow domains than statically typed languages.) In Scheme, 'set!' is a macro. The first argument of 'set!' is a variable name, but there's no obvious way to do something like this: (set! (if C var1 var2) 42). Any solution requires re-inventing the wheel in the shadow domain of macros. By contrast, in C I can write *(C ? &var1 : &var2) = 42;.

I've come across these issues many times. The worst offender I've seen is a CMS I used to maintain called ocPortal which contains many nested languages: PHP for "real" programming, "Tempcode" for templating, "Comcode" for content, "safe Comcode" for user-generated content and HTML at the bottom (plus a smattering of SQL, and admins could access a crude Unix-like terminal too).

It also reminds me of the "inner platform effect" ( http://en.wikipedia.org/wiki/Inner-platform_effect ), where some rudimentary extension/customisation mechanism grows and grows until it's a "shadow" version of the main application.

@doug There is a subtle distinction between first-class types (which imply a "type-of-types") and "type : type": the former requires evaluation to occur at compile time, which may get stuck in an infinite loop, whilst the latter is actually *inconsistent*.

Even though infinite loops can also cause inconsistency, the compiler won't actually generate an invalid/unsafe program: it will just fail to generate *any* program!

Note that infinite loops *can* cause inconsistency in the presence of optimisations: for example, we can write a partial halting oracle which runs a given program then returns "true"; if the program never halts, we never reach the return statement. However, an optimising compiler may notice that the argument program's return value is never used, and hence optimise-away the call, and cause our oracle to immediately return "true" for everything.

On the other hand, "type : type" says that the 'type of types' is its own type. For example, the type of "5" is "int", the type of "int" is "type" and the type of "type" is "type". This allows self-referential paradoxes, like Russel's paradox (is "the set of sets which don't contain themselves" a member of itself?). This can be mitigated by numbering the types, so "5" has type "int", "int" has type "type0", "type0" has type "type1" and so on.

As for the practicality of this, Idris seems to come closest. Agda and Coq are interesting, but more suited for formal Maths than "real" programming.