Domain Specific Languages

This page discusses Domain Specific Languages, providing categories and information about how DSLs are engineered (or Life After Parsing).
It shows how DSLs are key to enabling
and extremely effective class of program generation technology based on program transformation, tracing the origins back to a 1970s tool called Draco. A modern code generation engine, the
DMS Software Reengineering Toolkit, uses
the Draco concepts to process many DSLs effectively. Surprisingly,
DSLs are fundamental to implementing this class of generators;
DSLs are used to support DSLs!

A typical procedural computer language is general purpose: it can solve arbitrary problems ("Turing capable") but is basically clumsy for almost everything (ask any programmer).
A domain-specific language
or DSL is a notation system (and corresponding semantics) that allows the succint
description of a problem or a solution in a narrow problem ("domain") area.
The notation takes advantage of the existence of a community of philsophers and engineers
who have struggled to characterize a problem area and found such a notation
that is effective, and solutions that can be produced for instances of the notation. Such notations may be text-like (mathematics, Ascii text, Unicode) or graphical (varieties of geometric shapes linked by various kinds of arcs), often annotated with text (from another DSL!)

A Zoo of DSLs

There are many existing DSLs, and people design new ones
quite frequently. Here are some examples that existed well before computers:

George Boole's algebra for writing down logic equations

Leibniz' Calculus (he and Newton disagreed on the notation and Newton eventually won)

Readers that get to this point are likely DSL inventors, or live in close proximity to one.

Our particular perspective treats "typical computer languages" as just another kind of special purpose domain: general purpose computing. From this perspective, these languages are just notational systems with semantics with associated solutions ("compilers"), and viewed from afar, look just like all the other kinds of DSLs.

COBOL, a language designed for business purposes, that by an large runs the corporations of the world

Algol, one of early, cleanly designed procedural languages allowing call-by-name and recursion

LISP, an early dynamic language, with the ability to manipulate its own structures,
still favored by many AI programmer

C, the language at the bottom of most extant operating systems and many embedded devices

C++, used to build large complex applications and a favorite in the stock trading business

Java, a failed attempt at an embedded language that succeded as run-anywhere enterprise application language

C#, Microsoft's attempt to overrun Java and succeed at world domination

Ruby, a hot new dynamic language used for building complex web applications

Domain Analysis

Somewhere, somehow, these DSLs get invented. Breifly,
DSLs are motivated by the observation by a small set of individuals,
who see a problem area, recognize that problems or solutions in
that area all have some kind of commonality, and find a way to write
down the commonality and the variability; a DSL is born.
This process is necessarily messy; it is the scientific process applied
to language design and usually involves a lot of guess- and re-work.
Most of us are lucky enough to get to see and use the final product without
the intermediate pain.
This process is called domain analysis.
You can read more about how to approach this is a systematic manner in Guillermo Arango's papers, including
Domain Analysis Concepts and Research Directions.

Domain Engineering

For a DSL to become useful, ultimately it needs to acquire a stable notation, semantics, and in this modern day, tools to analyze the DSL, or to generate solutions based on the problem statement the DSL poses.

A classic way to capture the notation ("syntax") for text languages is
BNF (Backus-Naur Form),
which is itself a DSL for describing syntax! Graphical languages can be described by
Graph Grammars.

With these, all that is left is building machinery that can accepts instances
for the formal language, and somehow process that instance into a solution,
taking advantage knowledge of the semantics of the language.
The classical approach for this is to build a compiler.

Now we get to a technology trap: building compilers is hard. We are
stuck: we have a great way to write down problems... and it doesn't do us
any good without the compiler.

Generic Program Transformation Engines

Draco

The term domain analysis came from Dr. James Neighbors in
the late 1970s. He wanted to build sets of cooperating DSLs and faced
the compiler problem. Fortunately,
program transformation technology was
being developed at the time, as a kind of generalized compiler. What program
transformations do is transforms structures (think "graphs") incrementally
into other structures; if you apply enough transforms, you can cause
arbitrarily complex changes to the result (think optimization and code generation).
So if you can represent your DSL as a structure,
then you can use a program transformation system to manipulate it.
Neighbors married metacompiler technology (in particular parser-generator generators)
with program transformation systems to enable
his Draco system
to accept a variety of DSLs, and generate complete application systems from this.
The structures he happened to use are Abstract Syntax Trees because string-parsers can produce these easily, but the idea applies
equally well to graphical (e.g, graph-based) languages.

Draco is hardly the first application code generator system constructed.
People have wanted these since time began. But Draco
was one whose ideas are extremely well founded in software engineering
and computer science techology, and whose elements acted with incredible synergy.
Better, the Draco elements were reusable across many domains,
and thus solves much of the compiler problem by amortizing the cost
of building the compiler pieces. Sure, the pieces were still hard to build,
but used across hundreds of domains implementations those pieces suddenly look cheap,
and more importantly, available for use in building the tools for the next domain.

Too bad Draco never went commercial.

DMS Software Reengineering Toolkit

This author happened to be in Neighbor's research group, and caught the DSL bug.
The DMS Software Reengineering Toolkit
was inspired by Draco, and consequently
has as its philsophical core the same concepts as Draco, with
several technology twists fundamental to its commercial success:

Extremely strong (context-free GLR) parsing technology to make it easy
to define arbitrary DSLs and generate the ASTs accurately and easily.
Once a grammar is defined to DMS, it can parse the DSL.

Classic compiler flow analysis machinery, generalized to support the needs
of arbitrary DSLs, making it possible to reason deeply about a particular DSL instance
("program"). The problem this solves is one of facts-at-a-distance;
a piece of information from one part of a large DSL instance may be arbitrarily
far away from where that fact is needed process another part of the DSL
to provide for good code generation.

Source pattern matching, and source-to-source transformations. Given
the working grammar, DMS can directly manipulate the structures.
(For an example of this,
see Algebra defined as a DMS Domain).

A direct confrontation with the problem of scale, through DMS's implementation
in a parallel computer language. Any successful computer
language, will eventually produce large, complex artifacts, that must be processed
in the context of the rest of the complex artifacts that make up the rest of
the software system. For this to work, the tool must be able to handle many
domains concurrently, reason about the set as a whole, and bring enough computer
cycles so that answers are produced in reasonable length of time. The underlying
parallel language allows DMS to harness the available manycore computers that
are now becoming widespread, to the benefit of the engineer and
timely delivery to his end users.

These pieces are all reusable across many, many domains, thus making it practical
to build DSL tools for those many domains.

The claim is that DMS is ideal for building DSLs. That claim is put to
the test in DMS itself, which is largely bootstrapped using DMS and a set
of DSLs that comprise the the core of DMS.
That list of DSLs include:

DMSLexical, a language for describe regular expressions over sequences of Unicode
text, including a "subtract" operation over regular expressions

DMSStringGrammar, which is a remarkably simply and clean EBNF notation for
describing context free grammars and certain semantic properties such
as associativity and commutativity of operators.
Proven on very tough languages such as C++.

DMSPrettyPrinter, a Latex-like language for assembling boxes of text into larger
boxes, used to regenate source text from ASTs

DMSAttributeGrammar, enabling the specification of computations over values
passed around an abstract syntax tree, and compiling down to partial-order
parallel programs in PARLANSE

PARLANSE, a task-parallel programming language with extremely fine grain (fast context switch)
tasks, and safe exception handling in the face of parallelism, to enable
the construction of very large, complex program analysis tools that
DMS is capable of expressing.

Topics

Semantic Designs- Our Goal

To enable our customers to produce and maintain timely, robust and economical software by providing world-class Software Engineering tools using deep language and problem knowledge with high degrees of automation.