Tools

Creating Your Own Domain-Specific Language

By Sebastian Zarnekow, August 23, 2011

Part 1: Writing the grammar without a course in parsing
theory

Entity Grammar Step-By-Step

Let's have a closer look at the grammar notation and then develop the actual syntax for the entity language step by step. The first parser rule in a grammar is the entry point for the language. Each time the parser reads a file, it starts with the entry rule and walks through the grammar to consume the input file.

By providing meaningful names for each rule, you can ensure that the resulting object graph has an intuitive structure. The root object should be a Domainmodel. It provides access to a number of elements which are either packages, types, or imports. These three concepts will be derived from AbstractElement to make sure that we can use them in arbitrary order in our grammar.

The += assignment is used here to denote that an instance of Domainmodel holds a list of AbstractElements. There are two other assignment operators available. The plain equals sign (=) sets a single value instead of adding to a list, while the boolean assignment (?=) sets the feature on the left-hand side to true if the right-hand side was parsed successfully.

The second interesting syntax element is the cardinality (*). It denotes that the syntax allows any number of AbstractElements in the body of a Domainmodel. Xtext has operators indicating this, as shown in Table 1.

Table 1: Operators.

The third thing that I want to emphasize is the alternative in the body of the rule AbstractElement. The pipe symbol (|) is used here to define different valid paths in a rule (e.g., an AbstractElement is either a PackageDeclaration, a Type, or an Import). The overall syntax definition of an Xtext grammar is very close to the Extended Backus-Naur Form (EBNF) [5] or the input format used by many parser generators. It's concise, intuitive, and easy to understand after little exposure.

To continue with the grammar definition, we add a rule to describe the structure of a package. A PackageDeclaration is conceptually similar to the root Domainmodel except that it has a name. Because the name of a package consists of a number of segments that are delimited by a dot (.), we add a parser rule for it:

The rule QualifiedName is special. The language processor recognizes that it does not contain any assignments but simply consumes atomic values. That's why it is interpreted as a data type rule. Data type rules allow you to define a complex syntax for a simple type (e.g., for a string or a decimal number). They have significant advantages compared to atomic terminal rules like ID or INT: After an input sequence is parsed, the framework automatically strips superfluous whitespace along with any nested comments from the qualified name and assigns the clean value to the package's name.

The next step is to define types. The sample file illustrates that two different kinds of types exist in the domain model language. In addition to entities, there are simple data types that don't expose any structural information. The latter have a concise notation: After the keyword datatype, a name is expected, which is an atomic ID.

The rule for entities leverages a several concepts introduced in this tutorial. There are simple assignments, keywords, optional cardinality, and multi-value assignments. However, there is something new, too. An entity may optionally extend another entity to represent an inheritance relationship. A cross reference is used for this. All the other assignments that were used so far in this grammar, established a "has-a" or "contains-a" relationship. The "super" type of an entity is different. The referenced super type is not contained in the entity. Instead, it must be defined somewhere else, and the inheriting entity simply references it. A cross reference is the grammar element that allows you to describe exactly this concept. Between the squared brackets, the type of the referenced object is used. In this context, the pipe symbol is not used to denote an alternative but rather serves as a delimiter. On its right side is the definition of how the reference appears in the concrete syntax of the language. Because entities may be referenced by their qualified name, we reuse the previously defined data type rule here.

However, qualified names are not the best choice for readability and editing comfort. That's why the notion of imports is added to the language. Xtext provides convenient default behavior for imports that are based on namespaces and qualified names. It is a good match for many common use cases. If a feature importedNamespace is used in your language, the framework will process the information that is stored in that feature and enrich the scope accordingly when the cross references are resolved. Due to this convention, two new rules are sufficient to implement imports. A container object Import is necessary as is a data type rule defining the imported namespace itself to allow wildcards. It adds an optional suffix to a qualified name.

A feature definition is a sequence of an optional keyword many, an ID that becomes the feature's name, a colon, and a type reference. The type is again a cross-reference that refers to a type. Because cross-references are polymorphic, the type of a feature can be either a simple data type or an entity.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!