Best Practices for Model-Driven Software Development

Model-driven software development (MDSD [1][2]) no longer belongs to the fringes of the industry but is being applied in more and more software projects with great success. In this article we would like to pass on, based on the experiences we have gathered in the past few years, our contribution to its best practices.

Since domain-specific languages (DSLs) and code generators have been around for a while, this is naturally not the first article to offer a description of best practices in this area (see [1] and [3]). Some of these previously described practices have firmly established themselves in the meantime, while others have lost their relevance or even become obsolete. We deal with those principles that have 'stood the test of time' in the first part of this article. Moreover, other new best practices have crystallized over time that have not yet been 'pinned down'. These will be described in the second part of the article.

A vital point that is, however, ignored over and over again: For every artefact it must be clear and unambiguous whether it is generated or is to be manipulated by the developer. This doesn't mean that you shouldn't use a code generator for code that is to be subsequently augmented or changed by the user. It's just that this must be done explicitly and above all, this code should be only generated once (see box). Code which is 100% generated should neither be changed nor checked into the repository. It is a disposable good - the important artefacts are the models.

Passive code generation, that is, one-off generation in the 'Wizard' mould, has become quite popular recently. Under the general concept of scaffolding, the first steps in new projects in almost every new (web) framework are made as simple as possible. This can be very helpful in the beginning and can greatly encourage the acceptance of the framework. However, since these initial stages constitute only a very small part of the project's life cycle, this provides only short-term assistance to the user.

A relatively widespread breach of this best practice is the use of protected regions. This refers to text regions of generated code that the generator will not overwrite. Within a protected region, a developer can carry out manual changes and the generator 'fills in the gaps'. In doing so however, the generated code is raised to the level of first class code which, for example, must be checked in (more on that later). Furthermore, the protected regions filled out by the developer must be activated or else they remain unprotected. In the heat of battle, this is easily overlooked which in turn leads to lost code, long searches for errors and a lot of irritation.

So how would the best practice look in reality? There are two variations which can also be used together quite well: First, a separate directory should be set aside for generated code (usually src-gen). In this way, a physical separation between generated and non-generated code is achieved. In addition, all generated artefacts should begin with a commentary such as the following: // WARNING! GENERATED CODE, DO NOT MODIFY. Perfectionists might supplement this by adding which template the code was created with. Both options can also be used to distinguish generated files from non-generated ones in the IDE (see [5]).

Don't check-in generated code

Generated code should not be checked in for the same reasons that Java-Byte code or other derived artefacts shouldn't be. Derived artefacts are 100% redundant, i.e. they don't contain any non-reproducible information. Checking-in such artefacts only increases the volume of data to be managed and conflicts arise during every synchronization which are best ignored using Override and Commit. On the other hand, checking-in everything might let you fetch another cup of coffee...

With protected regions, the problem becomes much greater, since in such cases everything can't be simply overwritten. It could well be that another developer has changed a protected region in the meantime.

The only time it can make sense to check-in generated code is when, for some reason, it is impossible to integrate the generator run into the build process.

Integrate the generator into the build process

Automating the build process is a measure that can significantly improve the development team's productivity. The crucial point in this regard is that the process is in fact completely automated and that no manual intervention is required. Normally, the build process consists of the phases checkout, compile, link (where necessary), deployment in the test platform and execution of the unit tests. In MDSD projects, the generation phase must be added preceding the compile phase.

It goes without saying that the build process is carried out not just locally but also on a server set aside for this purpose. This guarantees that code from all developers participating in the project is constantly being integrated (continuous integration).

Use the resources of the target platform

The concepts of the target platform can help with a variety of tasks. In many cases, an application is not derived entirely from models, but rather just the parts that can only be poorly expressed in the target language (e.g. Java).

The areas in which DSLs are typically useful deal with schematic information such as domain models or service layers. The logic also contained in these layers can be defined very well in any programming language.

Instead of using protected regions and integrating the logic into the generated code, there is the option of depositing the various aspects into different files using the mechanisms of the target language. One is generated while the other is provided by the developer adding the missing information.

In Java, for example, inheritance can be used. In many scenarios, three level inheritance (MDSD) has proven its worth. In this, an abstract base class (ancestor class) is provided by a generic framework, and its attributes and behavior are inherited by a generated and similarly abstract class. The concrete class then inherits from this generated class.

Fig. 1: Three-step inheritance

Of course there are numerous further variations, such as the combination of inheritance and delegation. With other target languages, other concepts must be used where appropriate such as Includes.

Generate clean code

Simply because code has been created automatically doesn't mean it will never again be read by human eyes. The generated code should therefore satisfy the same quality requirements that apply to manually written code. A homogeneous code base makes subsequent comprehension and error location much easier.

By using a code formatter in direct conjunction with the generation process, code insertion and formatting can be automated without increasing the complexity of the template. The generated code should follow applicable coding styles and use well-established design patterns.

Use the complier

A further opportunity to make use of the target platform is by communicating with the developer via the target language's compiler. If, for instance, the three-level inheritance described above were to be used, an abstract method could be generated for each method to be implemented. The compiler would then request that the developer implement this into the concrete class.

For more complex dependencies to which, in and of themselves, the compiler would not object, a piece of dummy code can be generated which makes use of code which the developer is yet to produce.

For cases in which the target language does not provide a compiler and in which a developer would nonetheless like to ensure that the target platform's developer adds specific things manually (obviously not into the generated code!), it could be helpful to make use of openArchitectureWare's Recipe Framework [7]. Recipes instructs the developer to check, after every generator run, whether certain conditions have been met - for instance whether a particular class has been manually created which in turn is to inherit from a generated class. When such a condition is not met, the developer is notified accordingly.

Talk in Metamodelese

Eric Evans describes in his book Domain-Driven Design [5] the concept of the ubiquitous language. The crux of this idea is to define the technical or specialist concepts in a project and to thereby develop a single, uniform language in which all participants can communicate. In this way, the concepts involved are dealt with clearly and explicitly and, as a result, can become an important and 'living' component of the project. The concepts and the terms used for them will then correlate at all levels of the project.

If this model is used for the concepts involved in the meta-model, then the technical concepts should also be given unambiguous names and meanings. Through the consistent use of terminology and meaning, for example in team meetings, important aspects can be described more clearly. It also becomes sooner apparent when the concepts decided upon no longer suit the requirements. In such cases, the DSL must be adapted to the conditions of the changed framework.

Develop DSLs iteratively

DSLs are not developed using a big up front approach but ideally come into being, just as with APIs, incrementally. The more obvious core abstractions are defined first of all then, as understanding of the domains to be described grows, individual elements are added or even removed.

DSLs are public interfaces, so the principles of API development also apply in this area. Depending on the agreement regarding the interface's life cycle, existing contracts must not be broken in the further development process.

In any case, the technology at the heart of DSLs has an enormous influence on migration capabilities. In order to migrate a model manually, for example to replace concepts that are no longer available with newly introduced ones, you must be able to load the old model into the editor. In object-based storage, for instance via XMI or in a database, this is often no longer possible since changes to the meta-model have rendered them incompatible. In such cases, either the migration must be done with a program using a model transformation from the old system into the new meta-model, or you have to go down to the storage format level and try to make the old data compatible.

When the DSL's concrete syntax is the same as the one in which the model is stored (that is to say textually), a migration is of course much easier.

Develop model-validation iteratively

As already emphasized , model-driven software development processes are based on the premise that the models being processed are available in a formal, machine-readable form. The form the model takes is given by its meta-model, which thereby forms the basis for the modeling tool at the same time (there are in fact modeling tools which operate independently of meta-models and as a result lead to significant problems in MDSD toolchains). Now, you should be able to assume that a reasonable modeling tool will only produce formally correct models that can in turn be processed by the generator without any problems. This is, unfortunately, a fallacy, since the meta-model only describes the static aspects of the produced model. Semantics cannot be expressed by this static information. An example: ''In a finite state machine, each state can appear only once''. This condition cannot be expressed in the usual meta-modeling languages. To get around this problem, constraint languages such as OCL and oAW's Check have been created in which conditions such as the one just described (''There must not be name duplicates among all the machine's states'') can be expressed.

The condition: ''In a finite state machine, every state must be connected via a transition to another state'' could be also implemented, if so desired, directly in a meta-model (via multiplicities 1...n). In such cases, a constraint is often nonetheless preferable: For one thing, the meta-model remains clearer and more straightforward as a result, for another, intermediate states which arise during the design of a model will conform to the overall scheme, which for many tools is a basic prerequisite for storage.

The constraints should be developed iteratively: First of all, those constraints that seem to be obviously reasonable are defined. As soon as generation errors arise in the course of the project that can be traced back to incomplete checking of the model, the constraints are supplemented to cover for this case. In this way, model validation is made progressively complete and gap-free.

Test the generator using a reference model

Even code generators themselves should be tested, preferably with an automatic test suite. But how is this put into practice?

A popular option is to simply compare the source text with the expected source code. This is, however, a very fragile construct since even the smallest change in the generator will render the tests invalid. In reality, your goal is to ensure that the generated code operates in the target platform in a way that meets your requirements.

This can be achieved very simply by using unit tests to test the generated code. There is a corresponding framework for most target languages. For this process, reference models are created which use the concepts in the DSL in characteristic ways. In this case, characteristic means that an example domain is not 'invented' and used but rather that the model elements are named after the DSL concepts they represent. A reference model for a domain-model DSL could, for example, look like this:

Ideally, the test suite will consist of several such reference models and their associated tests that will describe different aspects and combinations. In subsequent iterations, a new reference model can be used to track down bugs before they have been removed. The existing reference model and their respective tests will not be compromised in the process.

Incidentally, model validation and transformations can and should be tested in this way as well.

Select suitable technology

There is an abundance of different ways to depict and edit models. Which technology and syntax is appropriate will depend on the general requirements and culture of the project. In this regard, it is not only the type of domains which is decisive but also the question of how flexible the technology is and how much effort the development process will demand compared to the alternatives. A further consideration is the ease with which the technology can be integrated into the application developer's develoment process and existing tool landscape. Integration into the development environment (e.g. Eclipse) is nowadays almost a prerequisite.

Ideally, the syntax should identify the core abstractions clearly and be expandable without much trouble. It should be both descriptive and open to quick editing. Feedback during modeling should be fast and comprehensive and the turnaround times (i.e. modeling -> generation -> execution) as short as possible.

The perfect technology does not, unfortunately, exist yet. Modeling with UML, for example, often means a small initial investment of time and effort in defining the language: the definition of a profile. On the other hand, the turnaround is most often very long: Neither the DSL nor the tool can be expanded very simply and you have to come to terms with a very large and complex meta-model while processing the models. In our experience, it is worth it in the vast majority of projects to use 'real' DSLs, which are tailored to fit the requirements of the respective domains exactly and don't contain any deadweight.

Encapsulate UML (and other complex meta-models)

Should the chosen technology, in spite of all warnings, end up being UML, you should translate the UML models before further treatment into a simpler meta-model for your domains. With this measure, subsequent downstream processing steps will be simpler and less error-prone. Likewise, the validation of the model should then take place after the transformation. This ensures that the constraints can also be formulated both simply and domain-specifically.

This decoupling offers the additional possibility of changing over to a 'real' DSL at a later stage without having to change the generator.

Use graphical syntax correctly

Graphical modeling has some significant advantages. It depicts relationships between important elements in a clear and simple fashion. A lot of information can be summarized much faster because it is not written out in full but rather represented visually.

When using graphical models, you should exploit the possibilities inherent in visualization and try to find a syntax that depicts the most important characteristics of the model's elements in a visual way. UML is, in this context, a very useful role model. In this case, for example, abstract classes are identified by their italicized names. The annotation using operators, e.g. '#', '+', '-' to represent the visibility of an attribute can be used in both graphical and textual DSLs.

But be careful! A syntax can become overburdened with information very quickly. Diagrams should not be too busy or cluttered - as an example, not every possible font and format of text should be present. It is a question of striking the right balance. The basic principles of screen design can offer helpful tips in this direction.

Use textual syntax correctly

In defining a textual syntax, it is important to reach a good compromise between compactness and comprehensiveness. Depending on how much expertise the modeler can be expected to have, it may be possible to omit explanatory keywords.

Below we compare Java's familiar syntax for the definition of an abstract class with a fictional, but possible, alternative:

JAVA: public abstract class Foo

FIKTIV: class name=Foo visibility=public abstract=true

There is probably no argument that the Java syntax is easier to read than the fictional one. It is, however, not as self-explanatory. In the fictional syntax, it is made explicit which value is assigned to which attribute. The user only needs to be familiar with the meta-model, the syntax is clear. XML works in just this way and that is indeed a small, if in our opinion not exactly decisive, advantage of XML: the syntax is clear and simple to understand.

Again, the right balance of expressiveness vs. explicitness is important. It depends on the DSL's users whether the syntax can be expressive with lots of syntactic sugar or not. If they are smart and use the DSL a lot it can be more expressive. If not the syntax should enforce explicitness.

Use Configuration By Exception

A further important point is the correct choice and considered application of configurations. In contrast to a graphical syntax, in which all information is shown, usually in some kind of properties view, in textual modeling, information can be simply left out if it corresponds to the defaults.

Again, we would like to see this in Java and show a fictitious alternative syntax below:

As is quite easy to see, the correct choice of settings can improve the legibility tremendously. It should be made absolutely clear from the outset, however, that these settings are also a part of the API. Defaults cannot be changed retrospectively without breaking existing API contracts. So use them, but think twice before doing so.

Teamwork loves textual DSLs

Classical modeling tools have a problem with teamwork. The reason for this is the way in which the models are saved: UML tools usually save their models in a single, large XMI-file. XMI (XML Metadata Interchange) is a highly technical and generic format. Anyone who has already tried to resolve a conflict with an XMI-model as the basis knows just what kind of problems this causes. In particular, the referencing via long, cryptic UUIDs is, for us developers, simply unmanageable. Therefore, you should try to avoid conflicts in such cases by, for example, allowing only exclusive write-access to the model or by using a very finely grained partition structure for the models.

To get to the root of the problem, you should also consider saving the model data in a legible and easy-to-maintain format. Nowadays it is entirely possible to have several syntaxes for a single DSL, and even to use a graphical editor for a textual DSL, for example.

Many professional UML tools provide team-servers, which enable team modeling in different ways. Unfortunately this also creates a break in the toolchain since, in order to synchronize the source-code - MDSD models are source-code! - two different tools and two different repositories have to be used. This has further negative repercussions, for example with regard to the integration-build.

Use model-transformation to reduce complexity

Complex code generators arise when the translation - the step from the model to the code - is especially large. One way to get a handle on this complexity is to split the generator into two parts (''divide and conquer'').

In this process, a model is designed which is as close to the code to be generated as possible. A model transformation translates this input model into interim models of the same type as the new meta-model. The code generator now only has to map from the interim meta-model onto the target platform.

This procedure has the pleasant side-effect of creating, via the meta-model, a clear interface so that each part can be developed by different teams and reused independently of each other in other scenarios.

Of course, as many such interim steps as desired can be integrated into a code generator. In practice, experience has shown that a single transformation (of course in many cases none) is entirely sufficient. The MDA recommends many cascaded layers when using of model transformations, in order to decouple the various platforms. This is, in the authors' humble opinion, not viable in practice. Of course, the old maxim no doubt applies here: The exception proves the rule :-).

Generate towards a comprehensive platform

Another important way to keep the translation step from model to code as small as possible and thereby minimize the generator's complexity, is to develop a platform which accomodates the generator as far as is possible. Instead of, for example, generating a persistence layer, a suitable framework (e.g. a JPA implementation) should be used. Additional project-specific abstractions are then developed on this platform (such as an AbstractEntity-Class) from which code is then generated.

Conclusion

The best practices described here reflect our experiences (and those of our colleagues and friends) over collected over several years in practice. Our most important recommendation for the reader is: Be pragmatic. DSLs and code generators can, when used appropriately, be an immensely useful tool. But the focus should always be the problem to be solved. In many cases, it makes sense to describe certain, but not all, aspects using DSLs. Projects which decide from the get-go to follow a model-driven approach are ignoring this last piece of advice. If you are not sure, whether and how you should make use of MDSD technology, get some expert advice from a consultant in the field.

Glossary

Passive generator [4]: is call on once. The created code is subsequently developed manually. Almost all wizards in development environments fall into this category.

Active generator [4]: can be called up again and again. Is configured using configuration files or models. MDSD generators typically belong to this category.

About the authors

Sven Efftinge leads the Kiel branch of itemis AG. He is a project leader in Textual Modeling Framework (TMF) as well as a committer for other Eclipse modeling projects. Furthermore, he is an architect and developer of openArchitectureWare 4 and the Xtext framework.

Peter Friese is a software architect and MDSD expert for itemis AG. He has comprehensive experience in the application of model-driven processes of software development and in the development of software tools. Peter is a committer for the Eclipse modeling project as well as for the open-source projects openArchitectureWare, AndroMDA and FindBugs.

Dr. Jan Köhnlein also works as a software architect for itemis AG. He has been designing development tools for MDSD for several years and is a committer for Eclipse modeling und openArchitectureWare.

The three authors work for itemis AG [7], which focus on further development of the Eclipse modeling projects and provide professional services in this area.

I've seen some tools represent XMI UUIDS as an xpath-like path and uuid. This way they can generate an intelligent error message when the referenced object is missing.

If you are thinking of going down the XMI path and sharing large artifacts with a large team then I would highly recommend stress testing the tools to see what pain they give you. There are lots of gotchas that can cause issues with the development process. For example how will the tool handle language changes and branching and will you effectively lose your version history if you make changes to your language. Don't blindly trust vendor claims.

For some tools the model validation process is very important for preserving the integrity of the model. If the validation process runs so slowly that developers don't run it then you can get some nasty problems. One popular XMI framework proxies cross model references and if the actual reference can't be found when the proxy is interrogated then it will give back incorrect values instead of an error. This can create subtle errors if models are compiled without being validated first.

I also have strong doubts about embedding in another modelling language like UML. One of the advantage of writing your own language from scratch is that you can embed a lot of the rules of the language in the structure of the language. If you are willing to embed in another modelling language then I think you should consider embedding in a programming language. Embedding in a programming language can make development faster because you can harness incremental compilation and quick validation performed by the IDE (though this will often be weak). Situations where you are not heavily using the semantic checking or where interpreting is easy and fast or where you have to write custom code side by side with the model make embedding in a programming language more attractive.

I think the most important thing the MDA community has to realise is that MDA doesn't necessarily mean UML and MDA doesn't necessarily mean graphical diagrams. There seems to be some kind of obsession with UML and programming with pictures.

Nice article! I can confirm based on my work with MDSD tools that these best practices are crucial.

In the development of Sculptor we have actually used all of the best practices described in the article, except 'Use graphical syntax correctly' which is not applicable for us.

Sculptor is an Open Source tool that applies the concepts from Domain-Driven Design and Domain Specific Languages.

You express your design intent in a textual specification, from which Sculptor generates high quality Java code and configuration. Sculptor takes care of the technical details, the tedious repetitive work, and let you focus on delivering more business value – and have more fun.

The generated code is based on well-known frameworks, such as Spring Framework, Spring Web Flow, JSF, Eclipse RCP, Hibernate and Java EE.

The DSL and the code generation drives the development and is not a one time shot. It is an iterative process, which can be combined with Test Driven Development and evolutionary design.

I would like to add one imho essential best practice: only start creating a Generator/DSL if you really know, what kind of code is to be generated and where the points of variance are. At best you have created at least some 20 similar artifacts by hand, only then you can really identify which part of the code is common and should therefore be generated and which part should be added manually. Then you have a good starting point, where you have to place "extension points" (template methods etc.). If you don't do that you will get bitten, by either the lack of usable extension points, or by overly complex generators that try to generate code for the corner cases.

That leads to two other observations:

1) be careful when you do your DSL iteratively that you do not have to rewrite your existing DSL instances (i.e. artifacts that are written in your DSL) again and again, when your DSL definition changes too often. DSL instances aren't easily refactored.

2) depending on the stability of your generator, you should really think hard about not checking in the generatted artifacts. Are you sure that your current generator will still work in two years? Do you have archived everything needed, so that it can still run (IDE?, JDK?, Ant?, All needed Jar-Files?, Maven-Repository? ...)?

In extra large applications, where you have a separation between the core and the application team in a way that the core team can advance with the versions and the application team is working with one of the previous version, it's sometimes WASTE OF PRECIOUS BUILD TIME !

Ask yourself: Do this gen-ed code realy changes that often? How much of it do we have? How much time does it take to generate it? Sometimes it will be better to check in the gen-ed code and when needed just regen' it and override the existing version.

Hi all ... if you are interested in this topic I'd love your feedback on a new tool I've created called Viewpoints, which is a business-driven modeling and code generation tool for Visual Studio .NET 2008.