On code generation from models

In a recent article, Dan Hayward introduced two kinds of approaches to MDA: translationist and elaborationist. In the former approach 100% code is generated from the model; in the latter approach some of the code is generated and then hand finished. He gives examples of tools and companies following each of these approaches.

Underlying Dan’s article seemed to be the assumption that models are just used as input to code generation. To be fair, the article was entirely focused on the OMG’s view of model driven development, dubbed MDA, which tends to lean that way. My own belief is that there are many useful things you can use models for, other than code generation, but that’s the topic of a different post. I’ll just focus here on code generation.

So which path to follow? Translationist or elaborationist?

In the translationist approach, the model is really a programming language and the code generator a compiler. Unless you are going to debug the generated (compiled) code, this means that you’ll need to develop a complete debugging and testing experience around the so-called modeling language. This, in turn, requires the language to be precisely defined, and to be rich enough to express all aspects of the target system. If the language has graphical elements, then this approach is tantamount to building a visual programming language. The construction of such a language and associated tooling is a major task that requires specialist skills. It will probably be done by a tool vendor in domains where there is enough of a market to warrant the initial investment. Indeed, one doesn’t have to look far for examples. There are several companies who have built businesses on the back of this approach to MDA, especially in the domain of real-time, embedded systems. And, for obvious reasons, they have been leading efforts to define a programming language subset of UML, called Executable UML, xUML or xtUML, depending on which company you talk to.

In contrast, the elaborationist approach to code generation does not require the same degree of specialist skill or upfront investment. It can start out small and grow organically. However, there are pitfalls to watch out for. Here’s some that I’ve identified:

Be careful to separate generated code from handwritten code so that when you regenerate you do not overwrite the hand written code. If that is not possible, e.g. because you have to fill in method bodies by hand, then there are mitigation strategies one can use. For example, you can use the source control system and code diff tools to forward integrate hand written code in the previous version to the newly generated version.

Remember that you will be testing and debugging your handwritten code in the context of the generated code. This means that your developers can not avoid coming into contact with the generated code. So make the generated code as understandable as possible. Simple generated code that extends well factored libraries (as opposed to generated code that starts from low-level base classes) can make a big difference.

The code generator itself will need testing and debugging, especially in the early stages. It should be written in a form that is accessible to your developers and allows the use of testing and debugging tools.

Manage your models, like you manage code. Check them into the source control system and validate them as much as you can. The amount you can validate the models depends on the tools you’re using to represent them. You could just choose to represent the models as plain XML, in which case the definition of your modeling language might be an XSD, so you can validate your models against the XSD. If you choose to represent your models as UML, then it is likely that you’ll also be using stereotypes and tagged values to customize your modeling language (see an earlier post). In general, UML tools don’t do a good job of validating whether models are using them in the intended way, so resort to inspection or build validation checks into your code generator instead.

Remember that ‘code’ is not just C# or Java. Run-time configuration files, build scripts, indeed any artifact that needs to be constructed in order to build and deploy the system, count as code.

Remember that the use of code generators is meant to increase productivity. So look for those cases where putting information in a model and generating code will save time and/or increase quality. Typically you’ll be building your system on top of a code framework, and your code generator will be designed to take the drudgery out of completing that framework, and prevent human errors that often accompany drudgery. For example, look for cases where you can define a single piece of information in a model, that the generator then injects into many places in the underlying code. Then, instead of changing that piece of information in multiple places, you just change it once in the model and regenerate.

Of course, we have been talking to our customers and partners about their needs in this area. But we’re always to keen to receive more feedback. If you’ve been using code generation, then I’d like to hear from you. Has it been successful? What techniques have you been using to write the generators? To write the models? What pitfalls have you encountered? What development tools would have made the job easier?

Not to be a bother, perhaps others have asked this — Is the VS2k5 team going to support purer UML/MDA in the "team system" or import/export mechanisms with the extensibility tools? I know the "team system" is quite platform-specific — will it support more independent modelling?

A quick response to Ian. I was going to point you at Keith Short’s posting on this topic at http://blogs.msdn.com/keith_short/archive/2004/04/16/114960.aspx, but looking at your blog I see that you have now found this article yourself. There’s also an interesting set of articles in the BPTrends MDA Journal edited by Dave Frankel – see http://www.bptrends.com/search.cfm?keyword=MDA+Journal&gogo=1. A couple of the articles are by my colleague Steve Cook. In summary, it is safe to say that we are rather sceptical about the MDA claims. But IMO it will be important to import/export XMI into our tools, because customers could have important business content in that format and we’ll need to deal with it.

We have been using model-driven development approach to build large distributed business-critical enterprise applications for over ten years now. We address the niche area of development of database-centric business applications pertaining to variants of distributed architecture. The idea is to capture platform, design strategies and architecture concerns in model form and generate a solution framework. Application behaviour is specified in a high level language which is translated to C++ or Java or C# thus completing the solution framework generated from the models. We also generate makefiles, test cases, test data, test harness, deployment related artefacts etc. from the models and support configuration and version management of models as well as code in a seamless integrated manner. We use a meta modeling framework to define problem-specific meta model that is sufficient to cater to all code generation requirements. This meta model is an extended UML meta model. A code generator is an interpreter of a specific view of this (unified) meta model and encodes several design decisions (pertaining to code generation) in its implementation. We have developed all the tools and the necessary meta technology for building them ourselves.

In our experience, the major benefits of this approach are: productivity gain, uniformly high code quality and an ability to target the application specification to various technology platforms. The benefits of the approach stand out more vividly for product-families.

Use of a proprietary language for specifying behaviour, though guaranteeing retargettability of the application, is not welcome by developers desirous of coding in standard programming languages like C++ or Java or C#. With no debugging support available at specification level, the developer has to debug at the generated code which is difficult. Time required for a change made to the specification, be it model or higher level language, to be effective in the generated code – i.e. introduction of the change, processing of the change, testing and releasing – is large compared to the traditional approach. Having to work at model as well as code level does not provide a homogeneous user experience which MSVC or VisualAge developers are used to.