After reading the book The Pragmatic Programmer, one of the arguments I found most interesting was "write code that writes code".

I tried searching over the net for some more explanations or articles about it, and while I found some good articles on the subject, I still haven't found any specific code implementation or good examples.

I feel it's still not a so common argument, something that lacks documentation or it's not embraced by so many people, and I would like to know more about it.

What do you think about the subject? Is it something that will really increase your productivity? What are some good resources on the subject, among books, blogs, slideshows, etc?

Some code examples would be greatly appreciated to allow me to better understand its implementation.

Here's the wiki page on the subject with various relevant programming techniques, like Meta Programming, Generative Programming and Code Generation.

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
If this question can be reworded to fit the rules in the help center, please edit the question.

28

I did once write code which wrote code which wrote code... :)
–
BenjolAug 26 '11 at 8:33

Additionally, server-side languages do this all the time by generating HTML, CSS and JavaScript. You could have a server-side script that creates a server-side script that creates html with javascript that creates more html, and no-one will bat an eye about it because of how common it is.
–
zzzzBovAug 26 '11 at 17:49

AtomWeaver (atomweaver.com) is a good example of automatic programming: First, you create reusable mini-programs in Lua. Then, you model your system by reusing these assets. AtomWeaver then weaves a Lua program that contains your "mini-generators" to generate the system's final source code. You can then tweak your model and re-generate.
–
Rui CuradoMay 14 '12 at 8:30

22 Answers
22

In the Lisp world, it is quite common to see the code which writes code which writes code (and so on). So, any decently sized Lisp or Scheme project will serve as a good code example. I'd recommend looking at the Racket compiler and runtime sources, as well as Bigloo, their libraries are just brilliant.

As for the productivity: I'm using metaprogramming as a dominant technique in almost all of my development work, and it clearly helps a lot, both reducing the code size and increasing its readability. The key is in using Domain Specific Languages, and metaprogramming is one of the most efficient ways of implementing them.

I prefer going a bit further, and, instead of writing code that writes code, writing code that generates objects, methods, functions. This can be achieved with Lisp macros or Ruby dynamic program modification capabilities for example.

The little difference is that you don't end with source files that were automatically generated. Usually these files are not human readable and cannot be modified, so why bother with them. I don't like the idea of increasing my code base with something I can't control.

Why should it be useful it I still have to code the generating code? Should I write a code able to generate different things depending on user input, so that I can reuse it over and over?

First, metaprogramming is not a goal, but a tool. Don't use metaprogramming because "its cool" or "X said every developer should use it".

I think one good reason to use metaprogramming is to generalize some common pattern (pattern as something that repeats) that you have found in your code and that no other usual programming technics (inheritance, design patterns, etc.) can achieve.

As said by Jordan, one typical use case is database handling and ORM (Object Relation Mapping). Once again, in Ruby, you should look at ActiveRecord that is a great example of metaprogramming applied to ORM.

As final note :

Don't think "I want to apply metaprogramming, where could I apply it in my code ?".

Think "I see this pattern that's repeating all over my code, I can't find a way to refactor the code into something smaller and more reusable. Maybe metaprogramming can help me ?"

@Jose: Most commonly you generate code via templates. There's apache (N-)velocity for example or the visual studio T4 templates. Then you just have a program which feeds the metadata into your templates and creates new files from then. It's pretty easy and I am doing it all the time to generate UI-skeletons, Entities etc.
–
FalconAug 26 '11 at 8:57

I would add metaprogramming can replace some pattern like policy or state, but with no runtime cost. This is not only for problems that cannot be achieved with common refactoring, but also sometime a better alternative.
–
deadalnixAug 26 '11 at 17:18

1

@Jose Faeti: I see that you know some Python. It also has metaprogramming capabilities, though I haven't really used those. Take a look at Dangerously Advanced Python PDF
–
KitAug 27 '11 at 12:48

3

@Falcon: IMO that is the worst way to generate code; it is very poor work-around for languages with no built-in meta-programming facility. Instead of generating Java or C#, it would be better to write that code in a higher-level JVM or .NET language.
–
kevin clineAug 10 '12 at 17:18

It is not any better. Your precious little so very own code can't be properly managed by some other bloke's code generation tool, as that other bloke knows nothing about your specifics. The most productive use of the metaprogramming is implementing domain specific languages - and, as the name suggests, they're specific to your very problem domain, they can't be implemented by anyone else but you.
–
SK-logicAug 26 '11 at 8:57

@AtillaOzgur, they can be "very good", true. But they're not any better than eDSLs. Standalone code generation is obviously much more limited and much less flexible than the macro metaprogramming.
–
SK-logicAug 7 '12 at 8:49

One of the classic examples is lex and yacc. Their primary purpose is to avoid the drudgery of writing any kind of parser. Along the way, they make it far faster to build complex parsers with many rules and states, and they also avoid all of surprise mistakes made by people rolling their own.

This is also the idea behind c, which is a tool to write assembler. The same thing goes for any high level language you care to name. For tools that write code for you, there are a few simple paradigms.

A proper IDE helps by providing documentation at your fingertips, smart auto completion, and code snippets. IDE's also include various templates, so you don't have to start a program from scratch. There are programs to take a uml diagram and rough out classes in a high level language.

Finally, you can write your own tools for code generation within your problem set. This is how lex and yacc first got started. Any kind of domain specific langauge exists for precisely this reason. You create some building blocks that describe your solution in easier to understand code, wrapping up common activities, or complicated sections with simple commands. You aren't looking for a solution to every problem, just an easier definition of the specific one you are dealing with.

In a sense, everything that you do above the binary layer is code automation.

@Jose Faeti The wikipedia article en.wikipedia.org/wiki/Automatic_programming has links to various different tools, if your interested in some more details. I'd also suggest reading up on lex and yacc, as there is quite a bit more documentation and description for those.
–
Spencer RathbunAug 26 '11 at 13:24

Metaprogramming is a controversial technique in many shops. The reason is, like any powerful tool, the magnitude of help or hurt is large.

Pros

More Expressive, less code to write and maintain(often by an order of magnitude or more)

Consistency, more consistent behavior over the class of problems your solving with the code

Productivity, less code for a solution to a larger problem space

Cons

Complexity, it can be very complicated even though there is less code

Safety, sometime type safety and static analysis in general will be sacrificed

Bugs Affect More, Small errors will have a larger impact

I am a huge fan of metaprogramming, but I've been doing it for a long time. To me the tradeoff of reduced code size and consistent behavior more than make up for the risks. Less code means less bugs, less code to maintain, and I can usually add large pieces of functionality very quickly.

However, this does not mean I think all programmers should engage in it. I've seen and had to fix big problems created by metaprogramming. Usually from when people who don't understand the concept and have attempted to extend the functionality, or just fix a bug. It takes a particular mind set that is at the very least detail oriented. The question to use metaprogramming techniques should be a team decision. If you have team members who don't understand, don't have the temperament for it, or are just against it none of the team should use metaprogramming.

Complexity of metaprogramming is highly overrated. There is absolutely nothing complicated in it, as long as you're using the right tools. And DSLs are much easier to debug and maintain than the typical boilerplate code. Also, I can't understand why one should sacrifice type safety - it is exactly the opposite, DSLs may have domain-specific, highly efficient type systems as well.
–
SK-logicAug 26 '11 at 20:36

2

@SK-logic: Not all languages support metaprogramming well. So sometimes things like type safety are sacrificed (ie. C). Also metaprogramming is not just DSLs. It includes things like dispatch style programming, generics, currying, object inspection, dynamic application, etc. As for complexity, I think it's easy for us (people with metaprogramming experience) to say it's not complicated. I have seen other struggle with understanding all the cases the code will be executed under. It mostly depends on their experience and the technique involved.
–
dietbuddhaAug 26 '11 at 21:14

Most code writes code. For example php code helps write html. The php pdo library helps write SQL calls. The file I/O functions write code to communicate with the OS. Even a regular function call is a reference to another block of code which is executed. So your functions calls are writing code.

In broad terms, we can think of computing as writing codes which write codes recursively forming a stack that terminates when it runs up against physical reality of codes wired into hardware.

I would not call html a programming language. It is a syntaxe for documents
–
SimonAug 28 '11 at 10:42

3

@Simon its an interesting point. There are all variety of expressive powers for the different codes we use. Code can write to a weaker language, a stronger language, or its own language.
–
Ben HaleyAug 28 '11 at 14:14

How you do this varies depending on your requirments. Assuming you're using static code generation you could write all the infrastructure yourself, or you could use an existing generator such as CodeSmith or MyGeneration. Using these you just need to write the required templates.

My last project involving this was some basic ASP.NET CRUD screens (code generation is good for this). The process went define entities as metadata in xml files. Write templates to cover the various artifacts required (entity classes, repositories, service classes, asp.net controls, asp.net pages etc.). Run the generation process and style the output.

There is some overhead in writing the templates but they can be reused for subsequent similar projects. Similarly changes to the underlying data are handled by changing the metadata and rerunning the generation making changes simpler and quicker to implement.

As for testing. Since this is a templated system you will need to spend some time initially validating the output of the process, if you template is wrong all output from that template will be similarly wrong. Once you're happy with this you can also use the code generators to create basic tests from the xml metadata which you can then extend to cover special cases. However remember you may still need to hand code tests to cater for specific things, code generation reduces your work, it doesn't eliminate it entirely.

Metaprogramming has been part of programming for a long time. Consider not just tools like SWIG, or WYSIWYG designers, which create code, but also in-language tools like C's preprocessor, or even C++'s templates and C#/Java's generics- not to mention Reflection.

In fact, you could argue that every compiler is just another metaprogram- they take in program text and output machine or VM code. And, life without compilers? Owch.

I was working at a site that had around 50MB of Delphi source code using the BDE for data access. They wanted to switch to using Direct Oracle Access to allow an Oracle upgrade past the highest version supported by the BDE (8i if I recall correctly).

So, instead of getting a team of coders to work through every form and data module changing every component manually I wrote a PERL script that:-

Parsed the DFM (form file) and identified all of the TQuery, TTable, TStoredProcedure & TDatabase objects - storing the items in a list.

Parsed the PAS (code) and identified the usage of the objects - were the TQueries doing updates or selects? Also, it identified any objects created in code rather than dropped onto a form in the IDE.

Rewrote the DFM & PAS changing the object types appropriately (e.g. TTable -> TOracleDataSet with the SQL property set to "select * from " etc) and the method calls. Also, extra method calls were added if appropriate to close, open & set parameters.

In short, 3 weeks work tweaking the script to work on different applications written by different teams with different coding styles instead of the original estimate of 5+ developers working for 6 months.

And the reason I even thought of using that approach was through reading The Pragmatic Programmer

@Jose That's the idea. Use scripting languages to automate repetitive stuff. It can be for a one-off where you get an 8x productivity boost or as in your case something time consuming that you will do again & again.
–
mcottleSep 6 '11 at 7:33

When working with SQL, you shouldn't be changing the database directly, but instead are supposed to be executing scripts that make whatever changes you want, including structural changes to the database (adding tables, columns, primary keys, constraints and so forth). Quite frequently you will need to do take the same action against a lot of tables or columns at the same time, and doing them one by one would be tedious, a short script that outputs a larger script that does what you want can be a real time saver.

For instance, before the DATE data-type was introduced to MS SQl Server, the only choice for a date column was DATETIME which has a time part -- a time part that makes dealing with the data a bit harder. Upon upgrading to a version with the Date data-type, you might want to update the columns where the time is always 00:00. In a database with dozens or even hundreds of DateTime columns this would be quite time consuming. But it's easy to write a script that queries all of the tables, checking every column with a data type of DATETIME to see if it the time is ever anything but 00:00 and if not create an ALTER statement for the table/column to change the data type to DATE. Presto, code that writes code.

I am just working on such a tool. In our particular case we generate the VB.NET code based for the Data Layer on the signatures of the functions in the database.

Starting to work on and with code generation is difficult at first since you have no idea how the code should be generated, but once you have an established set of rules, and the code that has to be generated can always be generated based on those rules, working with that code is not that difficult. OF course, depending on the complexity of the code generation and on the number of rules, the task can become more difficult. But in essence, auto-code generation is used for repetitive coding tasks and not for advanced code that varies much.

Testing the output is twofold. First you have to make sure that the code compiles, and that's easy. Then you have to make sure that the output does what you meant it to do based on the parameters it was generated on.. and the difficulty of that varies on the complexity of the code you generate.

My sincere recommendation is that if you feel like you write code in a repetitive way, and you can afford the time.. Try to think if what you are doing can't be done by generated code. And if so (if it's repetitive code than is almost always the case) think how many times will you have to extend, slightly modify that code and also how many times do you have to write that exact kind of code. If the answer to any of these is "many" then you should seriously consider making a generator for that code.

I have a PHP module which outputs a web page containing JavaScript code which generates HTML. That's three layers right there. Boy was that hard to read!

In a programming class, we had to write a program that would take a formula string from the user and parse it and display the value. The most impressive solver simply took the user input, wrapped it in main(){printf( "%d", ...);} and ran a script to compile, link, and run it. He didn't write a parser! Today you could do that in an SQL SELECT statement.

It's a tool you should play with, then store it away for some future day when it will be handy.

I have developed neat meta-programming solutions with Prolog. Where the main application (in C++ say) translates an abstract definition of a problem into a Prolog application at runtime, which is then delegated to. Often writing equivalent functionality in C++ would take forever.

I think this scenario is an excellent case in favour of the code-writing-code argument.

Metaprogramming is most commonly associated with non-dynamic languages, since there's an harder time in achieving certain behaviours (such as implementing an ORM) without lots of non-productive and non-intelligent lines of code.

But even in more dynamic languages such as PHP, code generation can be a really life-saver and increase productivity in massive amount. In modern frameworks it's very common to have scaffolding that generates most of the common model, form, test and actions for a certain business object that you declare. It's one of the reasons why frameworks such as symfony or RoR have so much success, those code-generation tools make consistent code very quickly and boost the programmers productivity.

In web-sites, most of the interaction revolves around four main actions:

Create an element

Retrieve a set of elements (with possible filtering)

Update an element with new attributes

Delete a set of elements

At least everything that revolves around this 4 main actions could and IMHO SHOULD be achieved using code-generation tools to achieve maximum-productivity.

In my company, we use symfony, and its admin-generator is an exceptional tool, that even generates code in run-time (and caches it), which means we don't even need to use any kind of task or external tool to generate new code, we just need to clean our cache. I STRONGLY advise into using this kind of tool for CRUD operations.

But, doing what symfony awesome contributors did, is not an easy task. I've implemented some code-generation tasks myself and doing something that is truly consistent and with a broad implementation to cover most corner-cases is not easy.

Is it something that will really increase your productivity?

I believe that metaprogramming is very very important in lower-levels of work (frameworks, caching, compilers, etc.) but something that we must approach with extreme caution if we're doing things on the business-layer.

Using code-generation is without any question a major productivity-booster. Implementing your own code-generation tools, not so much unless you're building a framework yourself.

What are some good resources on the subject, among books, blogs, slideshows, etc?

The best resource to understand programming is always good and well-commented source code. I would say that looking into RubyOnRails and Symfony admin generators is a good idea.

While many answers here refer to what is commonly known as meta programming, there was in fact a field associated to AI known as automatic programming that was about programs understanding or synthesizing programs [1].

Any compiler (or meta-program, code generator, translator, macro system, ...) work with transformations, generating an output from an input by carrying out its fixed algorithm of transformation. But a traditional compiler or meta-program does not, given a definition, description or example of what sorting a list is (eg. [5, 3, 9] => [3,5,9]), create a sorting algorithm. Such problems where of the interest of this "automatic programming" field.

Meta programming can be very difficult to maintain. At first it looks elegant, but when you start running into corner cases, the errors are caught late (on the code that have been generated), and the whole thing becomes a nightmare to use/debug.

I have mainly wrote python code, and in my experience meta programming is always a bad choice with this language. You can always refactor things to do it with boring normal language features. The result is less funky, but easier to live with.

You might found our DMS Software Reengineering Toolkit interesting. It is a pure metaprogramming tool, intended to let one build custom program analysis and transformation tools.

[To follow a comment to OP's question, when used to build a specific transformation tool, DMS is a product line that writes code, that writes code :]

DMS achieves this by being agnostic (but not independent) of target programming lanuages. DMS provide the standard services needed by a wide variety of metaprogramming tasks, much as an OS provide a wide variety of services for standard programming tasks. These services include strong parsing, automatic construction of abstact syntax trees, pattern-matching and rewriting on trees, symbol table libraries that easily manage langauges with nasty scoping rules such as multiple inheritance, control-flow, data-flow, points-to and call graph analysis. None of this is meaninful in the absence of specific langauges to process, so DMS accepts language definitions that are tied to these general pieces of machinery, yielding language-specific parsing, AST construction, target language-specific pattern matching/rewriting using the target-language syntax, target-langauge specific analyzers, etc.

And like an OS, DMS is designed to have very few opinions or constraints on what (meta)programs you wish to write, which means it can be used for wide variety of purposes: extracting metrics, finding dead code, implementing aspect weavers, translating langauges, generating codes from DSLs, rearchitecting large applications. (DMS has already been used for all of these tasks).

One needs robust language definitions if you don't want to spend your time encoding everything in the langauge reference manual (think about what this means for Java and C++). DMS solves this problem by having a library of complete langauge definitions available. The analog here is kind of like having a database avaialbe for your OS; you don't have to implement one of them to get on with writing your database-centric application.

Its objective says, "Teach students the virtues of metadata. More specifically, they learn how to formally represent the requirements of a Web service and then build a computer program to generate the computer programs that implement that service."

In or around 2001 I started working on a project which was making extensive use of business objects and data objects. I was to be building the front-end website, but was hung up twiddling my thumbs because the business layer and data access layer weren't fully developed. After a couple of weeks of that, I started to take a hard look at what those layers were doing. Basically, they were exposing data returned from stored procedures as collections of objects with properties corresponding to the fields in the data, or were taking input parameters and sending them to stored procedures to be saved to database tables. There was a lot of serialization / deserialization taking place between the two layers, there was Microsoft Transaction Server involved, an IDL / ODL type library ... but it all fit a pattern.

2 weeks later, I had a code generator worked out which would dump out IDL / ODL, and would also dump out the business and data objects. It had taken the guy building the business and data layer objects 2 years to get to the point of debugging and testing these objects. In 2 weeks, with code generation, we had the same output, but because it was all generated it was pretty well bug-free.

That code generator (lower-level CASE tool) followed me around through many different iterations, for about 8 to 10 years, because the principle was just so simple: you're doing something that needs to be done when talking to databases, it's pretty much repetitive coding, and once you get it right you don't have to worry with it any longer.

So, yes: use a code generator, particularly when the coding is repetitive and fits a well-defined pattern.

I've known people to use RegX macros to do similar things, or to use Excel formulas to do similar things (I do this as well).

A metaprogramming example

I have a Ruby authorization library called Authority. It lets developers ask questions in their app with methods like current_user.can_read?(@post) and @post.readable_by?(current_user). These questions are answered by centralized authorizer classes.

This is the crucial part: Authority doesn't know which methods to define until it sees the user's configuration. The user configuration might contain: