Search

An Introduction to Metaprogramming

A metaprogram is a program that generates other programs or program
parts. Hence, metaprogramming means writing metaprograms. Many
useful metaprograms are available for Linux; the most common ones
include compilers (GCC or FORTRAN 77), interpreters (Perl or Ruby), parser
generators (Bison), assemblers (AS or NASM) and preprocessors (CPP or
M4). Typically, you use a metaprogram to eliminate or reduce a tedious
or error-prone programming task. So, for example, instead of writing
a machine code program by hand, you would use a high-level language, such
as C,
and then let the C compiler do the translation to the equivalent low-level machine instructions.

Metaprogramming at first may seem to be an advanced topic, suitable only
for programming language gurus, but it's not really that
difficult once you know how to use the adequate tools.

Source Code Generation

In order to present a very simple example of metaprogramming, let's
assume the following totally fictional situation.

Erika is a very smart first-year undergraduate computer science
student. She already knows several programming languages, including C
and Ruby. During her introductory programming class, Professor Gomez,
the course instructor, caught her chatting on her laptop computer. As
punishment, he demanded Erika write a C program that printed the
following 1,000 lines of text:

1. I must not chat in class.
2. I must not chat in class.
...
999. I must not chat in class.
1000. I must not chat in class.

An additional imposed restriction was that the program could not use
any kind of loop or goto instruction. It should contain only one big
main function with 1,000 printf instructions—something like this:

#include <stdio.h>
int main(void) {
printf("1. I must not chat in class.\n");
printf("2. I must not chat in class.\n");
/* 996 printf instructions omitted. */
printf("999. I must not chat in class.\n");
printf("1000. I must not chat in class.\n");
return 0;
}

Professor Gomez wasn't too naive, so he basically expected Erika to write
the printf instruction once, copy it to the clipboard, do 999 pastes, and
manually change the numbers. He expected that even this amount of irksome
and repetitive work would be enough to teach her a lesson. But, Erika
immediately saw an easy way out—metaprogramming. Instead of writing this
program by hand, why not write another program that writes this program
automatically for her? So, she wrote the following Ruby script:

This code creates a file called punishment.c with the expected
1,000+ lines of C source code.

Although this example might seem a bit fabricated, it
illustrates how easy it is to write a program that produces the source
of another program. This technique can be used in more realistic
settings. Let's say that you have a C program that needs to include a PNG
image, but for some reason, the deployment platform can accept one file
only, the executable file. Thus, the data that conforms the PNG file data
has to be integrated within the program code itself. To achieve this,
we can read the PNG file beforehand and generate the C source text for
an array declaration, initialized with the corresponding data as literal
values. This Ruby script does exactly that:

This script reads the file called ljlogo.png and creates a new
output file called ljlogo.h. First, it writes the declaration of the
variable ljlogo as an array of unsigned characters. Next, it reads
the whole input file at once and unpacks every single input character as
an unsigned byte. Then, it writes each of the input bytes as two-digit
hexadecimal numbers in groups of eight elements per line. As should be
expected, individual elements are terminated with commas, except the last
one. Finally, the script writes the closing brace and semicolon. Here
is a possible output file sample:

The following C program demonstrates how you could use the generated
code as an ordinary C header file. It's important to note that the PNG
file data will be stored in memory when the program itself is loaded:

You also can have a program that both generates source code and executes it
on the spot. Some languages have a facility called eval,
which allows you to translate and execute a piece of source
code contained within a string of characters at runtime. This feature is usually
available in interpreted languages, such as Lisp, Perl, Ruby, Python and
JavaScript. In this Ruby code:

x = 3
s = 'x + 1'
puts eval(s)

The string 'x + 1' is translated and executed when the code is run,
printing 4 as a result. Note that even the value bound to variable x is
available during the runtime evaluation.

The following Ruby code demonstrates a contrived way to find the result
of adding all the integer numbers between 1 and 100. Instead of using
a normal loop or iteration method, we generate a big string containing
the expression “1+2+3+...+99+100” and then proceed to
evaluate it:

puts eval((1..100).to_a.join('+'))

The eval function should be used with care. If the string used as the
argument to eval comes from an untrusted source (for example, from user input),
it can be potentially dangerous (imagine what could happen if the string
to evaluate contains the Ruby expression rm -r *). In many cases,
there are alternatives to eval that are more flexible, less insecure
and do not require the speed hit of parsing code during runtime.

Quines

A quine is special kind of source code generator. The jargon file defines
a quine as “a program that generates a copy of its own source text
as its complete output”. You might be right if you think this lacks
any practical value by itself, but as a brain-teaser, it can be
mind-blowing. Here's a quine written by Ryan Davis, which is one of the shortest
ones for the Ruby language:

f="f=%p;puts f%%f";puts f%f

Run this program, and you will get it as output. You might even try
something like this from a shell prompt:

ruby -e 'f="f=%p;puts f%%f";puts f%f' | ruby

Here we're using the -e option from the command line to specify one line
of Ruby source to execute, and then we use a pipe to send its output to
another instance of the Ruby interpreter. The output is once again the
same program source.

Modifying Programs during Runtime

Dynamic languages, such as Ruby, allow you to modify different parts
of your program easily during runtime without having to generate
source code explicitly as we did previously. Ruby's core API and
frameworks, such as Ruby on Rails, employ this facility to automate common programming
tasks. For example, in a class definition, you can use the attr_accessor
method to produce the read/write access methods automatically for a
given attribute name. Thus, the following code:

class Person
attr_accessor :name
end

is equivalent to this more verbose code:

class Person
def name
@name
end
def name=(new_name)
@name = new_name
end
end

The previous code has a minor drawback: the corresponding instance
variable @name is not really created until you first set its value. This
means you'll get a nil value if you happen to read the name
attribute before writing to it. If you're not careful, this could
introduce a few subtle bugs into your programs. The easiest way to
avoid this problem is to set the @name instance variable to a reasonable
value in the Person#initialize method. Because this is a quite common
scenario, wouldn't it be nice to have this method generated automatically,
in addition to the read/write accessors? Let's define an attr_initialize
method that'll do that using Ruby's metaprogramming facilities.

The attr_initialize method takes as input a variable number of attribute
names (attrs). Each attribute name has the same position reserved for
it in the dynamically created initialize method parameter list (args)
in order to set its initial value. We start the new method's code by
checking that the number of arguments being received are the same as the
number of attributes we originally specified. If not, we raise an error
with a descriptive message. Afterward, we use a SyncEnumerator object
(from the generator library) to iterate at the same time over the
declared attributes list (attrs) and the actual arguments list (args)
so as to perform a one-by-one attribute-argument binding using the
instance_variable_set method. Finally, we delegate to the attr_accessor
method in order to create the read/write access methods for all the
declared attributes.

Once you're familiar with the techniques, metaprogramming is
not as complicated as it might sound initially. Metaprogramming allows
you to automate error-prone or repetitive programming tasks. You can use
it to pre-generate data tables, to generate boilerplate
code automatically that can't be abstracted into a function, or even to test your
ingenuity on writing self-replicating code.

Ruby Cookbook by Lucas Carlson and Leonard Richardson, published by
O'Reilly Media, 2006. Chapter 10 of this book contains 16 recipes on
reflection and metaprogramming using Ruby. Highly recommended.

The Quine Page: www.nyx.net/~gthompso/quine.htm. This Web page
contains quines in many different programming languages. It even has
quines that work in more than one language.

Ariel Ortiz is a faculty member at the Computer Science Department of
the Tecnolgico de Monterrey, Campus Estado de Mexico. He's been teaching
computer programming for almost two decades. He's not too sure what
his favorite programming language is, but he thinks it's either Scheme,
Python or Ruby. He can be reached at ariel.ortiz@itesm.mx.