Log in

Cybernethics / Cybernéthique

Liberty, Ethics and Information / Liberté, Éthique et Information

Metaprogramming from the ground up: avoid C

Long ago, assembly languages were endowed with expressive macroprocessing facilities.
But it still sucked to write in non-portable languages with incompatible proprietary such metalanguages.
And thus, I wanted to metaprogram in something reasonably portable,
which at the time pretty much meant C.
The first obvious choice was to look at what the standard C pre-processor, CPP offered.

So as to prevent people from shooting themselves in the foot, the language designers made sure the macro expansion algorithm would terminate, by disabling recursion on already-expanded tokens in expressions where the token was already processed. Some people, notably hbaker, thought of using #include as a recursion mechanism. Unhappily, this isn't enough, because you cannot store infinite state in a CPP program:
there is a finite number of variable-setting clauses in a program,
each to a fixed variable known at compile-time, which leads
to a finite number of variables usable in tests.
There is a finite number number of test statements, each combining into a boolean a finite number of variables, in a computation restricted to operators of modular arithmetics; variables being expanded as lexical text can actually expand to something that has more combinations than a fixed-precision integer, but up to the reduction to some arithmetic operations, there is still but a finite number of observable grammatical states that a variable can take.

All in all, it is impossible to write a useful non-trivial metaprogram in CPP. But that doesn't mean it is impossible to write trivial harmful metaprograms in CPP, as is easily demonstrated in my counter-example die_die_stupid_c_compiler.c. So CPP is but one more example of a
fascistbondage and discipline meta-language.
At the same time, the C++ language was slowly extending itself with a meta-programming system, its template language, that soon enough became weakly "Turing equivalent", and allowed wizards to write metaprograms to do all kind of wonderful things. Except that this metalanguage was a pure functional language completely disconnected from C++ itself, extremely hard to debug -- you pretty much have re-develop all meta-level libraries from scratch in a completely new language to do non-trivial metaprograms, and cannot reuse libraries across language levels to bootstrap new functionality. Yet another misguided design.

The correct approach was OpenC++, that provides metaprogramming in the same recursively-bootstrappable meta-language as C++.
But by the time you get there, you understand that C++ is not a language you want to use for metaprogramming anyway.
Like Perl, C++ is a swiss army chainsaw of a programming language.
Unlike Perl, it's got all the blades simultaneously and permanently cast
in a fixed half-open position.
Don't turn it on.
If you want to metaprogram a language in itself,
you'll do yourself a favor by instead choosing
Lisp, OCaml, Oz, Haskell, Erlang,
or any other HOT language.

As for C, it's rather bad as a portable assembly language,
as it doesn't handle continuations, multiple-value returns,
doesn't allow you precise access to the memory model
and temporary variables as required for precise garbage collection, etc.
I will spare you my pitiful attempts at metaprogramming it with m4 (don't try it -- m4 really sucks,
being better than CPP is a very low bar;
however you may look at ThisForth for a relative success at using it).
Tom Lord did interesting things with the hard part of metaprogramming:
not just generative but analytic, too.
He achieved the automatic verification of some GC invariants
in the C layer of his Scheme implementation --
and that convinced me that even when done
the best possible way,
metaprogramming C still sucks:
the CPP layer makes it hard to reason about actual source,
and the language has lots of arbitrary semantics
that make it hard to reason about where side effects happen
unless you have intimate knowledge of the compiler,
but there is no way to access this knowledge
unless you re-write your own compiler at which point,
why choose C?

These days,
LLVM seems to be the main thing
as far as a portable low-level language target for metaprogramming goes
(unless you join the dark side and drink the .NET kool-aid).
And if you don't care as much for mainstream and portability,
you could try to go the way of Factor
or your own COLA
and build a system from the ground up around sound metaprogramming principles.

I understand.D has a GC, but it's not precise and it can't compact the GC heap.To change the situation I think D may have to go a bit the Cyclone way, managing two different types of pointers, fully GC-managed references to GC-managed memory/GC-managed references, and normal raw (C-like) pointers to the C heap. Then the GC is allowed to move/compact memory allocated with GC-managed pointers, becoming more precise.

You cannot say "macroprocessor" and "metaprogramming" in one sentence :)

The idea of macroprocessors actually predates C - I'm old enough to remember debates about comparative merits of macro assemblers (assembler + macroprocessor) vs compilers for the purpose of systems programming. So the common ground was to have a compiled pseudo-assembler *and* macroprocessor on top of it.

Mono Kool-Aid

In my humble opinion it is not to bad to drink some .net/mono kool-aid if you like meta-programming.

If you stick to .net 2.0 you have a true cross-platform product that is great for RAD development, Mono even has its own IDE and you can download Visual Studio Express if you are on a Windows platform(or if you really want to stay Open Source use SharpDevelop).

The only problem I can see with .net/mono is the patent FUD from Microsoft.

BitC

BitC is an ambitious attempt at providing a nice safe low-level language. Its SEXP syntax and well-defined semantics makes it relatively easy to metaprogram BitC, but it looks like you still lack the controls that would allow you to implement GC or PCLSRing for a platform targetting BitC.

Metaprogramming is not perfect

At least in its C++ incarnation, metaprogramming is far from perfect:I consider string manipulation to be one of the places where C++ is a total disaster. It's way to easy for idiots to do something like this:

a = b + "/share/" + c + serial_num;

where you can have absolutely no idea how many memory allocations are done, due to type coercions, overloaded operators (good God, you can overload the comma operator in C++!!!), and then when something like that ends up in an inner loop, the result is a disaster from a performance point of view, and it's not even obvious *why*!Theodore T'so

I would tend to say that this problem is not specific to C++: macroprogramming in Common Lisp and Metaobject protocols suffer from the very same problem: too often, "higher level" philosophies hide and abtract away the "low level" aspects which, as a matter of fact, are the very ones you absolutely need to control the quality of your implementation.

C++ is a horrible language. It's made more horrible by the fact that a lot of substandard programmers use it, to the point where it's much much easier to generate total and utter crap with it. Quite frankly, even if the choice of C were to do *nothing* but keep the C++ programmers out, that in itself would be a huge reason to use C. [...] In other words, the only way to do good, efficient, and system-level and portable C++ ends up to limit yourself to all the things that are basically available in C.---And if you want a fancier language, C++ is absolutely the worst one to choose. If you want real high-level, pick one that has true high-level features like garbage collection or a good system integration, rather than something that lacks both the sparseness and straightforwardness of C, *and* doesn't even have the high-level bindings to important concepts.

IOW, C++ is in that inconvenient spot where it doesn't help make things simple enough to be truly usable for prototyping or simple GUI programming, and yet isn't the lean system programming language that C is that actively encourages you to use simple and direct constructs.Linus Torvalds

It's not an exaggeration: the problem is that more often than not, what he says often turns to be really true: due to its fancy abstraction features, people program in C++ (and OOP, and AOP) by means of what they think is the meaning of the code. But this is WRONG, for as a matter of fact, the real meaning of the code exclusively stems from the semantics of the underlying machine: thus trying to hide its semantics appears to be rather insane, it's one of the very last things to do !

All this spells a lot about the failure of current metaprogramming attempts till now. In fact (at least this is my opinion), the very concept of encapsulation is the problem (or at least an important part of the problem).