Saturday, July 15, 2006

Schema Analysis and the need for Python-based DSLs

"The Alloy Analyzer is a tool developed by the Software Design Group for analyzing models written in Alloy, a simple structural modeling language based on first-order logic. The tool can generate instances of invariants, simulate the execution of operations (even those defined implicitly), and check user-specified properties of a model. Alloy and its analyzer have been used primarily to explore abstract software designs. Its use in analyzing code for conformance to a specification and as an automatic test case generator are being investigated in ongoing research projects."

I took a little time to go through the tutorial, and discovered that the language is very much like the domain-relational calculus, meaning that it expresses a schema in terms of sets and relationships between them. It actually seems like a terrific language for designing object schemas and expressing constraints over them. It's gotten me to wondering whether I could perhaps find a way to extend this Python cookbook recipe for expressing prolog-like rules to support the full expressiveness of Alloy, but using a nice Pythonic syntax.

One of the big problems in expressing schemas compactly in pure Python is that you often need to be able to refer to types before they are defined. For example, if you want to say that a FileSystemObject has a parent that is a Directory, but Directory is a kind of FileSystemObject, then you have a forward reference that can't be easily expressed in today's Python. The schema definition tools that I created for Chandler and PEAK both work around this by either requiring the later type (Directory) to define an inverse relationship that then links to the forward relationship, or else use strings to refer to types not yet defined. Neither of these approaches is particularly satisfactory.

So the advantage of the clever cookbook recipe is that it simply makes all names in a function body have a symbolic meaning, automatically creating objects for those names and executing the function code in a context where the names are bound. Then, traditional operator-overloading techniques can take over from there. It seems like one could use this to define a schema with something like:

Okay, so that's not exactly pretty. In fact, about the only good thing about it is that it allows forward references. But I do plan to study Alloy's reference manual and see if I can find a more natural mapping from its concepts to Python. The forward-reference issue is actually a pretty minor problem, compared to the issue of trying to concisely express generalized set constraints like "no directory can be a child of itself" in Python. These are just a few of the things needed to implement my utopian dreams for peak.schema, that effectively require a logic or set (mini-)language to sort out.

Unfortunately, Python isn't the most suitable language in the world for creating domain-specific languages like these. This is one place where Ruby really does have an advantage over Python -- as opposed to all the supposed advantages touted by Ruby enthusiasts who don't get that Python already has all the stuff they're jabbering about.

However, Ruby's advantage in this area basically boils down to two things: being able to apply functions without parentheses, and having code blocks. The first is nice because it makes it possible to create commands or pseudo-statements, and the second is a necessity because it allows those pseudo-statements to encompass code blocks. Without at least the second of these features, Python is never going to be suitable for heavy-duty DSL construction.

Blocks, however, are probably the one thing that Python will never get, due to concerns about people creating obfuscated code. That is, allowing every programmer to be a language designer means that every program can become its own language: the slippery slope that leads to Lisp. :-) One of the things that makes Python great is that it's a simple, easy-to-learn language. Sort of a "learn once, read anything" principle. Customizable syntax means that in the degenerate case there could be a new language to learn for every program you want to read. I'm honestly not sure I'd want to open those floodgates, despite the fact that it occasionally inconveniences me not to be able to create DSLs.