The Rule Challenge

I’ve made this challenge on IRC several times, and I think on the mailing lists too, but I’m putting it up here for reference.

I’m not a fan of rules. I’ve used them in production code in the past, though I’d avoid doing so again, and (unlike most people who end up using them) have spent a fair bit of time analyzing their behaviour and effects. I’ve come to the following conclusion:

All non-trivial rules are incorrect.

I define “non-trivial” as follows:

Any rule with multiple action statements is non-trivial.

Any rule using DO ALSO is non-trivial.

Any rule with a WHERE clause is non-trivial.

This pretty much only leaves rules which rewrite into at most one simple statement, which I’ll concede can sometimes be written correctly (but often are not).

I define “incorrect” as “produces surprising or destructive behaviour in response to ordinary SQL statements”. In general this requires SQL statements not forseen by the rule writer; it’s easy to write rules that work only for certain statements, but doing this in real code amounts to laying a minefield for your future developers.

My challenge is simple: show me a counterexample to the above claims. Any takers?

[EDIT: to clarify, by “rule with a WHERE clause” I mean a conditional rule with WHERE used in the rule definition, not merely a rule whose action statement has a WHERE clause in it.]

I’m not principally a PostgreSQL folk, DB2 is my main squeeze, but I do follow it a bit. The whole rules thing is something only PG did, and that was Stonebraker’s doing. One would have to read up his rationale way back when. From what I can find, View support is the justification for using rules, since that is the only way to get it. There have been a few links from the PostgreSQL site to rule posts such as this recently. I wonder why rules have become a topic du jour?

The challenge as written isn’t provable; the counterexample is exhaustive enumeration of all non-trivial rules.

As to better, one of those linked posts asserts that triggers are better. I would agree with that.

We’ve had more than the usual number of rules questions on IRC recently, at least that’s my impression; that would probably explain the multiple recent blog posts. (And I only wrote this one because someone on IRC bugged me about it.)

A few years back, I faced a very sophisticated, rapidly evolving schema. The domain was comprehensive IT management (with some big-data-like, telemetry sampling/management thrown in). This schema was more sophisticated that >99% of the schema I’ve ever seen (or ever want to see again ;-).

The only database we could find with nearly enough raw expressive power was PostgreSQL. PostgreSQL’s SQL-rewriting RULES were a game changer. From our perspective, this sort of ‘active schema’ facility opened a door to some really nifty things. As I see it, PG’s SQL Rewriting RULES put PostgreSQL way ahead of any alternative database (relational or otherwise). Let me try to be a *little* less hand-wavey.

From memory, we first overlapped persistent objects. What we did was to have the shared members/attributes of an object go into a ‘base’ table. Then, there was zero or one matching records, in one or more other tables, that carryied the specialized members/attributes added by sub-classes of that base class/type. We supported only single inheritance this way – but at the expense of yet more complexity – multiple inheritance could be supported.

At any rate, SQL-rewriting RULES did allow us to decouple the base classes, and the tables used to persist instances of such base classes, from arbitrary, uncoordinated specialization.

Beyond this basic ‘schema’, an arbitrary level of ‘further’ schema specialization was relegated to distinct development teams (operating in different time zones, languages, cultures, etc.). The idea was to minimize the need for coordination/communication about schema specializations. Different teams took the core/base/canonical schema du jour – and they all ran with it in parallel.

Then, about once a year, we would promote (or absorb) any duplicated-via-specialization members/attributes/triggers/procedures into the core/base/shared schema. Actually, we ended up with a hierarchy of such layered-by-specialization schemata.

I recall having to work around the lack of upsert (aka merge). When an instance of specialized object was created by the application layer logic, we often needed to either update the existing, and corresponding, base class record or insert a new one. As I recall, we used a stored procedure. Long story.

Coming from a heavy Java/Hibernate (ORM tool) background, we recognized that we were implementing a novel form of object persistence (well – novel at least in so far as Hibernate team/doc surveys the various options for mapping classes onto relations/tables).

Via this process, we freed large development teams to specialize a base schema … while keeping such specializations effectively ‘private’. There was no possible coupling to other, similar, specialized schema – unless and until the core/base schema was changed. That sort of shared schema/core/base change was carefully managed – and it happened only about once per year.

At the end of the day, this process made parallel schema evolution practical. For a very large scale, distributed software engineering organization, this mattered. Sure, there aren’t very many software engineering operating at this global 500 scale – nor do many of those attempt a shared, core/base schema of this complexity – but perhaps that is because not too many folks recognize how much power SQL-rewriting RULES puts into the hands of schema designers. It seems to me that there may be something of a catch-22 here.

Back then, once we recognized how much of a game-changer PG’s SQL-rewriting RULES could be, and how they affected what we (internally) dubbed an ‘active’ schema, we started to see quite a few other applications (for ‘active’ schema). For example, at one point, we needed a super strong, generalized audit trail. In IT management, keeping track of just who changed just what and when is critical.

To use Richard Snodgrass’s (temporal SQL) lingo, we had already partitioned valid/effective dated ‘history’ records into distinct tables. For fairly obvious performance reasons, we needed to physically separate (old valid/effective dated) history (records) from current data (records). And given the nasty, inherently complex nature of this (comprehensive IT management) domain, we also required a fundamentally distributed system – with resilience to network partitions, eventual consistency and all that nasty stuff.

So, we found RULES worked once again like a swiss army (schema) chainsaw. Given how primitive (or non-existant) the development tooling around PG RULES, none of this was easy. However, we managed to effect a second dimension of partitioning along the transaction/knowledge dimension. The result was bi-temporal audit trail, partitioned by valid/effective date ranges/periods and then further/recursively/sub-partitioned by transaction/knowledge date ranges/periods.

We also used RULES to create a ‘matching’ (active) schema to simplify both interoperability-oriented, ETL-style mappings/interfaces. After all, if we could coax PG into presenting a schema (consissting of dozens/hundreds of write-able views) that looked like some source (or industry standard) schema, but mapped in non-trivial ways, to our canonical/base/core destination schema, everything was 1000x easier. For instance, there are some intensely complex, putative industry-standard, IT management schemata out there for the IT management domain. Think about the common information model from the distributed management task force or the SID model from the Telecommunications Management Forum or …

One thing for sure. Without SQL-rewriting of a single SQL statement into many SQL statements, this kind of thing just cannot be done … except for cases where the source and destination schemata are largely (and significantly) conformal to begin with.

Best of all, we could create more than one such ‘in-the-RDBMS’, active-mapping schemata … to simultaneously appeal to an ever wider variety of end-user preferences, diverse industry standards, etc. And on top of that, we found the same trick handy for producing simplified, ad-hoc reporting schemata. This made it far more practical for our intensely complex core/base/canonical schema to be exposed via simplified schemata. We found simplified schemata useful for both sub-domain use and for perspective-specific, schema-level ‘views’ (of the underlying, core/base/canonical schemata). For end-users demanding ‘simplicity’ – with utter, wanton disregard for the inherent, underlying complexity of the IT management domain – we found PG’s SQL-rewriting RULES to be a god send.

While PostgreSQL seems uniquely distinguished by it’s avant-garde support for SQL-Rewriting RULES, and while I feel RULES makes PostgreSQL schemata 100x more expressive, there’s no question that the door to applications of such ‘active’ schema technologies is just barely cracked open – and only a little. Today, these sorts of needs are either avoided – or when pressed – just sucked up into application layer logic.

SQL-Rewriting takes a database into a feature/function level, and into new value-add, that most folks just do not consider the province of databases at all. There is almost a reflexive reaction to put the database back into the traditional database ‘box’ (of handling persistence, ACID transactions, indexing, caching, RBAC, etc.). Generalizing into the level of SQL-rewriting is just too much encroachment into the bailiwick of analytical/BI tools, custom reporting, ETL/warehousing tools, etc.

Going so far beyond the ‘database’ box is most disorienting … and easy to misapprehend … and actually … easy to miss altogether.

When SQL-rewriting is combined with computationally complete stored procedures, triggers and such … a whole host of things become both possible and practical.

The litany of interoperability-oriented, federation-enabling, etc. opportunities are fascinating. But one thing for sure. None of these applications is very widely appreciated. None seem to be well documented. None seem to have exemplary ‘reference’ implementations. It is as if the other shoe is still waiting to be dropped.

To the degree that this might be true, it would remind me of object-oriented programming notions back in the days when they were first implemented -in Simula68 (as in 1968). It wasn’t until the late 1980’s (or even early 1990s) that object-oriented technology broke into the mainstream. I suspect SQL-Rewriting RULES are in something of a similar ‘recognition limbo’.

The original rules were meant to facilitate updateable views. When you have an updateable view where the updates do not match the automagic of our currect updateable views, you need to write a rule. For example, an update RULE on a view which includes a key can prevent key updates. With a view with many joined tables may want to enable updates on a few columns in a few tables. The idea is that you may not want a proscribed updateable view. A master-detail view can allow changes, for example, on only the detail tables.

The other reason for rules are all the reasons for a view, which is a RULE.
–ex-illustra

Views are quite limited forms of RULEs – in particular, they don’t allow more than one rule, or more than one action, or any WHERE clause in the way that rules do (the WHERE clause of a view itself is a different thing). So they can never qualify as “non-trivial” in the sense I used above.

But while rules can be used to make views writable, the fact is that in all non-trivial cases the result is in some respect flawed: some queries that cause the rule to be invoked will produce unexpected and usually undesired results which are not easily predicted in advance.

Now that we have INSTEAD triggers on views, any case where you want a writable view that’s not covered by the default magic, you should be writing a trigger rather than a rule, since it’s very easy to understand what the trigger will do and how to write it correctly.