Logical Criteria

In order to process arbitrary expression-based rules, PEAK-Rules needs to
"understand" the way that conditions logically relate to each other. This
document describes the design (and tests the implementation) of its logical
criteria management. You do not need to read this unless you are extending or
interfacing with this subsystem directly, or just want to understand how this
stuff actually works!

The most important ideas here are implication, intersection, and disjunctive
normal form. But don't panic if you don't know what those terms mean! They're
really quite simple.

Implication means that if one thing is true, then so is another. A implies B
if B is always true whenever A is true. It doesn't matter what B is when A is
not true, however. It could be true or false, we don't care. Implication is
important for prioritizing which rules are "more specific" than others.

Intersection just means that both things have to be true for a condition to
be true - it's like the "and" of two conditions. But rather than performing
an actual "and", we're creating a new condition that will only be true when
the two original conditions would be true.

And finally, disjunctive normal form (DNF) means "an OR of ANDs". For example,
this expression is in DNF:

(A and C) or (B and C) or (A and D) or (B and D)

But this equivalent expression is not in DNF:

(A or B) and (C or D)

The criteria used to define generic function methods are likely to look more
like this, than they are to be in disjunctive normal form. Therefore, we must
convert them in order to implement the Chambers & Chen dispatch algorithm
correctly (see Indexing.txt).

We do this using the DisjunctionSet and OrElse classes to represent
overall expressions (sets or sequences of "ors"), and the Signature and
Conjunction classes to represent sequences or sets of "and"-ed conditions.

Within a Signature, the things that are "and"-ed together are a sequence
of Test instances. A Test pairs a "dispatch expression" with a
"criterion". For example, this expression:

isinstance(x, Y)

would be represented internally as a Test instance like this:

Test(IsInstance(Local('x')), Class(Y))

Conjunction instances, on the other hand, are used to "and" together
criteria that apply to the same dispatch expression. For example, this
expression:

isinstance(x, Y) and isinstance(x, Z)

would be represented internally like this:

Test(IsInstance(Local('x')), Conjunction([Class(Y), Class(Z)]))

The rest of this document describes how predicates, signatures, tests, dispatch
expressions, and criteria work together to create expressions in disjunctive
normal form, and whose implication of other expressions can be determined.

The basic logical functions we will use are implies(), intersect(),
disjuncts(), and negate(), all of which are defined in
peak.rules.core:

The most fundamental conditions are simply True and False. True
represents a rule that always applies, while False represents a rule that
never applies. Therefore, the result of intersecting True and any other
object, always returns that object, while intersecting False with any other
object returns False:

Notice, by the way, a few important differences between implies() and
intersect(). implies()always returns a boolean value, True or
False, because it's an immediate answer to the question of, "does the
second condition always apply if the first condition applies?"

intersect(), on the other hand, returns a condition that will always be
true when the original conditions apply. So, if it returns a boolean value,
that's just an indication that the intersection of the two input conditions
would always apply or never apply. Also, intersect() is logically
symmetrical, in that it doesn't matter what order the arguments are in, whereas
the order is critically important for implies().

However, intersect() methods must be order preserving, because the order
in which logical "and" operations occur is important. Consider, for example,
the condition "y!=0andz>x/y", in which it would be a bad thing to skip
the zero check before the division!

So, as we will see later on, when working with more complex conditions,
intersect() methods must ensure that the subparts of the output condition
are in the same relative order as they were in the input.

(Also, note that in general, when you intersect two conditions, if one condition
implies the other, the result of the intersection is the implying condition.
This general rule greatly simplifies the implementation of most intersect
operations, since as long as there is an implication relationship defined
between conditions, many common cases of intersection can be handled
automatically.)

In contrast to both implies() and intersects(), the disjuncts()
function takes only a single argument, and returns a list of the "disjuncts"
(or-ed-together conditions) of its argument. More precisely, it returns a list
of conditions that each imply the original condition. That is, if any of the
disjuncts were true, then the original condition would also be true.

Thus, the disjuncts() of an arbitrary object will normally be a list
containing just that object:

This lets you avoid writing lots of decorators for the cases where you want
more than one type (or istype() instance) to match in a given argument
position. (As you can see, it's equivalent to specifying all the individual
combinations of specified types.)

Finally, the negate() function inverts the truth of a condition, e.g.:

>>> negate(True)
False
>>> negate(False)
True

Of course, it also applies to criteria other than pure boolean values, as we'll
see in the upcoming sections.

A criterion object describes a set of possible values for a dispatch
expression. There are several criterion types supplied with PEAK-Rules, but you
can also add your own, as long as they can be tested for implication with
implies(), and intersected with intersect(). (And if they represent an
"or" of sub-criteria, they should be able to provide their list of
disjuncts(). They'll also need to be indexable, but more on that later in
other documents!)

Sometimes, more than one criterion is applied to the same dispatch expression.
For example in the expression xisnotyandxisnotz, two criteria are
being applied to the identity of x. To represent this, we need a way to
represent a set of "and-ed" criteria. peak.rules.criteria provides a base
class for this, called Conjunction:

>>> from peak.rules.criteria import Conjunction

This class is a subclass of frozenset, but has a few additional features.
First, a Conjunction never contains redundant (implied) items.
For example, the conjunction of the classes object and int is int,
because int already implies object:

Notice also that instead of getting back a set with one member, we got back the
item that would have been in the set. This helps to simplify the expression
structure. As a further simplification, creating an empty conjunction returns
True, because "no conditions required" is the same as "always true":

>>> Conjunction([])
True

A conjunction implies a condition, if any condition in the conjunction
implies the other condition:

(By the way, on a more sophisticated level of reasoning, you could say that
Conjunction([str,int]) should have equalled False above, since
there's no way for an object to be both an int and a str at the same
time. But that would be an excursion into semantics and outside the bounds of
what PEAK-Rules can "reason" about using only logical implication as defined by
the implies() generic function.)

Conjunction objects can be intersected with one another, or with
additional conditions, and the result is another Conjunction of the
same type as the leftmost set. So, if we use subclasses of our own, the result
of intersecting them will be a conjunction of the correct subclass:

If you want to ensure that all items in a set are of appropriate type or value,
you can override __init__ to do the checking, and raise an appropriate
error. PEAK-Rules does this for its specialized conjunction classes, but uses
if__debug__: and assert statements to avoid the extra overhead when
run with python-O. You may wish to do the same for your subclasses.

The DisjunctionSet and OrElse classes are used to represent sets and
sequences of "or"-ed criteria:

>>> from peak.rules.criteria import DisjunctionSet, OrElse

Both types automatically exclude redundant (i.e. more-specific) criteria, and
can never contain less than 2 entries. For example, "or"-ing object and
int always returns object, because object is implied by int:

Notice that instead of getting back a set or sequence with one member, we got
back the item that would have been in the set. This helps to simplify the
expression structure. As a further simplification, creating an empty
disjunction returns False, because "no conditions are sufficient" is the
same as "always false":

>>> DisjunctionSet([])
False
>>> OrElse([])
False

In addition to eliminating redundancy, disjunction sets also flatten any
nested disjunctions:

This is because it uses the disjuncts() generic function to determine
whether any of the items it was given are "or"-ed conditions of some kind. And
the disjuncts() of a DisjunctionSet are a list of its contents:

>>> disjuncts(DisjunctionSet([1, 2, 3, 4]))
[1, 2, 3, 4]

But OrElse sequences do not do this flattening, in order to avoid imposing
an arbitrary sequence on their contents:

(The disjuncts() of an OrElse are much more complicated, as the
disjuncts of a Python expression like "aorborc" reduce to "a",
"(nota)andb", and "(notaandnotb)andc"! We'll talk more about
this later, in the section on Predicates below.)

A disjunction only implies a condition if all conditions in the disjunction
imply the other condition:

The IsObject criterion type represents the set of objects which either
are -- or are not -- one specific object instance. IsObject(x) (or
IsObject(x,True)) represents the set of objects y for which the
yisx condition would be true. Conversely, IsObject(x,False)
represents the set of objects y for whom yisnotx:

The Range() criterion type represents an inequality such as lo<x<hi
or x>=lo. The lows and highs given have to be a 2-tuple, consisting of
a value and a "direction". The direction is an integer (either -1 or 1) that
indicates whether the edge is on the low or high side of the target value.
Thus, a tuple (27,-1) means "the low edge of 27", while (99,1)
means "the high edge of 99". In this way, any simple inequality or range
can be represented by a pair of edges.

Thus, the intersection of two different != values produces a disjunction of
three Range() objects, representing the intervals that "surround" the
original != values:

Notice that if we omit the hi or lo, end of the range, it's replaced
with "below Min" or "above Max", as appropriate. (The Min and
Max values are special objects that compare below or above any other
object.)

When creating range and value objects, it can be useful to use the
Inequality constructor, which takes a comparison operator and a value:

Now that we've got all the basic pieces in place, we can now operationally
define predicates for the Chambers & Chen dispatch algorithm.

Specifically, a predicate can be any of the following:

True (meaning a condition that always applies)

False (meaning a condition that never applies)

A Test or Signature instance

A DisjunctionSet or OrElse containing two or more Test or
Signature instances

In each case, invoking disjuncts() on the object in question will return
a list of objects suitable for constructing dispatch "cases" -- i.e., sets of
simple "and-ed" criteria that can easily be indexed.

The tests_for() function can then be used to yield the component tests of
each case signature. When called on a Test, it yields the given test:

tests_for(False), however, is undefined, because False cannot be
represented as a conjunction of tests. False is still a valid predicate,
of course, because it represents an empty disjunction.

In normal predicate processing, one loops over the disjuncts() of a
predicate, and only then uses tests_for() to inspect the individual items.
But since disjuncts(False) is an empty list, it should never be necessary
to invoke tests_for(False).

There is an important distinction, however, in how disjuncts() works on
OrElse objects, compared to all other kinds of predicates. disjuncts()
is used to obtain the unordered disjunctions of a logical condition, but
OrElse is ordered, because it represents a series of applications of the
Python "or" operator.

In Python, a condition on the right-hand side of an "or" operator is not tested
unless the condition on the left is false. PEAK-Rules, however, tests the
disjuncts() of a predicate independently. Thus, in order to properly
translate "or" conditions in a predicate, the disjuncts() of an OrElse
must include additional and-ed conditions to force them to be tested in order.

Specifically, the disjuncts() of OrElse([a,b,c]) will be:

a,

intersect(negate(a),b), and

intersect(intersect(negate(a),negate(b)),c)!

This expansion ensures that b will never be tested unless a is false,
and c will never be tested unless a and b are both false, just like
in a regular Python expression. Observe:

This delayed expansion "preserves the unorderedness" of the contents, by not
forcing them to be evaluated in any specific sequence, apart from the
requirements imposed by their position within the OrElse.

We'll do one more test, to show that the disjuncts of the negated portions of
the OrElse are also expanded:

(a and b) or (int|str) => (a and b) | not (a and b) and (int|str)
not (a and b) and (int|str) => (not a | not b) and (int|str)
(not a | not b) and (int|str) => (
(not a and int) | (not a and str) | (not b and int) | (not b and str)
)