Proof of Theorem 1

We here prove
the correctness of algorithm solve_constraints. We will
assume throughout that the two variables in the long of any constraint
in S are distinct.

Lemma 3:Let T be a cluster tree and let Z be an instantiation of T.
Let a and b be symbols of T. Let N be the least common ancestor
of a and b in T. Then od(
Z(a), Z(b)) =
odiam(Z(N)).

Proof: Since Z(a) and Z(b) are elements of
Z(N), it follows from the definition of odiam that
od(
Z(a), Z(b)) <<= odiam(Z(N)).
Suppose the inequality were strict; that is,
od(
Z(a), Z(b)) << odiam(Z(N)).
Then let C be the set of all the symbols c of T such that od(Z(a),Z(c)) <<= od(Z(a),Z(b)).
Then odiam(Z(C)) = od(Z(a),Z(b)) << odiam(Z(N)).
It is easily shown that Z(C) is a cluster in
Z(T). Therefore, by property (ii) of definition 5,
there must be a node M such that M.symbols = C. Now, M is certainly
not an ancestor of N, since odiam(Z(M)) << odiam(Z(N))
but M.symbols contains both a and b. But this contradicts the
assumption that N was the least common ancestor of a and b.

Corollary 4:Let T be a cluster tree and let Z be an instantiation of T.
Let a,b,c,d be symbols of T.
Let N be the least common ancestor of c and d in T, and
let M be the least common ancestor of a and b in T.
Then od(
Z(a), Z(b)) << od(
Z(c), Z(d)) if and
only if M.label < N.label.

Lemma 5:Let S be any set of constraints of the form
od(a,b) << od(c,d). Let H be the connected components of
the shorts of S. If S is consistent, then not every
edge of S is in H.

Proof: Let Z be a valuation satisfying S. Find
an edge pq in S for which od(Z(p),Z(q)) is maximal.
Now, if ab is a short of S -- that is, there is a constraint
od(a,b) << od(c,d) in S -- then od(
Z(a), Z(b))
<< od(
Z(c), Z(d)) <<= od(
Z(p), Z(q)).

Now, let ab be any edge in H, the connected components of the shorts of
S. Then there is a path
a1=a, a2 ... ak =b such that
the edge
aiai+1 is a short of S for
i=1 ... k-1.
Thus, by the om-triangle inequality,
od(
Z(a), Z(b)) <<=
maxi=1..k-1(od
(Z(ai), Z(ai+1))) <<od(
Z(p), Z(q)). Hence pq <> ab, so pq is not in H.

Lemma 6:The values of S and H in any iteration are supersets
of their values in any later iteration.

Proof:S is reset to a subset of itself at the end of each
iteration. H is defined in terms of S in a monotonic manner.

Lemma 7:S cannot be the same in two successive iterations of the main loop.

Proof: by contradiction. Suppose that S is the same in two
successive iterations. Then H will be the same, since it is defined
in terms of S. H is constructed to contain all the shorts of S,
Since the resetting of S at the end of the first iteration does
not change S, H must contain all the longs as well. Thus,
H contains all the edges in S. But that being the case, the
algorithm should have terminated with failure at the beginning of
the first iteration.

Lemma 8:Algorithm solve_constraints always terminates.

Proof: By lemma 7, if the algorithm does not exit
with failure, then
on each iteration some constraints are removed from S. Hence, the
number of iterations of the main loop is at most the original size of
S. Everything else in the algorithm is clearly bounded. (Note
that this bound on the number of iterations is improved in section
6.1 to n-1, where n is the number of symbols.)

Proof: If the algorithm returns false, then the transitive
closure of the shorts of S contains all the edges in S.
By lemma 5, S is inconsistent.

Lemma 10:If constraint C of form od(a,b) << od(c,d) is in the
initial value of S, and edge cd is in H in
some particular iteration, then constraint C is in S at the
start of that iteration.

Proof: Suppose that C is deleted from S on some particular
iteration. Then edge cd, the long of C, cannot be in
H in that iteration. That is, it is not possible for edge cd to
persist in H in an iteration after C has been deleted from S.
Note that, by lemma 6, once cd is eliminated from
H, it remains out of H.

Lemma 11:The following loop invariant holds:
At the end of
each loop iteration, the values of L.symbols, where L is a leaf in the
current state of the tree, are exactly the connected components of H.

Proof: In the first iteration, T is initially just the root R,
containing all the symbols, and a child of R is created for each connected
component of H.

Let Ti and Hi be the values of T and H at the end of the ith
iteration.
Suppose that the invariant holds at the end of the kth iteration.
By lemma 6, Hk+1 is a
subset of Hk. Hence, each connected component of Hk+1 is a
subset of a connected component of Hk. Moreover, each connected component
J of Hk is either a connected component of Hk+1 or is partitioned
into several connected components of Hk+1. In the former case,
the leaf of Tk corresponding to J is unchanged and remains a leaf in
Tk+1. In the latter case, the leaf corresponding to J gets assigned
one child for each connected component of Hk+1 that is a subset of J.
Thus, the connected components of Hk+1 correspond to the leaves of
Tk+1.

Lemma 12:If procedure solve_constraints does not return false,
then it returns a well-formed cluster tree T.

Proof:
Using lemma 11, and the cleanup section of
solve_constraints which creates
the final leaves for symbols, it follows that every symbol in S ends up in a single leaf of T. As m is decremented on each iteration,
and as no iteration adds both a new node and children of that node, it
follows that the label of each internal node is less than the label
of its father. Hence the constraints on cluster trees
(definition 3) are
satisfied.

Lemma 13:Let a,b be two distinct symbols in S and let T be the cluster tree
returned by solve_constraints for S. Let N be the least
common ancestor of a,b in T. Then either N is assigned its label
on the first iteration when the edge ab is not in H, or the
edge ab is in the final value of H when the loop is exited and
N is assigned its label in the final cleanup section.

Proof: As above, let Hi be the value of H in the ith
iteration.

If N is the root, then it is assigned its label in the first
iteration. Clearly, a and b, being in different subtrees of N,
must be in different connected components of H1.

Suppose N is assigned its label in the kth iteration of the loop for
k>1.
By lemma 11, at the end of the previous iteration,
N.symbols was a connected component of Hk-1, and it
therefore contained the edge ab. Since N is the least common ancestor
of a,b, it follows that a and b are placed in two different children
of N; hence, they are in two different connected components of Hk.
Thus the edge ab cannot be in Hk.

Suppose N is assigned its label in the cleanup section of the algorithm.
Then by lemma 11, N.symbols is a connected component
of the final value of H. Hence the
edge ab was in the final value of H.

Lemma 14:Let S initially contain constraint C of form
od(a,b) << od(c,d).
Suppose that solve_constraints(S) returns a
cluster tree T. Let M be the least common ancestor of a,b in T and
let N be the least common ancestor of c,d. Then M.label < N.label.

Proof: Suppose N is given a label in a given iteration. By
lemma 13, cd is eliminated from H in that same
iteration. By lemma 10,
constraint C must be in S at the start of the
iteration. Hence ab is a short of S in the iteration, and
is therefore in H. Hence
M is not given a label until a later iteration, and therefore is
given a lower label.

It is easily seen that cd cannot be in H in the final iteration of
the loop, and hence N is not assigned its label in the cleanup
section.

Lemma 15:Suppose that solve_constraints(S) returns a
cluster tree T. Then any instantiation of T satisfies the constraints
S.

Proof: If solve_constraints(S) returns false, then
it is inconsistent (lemma 9). If it does not return
false, then
it returns a cluster tree T (lemma 12). Since T has
an instantiation (lemma 2) and since every
instantiation of T is a solution of S (lemma 15),
it follows that S is consistent and T satisfies S.