sofs

MODULE

sofs

MODULE SUMMARY

Functions for Manipulating Sets of Sets

DESCRIPTION

The sofs module implements operations on finite sets and
relations represented as sets. Intuitively, a set is a
collection of elements; every element belongs to the set, and
the set contains every element.

Given a set A and a sentence S(x), where x is a free variable,
a new set B whose elements are exactly those elements of A for
which S(x) holds can be formed, this is denoted B =
{x in A : S(x)}. Sentences are expressed using
the logical operators "for some" (or "there exists"), "for all",
"and", "or", "not". If the existence of a set containing all the
specified elements is known (as will always be the case in this
module), we write B = {x : S(x)}.

The unordered set containing the elements a, b and c
is denoted {a, b, c}. This notation is not to be
confused with tuples. The ordered pair of a and b, with
first coordinate a and second coordinate b, is denoted
(a, b). An ordered pair is an ordered set of two
elements. In this module ordered sets can contain one, two or
more elements, and parentheses are used to enclose the elements.
Unordered sets and ordered sets are orthogonal, again in this
module; there is no unordered set equal to any ordered set.

The set that contains no elements is called the empty set.
If two sets A and B contain the same elements, then A
is equal to B, denoted
A = B. Two ordered sets are equal if they contain the
same number of elements and have equal elements at each
coordinate. If a set A contains all elements that B contains,
then B is a subset of A.
The union of two sets A and B is
the smallest set that contains all elements of A and all elements of
B. The intersection of two
sets A and B is the set that contains all elements of A that
belong to B.
Two sets are disjoint if their
intersection is the empty set.
The difference of
two sets A and B is the set that contains all elements of A that
do not belong to B.
The symmetric
difference of
two sets is the set that contains those element that belong to
either of the two sets, but not both.
The union of a collection
of sets is the smallest set that contains all the elements that
belong to at least one set of the collection.
The intersection of
a non-empty collection of sets is the set that contains all elements
that belong to every set of the collection.

The Cartesian
product of
two sets X and Y, denoted X × Y, is the set
{a : a = (x, y) for some x in X and for
some y in Y}.
A relation is a subset of
X × Y. Let R be a relation. The fact that
(x, y) belongs to R is written as x R y. Since
relations are sets, the definitions of the last paragraph
(subset, union, and so on) apply to relations as well.
The domain of R is the
set {x : x R y for some y in Y}.
The range of R is the
set {y : x R y for some x in X}.
The converse of R is the
set {a : a = (y, x) for some
(x, y) in R}. If A is a subset of X, then
the image of
A under R is the set {y : x R y for some
x in A}, and if B is a subset of Y, then
the inverse image of B is
the set {x : x R y for some y in B}. If R is a
relation from X to Y and S is a relation from Y to Z, then
the relative product of
R and S is the relation T from X to Z defined so that x T z
if and only if there exists an element y in Y such that
x R y and y S z.
The restriction of R to A is
the set S defined so that x S y if and only if there exists an
element x in A such that x R y. If S is a restriction
of R to A, then R is
an extension of S to X.
If X = Y then we call R a relation in X.
The field of a relation R in X
is the union of the domain of R and the range of R.
If R is a relation in X, and
if S is defined so that x S y if x R y and
not x = y, then S is
the strict relation
corresponding to
R, and vice versa, if S is a relation in X, and if R is defined
so that x R y if x S y or x = y,
then R is the weak relation
corresponding to S. A relation R in X is reflexive if
x R x for every element x of X; it is
symmetric if x R y implies that
y R x; and it is transitive if
x R y and y R z imply that x R z.

A function F is a relation, a
subset of X × Y, such that the domain of F is
equal to X and such that for every x in X there is a unique
element y in Y with (x, y) in F. The latter condition can
be formulated as follows: if x F y and x F z
then y = z. In this module, it will not be required
that the domain of F be equal to X for a relation to be
considered a function. Instead of writing
(x, y) in F or x F y, we write
F(x) = y when F is a function, and say that F maps x
onto y, or that the value of F at x is y. Since functions are
relations, the definitions of the last paragraph (domain, range,
and so on) apply to functions as well. If the converse of a
function F is a function F', then F' is called
the inverse of F.
The relative product of two functions F1 and F2 is called
the composite of F1 and F2
if the range of F1 is a subset of the domain of F2.

Sometimes, when the range of a function is more important than
the function itself, the function is called a family.
The domain of a family is called the index set, and the
range is called the indexed set. If x is a family from
I to X, then x[i] denotes the value of the function at index i.
The notation "a family in X" is used for such a family. When the
indexed set is a set of subsets of a set X, then we call x
a family of subsets of X. If x
is a family of subsets of X, then the union of the range of x is
called the union of the family x. If x is non-empty
(the index set is non-empty),
the intersection of the family x is the intersection of
the range of x. In this
module, the only families that will be considered are families
of subsets of some set X; in the following the word "family"
will be used for such families of subsets.

A partition of a set X is a
collection S of non-empty subsets of X whose union is X and
whose elements are pairwise disjoint. A relation in a set is an
equivalence relation if it is reflexive, symmetric and
transitive. If R is an equivalence relation in X, and x is an
element of X,
the equivalence
class of x with respect to R is the set of all those
elements y of X for which x R y holds. The equivalence
classes constitute a partitioning of X. Conversely, if C is a
partition of X, then the relation that holds for any two
elements of X if they belong to the same equivalence class, is
an equivalence relation induced by the partition C. If R is an
equivalence relation in X, then
the canonical map is
the function that maps every element of X onto its equivalence class.

Relations as defined above
(as sets of ordered pairs) will from now on be referred to as
binary relations. We call a set of ordered sets
(x[1], ..., x[n]) an (n-ary) relation, and say that the relation is a subset of
the Cartesian product
X[1] × ... × X[n] where x[i] is
an element of X[i], 1 <= i <= n.
The projection of an n-ary
relation R onto coordinate i is the set {x[i] :
(x[1], ..., x[i], ..., x[n]) in R for some
x[j] in X[j], 1 <= j <= n
and not i = j}. The projections of a binary relation R
onto the first and second coordinates are the domain and the
range of R respectively. The relative product of binary
relations can be generalized to n-ary relations as follows. Let
TR be an ordered set (R[1], ..., R[n]) of binary
relations from X to Y[i] and S a binary relation from
(Y[1] × ... × Y[n]) to Z.
The relative
product of
TR and S is the binary relation T from X to Z defined so that
x T z if and only if there exists an element y[i] in
Y[i] for each 1 <= i <= n such that
x R[i] y[i] and
(y[1], ..., y[n]) S z. Now let TR be a an
ordered set (R[1], ..., R[n]) of binary relations from
X[i] to Y[i] and S a subset of
X[1] × ... × X[n].
The multiple
relative product of TR and S is defined to be the
set {z : z = ((x[1], ..., x[n]), (y[1],...,y[n]))
for some (x[1], ..., x[n]) in S and for some
(x[i], y[i]) in R[i],
1 <= i <= n}.
The natural join of
an n-ary relation R
and an m-ary relation S on coordinate i and j is defined to be
the set {z : z = (x[1], ..., x[n],
y[1], ..., y[j-1], y[j+1], ..., y[m])
for some (x[1], ..., x[n]) in R and for some
(y[1], ..., y[m]) in S such that
x[i] = y[j]}.

The sets recognized by this
module will be represented by elements of the relation Sets, defined as
the smallest set such that:

for every atom T except '_' and for every term X,
(T, X) belongs to Sets (atomic sets);

(['_'], []) belongs to Sets (the untyped empty set);

for every tuple T = {T[1], ..., T[n]} and
for every tuple X = {X[1], ..., X[n]}, if
(T[i], X[i]) belongs to Sets for every
1 <= i <= n then (T, X) belongs
to Sets (ordered sets);

for every term T, if X is the empty list or a non-empty
sorted list [X[1], ..., X[n]] without duplicates
such that (T, X[i]) belongs to Sets for every
1 <= i <= n, then ([T], X)
belongs to Sets (typed unordered sets).

An external set is an
element of the range of Sets.
A type
is an element of the domain of Sets. If S is an element
(T, X) of Sets, then T is
a valid type of X,
T is the type of S, and X is the external set
of S. from_term/2 creates a
set from a type and an Erlang term turned into an external set.

The actual sets represented by Sets are the elements of the
range of the function Set from Sets to Erlang terms and sets of
Erlang terms:

When there is no risk of confusion, elements of Sets will be
identified with the sets they represent. For instance, if U is
the result of calling union/2 with S1 and S2 as
arguments, then U is said to be the union of S1 and S2. A more
precise formulation would be that Set(U) is the union of Set(S1)
and Set(S2).

The types are used to implement the various conditions that
sets need to fulfill. As an example, consider the relative
product of two sets R and S, and recall that the relative
product of R and S is defined if R is a binary relation to Y and
S is a binary relation from Y. The function that implements the relative
product, relative_product/2, checks
that the arguments represent binary relations by matching [{A,B}]
against the type of the first argument (Arg1 say), and [{C,D}]
against the type of the second argument (Arg2 say). The fact
that [{A,B}] matches the type of Arg1 is to be interpreted as
Arg1 representing a binary relation from X to Y, where X is
defined as all sets Set(x) for some element x in Sets the type
of which is A, and similarly for Y. In the same way Arg2 is
interpreted as representing a binary relation from W to Z.
Finally it is checked that B matches C, which is sufficient to
ensure that W is equal to Y. The untyped empty set is handled
separately: its type, ['_'], matches the type of any unordered
set.

A few functions of this module (drestriction/3,
family_projection/2, partition/2,
partition_family/2, projection/2,
restriction/3, substitution/2) accept an Erlang
function as a means to modify each element of a given unordered
set. Such a function, called
SetFun in the following, can be
specified as a functional object (fun), a tuple
{external, Fun}, or an integer. If SetFun is
specified as a fun, the fun is applied to each element of the
given set and the return value is assumed to be a set. If SetFun
is specified as a tuple {external, Fun}, Fun is applied
to the external set of each element of the given set and the
return value is assumed to be an external set. Selecting the
elements of an unordered set as external sets and assembling a
new unordered set from a list of external sets is in the present
implementation more efficient than modifying each element as a
set. However, this optimization can only be utilized when the
elements of the unordered set are atomic or ordered sets. It
must also be the case that the type of the elements matches some
clause of Fun (the type of the created set is the result of
applying Fun to the type of the given set), and that Fun does
nothing but selecting, duplicating or rearranging parts of the
elements. Specifying a SetFun as an integer I is equivalent to
specifying {external, fun(X) -> element(I, X) end},
but is to be preferred since it makes it possible to handle this
case even more efficiently. Examples of SetFuns:

The order in which a SetFun is applied to the elements of an
unordered set is not specified, and may change in future
versions of sofs.

The execution time of the functions of this module is dominated
by the time it takes to sort lists. When no sorting is needed,
the execution time is in the worst case proportional to the sum
of the sizes of the input arguments and the returned value. A
few functions execute in constant time: from_external,
is_empty_set, is_set, is_sofs_set,
to_external, type.

The functions of this module exit the process with a
badarg, bad_function, or type_mismatch
message when given badly formed arguments or sets the types of
which are not compatible.

Returns the binary relation containing the elements
(E, Set) such that Set belongs to SetOfSets and E
belongs to Set. If SetOfSets is
a partition of a set X and
R is the equivalence relation in X induced by SetOfSets, then the
returned relation is
the canonical map from
X onto the equivalence classes with respect to R.

Creates a family from
the directed graph Graph. Each vertex a of
Graph is
represented by a pair (a, {b[1], ..., b[n]})
where the b[i]'s are the out-neighbours of a. If no type is
explicitly given, [{atom, [atom]}] is used as type of
the family. It is assumed that Type is
a valid type of the
external set of the family.

If G is a directed graph, it holds that the vertices and
edges of G are the same as the vertices and edges of
family_to_digraph(digraph_to_family(G)).

If Family1 and Family2
are families, then
Family3 is the family
such that the index set is equal to the index set of
Family1, and Family3[i] is the
difference between Family1[i]
and Family2[i] if Family2 maps i,
Family1[i] otherwise.

If Family1 is
a family
and Family1[i] is a binary relation for every i
in the index set of Family1,
then Family2 is the family with the same index
set as Family1 such
that Family2[i] is
the domain of
Family1[i].

If Family1 is
a family
and Family1[i] is a binary relation for every i
in the index set of Family1,
then Family2 is the family with the same index
set as Family1 such
that Family2[i] is
the field of
Family1[i].

If Family1 is
a family
and Family1[i] is a set of sets for every i in
the index set of Family1,
then Family2 is the family with the same index
set as Family1 such
that Family2[i] is
the intersection
of Family1[i].

If Family1[i] is an empty set for some i, then
the process exits with a badarg message.

If Family1 and Family2
are families,
then Family3 is the family such that the index
set is the intersection of Family1's and
Family2's index sets,
and Family3[i] is the intersection of
Family1[i] and Family2[i].

If Family1 is
a family
and Family1[i] is a binary relation for every i
in the index set of Family1,
then Family2 is the family with the same index
set as Family1 such
that Family2[i] is
the range of
Family1[i].

If Family1 is
a family,
then Family2 is
the restriction of
Family1 to those elements i of the index set
for which Fun applied
to Family1[i] returns
true. If Fun is a
tuple {external, Fun2}, Fun2 is applied to
the external set
of Family1[i], otherwise Fun is
applied to Family1[i].

If Family1 is
a family
and Family1[i] is a set of sets for each i in
the index set of Family1,
then Family2 is the family with the same index
set as Family1 such
that Family2[i] is
the union of
Family1[i].

If Family1 and Family2
are families,
then Family3 is the family such that the index
set is the union of Family1's
and Family2's index sets,
and Family3[i] is the union
of Family1[i] and Family2[i] if
both maps i, Family1[i]
or Family2[i] otherwise.

Creates an element
of Sets by
traversing the term Term, sorting lists,
removing duplicates and
deriving or verifying a valid
type for the so obtained external set. An
explicitly given type
Type
can be used to limit the depth of the traversal; an atomic
type stops the traversal, as demonstrated by this example
where "foo" and {"foo"} are left unmodified:

from_term can be used for creating atomic or ordered
sets. The only purpose of such a set is that of later
building unordered sets since all functions in this module
that do anything operate on unordered sets.
Creating unordered sets from a collection of ordered sets
may be the way to go if the ordered sets are big and one
does not want to waste heap by rebuilding the elements of
the unordered set. An example showing that a set can be
built "layer by layer":

If TupleOfBinRels is a non-empty tuple
{R[1], ..., R[n]} of binary relations
and BinRel1 is a binary relation,
then BinRel2 is
the multiple relative
product of the ordered set
(R[i], ..., R[n]) and BinRel1.

Returns a pair of sets that, regarded as constituting a
set, forms a partition of
Set1. If the
result of applying SetFun to an element
of Set1 yields an element in Set2,
the element belongs to Set3, otherwise the
element belongs to Set4.

Returns the family
Family where the indexed set is
a partition
of Set such that two elements are considered
equal if the results of applying SetFun are the
same value i. This i is the index that Family
maps onto
the equivalence
class.

Creates a relation.
relation(R, T) is equivalent to
from_term(R, T), if T is
a type and the result is a
relation. If Type is an integer N, then
[{atom, ..., atom}]), where the size of the
tuple is N, is used as type of the relation. If no type is
explicitly given, the size of the first tuple of
Tuples is
used if there is such a tuple. relation([]) is
equivalent to relation([], 2).

Returns the set containing every element
of Set1 for which Fun
returns true. If Fun is a tuple
{external, Fun2}, Fun2 is applied to the
external set of
each element, otherwise Fun is applied to each
element.

Here might be the place to reveal something that was more
or less stated before, namely that external unordered sets
are represented as sorted lists. As a consequence, creating
the image of a set under a relation R may traverse all
elements of R (to that comes the sorting of results, the
image). In images/2, BinRel will be traversed once
for each element of SetOfSets, which may take too long. The
following efficient function could be used instead under the
assumption that the image of each element of SetOfSets under
BinRel is non-empty:

Returns a triple of sets: Set3 contains the
elements of Set1 that do not belong
to Set2; Set4 contains the
elements of Set1 that belong
to Set2; Set5 contains the
elements of Set2 that do not belong
to Set1.

Returns a subset S of the weak
relation W
corresponding to the binary relation BinRel1.
Let F be the field of
BinRel1. The
subset S is defined so that x S y if x W y for some x in F
and for some y in F.