Types With Extents

With the increasing complexity of models and tasks to which databases are being
applied in recent years, research in database theory and applications has moved
away from the simple, flat-relational data-model and towards data-models and
paradigms, such as semantic or object oriented databases, which support more
complex data-structures.

The purpose of this work is to study databases involving complex, potentially
recursive data-structures and how such databases can be queried and
manipulated. A major insight can be gained by considering the analogy between
the relationship between values and types in programming languages and that
between instances and schemas in databases. Though the precise interpretations
of these terms vary widely, broadly speaking a type describes a collection of
values with similar structure, while a schema describes a set of admissible
instances for a database. However a schema will normally describe a number of
constraints on database instances which go beyond the type of the instance
(considered as an unusually complex data value). The constraints expressable
are dependent on the particular data-model and DBMS being consiidered, but
perhaps the most fundamental of these constraints, common to any database
system, is that a schema describes a number of finite sets, representing the
data stored in the database, and the relationships between these sets. For
example a relational database schema will describe a number of relations
(finite sets of tuples), while an object-oriented schema would describe a
number of classes (finite sets of objects).

The basic premise of this work is that, when working with a database, all
values originate from one of these finite _extents_, and that even recursive
data-values will have finite representations in terms of the elements of these
extents. This knowledge provides us with the ability to implement efficient
transformations between databases, and to evaluate recursive function
definitions on databases which would not be decidable in a more general
programming environment.

Data Models

The type system we develop is basically the same as that of
[Abiteboul/Kannelakis], and includes type constructors for records, varients
and sets, as well as "Classes" which are associated with finite extents. A
schema then associates a type with each class in some distinguished finite set
of classes, this being the type of objects in that class.

As with [Abiteboul/Kannelakis] we consider both a "value-based" model, in which
data-values are regular trees, and an object-identity based model, and examine
the correspondences between them. However we do this from the perspective of
observational distinguishability of instances in the presence of various
predicates on values of class type.

We also look at the idea of keys as a means of distinguishing or refering to
objects in a database. I believe that these are extremely important for the
purpose of practical programming and querying with databases involving such
complex data-structures, and also for languages allowing the creation of new
object identities.

Transformations

A logical language for expressing database transformations and constraints on
these models, similar to the one already developed by myself and others for
more limited data-models, will be constructed. Here transformations are
considered to be simple structural manipulations of data-values, mapping
instances of one schema to instances of another. Such transformations can
arrise from a number of distinct database tasks: schema evolutions, data-entry,
user views of databases, importing and exporting data from heterogenous
databases and so on. Our collaborative work with the Human Genome Center at
the Univerity of Pennsylvania indicates that such transformations are extremely
important and useful.

We show that a class of "non-recursive" transformations, in which a source
database instance can be converted to a target instance in a single pass, are
of particular interest. For such transformations our logic can be considered to
be a declarative language for programming the transformations. Compiling such
transformation programs consists of rewriting the programs into a "normal form"
in which clauses completely describe how a particular class is populated. These
normal form programs can then easily be converted into an appropriate database
programming language.

Constraints

It is apparent that the simple and somewhat ad hoc collection of constraints
normally available in a relational database model are not adequate to express
many of the constraints arrising in the more complex datamodels currently
gaining in popularity, or in the tasks to which such databases are being
applied. We also believe that there is an important interaaction between
database transformations and constraints, with database transformations
implying various constraints on the source and target databases, and
constraints playing a part in determining transformations. We therefore use the
same logic as a constraint language as for determining transformations,
allowing us to express a very general family of constraints, and to reason
about transformations, constraints and the interactions between the two in a
single framework.

Recursive Queries

In order to query databases involving values which are recursive or have
unbounded depth it is desirable to be able to write recursive function
definitions over the data. However, in a general programming environment such
recursive functions may not terminate, while traditionally, when dealing with
database queries, we expect all queries to terminate and to be efficiently
computable.

Further a recursive function definition may admit a number of undesired
solutions in addition to the intended solution. We propose criteria for
determining the desirable solutions for a function equation, representing those
that can be computed in a "constructive" manner. We develop an algorithm for
finding these desired solutions. The algorithm makes use of the known finite
extents of values to ensure termination. We go on to show that these solutions
can also be computed (though perhaps not very efficiently) without using
general recursion at all, but instead using only structural induction on sets.