CS145 - Introduction to DatabasesSpring 2000, Prof. Widom

Written Assignment #4
Due Wednesday May 10

Problem 1

Recall from Problem 9 in Written
Assignment #3 the relation Flight(from,to,cost,airline),
containing nonstop flights from one city to another. We were
interested in part (c) of that problem in writing a single SQL query
that would return the cheapest cost of flying between each pair of
cities, regardless of the number of stops. While it is not possible
to write such a query in SQL2, it is possible to compute the answer
using SQL in conjunction with a programming language, and that's your
task in this problem. You may use SQL embedded within a C-like
language as presented in Section 7.1 of the textbook, you may use a
PL/SQL-like syntax, or you may use ODBC/JDBC-style dynamic SQL.
Regardless, we are not particularly concerned with correct syntax, and
you can certainly "wing it" on syntax for things like checking for end
conditions. We're interested primarily in your overall algorithm, and
the interaction between programming constructs and SQL.

Problem 2

This problem is a restatement of Exercise 7.2.3 on page 389 of the
textbook. Consider the four operations in parts (a), (b), (c), and
(d) of Exercise 7.2.1 on page 388 of the textbook. For this problem
you only need to consider the English descriptions of the operations -
you need not rewrite them in SQL unless it helps you.

(a) Suppose a transaction T executes the operation
in part (a) of Exercise 7.2.1, and any number of other transactions
may at the same time be executing any number of all four operations.
Briefly describe in English a behavior of T's operation that
can occur if all transactions are running with isolation level
READUNCOMMITTED that cannot occur if all
transactions are running with isolation level SERIALIZABLE.

(b) Same as part (a) except suppose transaction T
executes the operation in part (b) of Exercise 7.2.1.

(c) Same as part (a) except suppose transaction T
executes the operation in part (c) of Exercise 7.2.1.

(d) Same as part (a) except suppose transaction T
executes the operation in part (d) of Exercise 7.2.1.

Problem 3

In this problem you will be graded on simplicity of your examples
as well as correctness.

Consider one of our favorite relational schemas - Product,
PC, Laptop, and Printer - e.g., appearing
in Exercise 6.6.2 of the textbook on the top of page 358.

(a) Give an example of two transactions T1 and
T2 operating on this database. Assume that transaction
T1 has isolation level SERIALIZABLE. (But note that
true serializability is guaranteed only when all transactions
have this isolation level.) For transaction T2, consider
isolation levels READUNCOMMITTED and READCOMMITTED. Design your transactions so that there is some
result allowed when T2 uses READUNCOMMITTED that is not allowed when T2 uses
READCOMMITTED. Assume that transactions
T1 and T2 both commit. More specifically, give:

The two transactions T1 and T2, each
specified as a sequence of one or more SQL statements followed by
commit. Your transactions should be as simple as possible;
e.g., they certainly do not need to involve all of the relations in
the schema.

A simple initial set of data for the relations used by your
transactions (the initial state).

The state of the data after T1 and T2 have both
executed such that this final state could result when T2 uses
READUNCOMMITTED, but could not result when
T2 uses READCOMMITTED. Also specify the
interleaving (ordering) of the statements between the two transactions
that produced this final state.

(b) Do part (a) again, except this time for transaction
T2 consider isolation levels READCOMMITTED
and REPEATABLEREAD, where the final state could
result when T2 uses READCOMMITTED, but
could not result when T2 uses REPEATABLEREAD.

(c) Do part (a) again, except this time for transaction T2
consider isolation levels REPEATABLEREAD and
SERIALIZABLE, where the final state could result when
T2 uses REPEATABLEREAD, but could not
result when T2 uses SERIALIZABLE.

Problem 4

In this problem you will be graded on simplicity of your example as
well as correctness.

Continue with the schema used in Problem 3. Suppose multiple
clients are operating on the database, and all of their transactions
use isolation level SERIALIZABLE. Give a realistic example
where the final state of the database depends on the order in which
concurrent transactions are serialized. Please specify:

An initial state for one or more relations.

Transactions issued by two clients concurrently (one
transaction per client, one or more SQL statements per transaction).

Two different valid final states of the relations, depending
on which transaction executed first.

Problem 5

Consider a set of users U, V, W,
X, and Y. Suppose that user U creates a
relation R(A) and is thus the owner of relation R.
Now suppose the following set of statements is executed in order:

stmt

user

operation

1

U

grant select on R to V,W with grant option

2

V

grant select on R to W

3

W

grant select on R to X,Y

4

U

grant select on R to Y

5

U

revoke select on R from V restrict

6

U

revoke select on R from W cascade

Show the grant diagram after steps 4, 5, and 6. Since all of the
nodes in the diagrams will be for privilege "select on R",
you may omit writing it in each node.

Problem 6

Warning: this is your open-ended question for this assignment,
although we expect that most students can come up with some solution,
and some of you will come up with the best solution.

There's a type of database security that was not covered in class
or in the textbook called "statistical authorization". With
statistical authorization, some users may be permitted to retrieve
only aggregate information from the database, e.g., average salaries
but not individual salaries. Furthermore, to prevent users from
applying aggregates to very small numbers of tuples (such as the
average of one salary!), the system requires that a certain minimum
number of tuples contribute to each aggregate result. Finally, to
prevent the user from using intersection properties to deduce a single
value (e.g., the user could ask for X=sum(A1,A2,A3,A4,A5),
then ask for Y=sum(A2,A3,A4,A5), then compute X-Y
to deduce the value of A1), the system may require that the
tuples participating in multiple queries issued by the same user have
a small intersection. In this problem you will explore how, even with
such security measures, specific information can still be deduced from
the database.

Here's the problem. Consider the simple relation
student(ID,GPA). Suppose that, for security reasons, the
following restrictions are made on user U's set of queries
against this relation:

The result of every query must be a single aggregate
value - a SQL aggregate function applied to one of the attributes of
the relation.

At least 4 different tuples must be used in the aggregate to
produce each query's result.

For any two queries issued by user U, the sets of tuples
used to produce the two query results must have an intersection no
larger than 2.

Assume that student IDs are in the range 1 to 50, and that
attribute GPA is of type float. Give a set of queries that
satisfies the above restrictions, and from whose answers you can
determine the GPA of the student with ID=1. Write the
queries in SQL, then show the computation that produces the GPA for
the student with ID=1 from the query results. Use as few
queries as you can.