8.6 Case Study: Databases

A large collection of information,
such as a company's sales records
or its customer
accounts or its payroll information, is called a database.
An important programming
challenge is determining the proper structure for a database.

In simplest terms, a database is a ``container''
into which
objects are inserted, located, and removed;
the objects that are stored in a database are called
records. An important feature about a record is that
it is uniquely identified by its key, which is
held within the record itself. Here are some examples of records:

A bank's database holds records of accounts. Each account record
is uniquely identified by a multi-letter-and-digit
account number. A typical record would contain information such
as

its key, the multi-digit integer, which identifies the account

the name (or some other identification) of the account's owner

the amount of money held in the account

A library's database holds records of books. Each record has for its key
the book's catalog number. For example, the U.S. Library of Congress
catalog number is a pair: an alphabetic string and a fractional number,
such as QA 76.8.
The records held in a library's database would have these attributes:

the key, which is the book's catalog number

the book's title, author, publisher, and publication date

whether or not the book is borrowed, and if so, by which patron

The U.S. Internal Revenue Service database hold records of taxpayers.
Each record is identified by a nine-digit social-security number.
The record holds the number as its key and also holds the taxpayer's
name, address, and copies of the person's tax reports for the past five
years.

Although the example records just listed differ markedly
in their contents, they share the common feature of possessing a key.
This crucial feature helps us understand the function of a database:

A database is a container that locates records by using the records'
keys as indices.

Compare this concept to that of an array: An array is a container
that locates objects by using integer indices numbered 0, 1, 2, ...,
and so on.
A database is like a ``smart array''
that uses a record's key to save and locate the record.

How can we model and build a general-purpose database in Java?
Here are some crucial concepts:

keys are objects

records are objects, and a record holds as one of its attributes
(the address of) its key object

a database is a kind of ``array'' of record objects; it must have
methods for inserting a record, finding a record, and deleting a record

when the database's user wishes to insert a record into the database,
she calls the databases' insert method, supplying the record as
the argument;
when she wishes to find a record, she calls the find method, supplying
a key object as an argument;
when she wishes to delete a record, the calls the delete method,
supplying a key object as an argument

For example, if we build a database to hold library books, the key
objects will be Library of Congress catalog numbers, and each record
object will hold (the address of) a key object
and information about a book. Such records are inserted, one by one,
into the database. When a user
wishes to find a book in the database, she must supply a key object to
the database's find method and she will receive in return
(the address of) the desired book object; an informal picture of this
situation looks like this:

The picture suggests that the database will operate the same,
regardless of whether books, bank accounts, and so on,
are saved. As long as the records---whatever they are---hold
keys, the database can do its insertions, lookups, and deletions,
by manipulating the records' keys and not the records themselves.
This is strongly reminiscent of arrays, which can hold a variety of
objects without manipulating the objects themselves.

So, how does a database manipulate a key?
Regardless of whether keys are numbers or strings or pairs of items,
keys are manipulated by comparing them for equality.
Consider a lookup operation: The database receives a key object,
and the database searches its collection of records, asking each
record to tell its key, so that each key can be compared
for equality to the desired key. When an equality is found true,
the corresponding record is returned.
This algorithm operates the same whether integers, strings, or
whatever else is used for keys.

In summary,

The Database holds a collection of Record objects,
where each Record holds a Key object.
The remaining structure of the Records is unimportant and
unknown to the database.

The Database will possess insert, find, and
delete methods.

Records, regardless of their internal structure,
will possess a getKey method that returns the Record's
Key object when asked.

Key objects, regardless of their internal structure,
will have an equals
method that compares two Keys for equality and
returns true or false as the answer.

We are now ready to design and build a
database subassembly in Java. We will build a subassembly---not an
entire program---such that the subassembly can be inserted as the
model into a complete application.
We follow the usual stages for design and construction:

State the subassembly's desired behaviors.

Select an architecture for the subassembly.

For each of the architecture's components, specify classes with
appropriate attributes and methods.

Write and test the individual classes.

Integrate the classes into a complete subassembly.

8.6.1 Behaviors

Regardless of whether a database holds bank accounts, tax records,
or payroll information, its behaviors are the same:
a database must be able to insert, locate, and delete records based
on the records' keys.
We plan to write a class Database so that an application
can construct a database object by stating,

Database db = new Database(...);

Then, the application
might insert a record---call it r0---into db
with a method invocation
like this:

db.insert(r0);

As stated earlier,
each record possesses its own key. Say that record
r0 holds object k0 as its key.
To retrieve record r0 from the database, we use a
command like this:

Record r = db.find(k0);

This places the address of record r0 into variable r
for later use.
We can delete the record from the database by stating:

db.delete(k0);

Notice that variable r still holds the address of the record, but
the record no longer lives in the database.

The above behaviors imply nothing about the techniques
that the database uses to store and retrieve records; these activities
are internal to class Database and are best left unknown to the
database's users.

8.6.2 Architecture

The previous examples suggest there are at least three components
to the database's design: the Database itself, the
Records that are inserted into it, and the Keys
that are kept within records and are used to do insertions, lookups,
and deletions.
The class diagram in
Figure 2 lists these components and their dependencies.

There is a new notation in the Figure's class diagram:
The annotation, 1 --> *, on the arrow
emphasizes that one Database collaborates with (or collects)
multiple Records, suggesting that an array will
be useful in the coding of class Database.
As noted earlier, whatever a Record or Key might be,
the methods getKey and equals are required.
(The format of the equals method will be explained momentarily.)

8.6.3 Specifications

To keep its design as general as possible, we will not commit
class Database to saving any particular form of Record---the
only requirement that a database will make of a record
is that a record can be asked for its key.
Similarly, the only requirement a database will make of a key is that
the key can be compared to another key for an equality check.

Since class Database
must hold multiple records, its primary attribute will
be an array of records, and the database will have at least the three methods
listed in Figure 2.

The specification for Record is kept as minimal
as possible: whatever a record object might
be, it has a function,
getKey, that returns the key that uniquely identifies the
record. Similarly,
it is unimportant whether a key is
a number or a string or whatever else;
therefore, we require only that a key
possesses a method, equals,
that checks the equality of itself to another key.

Attempts to insert the record, r, into the database.
Returns true if the record is successfully added, false
otherwise.

find(Key k): Record

Attempts to locate the record whose key has value k. If
successful, the address of the record is returned, otherwise,
null is returned.

delete(Key k): boolean

Deletes the record whose key has value k. If successful,
true is returned; if no record has key k,
false is returned.

Record

a data item that can be stored in a database

Methods

getKey(): Key

Returns the key that uniquely identifies the record.

Key

an identification, or ``key,'' value

Methods

equals(Key m): boolean

Compares itself to another key, m, for equality. If this key
and m are
same key value, then true is returned;
if m is a different key value, then false
is returned.

ENDTABLE===========================================================

Because we have provided partial (incomplete) specifications for
Record and Key, many different classes might
implement the two specifications. For example, we might write
class Book to implement a Record so that we
can build a database of books, or we might
write class BankAccount to implement a database of bank accounts.
Different classes of keys might also be written, if only because
books use different keys than do bank accounts.

Key's specification deserves a close look:
the specification is written as if keys are
objects (and not mere ints).
For this reason, given two Key objects, K1
and K2, we must
write K1.equals(K2) to ask if the two
keys have the same value.
(This is similar to writing
S1.equals(s2) when comparing two strings, S1 and
S2, for equality.)
We exploit this generality in the next section.

8.6.4 Implementation

The specifications for Record and Key make it possible to
write a complete coding for class Database without knowing
any details about the codings for the records and keys.
Let's consider the implementation of class Database.

The database's primary attribute is an array that will hold the
inserted records. class Database must contain this field
declaration:

private Record[] base;

The constructor method for the class will initialize the field to
an array:

base = new Record[HOW_MANY_RECORDS];

where all the array's elements have value null, because the
array is empty.
Records will be inserted into the database one by one.
To do an insert(Record r), follow this algorithm:

Search array base to see if r is present.
(More precisely, search base to see if a record
with the same key as r's key is already present.)

If r is not in base, then search for the first
element in base that is empty (that is, holds value null).

Insert r into the empty element.

Each of the
algorithm's three steps requires more refinement: To fill in details
in the first step,
say that we write a helper method,
findLocation, which searches the array for a record whose
key equals k.
The helper method might be specified like this:

/** findLocation is a helper method that searches base for a record
* whose key is k. If found, the array index of the record within
* base is returned, else -1 is returned. */
private int findLocation(Key k)

Then, Step 1 of the algorithm is merely,

if ( findLocation(r.getKey()) == -1 )

because r.getKey() extracts the key held within record r,
and a result of -1 from findLocation means that
no record with the same key is already present.

Step 2 of the algorithm is clearly a searching loop, and we use the
techniques from Chapter 7 to write this loop, which searches for the first
empty element in base where a new record can be inserted:

When this loop completes, i holds the index of the first empty
element in base, meaning that
Step 3 is just base[i] = r,
unless array base is completely filled with
records and there is no available space. What should we
do in the latter situation?

Because Java arrays are objects, it is possible to construct a
new array object that is larger than the current array and copy
all the elements from the current array to the new array. Here is
a standard technique for doing so:

The last assignment, base = temp, copies the address of the
larger array into array variable base, meaning that
base once again holds the address of an array of records.

BeginFootnote: If you have studied the Java libraries,
perhaps you discovered class Vector,
which behaves like an array but automatically expands to a greater
length when full.
The technique that a Java Vector uses to
expand is exactly the one presented above.
EndFootnote.

Figure 4 displays the completed
version of insert.

Next, we consider how to delete an element from the database:
The algorithm for method, delete(Key k), would go,

Search array base to see if
if a record
with the key, k, is present.

If such a record is located, say, at element index,
then delete it by assigning, base[index] = null.

We use the helper method, findLocation, to code Step 1.
We have this coding:

We can write the lookup method so that it merely
asks findLocation to find the desired record in the array.
Again, see Figure 4.

To finish, we must write the findLocation method, which
finds the record in array base whose key is k.
The algorithm is a standard searching loop, but there
is a small complication, because array base might have null
values appearing in arbitrary places, due to deletions of previously
inserted records:

if ( base[i] != null // is this array element occupied?
&& base[i].getKey().equals(k) ) // is it the desired record?
{ ... } // we found the record at array element, i
else { i = i + 1; } // the record is not yet found; try i + 1 next

The test expression first asks if there is a record stored in element,
base[i], and if the answer is true, then the element's key
(namely, base[i].getKey()) is compared for equality to
the desired key, k.

The completed Database class appears in Figure 4.
In addition to attribute base, we define the variable,
NOT_FOUND, as a memorable name for the -1 answer
used to denote when a search for a record
failed.

Although class Database appears to store records based on their keys,
a more primitive structure,
an array, is used inside the class to hold the records.
The helper method, findLocation, does the hard work of using
records' keys as if there were ``indices.''

Aside from the getKey and equals methods,
nothing is known about the records and keys saved in the database.
This makes class Database usable in a variety of applications,
we see momentarily.

Because the array of records can be filled, we use a standard technique
within the insert method to build a new, larger array when needed.

8.6.5 Forms of Records and Keys

When we use class Database to hold records, we must write a
class Record and a class Key. The contents of these
classes depends of course on the application that requires the database,
but we know from Table 3 that class Record must include a
getKey method and class Key must include an equals
methods. Figure 5 shows one such implementation: a record that models
a simple bank account and a key that is merely a single integer value.

The Record in Figure 5 has additional methods that let us
do deposits and check balances of a bank account, but the all-important
getKey method is present, meaning that the record can be
used with class Database of Figure 4.

In order to conform to the requirements demanded by class Database,
the integer key must be embedded within a class Key.
This means the integer is saved as a private field within
class Key and that the equals method must be written
so that it asks another key for its integer attribute, by means of an
extra method, getInt.

Here is how we might use the classes in Figure 5 in combination with
Figure 4. Perhaps we are modelling a bank, and we require this database:

Database bank = new Database(1000);

When a customer opens a new account, we might ask the customer to
select an integer key for the account and make an initial deposit:

To show that the database can be used in a completely different application,
we find in Figure 6 a new coding of record and key, this time for
library books. Now, class Record holds attributes for a book's
title, author, publication date, and catalog number; the catalog number
serves
as the book's key.

The structure of the catalog number is more complex: Its class Key
holds a string and a double, because we are using the U.S. Library of
Congress coding for catalog numbers, which requires
a string and a fractional number. The class's
equals method compares the strings and fractional numbers of two
keys.

Here is a short code fragment that constructs a database for a library
and inserts a book into it:

Database library = new Database(50000);
Record book = new Book( new Key("QA", 76.8), "Charles Dickens",
"Great Expectations", 1860 );
library.insert(book);
// We might locate the book this way:
Key lookup_key = new Key("QA", 76.8);
book = library.find(lookup_key);
// We can delete the book, if necessary:
boolean deleted = library.delete(lookup_key);

As noted by the statement, Key lookup_key = new Key("QA", 76.8),
we can manufacture keys as needed to perform lookups and deletions.

It is a bit unfortunate that the bank account record in Figure 5 was
named class Record and that the book record in Figure 6 was
also named class Record; more descriptive names, like
class BankAccount and class Book would be far more
appropriate and would let us include both classes in the same
application if necessary. (Perhaps a database must store both
bank accounts and books together, or perhaps one single
application must construct
one database for books and another for bank accounts.)

Of course, we were forced to use the name, class Record,
for both records because of the coding for class Database
demanded it.
The Java language lets us repair this naming problem with
a new construction, called a Java interface. We will return
to the database example in the next chapter and show how to use
a Java interface with class Database to resolve this
difficulty.

Exercise

Write an application that uses class Database and classes
Record and Key in Figure 5 to help
users
construct new bank accounts and do deposits on them.