Chapter 1
The Worlds of Database
Systems
Databases today are essential to every business. They are used to maintain
internal records, to present data to customers and clients on the Mbrld-Wide-
Web, and to support many other commercial processes. Databases are likewise
found at the core of many scientific investigations. They represent the data
gathered by astronomers, by investigators of the human genome, and by bio-
chemists exploring the medicinal properties of proteins, along with many other
scientists.
The power of databases comes from a body of knowledge and technology
that has developed over several decades and is embodied in specialized soft-
ware called a database rnarlngement system, or DBAlS, or more colloquially a
.'database system." .\ DBMS is a powerful tool for creating and managing large
amounts of data efficiently and allowing it to persist over long periods of time,
safely. These s\-stems are among the most complex types of software available.
The capabilities that a DBMS provides the user are:
1. Persistent storage. Like a file system, a DBMS supports the storage of
very large amounts of data that exists independently of any processes that
are using the data. Hoxever, the DBMS goes far beyond the file system in
pro~iding flesibility. such as data structures that support efficient access
to very large amounts of data.
2. Programming ~nterface.. DBMS allo~vs user or an application pro-
I the
gram to awes> and modify data through a pon-erful query language.
Again, the advantage of a DBMS over a file system is the flexibility to
manipulate stored data in much more complex ways than the reading and
writing of files.
3. Transaction management. A DBMS supports concurrent access to data,
i.e.: simultaneous access by many distinct processes (called "transac-
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

CHAPTER 1. THE WORLDS OF DATABASE SYSTE&fs 1.l. THE EVOLUTION OF DATABASE SI'Sl'E-$.IS 3
tions") at once. To avoid some of the undesirable consequences of si- The first important applications of DBMS's were ones where data was com-
multaneous access, the DBMS supports isolation, the appearance that posed of many small items, and many queries or modification~
were made. Here
transactions execute one-at-a-time, and atomicity, the requirement that are some of these applications.
transactions execute either completely or not at all. A DBMS also sup-
ports durability, the ability to recover from failures or errors of many Airline Reservations Systems
types.
In this type of system, the items of data include:
1.1 The Evolution of Database Systems 1. Reservations by a single customer on a single flight, including such infor-
mation as assigned seat or med preference.
What is a database? In essence a database is nothing more than a collection of
information that exists over a long period of time, often many years. In common 2. Information about flights - the airports they fly from and to, their de-
parlance, the term database refers to a collection of data that is managed by a parture and arrival times, or the aircraft flown, for example.
DBMS. The DBMS is expected to: 3. Information about ticket prices, requirements, and availability.
1. Allow users to create new databases and specify their schema (logical Typical queries ask for flights leaving around a certain time from one given
structure of the data), using a specialized language called a data-definition city to another, what seats are available, and at what prices. Typical data
language. modifications include the booking of a flight for a customer, assigning a seat, or
2. Give users the ability to query the data (a "query" is database lingo for indicating a meal preference. Many agents will be accessing parts of the data
a question about the data) and modify the data, using an appropriate at any given time. The DBMS must allow such concurrent accesses, prevent
language, often called a query language or data-manipulation language. problems such as two agents assigning the same seat simultaneously, and protect
against loss of records if the system suddenly fails.
3. Support the storage of very large amounts of data - many gigabytes or
more - over a long period of time, keeping it secure from accident or Banking Systems
unauthorized use and allowing efficient access to the data for queries and
database modifications. Data items include names and addresses of customers, accounts, loans, and their
balances, and the connection between customers and their accounts and loans,
4. Control access to data from many users at once, without allo~vingthe e.g., who has signature authority over which accounts. Queries for account
actions of one user to affect other users and without allowing sin~ultaneous balances are common, but far more common are modifications representing a
accesses to corrupt the data accidentally. single payment from, or deposit to, an account.
.Is with the airline reservation system, we expect that many tellers and
1.1.1 Early Database Management Systems customers (through AT11 machines or the Web) will be querying and modifying
the bank's data at once. It is \-ital that simultaneous accesses to a n account not
The first commercial database management systems appeared in the late 1960's. cause the effect of a transaction to be lost. Failures cannot be tolerated. For
These systems evolved from file systems, which provide some of item (3) above; example, once the money has been ejected from an ATJi machine, the bank
file systems store data over a long period of time, and they allow the storage of must record the debit, even if the po~ver immediately fails. On the other hand,
large amounts of data. However, file systems do not generally guarantee that it is not permissible for the bank to record the debit and then not deliver the
data cannot be lost if it is not backed up, and they don't support efficient access money if the po~x-er fails. The proper way to handle this operation is far from
to data items whose location in a particular file is not known. ob~ious and can he regarded as one of the significant achievements in DBlIS
Further: file systems do not directly support item (2), a query language for architecture.
the data in files. Their support for (1) - a schema for the data - is linlited to
the creation of directory structures for files. Finally, file systems do not satisfy
C o r p o r a t e Records
(4). When they allow concurrent access to files by several users or processes,
a file system generally will not prevent situations such as two users modifying llany early applications concerned corporate records, such as a record of each
the same file at about the same time, so the changes made by one user fail to sale, information about accounts payable and recei~able, information about
or
appear in the file. employees - their names, addresses: salary, benefit options, tax status, and
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

4 CHAPTER 1. THE WORLDS OF DATABASE SYSTEMS .I. THE EVOLUTION OF D.4TABASE SYSTEMS 5
so on. Queries include the printing of reports such as accounts receivable or Heading the columns are the three attributes: accountNo, balance, and type.
employees' weekly paychecks. Each sale, purchase, bill, receipt, employee hired, Below the attributes are the rows, or tuples. Here we show two t.uples of the
fired, or promoted, and so on, results in a modification to the database. relation explicitly, and the dots below them suggest that there would be many
more tuples, one for each account at the bank. The first tuple says that account
The early DBMS's, evolving from file systems, encouraged the user to visu- number-12345 has a balance of one thousand dollars, and it is a savings account.
alize data much as it was stored. These database systems used several different The second tuple says that account 67890 is a checking account wit11 $2846.92.
data models for describing the structure of the information in a database, chief Suppose we wanted to know the balance of account 67690. We could ask
among them the "hierarchical" or tree-based model and the graph-based "net- this query in SQL as follows:
work" model. The latter was standardized in the late 1960's through a report
of CODASYL (Committee on Data Systems and Languages).' SELECT balance
A problem with these early models and systems was that they did not sup- FROM Accounts
port high-level query languages. For example, the CODASYL query language WHERE accountNo = 67890;
had statements that allowed the user to jump from data element to data ele-
ment, through a graph of pointers among these elements. There was consider- For another example, we could ask for the savings accounts with negative bal-
able effort needed to write such programs, even for very simple queries. ances by:
SELECT accountNo
1.1.2 Relational Database Systems FROM Accounts
Following a famous paper written by Ted Codd in 1970,2 database systems WHERE type = 'savings' AND balance < 0;
changed significantly. Codd proposed that database systems should present
the user with a view of data organized as tables called relations. Behind the We do not expect that these two examples are enough to make the reader an
scenes, there might be a complex data structure that allowed rapid response to expert SQL programmer, but they should convey the high-level nature of the
a variety of queries. But, unlike the user of earlier database systems, the user of SQL "select-from-where" statement. In principle, they ask the DBMS to
a relational system would not be concerned with the storage structure. Queries
1. Examine all the tuples of the relation Accounts mentioned in the FROM
could be expressed in a very high-level language, which greatly increased the
clause,
efficiency of database programmers.
We shall cover the relational model of database systems throughout most 2. Pick out those tuples that satisfy some criterion indicated in the WHERE
of this book, starting with the basic relational concepts in Chapter 3. SQL clause, and
("Structured Query Language"), the most important query language based on
the relational model, will be covered starting in Chapter 6. However, a brief 3. Produce as an answer certain attributes of those tuples, as indicated in
introduction to relations will give the reader a hint of the simplicity of the the SELECT clause.
model, and an SQL sample will suggest how the relational model promotes
queries written at a very high level, avoiding details of "navigation" through In practice. the system must "optimize" the query and find an efficient way to
the database. ansn-er the query, even though the relations in~olred the query may be rery
in
large. 0
Example 1.1: Relations are tables. Their columns are headed by attributes,
which describe the entries in the column. For instance, a relation named By 1990. relational database systems were the norm. Yet the database field
Accounts, recording bank accounts, their balance, and type might look like: continues to evolve. and new issues and approaches to the management of data
surface regularlj-. In the balance of this section, we shall consider some of the
accountNo I balance I type modern trends in database systems.
12345
67890
1.1.3 Smaller and Smaller Systems
Originally, DBJIS's were large, expensive softn-are systems running on large
'GODASYL Data Base Task Group April 1971 Report, ACM, New York.
'Codd, E. F., "A relational model for large shared data banks," Comrn. ACM, 13:6, computers. The size was necessary, because to store a gigabyte of data required
pp. 377-387, 1970. a large computer system. Today, many gigabytes fit on a single disk, and
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

6 CHAPTER 1. THE WORLDS OF DATABASE SYSTEMS 1.1. T H E EVOLUTION OF DATABASE ST7STEhIS 7
it is quite feasible to run a DBMS on a personal computer. Thus, database through index structures, which we shall mention in Section 1.2.2 and cover
systems based on the relational model have become available for even very small extensively in Chapter 13. Another way to process more data in a given time
machines, and they are beginning to appear as a common tool for computer is to use parallelism. This parallelism manifests itself in various ways.
applications, much as spreadsheets and word processors did before them. For example, since the rate a t which data can be read from a given disk is
fairly low, a few megabytes per second, we can speed processing if we use many
1.1.4 Bigger and Bigger Systems disks and read them in parallel (even if the data originates on tertiary storage,
it is "cached on disks before being accessed by the DBMS). These disks may
On the other hand, a gigabyte isn't much data. Corporate databases often be part of an organized parallel machine, or they may be components of a
occupy hundreds of gigabytes. Further, as storage becomes cheaper people distributed system, in which many machines, each responsible for a part of the
find new reasons to store greater amounts of data. For example, retail chains database, communicate over a high-speed network when needed.
often store terabytes (a terabyte is 1000 gigabytes, or 101%ytes) of information
recording the history of every sale made over a long period of time (for planning Of course, the ability to move data quickly, like the ability to store large
inventory; we shall have more to say about this matter in Section 1.1.7). amounts of data, does not by itself guarantee that queries can be answered
Further, databases no longer focus on storing simple data items such as quickly. We still need to use algorithms that break queries up in ways that
integers or short character strings. They can store images, audio, video, and allow parallel computers or networks of distributed computers to make effective
I
many other kinds of data that take comparatively huge amounts of space. For use of all the resources. Thus, parallel and distributed management of very large
! databases remains an active area of research and development; we consider some
instance, an hour of video consumes about a gigabyte. Databases storing images i
from satellites can involve petabytes (1000 terabytes, or 1015 bytes) of data. I of its important ideas in Section 15.9.
Handling such large databases required several technological advances. For
example, databases of modest size are today stored on arrays of disks, which are
called secondary storage devices (compared to main memory, which is "primary"
storage). One could even argue that what distinguishes database systems from 1.1.5 Client-Server and Multi-Tier Architectures
other software is, more than anything else, the fact that database systems
routinely assume data is too big to fit in main memory and must be located Many varieties of modern software use a client-server architecture, in which
primarily on disk at all times. The following two trends allow database systems requests by one process (the client) are sent to another process (the server) for
to deal with larger amounts of data, faster. execution. Database systems are no exception, and it has become increasingly
common to divide the work of a DBMS into a server process and one or more
Tertiary Storage client processes.
In the simplest client-server architecture, the entire DBMS is a server, except
The largest databases today require more than disks. Several kinds of tertiary for the query interfaces that interact with the user and send queries or other
storage devices have been developed. Tertiary devices, perhaps storing a tera- commands across to the server. For example, relational systems generally use
byte each, require much more time to access a given item than does a disk. the SQL language for representing requests from the client to the server. The
While typical disks can access any item in 10-20 milliseconds, a tertiary device database server then sends the answer, in the form of a table or relation, back
may take several seconds. Tertiary storage devices involve transporting an to the client. The relationship between client and server can get more complex,
object, upon which the desired data item is stored, to a reading device. This especially when answers are extremely large. We shall have more to say about
movement is performed by a robotic conveyance of some sort. this matter in Section 1.1.6.
For example, compact disks (CD's) or digital versatile disks (DVD's) may
be the storage medium in a tertiary device. An arm mounted on a track goes There is also a trend to put more work in the client, since the server will
to a particular disk, picks it up, carries it to a reader, and loads the disk into be a bottleneck if there are many simultaneous database users. In the recent
the reader. proliferation of system architectures in which databases are used to provide
dynamically-generated content for Web sites, the two-tier (client-server) archi-
Parallel Computing tecture gives way to three (or even more) tiers. The DBMS continues to act
as a server, but its client is typically an application server, which manages
The ability to store enormous volumes of data is important, but it would be connections to the database, transactions, authorization, and other aspects.
of little use if we could not access large amounts of that data quickly. Thus, -4pplication servers in turn have clients such as Web servers, which support
very large databases also require speed enhancers. One important speedup is end-users or other applications.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

8 CHAPTER 1 . THE I,VORLDS OF DATABASE SE'STE3,fS 1.2. OVERVIE IV OF d DATABASE M.4NAGEkfEhrT SYSTEM 9
1.1.6 Multimedia Data line orders. .4 large company has many divisions. Each division may have built
its own database of products independently of other divisions. These divisions
Another important trend in database systems is the inclusion of multimedia nlav use different DBlIS's, different structures for information. perhaps even
data. By "multimedia" we mean information that represents a signal of some different terns to mean the same thing or the same term to mean different
sort. Common forms of multimedia data include video, audio, radar signals, things.
satellite images, and documents or pictures in various encodings. These forms
have in cornmon that they are much larger than the earlier forms of data - Example 1.2: Imagine a company with several divisions that manufacture
integers, character strings of fixed length, and so on - and of vastly varying disks. One division's catalog might represent rotation rate in revolutions per
size. second, another in revolutions per minute. Another might have neglected to
The storage of multimedia data has forced DBMS's to expand in several represent rotation speed at all. .-I division manufacturing floppy disks might
ways. For example, the operations that one performs on multimedia data are refer to them as "disks," while a division manufacturing hard disks might call
not the simple ones suitable for traditional data forms. Thus, while one might thein "disks" as well. The number of tracks on a disk might be referred to as
search a bank database for accounts that have a negative balance, comparing "tracks" in one division, but "cylinders" in another.
each balance with the real number 0.0, it is not feasible to search a database of
pictures for those that show a face that "looks like" a particular image. Central control is not always the answer. Divisions may have invested large
To allow users to create and use complex data operatiorls such as image- . amounts of money in their database long before information integration across
processing, DBMS's have had to incorporate the ability of users to introduce d- .-
lrlsions was recognized as a problem. A division may have been an itide-
functions of their own choosing. Oftcn, the object-oriented approach is used pendent company. recently acquired. For these or other reasons. these so-called
for such extensions, even in relational systems, which are then dubbed "object- legacy databases cannot be replaced easily. Thus, the company must build some
relational." We shall take up object-oriented database programming in various structure on top of tlie legacy databases to present to customers a unified view
places, including Chapters 4 and 9. of products across the company.
The size of multimedia objects also forces the DBXIS to rnodify tlie storage One popular approach is the creation of data warehouses. ~vhereinforrnatiorl
manager so that objects or tuples of a gigabyte or more can be accommodated. from many legacy databases is copied. with the appropriate translation, to a
Among the many problems that such large elements present is the delivery of ccritral database. -4s the legacy databases change. the warehouse is updated,
answers to queries. In a conventional, relational database, an answer is a set of hut not necessarily instantaneously updated. .A common scheme is for the
tuples. These tuples would be delivered to the client by the database server as warehouse to be reconstructed each night, when the legacy databases are likely
a whole. to be less bus^
However, suppose the answer to a query is a video clip a gigabyte long. It is The legacy databases are thus able to continue serving the purposes for
not feasible for the server to deliver the gigabyte to the cllent as a whole. For which they Tvere created. Sew functions, such as providing an on-line catalog
one reason it takes too long and will prevent the server from handling other service through the \leb. are done at the data warehouse. \Ye also see data
requests. For another. the client may want only a small part of the fill11 clip, warehouses serving ~iceds planning and analysis. For example. r o m p a y an-
for
but doesn't have a way to ask for exactly what it wants ~vithoutseeing the alysts may run queries against the warehouse looking for sales trends, in order
initial portion of the clip. For a third reason, even if the client wants the whole to better plan inventory and production. Data mining, the search for interest-
clip, perhaps in order to play it on a screen, it is sufficient to deliver the clip at ing and unusual patterns in data, has also been enabled by the construction
a fised rate over the course of an hour (the amount of time it takes to play a of data ~varel~ouses. there are claims of enhanced sales through exploita-
and
gigabj te of compressed video). Thus. the storage system of a DBXS supporting tion of patterns disrovered in this n-ay. These and other issues of inforlnation
multinledia data has to be prepared to deliver answcrs in an interactive mode. integration are discussed in C h a p t c ~
20.
passing a piece of the answer to tlie client on r~qucst at a fised rate.
or
1.2 Overview of a Database Management
1.1.7 Information Integration
System
As information becomes ever more essential in our work and play, T e find that
v
esisting information resources are being used in Inany new ways. For instance. In Fig. 1.1n-e see an outline of a complete DBMS. Single boxes represent system
consider a company that wants to provide on-line catalogs for all its products. so components. while double boses represent in-memory data structures. The solid
that people can use the World Wide 1Ti.b to hrolvse its products and place on- lines indicate control and data flow, while dashed lines indicate data flow only.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

10 CK4PTER 1. THE IVORLDS OF DATABASE SYSTEMS OVERVIE \V OF A DATABASE ~~..IIVAGEI\~EIVTT E J f
SYS 11
Since the diagram is complicated, we shall consider the details in several stages.
First, at the top, we suggest that there are two distinct sources of commands
to the DBMS:
1. Conventional users and application programs that ask for data or modify
data.
Database
2. A database administrator: a person or persons responsible for the struc- administrator
ture or schema of the database.
1.2.1 Data-Definition Language Commands
The second kind of command is the simpler to process, and we show its trail
beginning a t the upper right side of Fig. 1.1. For example, the database ad-
ministrator, or DBA, for a university registrar's database might decide that
there should be a table or relation with columns for a student, a course the
student has taken, and a grade for that student in that course. The DBX'
might also decide that the only allowable grades are A, B, C, D, and F. This
structure and constraint information is all part of the schema of the database. index,
It is shown in Fig. 1.1 as entered by the DBB, who needs special authority
to execute schema-altering commands, since these can have profound effects
on the database. These schema-altering DDL commands ("DDL," stands for
"data-definition language") are parsed by a DDL processor and passed to the
execution engine, which then goes through the index/file/record manager to
alter the metadata, that is, the schema information for the database.
me
com~nand~ I data, ',
mefadata, ,
indexes
\,
'. , ,,T
;
Buffer
manager
1.2.2 Overview of Query Processing
The great majority of interactions with the DBMS follo\v the path on the left Pages
side of Fig. 1.1. A user or an application program initiates some action that
does not affect the schema of the database, but may affect the content of the Storage
database (if the action is a modification command) or will extract data from manager
the database (if the action is a query). Remember from Section 1.1 that the
language in which these commands are expressed is called a data-manipulation
language (DML) or somewhat colloquially a query language. There are many
data-manipulation languages available, but SQL, which \\*as mentioned in Es-
ample 1.1, is by far the most commonly used. D l I L statements are handled by
two separate subsystems. as follo\vs.
u Storage
Figure 1.1: Database ~nanagenicntsystem components
Answering the query
The query is parsed and optimized by a querg compiler. The resulting gilery
plan, or sequence of actions the DBMS will perform to answer the query, is
passed to the execution engine. The execution engine issues a sequence of
requests for small pieces of data, typically records or tuples of a relation, to a
resource manager that knows about data Eles (holding relations), the format
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

CHAPTER 1. THE I4'ORLDS O F DATABASE SYSTEJIS 0 VER1,TETV O F A DATA BASE M.4.V-4 GEAIEXT SYSTEM 13
and size of records in those files, and index files, which help find elements of transferred. Thus, all DBMS components that need information from the disk
data files quickly. will interact with the buffers and the buffer manager, either directly or through
The requests for data are translated into pages and these requests are passed the execution engine. The kinds of information that various components may
to the bufler manager. We shall discuss the role of the buffer manager in need include:
Section 1.2.3, but briefly, its task is to bring appropriate portions of the data
from secondary storage (disk, normally) where it is kept permanently, to main- 1. Data: the contents of the dcitabaseitself.
memory buffers. Kormally, the page or "disk block" is the unit of transfer
between buffers and disk. 2. Metadata: the database schema that describes the structure of, and con-
The buffer manager communicates with a storage manager to get data from straints on, the database.
disk. The storage manager might involve operating-system commands, but 3. Statistics: information gathered arid stored by the DBMS about data
more typically, the DBMS issues commands directly to the disk controller. properties such as the sizes of, and values in, various relations or other
components of the database.
Transaction processing
4. Indexes: data structures that support efficient access to the data.
Queries and other DML actions are grouped into transactions, which are units
that must be executed atomically and in isolation from one another. Often each
query or modification action is a transaction by itself. In addition, the execu-
. -1more complete discussion of the buffer manager and its role appears in Sec-
tion 15.7.
tion of transactions must be durable, meaning that the effect of any completed
transaction must be preserved even if the system fails in some way right after
completion of the transaction. U7e divide the transaction processor into two 1.2.4 Transaction Processing
major parts: It is normal to group one or more database operations into 3 transaction, which
1. A concurrency-control manager, or scheduler, responsible for assuring is a unit of work that must be executed atomically and in apparent isolation
from other transactions. In addition: a DBMS offers the guarantee of durability:
atomicity and isolation of transactions, and
that the n-ork of a conlpletccl transaction will never be lost. The transaction
2. A logging and recovery manager, responsible for the durability of trans- manager therefore accepts transaction commands from an application, which
actions. tell the transaction manager when transactions begin and end, as \veil as infor-
mation about the expcctations of the application (some may not wish to require
We shall consider these component,s further in Section 1.2.4. atomicit? for example). The transaction processor performs the follo~ving tasks:
1.2.3 Storage and Buffer Management 1. Logging: In order to assure durability. every change in the database is
logged separately on disk. Thc log manager follo~vs of several policies
one
The data of a database normally resides in secondary storage; in today's com-
designed to assure that no matter \\-hen a system failure or ..crash" occurs,
puter systems "secondary storage" generally means magnetic disk. However. to
a recovery manager will be able to examine the log of changes and restore
perform any useful operation on data, that data must be in main memory. It
the database to some consistent state. The log manager initially writes
is the job of the storage manager to control the placement of data on disk and the log in buffers ant1 negotiates ~vitli buffer manager to make sure that
the
its movement between disk and main memory.
buffers are 11-rittcnto disk (where data can survive a crash) a t appropriate
In a simple database system. the storage manager might be nothing more
times.
than the file system of the underlying operating system. Ho~vever. efficiency
for
purposes, DBlIS's normally control storage 011 the disk directly at least under 2. Concurrerjcy control: Transactions must appear to execute in isolation.
some circumstances. The storage manager keeps track of the locatioil of files on But in iliost systems. there will in truth be niany transactions executing
the disk and obtains the block or blocks containing a file on request from the at once. Thus. the scliedt~ler(concurrency-control manager) lilust assure
buffer manager. Recall that disks are generally divided into disk blocks. which that the individual actions of multiple transactions are executed in such
are regions of contiguous storage containing a large number of bytes, perhaps an order that the net effect is the same as if the transactions had in
212 or 2'' (about 4000 to 16,000 bytes). fact executed in their entirety. one-at-a-time. A typical scheduler does
The buffer manager is responsible for partitioning the available main mem- its n-ork by maintaining locks on certain pieces of the database. These
ory into buffers, which are page-sized regions into which disk blocks can be locks prevent t ~ wtransactions from accessing the same piece of data in
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

14 CHAPTER 1. THE 'IVORLDS OF DATABASE SYSTE-4tS 1.3. OL7TLISE OF DATABASE-SYSTEAI STUDIES 15
"relational algebra" operations, which are discussed in Section 5.2. The
The ACID Properties of Transactions query compiler consists of three major units:
Properly implemented transactions are commonly said to meet the ".\CID (a) A query parser, which builds a tree structure from the textual form
test," where: of the query.
"A" stands for "atomicity," the all-or-nothing execution of trans- (b) A query preprocessor, which performs semantic checks on the query
actions. (e.g.; making sure all relations mentioned by the query actually ex-
ist), and performing some tree transformations to turn the parse tree
"I" stands for "isolation," the fact that each transaction must appear into a tree of algebraic operators representing the initial query plan.
to be executed as if no other transaction is executing at the same
time. (c) - query optimizer, which transforxns the initial query plan into the
1
best available sequence of operations on the actual data.
"D" stands for "durability," the condition that the effect on the
database of a transaction must never be lost, once the transaction The query compiler uses metadata and statistics about the data to decide
has completed. which sequence of operations is likely to be the fastest. For example, the
existence of an index, which is a specialized data structure that facilitates
'
The remaining letter, "C," stands for "consistency." That is, all databases access to data, given values for one or more components of that data, can
have consistency constraints, or expectations about relationships among make one plan much faster than another.
data elements (e.g., account balances may not be negative). Transactions
are expected to preserve the consistency of the database. We discuss the 2. The execution engzne, which has the responsibility for executing each of
expression of consistency constraints in a database scherna in Chapter 7, the steps in the chosen query plan. The execution engine interacts with
while Section 18.1begins a discussion of how consistency is maintained by most of the other components of the DBMS, either directly or through
the DBMS. the buffers. It must get the data from the database into buffers in order
to manipulate that data. It needs to interact with the scheduler to avoid
accessing data that is locked, and \\-it11the log manager to make sure that
all database changes are properly logged.
ways that interact badly. Locks are generally stored in a main-memory
lock table, as suggested by Fig. 1.1. The scheduler affects the esecution of
queries and other database operations by forbidding the execution engine
from accessing locked parts of the database. 1.3 Outline of Database-System Studies
3. Deadlock resohtion: As transactions compete for resources through the
locks that the scheduler grants, they can get into a situation where none Ideas related to database systems can be divided into three broad categories:
can proceed because each needs something another transaction has. The
transaction manager has the responsibility to inter~ene and cancel (-roll- 1. Design of databases. How does one develop a useful database? What kinds
back" or "abort") one or more transactions to let the others proceed. of information go into the database? How is the information structured?
What assumptions arc made about types or values of data items? How
do data items connect?
1.2.5 The Query Processor 2. Database progrcsm~ning.Ho\v does one espress queries and other opera-
The portion of the DBUS that most affects the performance that the user sees tions on the database? How does one use other capabilities of a DBMS,
is the query processor. In Fig. 1.1 the query processor is represented b tn-o
!- such as transactions or constraints, in an application? How is database
Components: progran~ming combined xith conventional programming?
1. The query compiler. which translates the query into an internal form called 3. Database system implementation. How does one build a DBMS, including
a query plan. The latter is a sequence of operations to be performed on such matters as query processing. transaction processing and organizing
the data. Often the operations in a query plan are implementations of storage for efficient access?
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.