CS186 - Introductionto Database Systems

- Introduction to
Database Systems
Fall Semester 2013
Prof. Michael Franklin
“Knowledge is of two kinds: we
know a subject ourselves, or we
know where we can find
information upon it.”
-- Samuel Johnson (1709-1784)

• We are at capacity (room, section & GSI/TA)
actually oversubscribed – some attrition built in
about 40 people from the waitlist were let in yesterday
• Wait list will be processed in order, but only as (if)
people drop the course.
The further down the list you are the worse your odds
This happens for the next week and a half or so
• We won’t be able to let in Concurrent Enrollment
Students
• A (smaller) offering of CS 186 is scheduled for next
semester
Taught by visiting DB professor from Oxford
• Michael-David Sasson (CS office) handles it from
here

•A data model is a collection of concepts for
describing data.
•A schema is a description of a particular
collection of data, using a given data model.
•The relational model of data is the most widely
used model today.
•Main concept: relation, basically a table with rows
and columns.
•Every relation has a schema, which describes the
columns, or fields.

[The Relational Model] provides a basis for a
high level data language which will yield
maximal independence between programs on
the one hand and machine representation on
the other. (E.F. Codd, 1981 Turing Award winner)

Relational DataBase Management
Systems were invented to let you use
one set of data in multiple ways,
including ways that are unforeseen at
the time the database is built and the
1st applications are written.
(Curt Monash, analyst/blogger)
That is, think about the data independently of any
particular program.

• “Declarative” Queries & Data Independence
Say what you want, not how to get it
• Help avoiding data corruption
• Protection from other users/jobs/queries
As if you are the only person accessing the DB
• Fault Tolerance and Durability
Database is protected even if failures occur in the
middle of processing
• Usually a bunch of tools and interfaces for
building applications

Design patterns for dealing with (Big) Data
When, why and how to structure your data
How Oracle and (a bit of) Google work
SQL ... and (a bit of) noSQL
Managing concurrency
Fault tolerance and Recovery
• Useful concepts for Computer Science in
general
and other sciences and endeavors as well!

The SQL Query Language
• The most widely used relational query language.
• Originally IBM, then ANSI in 1986
• Current standard is SQL-2011
• 2008 added x-query stuff, new triggers,…
• 2003 was last major update: XML, window functions,
sequences, auto-generated IDs. Not fully supported yet
• SQL-1999 Introduced “Object-Relational” concepts.
• Also not fully supported yet.
• SQL92 is a basic subset
• Most systems support at least this
• PostgreSQL has some “unique” aspects (as do most
systems).
• SQL is not synonymous with Microsoft’s “SQL Server”

Creating Relations in SQL
• Creates the Students relation.
Note: the type (domain) of each field is specified,
and enforced by the DBMS whenever tuples are
added or modified.
CREATE TABLE Students
(sid CHAR(20),
name CHAR(20),
login CHAR(10),
age INTEGER,
gpa FLOAT)

Primary Keys
• A set of fields is a superkey if:
No two distinct tuples can have same values in all key fields
• A set of fields is a key for a relation if :
It is a superkey
No subset of the fields is a superkey
• what if >1 key for a relation?
One of the keys is chosen (by DBA) to be the primary key.
Other keys are called candidate keys.
• E.g.
sid is a key for Students.
What about name?
The set {sid, gpa} is a superkey.

Primary and Candidate Keys in SQL
• Possibly many candidate keys (specified using
UNIQUE), one of which is chosen as the primary key.
Keys must be used carefully!
“For a given student and course, there is a single grade.”
CREATE TABLE Enrolled
CREATE TABLE Enrolled
(sid CHAR(20)
(sid CHAR(20)
cid CHAR(20),
cid CHAR(20),
vs.
grade CHAR(2),
grade CHAR(2),
PRIMARY KEY (sid),
PRIMARY KEY (sid,cid))
UNIQUE (cid, grade))
“Students can take only one course, and no two students
in a course receive the same grade.”

Foreign Keys, Referential Integrity
• Foreign key: a “logical pointer”
Set of fields in a tuple in one relation
that `refer’ to a tuple in another relation.
Reference to primary key of the other relation.
• All foreign key constraints enforced?
referential integrity!
i.e., no dangling references.