Login

Database Essentials

Are you new to the wonderful world of databases? Confused by the
sudden flood of technical jargon? Don’t know the difference between a
“trigger” and a “stored procedure”, a “join” and a “subquery”? Look no
further – the solution is right here!A wise man once said that information is power. And in the Internet age, that
statement has been proved correct more than once.

As more and more
companies move their businesses online, a vast sea of digital data comes into
being, bits and bytes that control both where businesses have come from, and
where they will go in the future. Everything a company knows about its business,
its customers and its partners is slowly being converted into ones and zeroes,
turning paper-based offices into cyberspace-based virtual workplaces, increasing
productivity and streamlining business processes.

Of course, there’s a
flip side to this as well. As the volume of data grows, it becomes harder to
track and manage it effectively. And so some bright spark came up with the idea
of organizing all that data into databases – essentially, data storage
containers that impose a structure on the data they contain, so as to simplify
the task of managing and using all that information.

This is a
fundamentally good idea, since it first makes it possible to organize large
amounts of information, and then search through this information for specific
items of data. It also offers benefits from the point of view of portability and
compatibility (once the data is organized and stored in a database, it can be
extracted and displayed in any manner you choose), provides a centralized
storage location for important information, and makes it easier to identify
relationships between different data segments.

Over the next few pages,
I’m going to give you a crash course in some of the basic concepts of database
theory, in the hope that it will offer a starting point for your own exploration
of this field of study. If you’re a novice when it comes to databases, some of
the concepts explained here should help you put things in perspective, provide
you with an explanation of some of the terms used by database engineers, and
also offer you some insight into the capabilities of today’s most popular
database engines.{mospagebreak title=The Customer Is King} First, though, let’s
start with the basics. What exactly is a database?

A database, quite
simply, is a collection of data. Every database is composed of one or more
“tables” – these tables, which structure data into rows and columns, are what
lend organization to the data.

As you can see, a table divides data into rows, with a new entry (or
“record”) on every row. If you think of a table as a filing cabinet, you’ll see
that every file in the cabinet corresponds to one row in the table.

The
data in each row is further broken down into cells (or “fields”), each of which
contains a value for a particular attribute of the data. For example, if you
consider the record for the user “John Doe”, you’ll see that the record is
clearly divided into separate fields for customer ID, and first and last
names.

The rows within a table are not arranged in any particular order –
they can be sorted alphabetically, by id, by member name, or by any other
criteria you choose to specify. Therefore, it becomes necessary to have some
method of identifying a specific record in a table. In the example above, each
record is identified by a “CustomerID”, which is a number unique to each row or
record; this unique field is referred to as the “primary key” for that table. A
table may or may not have a primary key – it’s not essential – but a primary key
does make it easier to locate records in a table.{mospagebreak
title=Relationships} Many of today’s databases are “relational database
management systems”, referred to by the acronym RDBMS. A relational database is
typically composed of multiple tables, which contain inter-related pieces of
information. SQL, or Structured Query Language, allows you to combine the data
from these tables in different ways, thereby allowing you to create and analyze
new relationships.

The example you just saw consists of a single table.
While this is fine by itself, it’s when you add more tables, and relate the
information between them, that you truly start to see the power inherent in this
system. Here’s an example:

Each of these three tables is an independent entity; however, they can
be linked to each other by means of the CustomerID field common to all of them.
For example, you could easily use the CustomerID 1243 to find out that user
“Daffy Duck” lives in Toontown, and has a balance of exactly $3.00 in his
account. Or you could go about it the other way: among all the customers with a
balance less than $10.00 in their accounts, CustomerID 1243, aka “Daffy Duck”,
is the only one living in Toontown, TT.

These relationships are the
fundamental basis of a relational database system. By making it possible to
easily relate records in different tables to one another, an RDBMS makes it
possible to analyze date in different ways, while simultaneously keeping it
organized in a systematic fashion, with minimal redundancy.{mospagebreak
title=Invasion Of The Foreign Keys} You will notice that it is the CustomerID
field, which appears in all three of the tables, that makes it possible to link
different tables together. In such a case, the CustomerID field is also referred
to as a “foreign key”, since it is used to refer to records in different tables.
In the example above, the name of the foreign key field is the same for all
tables; this is not necessary at all times.

When each and every foreign
key value is related to a field in another table, this relationship being
unique, the system is said to be in a state of “referential integrity”. In other
words, if the CustomerID field is present in all the tables once and only once,
and if a change to the CustomerID field in any single table is reflected in all
other tables, referential integrity is said to exist.

This concept of
referential integrity is a basic one, and one that becomes very important when
designing a database with more than one table. When foreign keys are used to
link one table to another, referential integrity, by its very nature, imposes
constraints on inserting new records and updating existing records. For example,
if a table only accepts certain types of values for a particular field, and
other tables use that field as their foreign key, this automatically imposes
certain constraints on the dependent tables. Similarly, referential integrity
demands that a change in the field used as a foreign key – a deletion or new
insertion – must immediately be reflected in all dependent tables.

Many
of today’s databases take care of this automatically – if you’ve worked with
Microsoft Access, for example, you’ll have seen this in action – but some don’t.
In the case of the latter, the task of maintaining referential integrity falls
to the application developer, by manually updating all dependent tables to
reflect changes in foreign key values.{mospagebreak title=Looking Up The Index}
If your table contains a large number of records, you can often speed up queries
by “indexing” certain columns. Similar in concept to the index you find at the
end of a book, a database index is a list of sorted field values, used to
simplify the task of locating specific records in response to
queries.

Typically, it’s a good idea to index fields on which you carry
out a large number of searches. For example, since most of the queries on the
tables above take place via the CustomerID field, it’s a good idea to index this
field. Normally, if you had a query like this,

sql> SELECT * FROM names WHERE CustomerID = 30000;

the database server would have to scan each and every record in the
table to locate the one with CustomerID 30000. By indexing this field, like
this,

sql> CREATE INDEX ids ON names (CustomerID);

the database server can use the index to locate ID 30000, and then use
the information stored in the index to quickly locate the relevant record on the
disk drive.

The downside of using indices is that they usually occupy
some amount of disk space, and can affect the time taken to alter (insert or
update) records, since the index needs to be refreshed each time the data
changes.{mospagebreak title=Joined At The Hip} Most databases also allow you to
query two or more tables at a time, and display a combined set of results. This
is technically referred to as a “join”, since it involves “joining” different
tables at specific points to create new views of the data.

In this case, the “names” and “addresses” tables have been joined
together through the common column “CustomerID”.

And you can also nest
one query within another, such that the result of the inner query provides data
for the outer query. Such a query is referred to as a “subquery”, and it allows
a great deal of flexibility when formulating long and complex
queries.

Let’s suppose you wanted to find out the names and addresses of
all customers with an account balance greater than $1000.00

There is a limit on the number of sub-queries you can use in a single
SQL statement, but it’s usually quite a comfortable number.{mospagebreak
title=Room With A View} Closely related to joins and subqueries is the concept
of “views”, which offers yet another way of displaying specific data
slices.

Unlike joins and subqueries, which are based on actual database
tables, a view is a table derived from the content present in other tables.
Think of a view as a virtual table, one which contains values from other tables
(these values selected on the basis of user-defined rules), and which can be
manipulated in exactly the same way as a regular table.

Since a view is
based on the data present in “real” tables, the data you see in the view changes
as the data in the base tables changes. A view also provides a simple way to
restrict access to sensitive fields in a database table, allowing you to exclude
certain fields from display, and can be used in combination with joins to
discover new relationships between different data items.

For example, the
following command creates a view named “highRollers”, which only contains a list
of accounts with a balance in excess of $1000.00

Any changes made to the “customers” table will be immediately
reflected in the view; similarly, changes made to the data in the view will be
executed on the underlying table.{mospagebreak title=Pulling The Trigger} If
your database supports it, you can also use “triggers” to automate certain
database actions. A “trigger”, as the name suggests, is a built-in database
notification of specific events – inserting a new record, updating an entry,
deleting a set of entries – and you can use this notification to automatically
perform pre-defined actions.

To illustrate this, let’s go back to our
example database for a minute. In this database, each time a customer closes an
account, the entry is deleted from the database, never to be recreated. If,
instead, you’d like to set things up so that the entry is also backed up to
another archive table, you could define a trigger that would run on all DELETE
events, and copy the relevant data to another table prior to deleting
it.Here’s an example of what an Oracle trigger might look like, in the
situation just described:

Typically, triggers can be customized to a great extent – you can
decide whether the action is to be taken before or after the event, once or
multiple times, for specific records or all records, and even specify additional
criteria to further focus the effect of the trigger.

For complex Web
applications (typically those with mission-critical data), triggers come in very
useful, since they make it possible to export simple rules to the database
layer, thereby reducing the dependency on the application layer and sometimes
even improving performance. I say “sometimes”, because triggers can also
increase database load significantly if used incorrectly – take the situation
above, and extrapolate to imagine 5,000 records being deleted every second, and
you’ll immediately see why this could significantly hamper database
performance.{mospagebreak title=So That’s Where All My Money Went…} Database
servers like Oracle also come with the ability to create “stored procedures” at
the database layer – essentially, blocks of program code that can be executed at
will, similar to the user-defined functions you can write in Perl or
PHP.

The advantage of creating such stored program blocks at the database
level is very simple, and very powerful – all applications which interact with
the database can use these stored procedures, thereby reducing the time and
effort required for application development. For example, if your application
uses both Perl and PHP to communicate with a database, you would need to write
procedures to accomplish the same tasks in both languages. By writing the
procedure once and moving it to the common point of intersection – the database
– application code becomes easier to write and maintain, and your time is used
more effectively.

For example, here’s a simple procedure, again from
Oracle’s PL/SQL language, which would reduce a specified customer’s balance by
10%:

This procedure would accept a CustomerID and AccountBalance as
parameters, and use this information to update a database entry in the table. As
you can see, this is similar to a Perl or PHP function, which accepts values as
parameters and performs actions based on those parameters (incidentally, you can
also have the stored procedure return a value to you, based on the results of
code execution – just like a regular function)