Introduction to SQL -- Installation of PostgreSQL

Abstract:

This short course consists of three parts, in the first part we
explain a few generalities about SQL using a public domain
database server called postgreSQL. In the second part we will
study in more detail the SQL commands. Finally the th ird part
will show us some of the advanced options of the SQL commands,
functions specific of PostgreSQL that might be of interest for
our projects and we will finally review a small C program
putting it all together.

_________________ _________________ _________________

Introduction

During this quick introduction I will only refer to databases
(DB). There are other organisations of data, but writing about
them would be outside the scope of our course.

Until recently, the access to data items was accomplished
through entities that were inter-related by well-defined links
of the database scheme. This type of access had advantages,
mainly speed of access, but a big disadvantage: we could only
access data through an existing link, for example:

country -> states -> counties

but never :

country -> counties

where "->" is the link.

If we wish to establish that second relationship we would
have to redefine the scheme and compile it again....

In fact, in a hierarchical DB, the relationship among the
various entities is static and can only be modified after
alteration of the DB scheme and recompilation of the
latter.

The basic idea behind relation databases is precisely to
link data during the query instance, without the need for a
static link, but instead using an identifier that permits to
link one register with another.

What I just wrote probably needs an Aspirin :)

Relational database managers do not required static links to
allow us go down the hierarchy of entities, instead they use a
unique code that identify these entities while establishing a
temporary relation as a result to a query.

The identification is nothing but a code. Ex: My phone
number is not:

1234567

but :

34 6 1234567

Clearly my phone number is identified by the country code
(34), state code (6) and the proper device number
(1234567).

In the set of countries, the code 34 (Spain) is
unique.

In the set of states, the code 34-6 (Spain/Valencia) is
unique.

In the set of devices, the code 34-6-1234567
(Spain/Valencia/my phone number) is unique.

Let me set the foundations for the first example that
illustrates what I just said.

All counties have a code, belong to
a state and a country.
All states have a code and belong to a
country.
All countries have a code.

To find all the counties in a state we relate the county
with the state through the country and county codes; to find
all the counties in a country we relate the county to the
country by the country code. These relationships are temporary
and only exist during the time of my query.

This is a bit arid and hard to understand but with the first
few examples I hope to make clearer the concept of code and
belonging.

When I send the first query to the DB manager it returns me
all the related data items. But what data am I really
receiving? The union of country and county items, for every
county I am going to receive the related country items.

During my first query a new nameless entity is suddenly
created, it contains a replica of countries and counties. This
new entity, once again, disappears at the end of my query.

Before we used to name a set of data a "file". This are made
of registers and each "register" has a "field". Well, in a
relational database, a "file" is known as a table, a table
contains rows and each row has columns, this is just a small
cosmetic change. ;-)

It is good to mention at this point that some hierarchical
DB managers introduce SQL as an access language, but this is
only anecdote. SQL language is almost exclusively a property of
relational managers.

To illustrate the use of SQL we will use the relational
managers PostgreSQL. Although it is not fully compliant with
the rules of SQL, it is sufficiently close for our purposes, it
is also a very good manager for more heavy duty tasks.

Let me explain only briefly the installation process, given
the goal of this article is SQL. First download the sources
from www.postgresql.org, as well as
any patches available. Extract the sources (tar zxvf) to a
directory, cd postgresql-6.3

In case of having trouble compiling and installing
Postgres95 correctly please refer to the INSTALL file in the
root directory for the distribution.

Let me make another side comment; a relational database
server is generally made of the following parts:

Data access layer

SQL processing layer

SQL parser layer

Communications layer

As clients we connect to the 4th layer, we send SQL commands
which are then passed to the parser layer. This translates the
commands and, in the absence of errors, sends the requests to
the second layer. All the processing and querying operations
take place at this level in collaboration with the data access
layer obviously: data is collected, errors are transmitted to
the client via the communications layer. The SQL processing
layer establishes a dialogue with the client while managing the
correct transfer of data and controlling the transactions and
interrupts.

First
Steps

Next I will illustrate with an example what has been described
so far, let us build three tables (or files):

18 rows ??? We inserted 3 countries and 6 states, all
identify a single country. How is it possible we get 18 rows?

The last command has performed a union of two tables, we
have related the table of countries with the table of counties,
since we have not specify any union exclusion rule, pgsql
returned ALL possible rows of countries related with ALL rows
of states, i.e. 3 for countries times 6 for states for a total
of 18. This result is obviously illogic and useless, better if
we could have done the following:

Yes, there are six counties and each county is in a country.
It is reasonable to get a number of rows identical to the
number of counties because country is a qualifier of counties.
We just related the table of countries with the table of
counties via the country code. Remember that countries have a
code and counties have the code of the country they belong
to.

Why countries.cod_country = states.cod_country
?

The country code in the table of countries is
cod_country and in the table of counties too,
therefore:

cod_country = cod_country

is illogical, the interpreter will never now which of the two
to use and it would return us an error:

Since we query "select * from countries, states",
where the * is a wild card that stands for EVERYTHING,
we obtained the two columns for countries and the three for
counties. Now we would like to be more specific:

In the last command we explicitly asked for the country code,
the state code and the name of the country and state. Notice
that some column names are qualified (a.cod_country) while
others are not (nam_state), this is because cod_country is
repeated in both tables while nam_state exists only in states.
Unique column names do not need extra qualifiers.

We still obtained the same three rows but this time the
returned information is more clear.

Well until now we have only given an introduction, just
warming up :-)

Concepts
Review

So far we have seen a few very basic concepts of SQL. The most
relevant thing here is the concept itself of SQL. We do not
work with concrete data anymore but with data entities. A data
entity is an abstract concept of databases. Simplifying it
could be understood as "ONLY RETURN PART OF ALL WHAT IS
AVAILABLE"

We have seen several commands:

CREATE
TABLE

This command creates a table with its columns.

DROP TABLE

Erases a table.

SELECT

This command is the foundation of SQL, allows us to
create a temporal table containing the necessary data
items only. SELECT can take as parameters functions or
complex statements, as well as sub_selects:

select count(*) from states
where cod_country in (select cod_country from countries);
count
-----
27
(1 row)

BEGIN WORK

This is another fundamental command. It tells the DB
manager to commit ALL the modifications given since BEGIN
WORK. In our particular DB manager a BEGIN WORK marks the
initialisation of a transaction, in other managers the
beginning of a transaction is mark by the first command
that alters something in the database. In postgreSQL all
commands that alter data will operate directly unless
there was a previous BEGIN WORK.

NOTE: commands that modify the scheme of the database
execute a COMMIT WORK, therefore if a transaction is
opened and any such commands is executed our transaction
will be closed immediately and it will be impossible to
launch a ROLLBACK WORK.

While a user has an open transaction he can declare
the access type to his data by other users:

Modified data

Original Data previous to the transaction

Block data access

COMMIT WORK

Closes a transaction leaving the committing the
modifications introduced. The command ROLLBACK WORK returns
the data to their state previous the current
transaction.

The concept of transaction is very important given that it let
us return to the previous state in case of an error. Let us try
this operation, first a "rollback work" to close any
previous transactions:

The definitions of the type of data are specific for
every kind of SQL manager, nevertheless there is an SQL
standard (the last one is ANSI/92 or SQL/3) that defines
a few types with its characteristics. This course we will
only see few types specific of PostgreSQL.