A database is a collection of information that is organized so that it caneasily be accessed, managed, and updated. We can have a database of any kind of content–

bibliographic, full-text, numeric and images. We can even have a multimedia database.Adatabaseis often abbreviated as DB.You can also think of a database as an electronic filingsystem. Traditional databases are organized by fields, records, and files. A field is a single pieceof information; a record is one complete set of fields; and a file is a collection of records. Forexample, a telephone book is analogous to a file. It contains a list of records, each of whichconsists of three fields: name, address, and telephone number.

With the advent of the Internet Technology, a new concept in database design has beenintroduced, which is known as a Hypertext database. In a Hypertext database, any object,whether itis

a piece of text, a picture, or a film,itcan be linked to any other object. Hypertextdatabases are particularly useful for organizing large amounts of disparate information, butthey are not designed for numerical analysis.

The size of a database can vary widely, from a few megabytes for personal databases, togigabytes (a gigabyte is 1000 megabytes) or even terabytes (a terabyte is 1000 gigabytes) forlarge corporate databases. The information in a database is stored on a nonvolatile mediumthat can accommodate large amounts of data; the most commonly used such media aremagnetic disks. Magnetic disks can store significantly larger amounts of data than mainmemory, at much lower costs per unit of data.

Table 9.0 contains details about each student. There are six pieces of information on eachstudent. They are Metric No, Name, Date of birth, Sex, Address and Courses. Each piece ofinformation in database is called aField. We can definefieldas

the smallest unit in a database.Each field represents one and only one characteristic of an event or item. Thus there are sixfields in this database. If we were to take a close look at all these fields, we can see that theyare not of the same type. Date of birth isdatetype whereasName ischaracter

type. All therelated fields for a particular event is called aRecord. In the example above, all six fields takentogether for a particular student is called a record of that student. Since there are six studentsthere are six records. Thus, we can define record asa collection of logically related fields. Wecan now say thata database

is a collection of logically related records.

Table 9.0: A Student Database

MetricNo.

Name

Date ofbirth

Sex

Address

Courses

9721001

Maryam

21.05.1980

M

C36, Sector 2, Manama, Bahrain

Pol Sc, Eco, History, Eng,Statistics

9721002

Aditya

12.06.1981

M

At/Po.Orange Street, Dubai

Phy, Chem, Biology, Eng,Geology

9732012

RahulJain

03.01.1979

F

A31, Pilani, Rajasthan, India

Pol

Sc, Eco, History, Eng, Maths

9724004

AhmadAli

23.11.1979

M

12A, Sheikh Sarai-I,Dahran,Saudi Arabia

Phy, Chem, Biology, Eng, IT

9715023

C. Suresh

07.09.1980

M

96, Malviya Nagar,Sri Lanka

Pol Sc, Eco, History, Eng,Programming

9.2

Features ofDatabase Management Systems

Thenext question is: How do we create and manage our databases?Data management involvescreating, modifying, deleting and adding data in files, and using this data to generate reports oranswer queries. The software that allows us to perform these functions easily is called aDataBase Management System

(DBMS). Using a DBMS files can be retrieved easily and effectively.

There are many DBMS packages available in the market. Some of them are:



MySQL,



PostgreSQL,



MicrosoftAccess,



SQL Server,



FileMaker,



Oracle,



RDBMS,



dBASE,



Clipper, and



FoxPro.

9.1

Database Characteristics

A database has several characteristics that make it usefuland irreplaceable.

9.1.1 Concurrent Use

A database is always used by more than one user at the same time. In this case adatabase

system allows several users to access the database concurrently. Answering different questionsfrom different users with the same (base) data is a central aspect of an information system.

Such concurrent use of data increases the economy of a system. Data

capturing and datastorage is not redundant, the system can be operated from a central control and the data canbe updated more efficient.

An example for concurrent use is the travel database of a biggertravel agency. The employees of different branches

can access the database concurrently andbook journeys for their clients. Each travel agent sees on hiscomputer

if there are still seatsavailable for a specific journey or if it is already fully booked.

Figure 9.1.1 describes howmultiple users can share the same database.

Figure 9.1.1: A database enabling concurrent access

9.1.2

Structured

and Described Data

A database system does not only contain

data but also the complete definition and descriptionof these data.A database contains metadata which describes the data itself-

the structure, thetype and the format of all data and, additionally, the relationship between the data.Metadatais sometimes known as "data about data".

Structured Data:

Data is called structured if it can be subdivided systematically and linked.Lets us look at anexample of how data can be structured. Table 9.1.2

entry in the first column must be a prename (coded as string) and an entryin the third column must be a postcode (coded as number).

Table 9.1.2: A table of Names and Addresses

Firstname [string]

Familyname [string]

Postcode

City [string]

Rohit

Gupta

14000

Srinagar

Hanif

Salam

46350

Klang

….

…..

….

….

9.1.3

SeparatesData and Applications

When using a database,theapplication

software does not needto knowabout the physicaldata storage like encoding, format, storage place, etc. It only communicates with themanagement system of a database (DBMS) via astandardized

interface with the help of astandardized

language like SQL. The access to the data and the metadata is entirely done by theDBMS.

In this way all the applications can be totallyseparated

from the data. Thereforedatabase internalreorganizations

or improvement of efficiency do not have any influence onthe application software.Figure 9.1.3

describes how this can be done.

Figure 9.1.3:Separating Data from Application Systems

9.1.4

Data Integrity

Data integritymeansthe quality and the reliability of the data of a database system.Dataintegrity includes also the protection of the database fromunauthorized

access (confidentiality)andunauthorized

changes..Data reflect facts of the real world. Logically, it is demanded thatthis reflection is done correctly. A DBMS should support the task to bring only correct andconsistent data into the database. Additionally, correct transactions ensure that the consistencyis maintained during the operation of the system. An example for inconsistency would be ifcontradictory statements were saved in the same database.

Student Record in the Library

Name

Address

Haziq Hamidi

No.11, Yellow Road,Ipoh, Malaysia

Rami Mayan

No.22 Oxfam Road, Klang, Malaysia

Sunny Darwish

No. 134, Silk Road, Singapore

….

….

Student Record in the AccountsDepartment

Name

Address

Haziq Hamidi

No.11, Yellow Road, Ipoh, Malaysia

Rami Mayan

No.22 Oxfam Road, Klang, Malaysia

Sunny Darwish

No. 74, Lime Tree Road, Norwich, UK

….

….

9.1.5.Data Persistence

Data persistence means that in a DBMS all data is maintained as long as it is not deletedexplicitly. The life span of data needs to be determined directly or indirectly be the user andmust not be dependent on system features. Additionally data once stored in a database mustnot be lost.

9.1.6

Data Views

Typically, a database has several users and each of them, depending on access rights anddesire, needs an individual view of the data (content and form). Such a data view can consist ofa subset of the stored data or of from the stored data derived data (notexplicitly

stored).Forexample:A university manages the data about students. Beside matriculation number, name,address, etc. other information like in which course the student is registered, if he needs to doa receipt, and so on is managed aswell.This

extensive database is used by several people allwith different needs and rights.

A student can view only his own data:-

name address, contactnumber, his courses, grades and fees paid. A lecturer can view only student metric number,names and grades

of those students that he/she teaches. A Dean can only see all student,lecturers and staff details if they are in his Faculty.

9.2

Features of Database Management Systems

A database management system (DBMS) is designed to manage a large body of information.Data management involves both defining structures for storing information and providingmechanisms for manipulating the information. In addition, the database system must providefor the safety of the stored information, despite system crashes or attempts at unauthorizedaccess. If data are to be shared among several users, the system must avoid possibleinconsistent

results due to multiple users concurrently accessing the same data.

Accessing desired records from a large relation using a scan on the relation can be veryexpensive. Indices are data structures that permit more efficient access of records. An index isbuilt on one or more attributes of a relation; such attributes constitute the search key. Given avalue for each of the search-key attributes, the index structure can be used to retrieve recordswith the specified search-key values quickly. Indices may also support other operations, such asfetching all records whose search-key values fall in a specified range of values.

A database schema is specified by a set of definitions expressed by a data-definition language.The result of execution of data-definition language statements is a set of information stored ina special file called a data dictionary. The data dictionary contains metadata,

that is, data aboutdata. This file is consulted before actual data are read or modified in the database system. Thedata-definition language is also used to specify storage structures and access methods.

Data manipulation is the retrieval, insertion, deletion, and modification of information stored inthe database. A data-manipulation language enables users to access or manipulate data asorganized by the appropriate data model. There are basically two types of data-manipulationlanguages: Procedural data-manipulation languages require a user to specify what data areneeded and how to get those data; nonprocedural data-manipulation languages require a userto specify what data are needed without specifying how to get those data.

A query is a statement requesting the retrieval of information. The portion of a data-manipulation language that involves information retrieval is called a query language. Althoughtechnically incorrect, it is common practice to use the terms query language and data-manipulation language synonymously.

Database languages support both data-definition and data-manipulation functions. Althoughmany database languages have been proposed and implemented,SQL

has become a standardlanguage supported by most relational database systems. Databases based on the object-oriented model also support declarative query languages that are similar to SQL.

SQL provides acomplete data-definition language, including the ability to create relations with specifiedattribute types, and the ability to define integrity constraints on the data.

Query By Example (QBE) is a graphical language for specifying queries. It is widely used inpersonal database systems, since it is muchsimpler than SQL for non-expert users.

Forms interfaces present a screen view that looks like a form, with fields to be filled in by users.Some of the fields may be filled automatically by the forms system. Report writers permitreport formats to be defined, along with queries tofetch

data from the database; the results ofthe queries are shown formatted in the report. These tools in effect provide a new language forbuilding database interfaces and are often referred to as fourth-generation languages (4GLs).

Often, several operations on the database form a single logical unit of work, called atransaction. An example of a transaction is the transfer of funds from one account to another.Transactions in databases mirror the corresponding transactions in the commercial world.

Traditionally database systems have been designed to support commercial data, consistingmainly of structuredalphanumeric

data. In recent years, database systems have added supportfor a number of nontraditional data types such as text documents, images, and maps and otherspatial data. The goal is to make databases universal servers, which can store all types of data.Rather than add support for all such data types into the core database, vendors offer add-onpackages that integrate with the database to provide such functionality.

9.3

Architectures of Database Management Systems

The database architecture is the set of specifications, rules, and processes that dictate how datais stored in a database and how data is accessed by components of a system. It includes datatypes, relationships, and naming conventions. The database architecture describes theorganization of all database objects and how they work together. It affects integrity, reliability,scalability, and performance. The database architecture involves anything that defines thenature of the data, the structure of the data, or how the data flows.

The overall structure of the database is called the database schema. The schema specifies data,data relationships, data semantics, and consistency constraints on the data. The entity-relationship data model is based on a collection of basic objects, called entities, and ofrelationships among these objects. An entity is a “thing” or “object” in the real world that isdistinguishable from other objects. For example, each person is an entity, and bank accountscan be considered entities. Entities are described in a database by a set of attributes. Forexample, the attributes account-number and balance describe one particular account in a bank.A relationship is an association among several entities. For example, a depositor relationshipassociates a customer with each of her accounts. The set of all entities of the same type and theset of all relationships of the same type are termedan entity set and a relationship set,respectively.

Like the entity-relationship model, the object-oriented model is based on a collection ofobjects. An object contains values stored in instance variables within the object. An object alsocontains bodies

of code that operate on the object. These bodies of code are called methods.The only way in which one object can access the data of another object is by invoking a methodof that other object. This action is called sending a message to the object. Thus,the callinterface of the methods of an object defines that object's externally visible part. The internalpart of the object—the instance variables and method code—are not visible externally. Theresult is two levels of data abstraction, which are important to abstract away (hide) internaldetails of objects. Object-oriented data models also provide object references which can beused to identify (refer to) objects.

In record-based models, the database is structured in fixed-format records of several types.Each record has a fixed set of fields. The three most widely accepted record-based data modelsare the relational, network, andhierarchical

models. The latter two were widely used once, butare of declining importance. The relational model is very widely used. Databases based on therelational model are called relational databases.

The relational model uses a collection of tables (called relations) to represent both data and therelationships among those data. Each table has multiple columns, and eachcolumn has aunique name. Each row of the table is called atuple, and each column represents the value ofan attribute of the tuple.

9.4

Evolution of Database Technology

Ancient to modern: The origins go back to libraries, governmental, business, and medicalrecords. There is a very long history of information storage, indexing, and retrieval.

1960's: Computers become cost effective for private companies along with increasing storagecapability of computers. Two main data models were developed: network model (CODASYL)and hierarchical (IMS). Access to database is through low-level pointer operations linkingrecords. Storage details depended on the type of data to be stored. Thus adding an

extra fieldto your database requires rewriting the underlying access/modification scheme. Emphasis wason records to be processed, not overall structure of the system. A user would need to know thephysical structure of the database in order to query for

information. One major commercialsuccess was SABRE system from IBM and American Airlines.

1970-72: E.F. Codd proposed relational model for databases in a landmark paper on how tothink about databases. He disconnects the schema (logical organization) of

a database from thephysical storage methods. This system has been standard ever since.

1970's: Several camps of proponents argue about merits of these competing systems while thetheory of databases leads to mainstream research projects. Two main prototypes for relationalsystems were developed during 1974-77. These provide nice example of how theory leads tobest practice.

Ingres: Developed at UCB. This ultimately led to Ingres Corp., Sybase, MS SQL Server, Britton-Lee, Wang's PACE. This system used QUEL as query language.

System R: Developed at IBM San Jose and led to IBM's SQL/DS & DB2, Oracle, HP's Allbase,Tandem's Non-Stop SQL. This system used SEQUEL as query language.

The term Relational Database Management System (RDBMS) is coined during this period.

Early 1980's: Commercialization of relational systems begins as a boom in computer purchasingfuels DB market for business.

Mid-1980's: SQL (Structured Query Language) becomes "intergalactic standard". DB2 becomesIBM's flagship product.Network and hierarchical models fade into the background, withessentially no development of these systems today but some legacy systems are still in use.Development of the IBM PC gives rise to many DB companies and products such as RIM, RBASE5000, PARADOX, OS/2 Database Manager, Dbase III, IV (later Foxbase, even later Visual FoxPro),Watcom SQL.

Early 1990's: An industry shakeout begins with fewer surviving companies offering increasinglycomplex products at higher prices. Much development during this

period centers on client toolsfor application development such as PowerBuilder (Sybase), Oracle Developer, VB (Microsoft),etc. Client-server model for computing becomes the norm for future business decisions.Development of personal productivity tools such as Excel/Access (MS) and ODBC. This alsomarks the beginning of Object Database Management Systems (ODBMS) prototypes.

Early 21st century: Decline of the Internet industry as a whole but solid growth of DBapplications continues. More interactive applications appear with use of PDAs, POStransactions, consolidation of vendors, etc. Three main (western) companies predominate inthe large DB market: IBM (buys Informix), Microsoft, and Oracle.

Future trends: Huge (terabyte) systems are appearing and will require novel means of handlingand analyzing data. Large science databases such as genome project, geological, national

security, and space exploration data. Data mining, data warehousing, data marts are acommonly used technique today. More of this in the future without a doubt.Smart/personalized shopping using purchase history, time of day, etc.

Successors to SQL (and perhaps RDBMS) will be emerging in the future. Most attempts tostandardize SQL successorshave

not been successful. SQL92, SQL2, SQL3 are stillunderpowered and more extensions are hard to agree upon. Most likely this will be overtakenby XML and other emerging techniques. XML with Java for databases is the current poster childof the "next great thing".

Mobile database use is a product now coming to market in various ways. Distributedtransaction processing is becoming the norm for

business planning in many arenas. Probablythere will be a continuing shakeout in the RDBMS market. Linux with Apache supporting mySQL(or even Oracle) on relatively cheap hardware is a major threat to high cost legacy systems ofOracle and DB2.

ObjectOriented Everything, including databases, seems to be always on the verge to sweepingeverything before it. Object Database Management Group (ODMG) standards are proposed andaccepted and maybe something comes from that.

Ethical/security/use issues tendto be diminished at times but always come back. Should you beable to consult a database of the medical records/genetic makeup of a prospective employee?Should you be able to screen a prospective partner/lover for genetic diseases? Shouldamazon.com keeptrack of your book purchasing? Should there be a national database ofconvicted sex offenders/violent criminals/drug traffickers? Who is allowed to do Web tracking?How many times in the last six months did you visit a particular sex chat room/pornsite/political satire site? Who should be able to keep or view such data? Who makes thesedecisions?

Summary

A database is a collection of information that is organized so that it can easily be accessed,managed, and updated.The software that allows us to perform these functions easily is called aData Base Management System

(DBMS). Using a DBMS files can be retrieved easily andeffectively.A database has several characteristics that make it useful and irreplaceable. Itenables concurrent use; it can describe and structure data, separate data from application,ensures data integrity, data persistence, and provide multiple data views.

The database architecture is the set of specifications, rules, and processes that dictate how datais stored in a database and how data is accessed by components of a system. It includes datatypes, relationships, and naming conventions.The origins go back to libraries, governmental,business, and medical records. There is a very long history of

information storage, indexing, andretrieval. Future trend of database will be focused on mobile database usage. Data mining, datawarehousing, data marts are a commonly used technique in the future.