Database Management System

Transcript

► What is data?
★ Data is stored facts.
★ It can be numbers, words, measurements, observations or even just
descriptions of things.
★ Example : Each student's test score is one piece of data.

► What is information?
★ Information is that which informs, i.e. that from which data can be derived.
★ Information is conveyed either as the content of a message or through direct
or indirect observation of something.
★ Example : The average score of a particular subject is the information that can be concluded from the given data.

What is Database?
★ Stores organized information
★ Data are extracted from information
★ Knowledge is derived from data

What kind of data ?

► Enterprise data

► Recent data

★ Banking

★ Scientific data

★ Transport (Railways, Airlines)

★ Biological data

★ University, Libraries

★ Semi structured data (XML)

★ Online stores, supermarket

★ Network data

★ Finance

★ Manufacturing (production, inventory, orders, supply chain)

★ Telecommunication

★ Sales (customers, products,
purchase)

What is Database Management System
(DBMS)?

★ Collection of interrelated data
★ Provides an environment that is both convenient and efficient to use
★ Set of programs to store, update data
★ Answer queries about the data
★ Manages very large amounts of data
★ Supports concurrent access to large amounts of data
★ Supports secure, atomic access to large amounts of data

► The major purpose of a database system is to provide users with an abstract view of the system.
► System hides certain details of how the data is stored, created and
maintained.
► It gives an architecture to separate the user applications and the physical database.

Levels of data abstraction

Physical level : The lowest level of abstraction describes how the data are actually stored, e.g. index, B-tree, hashing.
Logical level : The logical level describes data properties such as data semantics, relationships among data.
View level : The highest level of abstraction describes only part of the entire database, e.g. ATM in a bank get a view of customer balance, but not of transaction data.

★ Capture the current state of data
★ Time dependent
★ Only one exit at any given time

Three schema approach

► The overall design of the database is called the database schema.

The three-schema approach, or the Three Schema Concept is an approach to build information systems and systems information management from 1970s. It proposes to use following types of schemas.

★ External Schema
★ Conceptual Schema
★ Internal Schema

Three schema approachExternal Schema

★ The user’s view of the database.
★ Describes a subset of the database that a particular user group is interested in, according to the format user wants, and hides the rest.
★ May contain virtual data that is derived from the files, but is not explicitly stored.

Three schema approachConceptual Schema

★ The logical structure of the entire database as seen by DBA.
★ Hides the details of physical storage structures.
★ Concentrates on describing entities, data types, relationships, operations, and constraints.

Three schema approachInternal Schema

★ Physical representation of the DB on the computer.
★ Describes the database design at the physical level, which is the lowest level of abstraction describing how the data are actually stored.
★ Physical implementation of the DB to achieve optimal runtime performance and storage space utilization.

Data Independence

► The main advantage of three-schema architecture is that it provides data independence. Data independence is the ability to change the schema at one level of the database system without having to change the schema at the other levels.

★ It has ability to change the conceptual schema without having to change
external schema or application programs.
★ Example: Student (ID, Name, Class, Section)
A view including only ID and Name is not affected by changes in any other
attributes.

Data IndependencePhysical data independence

★ It has the ability to change the internal schema without affecting the
conceptual or external schema.
★ An internal schema may change to improve the performance, i.e. creating
additional access structure.
★ Easier to achieve logical data independence, because application programs
are dependent on logical structures.

Data Models

The way in which information (describing data, data relationships, data semantics and data constraints) is subdivided and managed within a database is referred to as the data model used by the DBMS. Each DBMS is based on a particular data model.

Why data models are important ?

★ Different views (designers, programmers, and end users) of same data lead to designs that do not reflect organization’s operation
★ Data modeling reduces complexities of database and help to create a
sound database design
★ It helps to create database tables that do not contain duplicate data values that can become inconsistent

★ It is based on simulation of the real world which consists of basic objects called entities and relationship among these objects.
★ The overall logical data structure of a database can be expressed
graphically by an E-R diagram. Which consists of rectangle (entity), ellipse
(attribute), diamond (relationship), and lines.

► Relationship : Relationship is an association between entities.
STUDENT ENROLLS IN COURSE

Graphical notation of E - R diagram

NAME SYMBOL FUNCTION
Entity
Attribute
Relationship
-Data in the system
-Uniquely identifiable by identifier
-Has attributes that describe it
Describe an entity
-Relates two entities
-Uniquely identifiable by the identifier

Degree of Relationship (E-R Model)

► The degree of relationship is the number of occurrences in one entity which are associated to the number of occurrences in another. It is also called Cardinality.
There are three degrees of relationship, known as:

★ One to One (1 : 1)
★ One to Many (1 : N)
★ Many to Many (M : M)

Object-oriented model
(Object-based Logical Models)

► In object oriented data model both data and their relationship are contained in a single structure known as an object.
★ An object contains values stored in instance variables within the object.
★ Unlike the record-oriented models, these values are themselves objects.
★ An object also contains bodies of code that operate on the the object.
★ These bodies of code are called methods.
★ Objects that contain the same types of values and behavior (methods) are
grouped into classes.
★ Internal parts of the object, the instance variables and method code, are not visible externally.
★ Result is two levels of data abstraction.

Object-oriented Models
State, Behavior, Identity

► State : attribute types and values
► Behavior : how an object acts and reacts
- Behavior is expressed through operations that can be performed on it
► Identity : every object has a unique identity, even if all of its attribute values are the same

Semantic Model

★ The Semantic Data Model (SDM), like other data models, is a way of structuring data to represent it in a logical way.
★ It focuses on providing more meaning of the data itself, rather than solely or primarily on the relationships and attributes of the data.
★ It provides a high-level understanding of the data by abstracting it further away from the physical aspects of data storage.
★ In SDM, an entity represents some aspect or item in the real world, such as an employee.
★ An entity is akin to a record in a relational system or an object in an object-oriented system.
★ These entities in SDM focus on types, which are more general, instead of sets of data.
★ In SDM, an entity is a very basic notion of a real-world or conceptual object that is defined by a single attribute.

Functional Model

★ The functional data model provides an unified approach to manipulation both data and procedures.
★ The functional data model is a definition of all components of an information system in the form of functions.
★ For example, the functional data model defines data objects, attributes and relationships as so-called database functions.
★ A Functional Data Manipulation Language is a number of data manipulation
functions which can be applied to database functions.

Record-based Logical Models

★ Named so because the database is structured in fixed-format records of several types
★ Each record type denes a fixed number of fields or attributes and each field is usually of a fixed length
★ The use of fixed-length records simplifies the physical level implementation of the database
★ The relational model has established itself as the primary data model for
commercial data processing applications
★ The first database systems were based on either the network model or the
hierarchical model both of which are tied more closely to the underlying
implementation of the database and are now decreasing in importance and real
world use

Record-based Logical Models

► Continued from previous slide
★ Record-based models do not include a mechanism for direct representation of code in the database
★ Separate languages associated with the model are used to express database
queries and updates
★ The three most widely-accepted models are the network, hierarchical and
relational

Network Model

★ A network database consists of a collection of records
★ Relationships among data are represented by links.
★ A record is in many respects similar to an entity in the E-R model
★ Each record is a collection of fields (attributes), each of which contains only one data value.
★ A link is an association between precisely two records.
★ Organization is that of an arbitrary graph.

Network Model

► Advantages
● Flexible, fast, efficient

► Disadvantages
● Complex
● Restructuring can be difficult because of changing all the pointers

★ The hierarchical data model organizes data in a tree structure, rather than arbitrary graphs
★ There is a hierarchy of parent and child data segments
★ This structure implies that a record can have repeating information, generally in the child data segments
★ Data in a series of records, which have a set of field values attached to it
★ It collects all the instances of a specific record together as a record type
★ These record types are the equivalent of tables in the relational model and with the individual records being the equivalent of rows
★ To create links between these record types, the hierarchical model uses Parent Child Relationships

Hierarchical Model

► Advantages
● Easy to search
● Add new branches easily

► Disadvantages
● Must establish the types of search prior to development of the hierarchical structure

Hierarchical Model
(Database systems)

► Currently following are the most widely used hierarchical database
★ IMS developed by IBM
★ Windows Registry by Microsoft

Database Normalization

► Database normalization is a process to refine the data model built by the Entity-Relationship diagram and organizes the fields and tables of a relational database to minimize redundancy. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in
just one table and then propagated through the rest of the database using the defined relationships.

► Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know as the First Normal Form (1NF) in 1970.

► There are three steps in Normalization.
★ First Normal Form ( 1 NF )
★ Second Normal Form ( 2 NF )
★ Third Normal Form ( 3 NF )

First Normal Form (1 NF)

★ Identify repeating groups of fields.
★ Remove repeating groups of fields.
★ Identify the keys for the tables.
★ Key of parent tables is brought as part of the concatenated key of the second table

Second Normal Form (2 NF)

★ Check if all fields are dependent on the whole key
★ Remove fields that depend on part of the key
★ Group partially-dependent fields as separate table
★ Name the tables
★ Identify key(s) to the table

Third Normal Form (3 NF)

★ Remove fields that
-- depend on other non-key fields
-- can be calculated or derived from logic
★ Group interdependent fields as separate tables, identify the key and name the table

Relational Model

★ Data is organized in terms of rows and columns in a table known as relation.
★ The position of a row in a table is of no importance.
★ The intersection of a row and column must give a single value.
★ All values appearing in the columns must give a single value.
★ Row must be unique.
★ Column Names must be unique.
★ All column values are atomic.

Relational database model

❖ Fields ( columns ) in the table
store attributes.
➢ Each attribute has a
specific domain.
❖ Tuples ( or records or rows ) in
the table store information.
➢ Each tuple is a unique
instance of an object.
❖ Tables are composed of a set
of tuples.
➢ A table is also called a
relation.

Relational database Term

❖ Table
➢ A collection of relevant data relating to one type of real world objects.
❖ Column
➢ A specific place for one type of data relating to one type of real world
objects.
❖ Domain
➢ Set of all possible values for a specific column.
❖ Row
➢ Collection of data describing one real world object.
❖ Primary Key
➢ Columns, which are part of the row and uniquely identify any one row.

Relational database model (records)

● Each record represents a logical entity (e.g. an invoice).
● Each field represents an attribute of the logical entity.

Relational database model (key)

● Each table has a primary key ( one field or a combination of fields) that has a unique value for each and every record in the table.

Relational database model (relating tables)

● Table can be related (joined) together based on their keys.
● The idea is to decompose into separate tables with no redundancy and to
provide a capability to reassemble with no information loss.

Properties of Relational Database

★ Represent data in the form of tables.
★ No hard-code relationship between tables.
★ Doesn’t require the user to understand the physical implementation.
★ Provides information about its content and structure in system table.
★ Supports the concept of NULL values

Codd’s 12 Rules for an RDBMS

Dr. Edgar Frank Codd (August 19, 1923 – April 18, 2003) was an computer scientist, while working for IBM he invented the relational model for database management (theoretical basis for relational databases). Codd proposed thirteen rules (numbered zero to twelve) and said that if a Database Management System meets these rules, it can be called as a Relational Database Management System. These rules are called as Codd's 12 rules. Hardly any commercial product follows all.

★ Rule Zero : The system must qualify as relational, as a database, and as a management system. For a system to qualify as a relational database
management system (RDBMS), that system must use its relational facilities
(exclusively) to manage the database. The other 12 rules derive from this rule. The rules are as follows :

★ A DDL is a language used to define data structures within a database.
★ Basic idea : Hide implementation details of the database schemes from the users.
★ DDL statements are compiled, resulting in a set of tables stored in a special file called a data dictionary or data directory.
★ The data directory contains metadata (data about data)
★ It is typically considered to be a subset of SQL, but can also refer to languages that define other types of data.
★ The DDL concept and name was first introduced in relation to the Codasyl
database model, where the schema of the database was written in a language
syntax describing the records, fields, and sets of the user data model.

Data Definition Language (DDL)

► SQL DDL allows the specification of a set of relations and about each relation :
★ Schema of each relation
★ Domain of values of associated with each attribute
★ Integrity constraints
★ Set of indices for each relation
★ Physical storage of each relation on the disk

► Data types (Domain types) in standard SQL-92
★ char(n) is fixed-length character string. character is equivalent.
★ varchar(n) is a variable-length character of a user-specified maximum of n.
★ int is an integer (machine-dependent). integer is equivalent.
★ numeric(p,d) is a fixed-point number with user-specified precision of pdigits and ddigits after the decimal point.
★ real, double precision are floating point and double-precision numbers, with machine-dependent precision.
★ float(n) is a floating point number, with user-specified precision of at least n precision.
★ date is a calendar date, containing a (four-digit) year, month, and day of the month.
★ time is the time of day, in hours, minutes, and seconds.

► Here are some examples of Data Definition Language (DDL) statements which are used to define the database structure or schema.
★ CREATE - to create objects in the database
★ ALTER - alters the structure of the database
★ DROP - delete objects from the database

Storage Definition Language (SDL)

► Specifies internal schema
★ Physical structure
★ Bytes per field
★ Order of fields
★ Sorting
★ Access (hash, B-tree)
★ File structure
★ Mixed records etc.
► Some DBMSs have no distinction between these schemas and thus the DDL and
the SDL are the same

★ Data Manipulation is retrieval, insertion, deletion, modification of information in the database
★ A DML is a language which enables users to access and manipulate data.
★ A popular data manipulation language is that of Structured Query Language
(SQL), which is used to retrieve and manipulate data in a relational database
★ The DDL concept and name was first introduced in relation to the Codasyl
database model, where the schema of the database was written in a language
syntax describing the records, fields, and sets of the user data model.

► Here are some examples of Data Manipulation Language (DML) statements
which are used for managing data within schema objects.
★ INSERT - insert data into a table
★ UPDATE - updates existing data within a table
★ DELETE - deletes all records from a table, the space for the records remain

Database Manager

► The database manager is a program module which provides the interface
between the low-level data stored in the database and the application programs and queries submitted to the system.

Responsibility of Database Manager

★ Interaction with the file manager : Storing raw data on disk using the file system. The database manager must translate DML statements into low-level file system commands for storing, retrieving and updating data in the database.
★ Integrity enforcement: Checking that updates in the database do not violate consistency constraints (e.g. no bank account balance below $50)
★ Security enforcement: Ensuring that users only have access to information they are permitted to see
★ Concurrency control: Preserving data consistency when there are concurrent users.

Database Administrator

► The database administrator (also known as a database administration
manager, data architect, or information center manager) is a person having
central control over data and programs accessing that data. In order to perform its duties, the database administrator must have solid knowledge on system analysis and programming.