DAY 5 : DISTRIBUTED DBMS

What is Distributed Database Management System ?

A
distributed database management system is a software
system that permits the management of the distributed database and makes the distribution
transparent to the users. A distributed database is a collection of multiple,
logically interrelated databases distributed over a computer network. Sometimes
distributed
database system is used to refer jointly to the distributed
database and the distributed DBMS.

Distributed
database management systems can be architected as client-server
systems or peer-to-peer ones. In the former, one or more servers
manage the database and handle user queries that are passed on by the clients.
The clients usually have limited database functionality and normally pass the SQL queries over to the
servers for processing. In peer-to-peer systems, each site has equal
functionality for processing.

A distributed database is a database that
is under the control of a central database management system (DBMS) in
which storage devices are not all attached to a common CPU. It may be stored in multiple computers
located in the same physical location, or may be dispersed over a network
of interconnected computers.

Collections
of data (eg. in a database) can be distributed across multiple physical
locations. A distributed database is distributed into separate
partitions/fragments. Each partition/fragment of a distributed database may be
replicated (ie. redundant fail-overs, RAID like).

Besides distributed
database replication and fragmentation, there are many other distributed
database design technologies. For example, local autonomy, synchronous and
asynchronous distributed database technologies. These technologies'
implementation can and does definitely depend on the needs of the business and
the sensitivity/confidentiality of the data to be stored in the database. And
hence the price the business is willing to spend on ensuring data security,
consistency and integrity.

Basic architecture

A database
server is the software managing a database, and a client is an application that
requests information from a server. Each computer in a system is a node. A node
in a distributed database system act as a client, a server, or both, depending
on the situation.

Important
considerations

The distribution is transparent — users must be able to interact with
the system as if it was one logical system. This applies to the systems
performance, and methods of access amongst other things.

Transactions are transparent — each transaction must maintain database
integrity across multiple databases. Transactions must also be divided into
subtransactions, each subtransaction affecting one database system.

Advantages of distributed databases

Reflects organizational structure — database fragments are located in
the departments they relate to.

Local autonomy — a department can control the data about them (as they
are the ones familiar with it.)

Improved availability — a fault in one database system will only
affect one fragment, instead of the entire database.

Improved performance — data is located near the site of greatest
demand, and the database systems themselves are parallelized, allowing
load on the databases to be balanced among servers. (A high load on one
module of the database won't affect other modules of the database in a
distributed database.)

Economics — it costs less to create a network of smaller computers
with the power of a single large computer.

Modularity — systems can be modified, added and removed from the
distributed database without affecting other modules (systems).

Disadvantages of distributed databases

Complexity — extra work must be done by the DBAs to ensure that the
distributed nature of the system is transparent. Extra work must also be
done to maintain multiple disparate systems, instead of one big one. Extra
database design work must also be done to account for the disconnected
nature of the database — for example, joins become prohibitively expensive
when performed across multiple systems.

Security — remote database fragments must be secured, and they are not
centralized so the remote sites must be secured as well. The
infrastructure must also be secured (eg: by encrypting the network links
between remote sites).

Difficult to maintain integrity — in a distributed database enforcing
integrity over a network may require too much networking resources to be
feasible.

Inexperience — distributed databases are difficult to work with, and
as a young field there is not much readily available experience on proper
practice.