This chapter is from the book

This chapter is from the book

Chapter 3: Storing Directory Information

In this Chapter

The Directory Database

Partitioning the Directory

Directory Replication

At its core, a directory is an information
repository, and how that information is stored and managed is of critical
importance. What is perhaps not quite as obvious is that the methods used for
the storage and management of directory information impact many aspects of
directory functionality. How effectively information is stored, retrieved, and
distributed can directly impact the overall scalability, performance, and
reliability of the directory.

This chapter examines how directory data is stored and managed in a
distributed directory environment, and it discusses how directory information is
subdivided and replicated to multiple directory servers. This chapter also
examines the methods of maintaining data consistency among the distributed
portions of the directory.

The Directory Database

The unified collection of objects managed by the directory is stored in a
database, commonly called the Directory Information Base (DIB). The DIB
contains the directory objects representing information and entities such as
users, network resources, applications, and so on along with the administrative
data (such as security settings) needed to manage and control access to those
objects.

The X.500 standards do not specify how the information is stored and
retrievedthe database storage mechanisms are not considered part of the
scope of the X.500 standards.

The methods used in directory database storage and retrieval mechanisms are
implementation specific and can vary widely. Some vendors, such as Novell, write
proprietary database engines to meet the specific needs of their directory
service. Other vendors have chosen to use a relational database as the
repository for directory information.

Because there will always be many more queries than updates performed on a
directory, many vendors optimize the search engine and provide highly available
catalogs to speed up resolution of client queries.

What Is Stored Depends on Focus

What information is stored depends entirely on what the directory service is
being used for. For example, a directory service that is used to manage an
enterprise network stores information not only on users, but also the servers,
network services and resources, applications, and other information necessary to
administer the network. Likewise, a directory that is used to manage an
e-commerce site stores information on the users, and user preferences data.

Storing the Directory Database on Disk

Keep in mind that this description is about how the information is written to
diskthe actual data storage mechanismnot about partitioning and
replication, which is discussed throughout the rest of this chapter. The
description that immediately follows assumes a unified (nonpartitioned)
directory, or a single partition of a larger, partitioned directory.

Although the DIB is generally spoken of in the singular, it should be noted
that this is only true in a logical sensefrequently more than one file
makes up the DIB. A directory database may be contained in a range of storage
structures, from a single file containing the directory information, to a
collection of files containing subsets of directory data with a table of
pointers used to link and organize the files.

A directory that has a specialized and small information set is more likely
to use a single file than a large general-purpose directory. A quick look at two
extremes will help illustrate why.

Single fileAt one end of the spectrum is a directory with a
relatively simple and small datasetthe Domain Name System (DNS). DNS, in
most implementations, stores its information in text files that are very small
and consist of a small dataset that can be searched quickly. Of course, DNS has
one of these text files for each zone (a zone is analogous to a partition);
therefore, the DNSdirectory is made up of a distributed database
of millions of these little text files.

Multiple filesAt the other extreme is Novell's
eDirectory, a more general-purpose directory that commonly contains a large
amount of information of varied types. To support the flexibility needed by
eDirectory, Novell has devised a storage method that uses a series of files,
each containing a particular portion of the data that makes up the DIB. One file
stores basic information about directory objects; a variable number of others
store most of the actual property value information (that is, the data)
associated with those objects.

Distributing the Directory Database

Whatever the specifics of a particular DIB implementation, directory service
designers must contend with some fundamental data management issues. Obviously,
support for the distribution of the DIB must be explicitly defined for a
directory to function with a distributed datastore. The support for a
distributed datastore is implemented via partitioning and replication of the
DIB. Concomitantly, a directory service must define a method of linking the
partitions into a complete directory tree, as well as a means to pass queries
and other information between partitions.

DIB replication presents other critical issues of data access and information
consistency between the distributed replicas. When using multiple copies of the
directory datastore (replicas), data integrity between the copies of the
information must be maintained. All replicas must (eventually) be updated
whenever changes are made to any replica. Additionally, to maintain the
consistency of the directory information, some form of data synchronization must
be performed.

The next section describes partitioning, and later sections examine
replication and data consistency concepts and operations.