A ClientDataSet in Every Database Application

Abstract: This article is the first in an extended series designed to explore the ClientDataSet. The basic behavior of the ClientDataSet is described, and an argument is made for the extensive use of ClientDataSets in most all database applications.

The ClientDataSet is a component that holds data in an
in-memory table. Until recently, it was only available in the Enterprise
editions of Delphi and C++ Builder. Now, however, it is available in the
professional editions of these products, as well as Kylix. This article is the
first in an extended series designed to explore the capabilities and features of
the ClientDataSet.

I have been playing with an idea for a while, and I
wanted the title of this article to reflect this (with my apologies to
Herbert Hoover for the pathetic turn of his political promise of "two
chickens in every pot and a car in every garage"). In short, I believe that
a very strong argument can be made for including one ClientDataSet and a
corresponding DataSetProvider for each TDataSet used in an application. Doing so
provides your user interface and runtime code with a consistent set of features
(filters, ranges, searches, and so forth) regardless of the data access
technology being employed.

Actually I have two goals in this first of many articles
detailing the ClientDataSet. The first is to set forth the reasons why I believe
that ClientDataSets should play a primary role in most database applications.
The second goal, and the one that I hope you find useful whether or not you
accept my arguments, is to provide a general introduction to the nature and
features of the ClientDataSet.

It's this second goal that I will address first.
Specifically, in order for my arguments to make sense, it is essential to first
provide an overview of the ClientDataSet, and how it interacts with a
DataSetProvider. This discussion will also serve as a primer for many of the
technique-specific articles that will follow in this series. After this
introduction I will return to my first premise, explaining in detail how you can
improve your applications through the thoughtful use of ClientDataSets.

Introduction to the ClientDataSet

The ClientDataSet has been around for a while: Since
Delphi 3 to be precise. But up until recently it has only been available in the
Client/Server or Enterprise editions of Delphi and C++ Builder. In these
editions the ClientDataSet was intended to hold data in a DataSnap (formerly
called MIDAS) client application. While many Enterprise edition developers did
make extensive use of the ClientDataSet's features in non-DataSnap application,
that this component did not exist in the Profession edition products made
recommending its widespread employment unrealistic.

With Borland's introduction of dbExpress, which first
appeared in Kylix 1.0, the ClientDataSet, and its companion, the DataSetProvider,
are now part of the Borland's Professional Edition RAD (rapid application
development) products, including Delphi 6, Kylix 2, and C++ Builder 6. Now all
Borland RAD developers have access to this powerful and flexible component (I'm not
counting the Personal or Open edition developers in this group, since those
versions do not have the database-related components in the first place).

With this in mind, let's now take a closer look at how the
ClientDataSet works.

The ClientDataSet is a TDataSet descendant that holds data
in memory in a table-like structure consisting of rows (records) and columns
(fields). Using the methods of the TDataSet class, a developer can navigate,
sort, search, filter, and edit the data held in memory. Because these operations
are performed on data stored in memory, they are very fast. For example, on a
test machine with 512 MB of RAM running an 850 MHz Pentium 3, an index was build
on an integer field containing random numbers of a 100,000 record table in just
under one-half second. Once built, this index can be used to perform near
instantaneous searches and set ranges on this indexed field.

The ClientDataSet actually contains two data stores. The
first, named Data, contains the current view of the data in memory, including
all changes to that data since it was loaded. For example, if a record was
deleted from the dataset, that record is absent from Data. Likewise, records
added to the ClientDataSet are visible in Data.

The second store, named Delta, represents the change log,
and contains a record of those changes that have been made to Data.
Specifically, for each record that was inserted or deleted from Data, there
resides a corresponding record in Delta. For modified records it is slightly
different. The change log contains two records for each record modified in Data.
One of these is a duplicate of the record that was originally modified. The
second contains the field-by-field changes made to the original record.

The change log serves two purposes. First, the information
in the change log can be used to restore edits made to Data, so long as those
changes have not yet been resolved to the underlying data source. By default,
this change log is always maintained, meaning that in most applications the
ClientDataSet is always caching updates.

The second role that the change log plays only applies to a
ClientDataSet that is used in conjunction with a DataSetProvider. In this role,
the change log provides sufficient detail to permit the mechanisms supported by
the DataSetProvider to apply the logged changes to the dataset from which the
data was loaded. This process begins when you explicitly call the
ClientDataSets ApplyUpdates method.

When a ClientDataSet is used to read and write data
directly from a file, a DataSetProvider is not used. In those cases, the change
log is stored in this file each time you invoke the ClientDataSets SaveToFile
method, and restored each time you call LoadFromFile (or if you open and close
the ClientDataSet when the FileName contains the name of the file). The change
log is only cleared in this scenario when you invoke MergeChangeLog or
ClearChanges (this second method causes the changes to be lost).

There are quite a few differences between how you use a
ClientDataSet depending on whether or not a DataSetProvider is employed. The
following discussion focuses exclusively on the situation where a ClientDataSet
points to a DataSetProvider with its ProviderName property. Using a
ClientDataSet directly with files will be discussed in detail in a future
article.

How a ClientDataSet and a DataSetProvider Interact

In order to use a ClientDataSet effectively you must
understand how a ClientDataSet interacts with a DataSetProvider. To illustrate
this interaction I have created a Delphi project named CDSLoadBehaviorDemo. The
main form for this project is shown in the following figure. While I will
describe what this project does, it is best if you download
this project from Code Central and run it. That way you can observe first-hand
the interaction.

Here is the basic setup. The ClientDataSet points to a
DataSetProvider through its ProviderName property, and the DataSetProvider
refers to a TDataSet descendant through its DataSet property. When you set the
ClientDataSets Active property to True or invoke its Open method, the
ClientDataSet makes a data packet request from the DataSetProvider. This
provider then opens the dataset to which it points, goes to the first record,
and then scans through the records until it reaches the end of the file. With
each record it encounters the DataSetProvider encodes the data into a variant
array. This variant array is sometimes referred to as the data packet.
When the DataSetProvider is done scanning the records, it closes the dataset to
which it points, and then passes the data packet to the ClientDataSet.

You can see this behavior in the CDSLoadBehaviorDemo
project. The DBGrid on the right-hand side of the main form is connected to a data
source that points to a TTable from which the DataSetProvider gets its data.
When you select ClientDataSet | Load from this project's main menu, you will literally
see the TTable's data being scanned in this DBGrid. Once the DataSetProvider
gets to the last record of the TTable, the TTable is closed and this DBGrid
appears empty again, as shown in the following figure.

Whether or not the scanning of the TTable is visible in the
CDSLoadBehaviorDemo project is configurable. Visible scanning is the default in
this project,
but because this visible scanning requires so many screen repaints, the
ClientDataSet takes quite a bit of time to load the not quite 1000 records of
the Items.db table (the table pointed to by the TTable). If you select View |
View Table Loading to uncheck this menu option, and select ClientDataSet | Load
(if data is already loaded, you must first select ClientDataSet | Unload), you
will notice that these records load almost instantly. The actual load time of a
ClientDataSet depends on how much data is loaded.

Returning to a description of the ClientDataSet/DataSetProvider
interaction, upon receiving the variant array, the ClientDataSet unpacks this
data into memory. The structure of this dataset is based on metadata that the
DataSetProvider encodes in the variant array. Even though the dataset to which
the DataSetProvider pointed may contain one or more indexes, the data packet
contains no index information. If you want indexes on the ClientDataSet, you
must define or create them. ClientDataSet indexes can be defined at runtime
using the IndexDefs property, and this topic will be discussed at length in a
future article.

The ClientDataSet now behaves just like most any other
opened TDataSet descendant. Its data can be navigated, filtered, edited,
indexed, and so forth. As pointed out earlier, any edits made to the
ClientDataSet will affect the contents of both the Data and Delta properties. In
essence, these changes are cached, and are lost if the ClientDataSet is closed
without specifically telling it save the changes. Changes are saved by invoking
the ClientDataSet's ApplyChanges method.

Applying Changes to the Underlying Data Source

When you invoke ApplyChanges, the ClientDataSet passes
Delta to the DataSetProvider. How the DataSetProvider applies the changes
depends on how you have configured it. By default, the DataSetProvider will
create an instance of the TSQLResolver class, and this class will generate SQL
statements that will be executed against the underlying data source.
Specifically, the SQLResolver will generate one SQL statement for each deleted,
inserted, and modified record in the change log. Both the UpdateMode property of
the DataSetProvider, as well as the ProviderFlags property of the TFields for
the provider's dataset, dictate exactly how this SQL statement is formed.
Configuring these properties will be discussed in a future article.

If the dataset to which the DataSetProvider points is an
editable dataset, you can alternatively set the provider's ResolveToDataSet
property to True. With this configuration, a SQLResolver is not used. Instead,
the DataSetProvider will edit the dataset to which it points directly. For
example, the DataSetProvider will locate and delete each record marked for
deletion in the change log, and locate and change each record marked modified in
the change log.

If you download the CDSLoadBehaviorDemo project, you can
see this for yourself. From your designer, select DataSetProvider1 and set its
ResolveToDataSet property to True. Next, run the project and load the
ClientDataSet. After making several changes to the data, select File |
ApplyUpdates. Depending on the speed of your computer, you may or may not
actually see the DBGrid become active as the TTable is edited. However, on most
systems you will notice the DBNavigator buttons become active briefly as a
result of the editing process. (If your computer is too fast, and you cannot see
the DBGrid or the DBNavigator become active, you can assign an event handler to
the AfterPost or AfterDelete event handlers of Table1, and issue a MessageBeep
or ShowMessage call. That way you will prove to yourself that Table1 is being
edited directly.)

There is a third option, which involves assigning an event
handler to the DataSetProvider's BeforeUpdateRecord event handler. This event
handler will then be invoked once for each record in the change log. You use
this event handler to apply the changes in the change log programmatically,
providing you with complete control over the resolution process. Writing
BeforeUpdateRecord event handlers can be an involved process, and will be
discussed in a future article.

When you invoke ApplyUpdates, you pass a single integer
parameter. You use this parameter to identify your level or tolerance for
resolution failures. If you cannot tolerate any failures to resolve changes to
the underlying data source, pass the value 0 (zero). In this situation the
DataSetProvider starts a transaction prior to applying updates. If even a single
error is encountered, the transaction is rolled back, the change log remains
unchanged, and the offending record is identified to the ClientDataSet (by
triggering its OnReconcileError event handler, if one has been assigned).

If you pass a positive integer when calling ApplyChanges,
the transaction will be rolled back only if the specified number of errors is
exceeded. If fewer than the specified number of errors is encountered, the
transaction is committed and the failed records are returned to the
ClientDataSet. Furthermore, the applied records are removed from the change log,
leaving only the changes that could not be applied.

If the number of failures exceeds the specified number, the
transaction is rolled back, the change log is unchanged, and the records that
could not be resolved are identified to the ClientDataSet as described earlier.

You can also pass a value of 1 when invoking
ApplyUpdates. In this situation no transaction is started. Any records that can
be applied are removed from the change log. Those whose resolution fail will
remain in the change log, and are identified to the ClientDataSet through its
OnReconcileError event handler.

That's basically how it works, although there are a number
of variations that I have not considered. For example, it is possible to limit
how many records the ClientDataSet gets from the DataSetProvider using the
ClientDataSet's PacketRecords and FetchOnDemand properties. Similarly, you can
pass additional information back and forth between the ClientDataSet and the
DataSetProvider using a number of provided event handlers. Future articles in
this series will describe how and when to use these properties.

Using ClientDataSets Nearly Everywhere

Now that we've overviewed the basic workings of the
ClientDataSet and DataSetProvider components, let's return to the premise that I
laid out at the beginning of this article. As I mentioned in the introduction, a
strong argument can be made for using a ClientDataSet/DataSetProvider
combination anytime data needs to be modified programmatically or displayed
using data-aware controls.

There are three basic benefits to using ClientDataSet and
DataSetProvider components for all data access.

The combination provides a consistent set of data
access features, regardless of which data access mechanism you are using.

Their use provides a layer of abstraction in the data
access layer, making future changes to the data access mechanism easier to
implement.

For local file-base systems (Paradox or dBase tables,
for example), the ClientDataSet can greatly
reduce table and index corruption.

Let's consider each of these points separately.

A Consistent, Rich Feature Set

The ClientDataSet provides your applications with a
consistent and powerful set of features independent of the data access mechanism
you are using. Among these features are an editable result set, on-the-fly
indexes, nested dataset, ranges, filters, cloneable cursors, aggregate fields,
group state information, and much, much more. Specifically, even if the data
access mechanism that you are using does not support a particular feature, such
as aggregate fields or cloneable cursors, you have access to them through the
ClientDataSet.

A Layer of Abstraction

In addition to the features supported by ClientDataSet, the
ClientDataSet/DataSetProvider combination serves as a layer of abstraction
between your application and the data access mechanism. If at a later time you
find that you must change the data access mechanism you are using, such as
switching from using the Borland Database Engine (BDE) to dbExpress, or from ADO
to InterBase Express, your user interface features and programmatic control of
data can remain largely unchanged. You simply need to hook the DataSetProvider
to the new data access components, and provide any necessary adjustment to your
DataSetProvider properties and event handlers.

Some people don't like the fact that a ClientDataSet holds
changes in cache until you call ApplyUpdates. Fortunately, for those
applications that need changes to be applied immediately you can make a call to
ApplyUpdates from the AfterPost and AfterDelete event handlers of the
ClientDataSet.

Reduced Corruption

For developers who are still using local file-based
databases, such as Paradox or dBase, there is yet another very powerful
argument. Hooking a ClientDataSet/DataSetProvider pair to a TTable can reduce
the likelihood of table or index corruption to near zero.

Table and index corruption occurs when something goes wrong
while accessing the underlying table. Since a TTable component has an open file
handle on the underlying table so long as the TTable is active, this corruption
happens all too often in many applications. When the data is extracted from a
TTable to a ClientDataSet, however, the TTable is active for only very short
periods of time; during loading and resolution, to be precise (assuming that you
set the TTable's Active property to False, leaving the activation entirely up to
the DataSetProvider). As a result, in most applications, accessing a TTable's
data using a ClientDataSet/DataSetProvider combination reduces the amount of
time that a file handle is opened on the table to less than a fraction of one
percent compared to what happens when a TTable is used alone.

But It's Not for Every Application

While these arguments are compelling, I must also admit that
this approach is not appropriate for every application. That a ClientDataSet
loads all of its data into memory makes its use much more difficult when you are
working with large amounts of data. There are work-arounds that you can use if
you point a ClientDataSet to, say, a multi-million record data source, but doing
so sometimes requires a fair amount of coding, thereby complicating the
application.

For most applications, however, the combination of features
provided by the ClientDataSet outweigh the disadvantages. But even if you do not
accept this argument, I think that you will find many situations where the use
of a ClientDataSet enhances your application's features, and simplifies your
efforts.

About the Author

Cary Jensen is President of Jensen Data Systems, Inc., a Texas-based training
and consulting company that won the 2002 Delphi Informant Magazine Readers
Choice award for Best Training. He is the author and presenter for Delphi
Developer Days (www.DelphiDeveloperDays.com), an information-packed Delphi
(TM) seminar series that tours North America and Europe. Cary is also an
award-winning, best-selling co-author of eighteen books, including Building
Kylix Applications (2001, Osborne/McGraw-Hill), Oracle JDeveloper (1999, Oracle
Press), JBuilder Essentials (1998, Osborne/McGraw-Hill), and Delphi In Depth
(1996, Osborne/McGraw-Hill). For information about onsite training and
consulting you can contact Cary at cjensen@jensendatasystems.com, or
visit his
Web site at www.JensenDataSystems.com.

Click
here for a
listing of upcoming seminars, workshops, and conferences where Cary Jensen is
presenting.

Copyright
) 2002 Cary Jensen, Jensen Data Systems, Inc.
ALL RIGHTS RESERVED. NO PART OF THIS DOCUMENT CAN BE COPIED IN ANY FORM WITHOUT
THE EXPRESS, WRITTEN CONSENT OF THE AUTHOR.