Inside .NET Managed Providers

Dino Esposito
Wintellect

October 9, 2001

When compared to full-fledged OLE DB providers, Microsoft .NET managed providers have a lot to offer. First off, they deliver a simplified data access architecture that often results in improved performance without the loss of functional capabilities. Furthermore, .NET managed providers directly expose provider-specific behavior to consumers through methods and properties. They also involve a much smaller set of interfaces than OLE DB providers. Last but not least, .NET managed providers work within the boundaries of the Common Language Runtime (CLR) and require no COM interaction. For SQL Server 7.0 and SQL Server 2000, the managed provider hooks up directly to the wire level, gaining a substantial performance advantage.

The functionalities that a .NET data provider supplies may fall into a couple of categories:

Support of the DataSet class through an implementation of the methods of the IDataAdapter interface

Support of connected data access, which includes classes representing connections, commands, and parameters

The simplest flavor of a data provider interacts with callers only through the DataSet, both in reading and writing. In the other case, you can control connections, transactions, and execute direct commands, regardless of the SQL language. The figure below shows the class hierarchy of the two standard managed providers in .NET—OLE DB providers and for SQL Server.

The objects that wrap connections, commands, and readers are provider-specific and may result in a slightly different set of properties and methods. Any internal implementation is rigorously database-aware. The only class that is out of this schema is the DataSet. The class is common to all providers and works as a generic container for disconnected data. The DataSet class belongs to a kind of super-namespace called System.Data. Classes specific of a data provider belong to the specific namespace. For example, System.Data.SqlClient and System.Data.OleDb belong to a specific namespace. The schema in the figure above is rather basic, though not simplistic. It is simplified because it does not include all the classes and the interfaces involved. The figure below is a bit more accurate.

Figure 2. Classes involved with a managed provider

The table below shows the list of the interfaces that make a .NET provider.

Interface

Description

IDbConnection

Represents a unique session with a data source

IDbTransaction

Represents a local, non distributed, transaction

IDbCommand

Represents a command that executes when connected to a data source

IDataParameter

Allows implementation of a parameter to a command

IDataReader

Reads a forward-only, read-only stream of data created after the execution of a command

IDataAdapter

Populates a DataSet and resolves changes in the DataSet back to the data source

Of all of interfaces, only IDataAdapter is mandatory and must be present in every managed provider. If you don't plan to implement one of the interfaces, or one method of a given interface, expose the interface anyway, but throw a NotSupportedException exception. Wherever possible, avoid providing no-op implementations of methods and interfaces as this may result in data corruption, particularly with the commit/rollback of transactions. For example, providers are not required to support nested transactions even though the IDbTransaction interface is designed to allow also for this situation.

Before going any further with the explanation of the role that each class plays in the overall workings of a .NET provider, let me say a few words about the naming convention that a managed provider is recommended to utilize. This is useful if you happen to write your own providers. The first guideline regards the namespace. Make sure you assign your own managed provider a unique namespace. Next, prefix classes with a nickname that identifies the provider throughout any internal and client code. For example, use class names like OdbcConnection, OdbcCommand, OdbcDataReader, and so on. In this case, the nickname is Odbc. In addition, try to use distinct files to compile distinct functionalities.

Implementing a Connection

The provider connection class inherits from IDbConnection and must expose the ConnectionString, State, Database, and ConnectionTimeout properties. The mandatory methods are Open, Close, BeginTransaction, ChangeDatabase, and CreateCommand. You are not strictly required to implement transactions. The following code snippet gives you an idea of the code used to implement a connection.

You should provide at least two constructors, one being the default that takes no argument. The other recommended constructor would accept only the connection string. When returning the connection string through the ConnectionString property, make sure you always return exactly what the user set. The only exception might be constituted by any security-sensitive information that you might want to remove.

The items you recognize and support in the connection string are up to you, but standard names should be used whenever it makes sense. The Open method is responsible for opening the physical channel of communication with the data source. This should happen not before the Open method is called. Consider using some sort of connection pooling if opening a connection turns out to be an expensive operation. Finally, if the provider is expected to provide automatic enlistment in distributed transactions, the enlistment should occur during Open.

An important point that makes ADO.NET connections different from, say, ADO connections, is that you are requested to guarantee that a connection is created and opened before any command can be executed. Clients have to explicitly open and close connections, and no method will open and close connections implicitly for the client. This approach leads to a sort of centralization of security checks. In this way, checks are performed only when the connection is obtained, but the benefits apply to all other classes in the provider that happen to work with connection objects.

You close the connection with the method Close. In general, Close should simply detach the connection and return the object to the pool, if there is a pool. You could also implement a Dispose method to customize the destruction of the object. The state of a connection is identified through the ConnectionState enum data type. While the client works over the connection, you should ensure that the internal state of the connection matches the contents of the State property. So, for instance, when you are fetching data, set the connection's State property to ConnectionState.Fetching.

The ODBC Connection

Let's see how a concrete .NET managed provider turns these principles into practice. For this example, I'll take into account the newest managed provider that appeared, though only in early beta releases. I'm talking about the .NET provider for ODBC data sources. You probably noticed already that the .NET provider for OLE DB does not support the DSN token in the connection string. Such a name is required to automatically select the MSDASQL provider and go through ODBC sources. The following code is how ODBC.NET declares its connection class:

The OdbcConnection object utilizes ODBC-typical resources, such as environment and connection handles. These objects are stored internally using class private members. The class provides for both Close and Dispose. In general, you can close a connection with either method, but do it before the connection object goes out of scope. Otherwise, the freeing of internal memory (that is, ODBC handles) is left to the garbage collector, whose timing you cannot control. For connection pooling, the OdbcConnection class relies on the services of the ODBC Driver Manager.

To play with the ODBC.NET provider (currently in beta 1), you should include System.Data.Odbc. At this time, the provider is guaranteed to work with drivers for JET, SQL Server, and Oracle.

Implementing a Command

The command object formulates a request for some actions and passes it on to the data source. If results are returned, the command object is responsible for packaging and returning results as a tailored DataReader object, a scalar value, and/or through output parameters. According to the special features of your data provider, you can arrange results to appear in other formats. For example, the managed provider for SQL Server lets you obtain results in XML format if the command text includes the FOR XML clause.

The class must support at least the CommandText property and at least the text command type. Parsing and executing the command is up to the provider. This is the key aspect that makes it possible for a provider to accept any text or information as a command. Supporting command behaviors is not mandatory and, if needed, you can support more, completely custom behaviors.

Within a command, the connection can be associated with a transaction. If you reset the connection—and users should be able to change the connection at any time—then first null out the corresponding transaction object. If you support transactions, then when setting the Transaction property of the command object, consider additional steps to ensure that the transaction you're using is already associated with the connection the command is using.

A command object works in conjunction with two classes representing parameters. They are xxxParameterCollection, which is accessed through the Parameters property, and xxxParameter, which represents a single command parameter stored in the collection. Of course, the xxx stands for the provider-specific nickname. For ODBC.NET, they are OdbcParameterCollection and OdbcParameter.

You create provider-specific command parameters using the new operator on the parameter class or through the CreateParameter method of the command object. Newly created parameters are populated and added to the command's collection through the methods of the Parameters collection. The module that provides for command execution is then responsible for collecting data sets through parameters. Using named parameters (as the SQL Server provider does) or the ? placeholder (similar to the OLE DB provider) is up to you.

You must have a valid and open connection to execute commands. Execute the commands using any of the standard types of commands, which are ExecuteNonQuery, ExecuteReader, and ExecuteScalar. Also, consider providing an implementation for the Cancel and Prepare methods.

The ODBC Command

The OdbcCommand class does not support passing named parameters with SQL commands and stored procedures. You must resort to the ? placeholder instead. At least in this early version, it does not support Cancel and Prepare either. As you can expect, the ODBC .NET provider requires that the number of command parameters in the Parameters collection matches the number of placeholders found within the command text. Otherwise, an exception is thrown. The line below shows how to add a new parameter to an ODBC command and assign it at the same time.

Notice that the provider defines its own set of types. The enumeration OdbcType includes all and only the types that the low-level API of ODBC can safely recognize. There is a close match between the original ODBC types, such as SQL_BINARY, SQL_BIGINT, or SQL_CHAR, and the .NET types. In particular, the ODBC type SQL_CHAR maps to the .NET String type.

Implementing a DataReader

A data reader is a kind of connected, cache-less buffer that the provider creates to let clients read data in a forward-only manner. The actual implementation of the reader is up to the provider's writer. However, a few guidelines should be taken into careful account.

First off, when returned to the user, the DataReader object should always be open and positioned prior to the first record. In addition, users should not be able to directly create a DataReader object. Only the command object must create and return a reader. For this reason, you should mark the constructors as internal. You should use the keyword internal in C#

The DataReader must have at least two constructors—one taking the result set of the query, and one taking the connection object used to carry the command out. The connection is necessary only if the command must execute with the CommandBehavior.CloseConnection style. In this case, the connection must be automatically closed when the DataReader object is closed. Internally, the resultset can take any form that serves your needs. For example, you can implement it as an array or a dictionary.

A DataReader should properly manage the property RecordsAffected. It is only applicable to batch statements that include inserts, updates, or deletes. It normally does not apply to query commands. When the reader is closed, you might want to disallow certain operations and change the reader's internal state, cleaning up internal resources like the array used to store data.

The DataReader's Read method always moves forward to a new valid row, if any. More importantly, it should only place the internal data pointer forward, but makes no reading. The actual reading takes place with other reader-specific methods, such as GetString and GetValues. Finally, NextResult moves to the next result set. Basically, it copies a new internal structure into a common repository from which methods like GetValues read.

The ODBC DataReader

As all the reader classes, OdbcDataReader is sealed and not inheritable. Methods of the class that have access to column values automatically coerce the type of data they return to the type of data that was initially retrieved from that column. The type used the first time to read one cell from a given column is used for all the other cells of the same column. In other words, you cannot read data from the same column as string and long in successive times.

When the CommandType property of a command object is set to StoredProcedure, the CommandText property must be set using the standard ODBC escape sequence for procedures. Unlike other providers, the simple name of the procedure is not enough for the ODBC.NET provider. The following pattern represents the typical way of calling stored procedures through ODBC drivers.

The string must be wrapped by {...} and have the keyword call to precede the actual name and the list of parameters.

Implementing a DataAdapter

A full-fledged .NET data provider supplies a data adapter class that inherits both IDbDataAdapter and DbDataAdapter. The class DbDataAdapter implements a data adapter designed for use with a relational database. In other cases, though, what you need is a class that implements the IDataAdapter interface and copies some disconnected data to an in-memory programmable buffer like the DataSet. Implementing the Fill method of the IDataAdapter interface, in fact, is in most cases sufficient to return disconnected data through a DataSet object.

Classes that inherit from DbDataAdapter must implement all the members, and define additional members in case of provider-specific functionality. This ends up requiring the implementation of the following methods:

Table mappings govern the way in which source tables (that is, database tables) are mapped to DataTable objects in the parent DataSet. Mappings take into account table names as well as columns names and properties. Schema mapping, instead, regards the way in which columns and tables are treated when it comes to adding new data to existing DataSets. The default value for the missing mapping property tells the adapter to create in-memory tables that looks like source tables. The default value for the missing schema property handles possible issues that arise when the DataTable objects are actually populated. If any of the mapped elements (tables and columns) are missing in the target DataSet, then the value of MissingSchemaAction suggests what to do. In a certain way, both MissingXXX properties are a kind of exception handler. The value Add forces the adapter to add any table or column that proves to be missing. No key information is added unless another (AddWithKey) value is assigned to the property.

When an application calls the Update method, the class examines the RowState property for each row in the DataSet and executes the required INSERT, UPDATE, or DELETE statement. If the class does not provide UpdateCommand, InsertCommand, or DeleteCommand properties, but implements IDbDataAdapter, then you can try to generate commands on the fly or raise an exception. You could also provide a made-to-measure command builder class to help with the command generation.

The ODBC provider supplies the OdbcCommandBuilder class as a means of automatically generating single-table commands. The OLE DB and SQL Server providers have provided similar classes. If you need to update cross-referenced tables, then you might want to use stored procedures or ad-hoc SQL batches. In this case, just override the InsertCommand, UpdateCommand, and DeleteCommand properties to make them run the command object you indicate.

Summary

The functionality that a .NET data provider offers can be divided into two main categories:

Support for the disconnected DataSet object

Support for connected data access, including connections, transactions, commands, and parameters

Data providers in .NET support DataSet objects through an implementation of the IDataAdapter interface. They may also support parameterized queries by implementing the IDataParameter interface. If you can't afford disconnected data, then use .NET data readers through the IDataReader interface.

Dialog Box: Naming Multiple Resultsets

Visual Studio® .NET has a very nice feature that lets you assign a consistent name to all the tables a data adapter is going to generate. After you've configured a data adapter object in any .NET application, the dialog shows the standard names of the tables being created: Table, Table1, Table2, and so forth. For each of them, you can then specify in a single shot a more evocative name. Is there a way to get this programmatically?

Visual Studio .NET is an excellent product, but there's only a little bit of magic in what it does. To answer your question, yes, there is a way to obtain that programmatically and, incidentally, it's the same code that Visual Studio utilizes behind the scenes.

The DataAdapter object has a collection called TableMappings whose elements are objects of type DataTableMapping. What's a table mapping anyway? It is a dynamic association set between a source table and the corresponding DataTable object that the adapter is going to create. If no mapping has been set, then the adapter creates a DataTable object with the same structure as the source table, except for the name. The name is the string specified through the call to the Fill method, or the word Table. Extra tables that originate from multiple resultsets are named after the first. So, in the default case, they are called Table1, Table2, and the like. Instead, if the data adapter is filled out like the code below, then the extra tables are named Employees1, Employees2, and so forth.

What Visual Studio does when you configure your data adapter is create one DataTableMapping object for each association you visually create. The following lines of code are the programmatic way to assign meaningful names to the first two tables of a DataSet filled as above.

Dino Esposito works for Wintellect, where he takes on ADO.NET and ASP.NET training and consulting. He is the co-founder of VB-2-The-Max, and also contributes the Cutting Edge column to MSDN Magazine. You can reach Dino at dinoe@wintellect.com.