Introduction

Last year I published the article ADO+.NET. In it I presented a library that in my opinion can "replace" ADO.NET.

In fact such library uses ADO.NET but it doesn't expose it to the users. We can say that this is very similar to what WinForms do compared to the Windows API (the user32 library). Yet Nicolas Dorier told me that ADO+.NET shouldn't try to replace ADO.NET, that it should add to it.

Well, part of me still disagrees because my purpose was to hide those methods and properties that cause confusion but I do understand that there are actually too many projects that use ADO.NET directly, so making something that simply makes it easier to access data the right way without replacing the ADO.NET components is a good thing.

So this time I am presenting a very small library meant to make it easier to actually read values with a data reader and generate a filled record in a very fast and yet configurable manner.

What does this library really do?

This library starts with a very simple purpose: Filling a .NET object by using an IDataReader row easily, without manual code to do the right GetSomething call, without manual conversions for datatypes mismatches and with a great speed.

And on top of this it gives syntactic sugar methods that allows users to easily enumerate a database command receiving instances of a given type or to create a list with those results.

And to make things even better, the entire code has expansion points that allow you to tell how to process a given database column (so you can decide to call some specific method to read that column) or if you don't want to go that far you have other extension points that allow you to tell how to find a field or property by the database column or how to read a database column considering its actual database type. Or, maybe simpler, to tell how a needed conversion should be done.

Using the code

Before talking about performance, explaining how it works internally or explaining how to completely reconfigure this solution, let's understand how to use it for the basic situations.

To start (probably during the application initialization) you need to configure the library with its default values (or maybe with different values, but we will see that later) so, to do that, you must call:

DbReaderGeneratorLibrary.Configure();

Then, at any moment when you have an IDbCommand already filled and being at the point of calling ExecuteReader() you can use one of those methods:

Enumerate<T>() - This is an extension method that will call the ExecuteReader() and will create a fast delegate to do the reads of each line.

As an Enumerable, you are free to use constructs like foreach and even LINQ methods (methods like FirstOrDefault(), First(), Single() and SingleOrDefault() are good methods as they don't require to read all the records). But if you require methods that load the entire contents in memory, it is preferrably to use the ToList() method.

ToList<T>() - Well, this is very similar to the Enumerate() method, but instead of only reading records when requested (enumerated) it will load all the records at once. If you actually want to have all the records in memory this method is preferred over the previous one, but it is not indicated for large batches or if you plan to call methods like First(), Single() etc, as it is useless to read all the records when you only want the first one.

EnumerateColumn<T>(columnIndex) - It is not uncommon to create a query that will read a single column from the database. In this case, why should we enumerate records that have a single property? Why not enumerate the columns directly?

This is what the EnumerateColumn() does. I was unsure if I should call it EnumerateField or EnumerateColumn, but even if the IDataReader has a GetFieldType, I decided it was better to use the "columns" name.

When enumerating a command to read a column we generally read the single column the query is actually selecting, so the default value for the columnIndex is zero, yet we are free to read another column index if we really want to select more than one column and decide to ignore all the others.

ColumnToList<T>(columnIndex) - As happens with the Enumerate() and ToList() methods, sometimes we may prefer to put all data into memory instead of iterating it. So, the ColumnToList() method will read all the lines resulting from a command and will put those results into a single list.

The library organisation

Personally I like to have one type per file. I am even extremist as I usually put a delegate declaration, which is a single line of code, into its own file.

But I am lately "playing" a little with different ways of presenting codes and libraries and one of the things I know is that many users prefer to have a single file they can add in their projects instead of having many files to add or, even worse, having to add a new project to their solutions and/or a DLL reference. So I decided to divide things into only two files:

DbReaderGeneratorDefinition.cs: This file contains all the interfaces, delegates and the "Default" definitions without actually having any implementation. The purpose of this design is to allow you to replace the default implementation if you decide to write your own implementation without having to rewrite code that depends on this library. This is very similar to the design I presented in the article Architecture of a Remoting Framework.

DbReaderGeneratorDefaultImplementation.cs: This file is the actual implementation of the DbReaderGenerator. Well, at least the default one for all the interfaces and extension points. I really believe that in your own projects you will try to replace at least one of the classes presented in this file.

Well... actually there's a third file. The DbReaderExtensions.cs is where the methods that I presented in the Using the code topic are found. I was really unsure where I should put the class with those methods. Considering it is a default implementation, I though about putting it in the default implementation file. But considering it is able to work with any configuration, I though I should put it in the definition file... in the end I put it into its own file, but I still hope there aren't too many files in the solution, and you can always merge the contents into a single file if you really want to.

Performance

I always hear (and read) people saying that performance is not important. Yet in almost every place I work the main issue is performance. And the common points where performance is a problem are:

Communication;

Database access.

And considering we are not dealing with communication in this article (and databases have their own optimized way of communication) there's only one thing to try to improve: The database access.

Actually database accesses using the ADO.NET directly suffer from these things (among others):

If using DataTables/DataSets all the value-type columns (int, booleans, tiny ints, chars etc) are boxed, which occupies much more memory and also always requires a cast when reading the data;

If using data readers, considering that some databases change the actual data-type, it is not an uncommon practice to read data using the GetValue() method (which also does boxing) and then use a method from the Convert class to actually receive the data with the right type;

Before the existence of yield return it was hard to write enumerators that read one record at a time and so people used to read all the data and put it into lists. Now, even with the existence of yield return, it is a common practice to continue to read all data and put it into a list (maybe because people is simply following the old "standard") and this can be very problematic for large batches.

So my purpose was to help users to avoid all those problems. Actually the code generates a single delegate at run-time to read the entire row filling an already existing object, so we can say that reading an entire line only adds a single virtual method call compared to the code done by hand.

But, differently from what happens when users do the code by hand, it is actually capable of analysing the types of the database and using the right GetSomething method. Note that for SQL Server to read a char you actually can't use the GetChar() method, you must read it as a string and then get the first character, but that's not the case for other databases.

So, with the code generated at run-time you can benefit from the versatility of using the right access method for different databases without a performance hit and without having to write a different code for every database. And as an extra benefit you can avoid writing lots of repetitive code even if you don't use different databases, avoiding bugs caused by copy/paste and guaranteeing that the good pattern is always used.

My only performance comparison was with Dapper reading thousands of records, many, many times. When Dapper was taking 1.9 seconds this solution was taking 1.2 seconds. Yet I did it only to be sure that it is fast, as Dapper is more complete as it helps you fill parameters while this solution is more complete in respect to user data-types and configuration (in fact, I plan to present an article on the solution to fill the query parameters as another independent solution that can be combined with this one).

EnumerateSingleInstance<T>

I already presented a list of extensions methods that you can use on with your IDbCommands. But if you really care about performance there's one extra method, called EnumerateSingleInstance.

This method always returns the same instance for all the database rows (and it can even receive such instance as parameter). Its purpose is to be used in foreach blocks considering the record is not going to be used outside such blocks, so it avoids the cost of creating a new instance at every database row, effectively becoming faster and also alleviating the pressure over the garbage collector.

But, as it always return the same instance, you can't use methods that expect different instances to be returned so, for example, it is useless to use LINQ methods like ToArray(), as the returned array will have the right length but will be filled with only one instance (which will contain the values of the last record read).

So, use this method if you want the maximum performance, but use it with caution.

The IDbReaderGenerator

In this library everything starts with the IDbReaderGenerator. When we do an Enumerate() call or a ToList() call what actually happens is that those methods will call an ExecuteReader(), will ask to generate a record filler using the IDbReaderGenerator and, well, they will iterate through all lines, calling the generated delegate and either yield returning the records or adding them to a list.

So we can say that everything starts from the IDbReaderGenerator.

But the default implementation will, in fact, do 2 things:

Will decorate an inner solution with a cache of the results, so new executions of the same query will not lose time generating the delegate again;

Will compile a delegate from an expression generated by the IDbReaderExpressionGenerator.

So, our user start point is not the architecture start point. The architecture start point is the IDbReaderExpressionGenerator.

The IDbReaderExpressionGenerator

This is the real heart of this library. The IDbReaderExpressionGenerator is responsible for generating a single expression that represents the appropriate call. In fact there are two kinds of expressions that can be generated:

Filler: A "filler" expression has the purpose of filling an already existing instance, so it receives such instance as input instead of generating a new result at each call. This has the advantage that allows users to avoid creating new records if they only want to keep a single record at a time in memory;

Column Reader: Expressions of this type are optimized to read a single column, so they return that column value directly instead of filling a record instance.

The Extension Points

Actually the first extension points to the DbReaderGenerator and the DbReaderExpressionGenerator are the interfaces. As they start as interfaces their implementation can be completely replaced or decorated.

As already explained the default DbReaderGenerator is in fact a decorator that cache the results generated by an implementation that redirects to the DbReaderExpressionGenerator. So, let's see the extension points that are available for the DbReaderExpressionGenerator.

The default implementation of the DbReaderExpressionGenerator has 2 constructors. One of them receives a delegate to generate expression to:

Access the database column;

Access the .NET member (field or property);

Make a conversion from the database type to the .NET member type (if needed).

Note that generating the member access expression is the easiest one and it is actually implemented with this code:

And this is probably the code that you may want to replace if you have a different naming rule.

If you see this code, the name of the database column is used to find a .NET member with the same name (so, a field or property with the same name).

But I know many situations where users want to put different names in their properties, be it by using attributes or another application specific rule. So, it is enough to replace this delegate with one that finds the member with a different rule and everything will be done.

Also note that if the query has a column name that's not found an exception is thrown. Actually, if a null expression is returned it will simply ignore such database column but it will still be able to fill the object with the other columns, so those are possible extensions that you may want to try.

The other two parameters (readColumnGenerator and conversionGenerator)

The default implementation of the readColumnGenerator will try to discover if the column type, as seen in the database, has an equivalent GetColumnType method on the reader.

That is, an Int32 will use the GetInt32() method, a String will use the GetString() method etc. If there is, it will use such a method to do the read. It doesn't care if this is not the type of the destination field or property as that's the responsibility of the conversion generator.

And so, if those types don't match, it is the responsibility of the conversion generator to generate the appropriate conversion call. And well, the default implementation uses the Convert class, trying to find an appropriate ToSomeType or, if one is not available, using the ChangeType() method. I plan to present another article showing how you can use a really expandable solution for the data type conversions to support all kinds of conversions (similar to the Expandable IoC Container, but for the conversion expressions).

What if the database column is not expected to set a field or property?

In some situations it is possible that a column in the database is not directly reflected as a field or property set. If this is the case, you can use the alternative constructor for the DbReaderExpressionGenerator. Such alternative constructor still needs the ReadColumnGenerator and a ConversionGenerator(which are used by the GenerateColumnReader method) but instead of trying to build the path (reading the database column and setting the field/property), it calls a delegate to do that. This means that you can actually read a column and call a method, if that's appropriate for your case.

I can say that I can see this happening if the last column is a column like IsReadOnly which is responsible for calling a MakeReadOnly() method.

Cache

Actually the code generated to read the IDataReaders is pretty fast. The problem is that generating such reader takes time (not that much, yet we can say that it may be a problem). So the best thing to do is to cache the generated readers. The problem is: How do we cache them?

It is possible that you always use an object (like Person) when reading a table Person, which is always using the same list of columns and in the same order, independently on the ORDER BY used or the WHERE used. If this is the case, a generator for the destination type could be reused in different select clauses, as different ORDER BY or WHERE clauses don't affect the returned columns.

But if you use the same target object with different tables or with different column orders on the SELECT clause you can't reuse the IDataReader reader, so a new one must be used. In fact, if you use different databases the same select may actually return columns of different types, so the generated reader can't be used in those cases either.

So, to try to solve those problems the default cache will only reuse a cached generator for an identical SQL clause, considering it comes from a connection that has exactly the same connection string and, of course, for the exactly same destination type. The class that does such cache actually allows you to say that you don't want to consider the SQL clause or the connection string, effectively reusing the cached generators more frequently, but that's the user responsibility to ask for that and to guarantee that he will not use the same object to read different tables, databases or simply selects with different column orders.

Yet, even if you don't care about reusing the cache so often there's another problem: Items cached will never be collected, so if you are building different SQL clauses (maybe because the where clauses are using string concatenations instead of using parameters, which is another problem on its own) the cache can become too big. It is not hard to make a better cache by using a weak dictionary, but creating a good weak dictionary is not that easy and surely it is not that small, so to avoid giving a big solution I simply wrote a simple cache that's not weak. But remember such trait when using it, so you may prefer to reuse the queries for different SQLs or you may prefer to write your own cache with different rules, as this is something important.

Sample

The sample application is a speed and conversion comparison using this solution and Dapper with fake commands and data readers. I made it use fake commands and data readers because I don't want to force users to create a real database to do the tests, yet I tried to use all the important methods so it is possible to see how the library can be used.

The fake reader actually treats the "int" columns as decimal columns, so it is necessary to do some conversions to make the values work and, with enums, Dapper simply fails while this solution works.

It is important to note that this application will run really fast to process millions of records as it doesn't actually lose time querying a real database, so we can see that both mappers are extremely fast. In real situations most of the time is spent doing the real query and receiving data through TCP/IP, yet I put the speed comparison to prove this is not going to make things slower while it will be useful by using the right get methods and doing the conversions easily, if they are necessary.

The Future

I will not promise anything, but in the future I plan to present an implementation for the ConversionGenerator that will allow users to easily register new conversions without having to provide an entire new implementation and I also plan to present a solution to fill the database parameters (which actually is something that Dapper has but this solution doesn't).

My purpose is to let each one of those solutions to live isolated so you can use anyone of them without using the others while still providing syntactic sugar methods more similar to how Dapper works, so you can easily execute a query, giving typed parameters and receiving typed results with ease, which of course will use this fast solution for reading and, if needed, will also use the fast solutions to fill parameters and to do configurable datatype conversions.

Share

About the Author

I started to program computers when I was 11 years old, as a hobbyist, programming in AMOS Basic and Blitz Basic for Amiga.
At 12 I had my first try with assembler, but it was too difficult at the time. Then, in the same year, I learned C and, after learning C, I was finally able to learn assembler (for Motorola 680x0).
Not sure, but probably between 12 and 13, I started to learn C++. I always programmed "in an object oriented way", but using function pointers instead of virtual methods.

At 15 I started to learn Pascal at school and to use Delphi. At 16 I started my first internship (using Delphi). At 18 I started to work professionally using C++ and since then I've developed my programming skills as a professional developer in C++ and C#, generally creating libraries that help other developers do their work easier, faster and with less errors.

Comments and Discussions

Paulo, I appreciate the code since I have also spent time trying to abstract DbReader mappings into a minimal api. So this is not a criticism, but I wanted to point out that its goal has a lot of similarity with the Enterprise Data Access Block (now on CodePlex. The DAAB allows you to write custom mappers, or use the default implementation based on Reflection. It also already addresses parameterized queries (which you are proposing in the future).

Even if it appears to do the same, the Data Access Block is a giant beast... that doesn't have all the potential.
In special, how do you deal with user data-types? I mean, create a type like DriverLicense... and now try to read it.

So, if I have a type like this:
public class Person
{
public long Id { get; set; }
public string Name { get; set; }
public DriverLicense DriverLicense { get; set; }
}

Can you use DAB to fill this object?
If not, then what is your solution?

Actually my solution can deal with such a type. And this is the kind of limitation that many ORMs have and that actually tells me: Don't use giant solutions that aren't expandable.

Note: Changing the DriverLicense type to implement an interface is not an option. It is the DbReader that should adapt to the data type, not the contraty.

Glad to know you evaluated the DAB...was just pointing out the similarities.

I agree, the original DAB is overkill, plus it had dependencies on other blocks. What I have done is separated out the data access code from all the rest and use that directly - no Exception, Dependency, or Caching blocks. And that's the direction it is going now that it has been open sourced.

As far as the nested class goes, I think it's a very cool feature that your library would handle that, so as I said, this was no criticism. However, I do handle that scenario by extending the Reader/Mapping interfaces in the DAAB to handle an XmlReader, and use xml results to transmit nested collections or nested types across the wire.

I've even extended the mapper interfaces to handle multiple resultsets which is also an alternative for handling nested collections. But in my experience, xml is quite efficient, plus if done to spec, the mapper can simply call an XmlSerializer and poof - the entity is rendered.

Again, just sharing...not knocking your library...It caught my attention because you have given some good thought to performance that comes from practical experience when wrapping ADO.Net...and not many developers go that route anymore...so we are on the same page!.

This is another confusing work. I scan your code and the article and found there is huge mistake of saying the word REPLACE.

Quote:

Last year I published the article ADO+.NET. In it I presented a library that in my opinion can replace ADO.NET.

You are not replacing ADO.Net.. you are simply facilitating a way things to do around it(ADO.Net) like Reading data, Saving data, may be mapping objects ...
Why I say this, because you are still referencing Core ADO.Net libraries.

using System.Data;
using System.Data.Common;

And

Quote:

System.Data
This namespace represents the ADO.NET architecture, which is a set of computer software components that can be used by programmers to access data and data services.[22]

I will not change anything. First, note that you are commenting on the wrong article. The actual one is not intended to replace ADO.NET. That was the ADO+.NET article. This one is only adding to it, so users will continue to use ADO.NET to open the connection, create the command etc.

The ADO+.NET was meant to replace ADO.NET in the sense that your application should use either one or the other. The fact that internally it uses ADO.NET doesn't mean users should see it. They only need to see the ADO+.NET, so it is replacing. That's why I even commented on WinForms and user32.dll. WinForms uses the user32 API, yet you don't need to know the user32 API to use it, so WinForms "replaces" user32 to the users.

Don't be ridiculous! Aside from being off-topic (this article is not about ADO+.NET), you're defining "replace" too narrowly. Of course ADO+.NET replaces ADO.NET, by the meaning clearly intended by the author: that one need not write any ADO.NET code, nor directly reference any portion of ADO.NET, in order to use ADO+.NET. The API of the one replaces the API of the other.

Technically speaking, if ADO+.NET is rewritten to access all the databases using their native drivers, avoiding any call to ADO.NET, users will not even need to know about that, because it hides ADO.NET.

If users actually started their projects using ADO+.NET (or if they needed to replace the creation of ADO.NET connections by the creation of ADO+.NET connections to be able to use ADO+.NET) they don't depend on ADO.NET anymore.

So, is it a wrapper? Yes. It is. But differently from this article (in which users start with an ADO.NET connection) in ADO+.NET users don't need to see the ADO.NET. And this is what I mean with replace.

In fact, before the native Firebird libraries, the ADO.NET libraries for firebird were wrappers over the native firebird libraries. Yet we could have code that:
A) Used the native libraries;
B) Used the Fb* classes directly, even if they redirected to the native libraries;
C) Used the Fb* classes through the IDb* interfaces.

A program written using A can't benefit from the classes at B. So a code that wanted to use the B classes will need to be rewritten, effectively replacing the native calls by the Fb classes.

So, does the Fb wrappers replace the native libraries?
Yes, even if they used it.

And the proof is that actually we have Fb libraries that have the same signatures but don't depend on the native (unmanaged) libraries anymore. So, even from the first moment the API was replaced, even if it depended on that native libraries and later they completely replaced those libraries.

If you can't understand that, you have a problem to understand ideas. It has nothing to do with the article.

I even put the "replace" instead of replace (OK, I forget it in the first post of this article so I changed it to be "replace").

"Explain the same thing in your article title or somewhere else."
I explained what it really does in the second paragraph.

But I really don't believe you see my effort. In your first message you even gave me a link so I can study ADO.NET as if I didn't know anything about ADO.NET. To me that has nothing to do with the word "replace", I really see that as you saying: "You don't know anything about ADO.NET. Study it before writing an article."

Also, on your second message you wrote: "BTW why I comment for this nonsense article".
So, do you see my effort on this "nonsense article"?

Still I'm not against your article or your work. I didn't even voted. Still replace words is make me sick. I'll repeat once again, to me either you put it in quote or not, "replace" ADO.Net means you don't need ADO.Net to work with data, which is not true according to your article. And I'm suggesting to rephrase with another word or something else. Even that is what the title of the comment says.

It replaces because you don't need to use it directly.
Imagine that I didn't give the source code, only the compiled code. Imagine that I also gave the code with the drivers for Oracle, SqlServer, SQLite etc.

You, as an user, can load ADO+.NET and access SQL Server. My driver is using ADO.NET driver, yet you will not see it. But why is it replacing?
It is replacing because you, as a user, must replace all you new SqlConnection by a new DatabaseConnection/DatabaseConnection.Create().

You will need to replace all you code that uses a SqlCommand/IDbCommand by a DatabaseCommand.

So, as a user, you use either one or the other. One replaces the other not because it doesn't depend on the other but because users should choose one or the other. But as ADO+.NET is more consistent with datatypes, uses typed readers etc it is intended to replace by being "better".

Note that "replace" is exactly the problem that's solved in this article. The other solution doesn't "add" to your existing Commands, it expects you to "replace" your existing commands.

You can't take an application that already has a an opened connection and a command filled with parameters and the command text and call an ADO+.NET Enumerate(), as to do that you must replace the ADO.NET components by the ADO+.NET components. The fact that they may wrap the other components is not important to the users.

So, you should replace the code that opens an ADO.NET connection by a code that opens an ADO+.NET connection. You should not create an ADO.NET command, you should create an ADO+.NET command... and considering the code may use a base class that already creates the connection you may be unable to do that if you don't plan to rewrite the entire application (or, should I say, if you don't plan to replace your ADO.NET access layer by the ADO+.NET access layer).

So, does ADO+.NET replaces ADO.NET?
Actually, the better answer is: It expects you to replace the ADO.NET objects by its own objects. So it is a replacement made on top of it, but the "made on top of it" is something that doesn't need to be told.

In fact, if we use only the interfaces we can really implement an ADO+.NET driver to access another database directly without passing through ADO.NET. I didn't do it, but it is possible. This is something the drivers may do.

If you don't accept this as replacing, then I should say that ADO.NET is only a "wrapper" over normal OleDb, because there's a driver that uses OldDb to do the connections. Surely it is a single driver, but this is the kind of comparison you are making because in ADO+.NET I only gave a single driver that uses ADO.NET. It is not constrained to do that, yet I don't think it is worth writing the driver from zero when my purpose was to correct other problems.

So, sorry, but I can't accept replace as the wrong word. If you don't understand it, it is your problem with a single word in the entire document. You should try to read the document, not a word.

First of all I'm not a user for this work. This is a wrong perception of you. Secondly don't try to explain "replace" in this context. And lastly I said and as you agreed [^] it's wrapper library and you should explain this way in your article. Don't distract reader with wrong phrasing and words. Readers will read first before they look the code. If they do like it.

Still this is my suggestion and I don't think you are a person who admit a mistake and try to change. Keep distracting people the way you started.

But actually I don't know if he has a problem with this article, with me or he simply can't understand how I write. Actually when I publish an article I always expect to receive a comment from him (or even a downvote) and it is usually as off-topic as this. He reads the first paragraph and says that I did something wrong/explained something wrong when there's an explanation of the problem in the next paragraph.

If I say something like "WeakReferences aren't unsafe (well, they may be but we need to ask for it)" he is not capable of understanding and he posts something like "content contradicts each other".

If I present a naive example to then present how to improve it, he says: Your example is naive, correct it.

So I only answer him because I don't want to give the impression that I don't answer comments to others that actually may not know about this situation, but I really consider his comments annoying.