Managed Extensions: Using the .NET ODBC Classes to Read Text Data

Welcome to this week's installment of .NET Tips & Techniques! Each week, award-winning Architect and Lead Programmer Tom Archer demonstrates how to perform a practical .NET programming task using either C# or Managed C++ Extensions.

While data for professional-caliber applications is most typically stored in traditional database systems, sometimes the data your application must use is in text format. This includes situations where you are accessing a small amount of test data, as well as scenarios where another system is providing a text file for you to use and you have no control over its format. While most people naturally think of using streams to read and write text files, ODBC has for years provided a driver specifically for this purpose.

Why would you use the driver and incur the overhead of ODBC when you can easily stream the data? Well, for starters, ODBC provides a generic SQL-like interface to the data. Secondly, due to ODBC's generic interface, using it instead of directly accessing a file via a stream allows you to more easily use the same code to access data in other formats.

For example, let's say that the data your code will ultimately work with is stored in a traditional RDBMS (relational database management system), such as SQL Server or Oracle. However, you might want to test your application logic against a small amount of data that you can quickly enter into a text file via any editor (such as Notepad). Using ODBC, you would simply specify different DSNs or ODBC drivers (for a DSN-less connection) based on which file format you're using. That way, you wouldn't have to maintain two completely different code bases for accessing your data (one for streaming text files and one for reading from the RDMBS).

This article illustrates how easy it is to read text data using the .NET ODBC classes.

Reading Data from a DSN-less Text File

The ODBC Desktop Drivers include a driver for reading text called the Microsoft Text Driver. The easiest way to access text data is to simply use the ODBC Admin application (odbcad32.exe) and—specifying the text driver—create a DSN against the desired text file. However, I'll show you the basic steps for using the .NET ODBC classes to access a text file, such that you don't have to perform the extra step of creating a DSN. (The complete code—including basic error handling and clean-up—can be found in the dialog class of this article's demo application.)

Create the Connection—Using the OdbcConnection class, you can pass a connection string that allows you to specify the ODBC driver (the Microsoft Text Driver, in this case) and the path of the files. Note that I said files—plural. When you use the Microsoft Text Driver, you don't specify in the connection string the file the application will be accessing. Instead, you specify the path to the file and, optionally, the valid file extensions for any files in that path that can be opened. The reason for this is that the text driver logically treats the specified directory as a relational database and then the specific files that your application works with as tables within that database. This was a great idea by the folks at Redmond, as it more closely mimics how your code will access data from a true RDBMS. The following example creates a connection to the folder that contains the specified file:

Note that when the Text Driver is used to make a connection to a given path (specified with the DBQ parameter), only files in that specific directory can be accessed—not files in any subdirectories.

Create the Command—Once a connection is made, you then can create the desired command via the OdbcCommand class. This is where you would specify the file name. The following code snippet selects all rows from the specified file name, keeping in mind that the file must exist in the folder specified in the DBQ parameter of the connection string:

Attach a Reader—Now that the command has been created, you can call the OdbcCommand::ExecuteReader method to execute the command and return a OdbcDataReader object that can be used to enumerate the returned data:

Taking Control of the Process with the schema.ini File

Once you've started working with text files via a DSN-less connection, you might run into situations that will have you asking things like "How do I specify how the file is delimited (for example, tab vs. comma)?" or "Where can I specify the character set?" These settings and more can be specified via a very simple file named schema.ini that resides in the same directory as the data file. The schema.ini file is documented on the Microsoft Web site, so I won't attempt to cover every possible parameter that can be specified. However, I will cover the most popular question I see on the Internet: how to specify whether the data includes (as its first row) the column names of the data.

By default, the text driver assumes that the data contains a column heading row. Therefore, if your data does not contain this row and you do not define a schema.ini file, you will find that the first row of data is ignored. For example, if your data looked like the following, the reader code above would display only the second and third records (leaving out your favorite author!):

In order to specify that the data does not include a column row and that you don't wish to name the columns, your schema.ini file would look like the following:

data file

[data.txt]
ColNameHeader=FALSE

In terms of specifying the column names for your data, you have two choices:

You can include—as the first row in the text file—the column names and then specify the ColNameHeader attribute in the schema.ini file. (You can also omit the schema.ini file, as the text driver defaults the ColNameHeader value to TRUE).

The column name can be retrieved from the reader by using the OdbcDataReader::GetName method. This article's demo application uses the last technique and—while being very simple in scope—allows you to tinker with your data file and schema.ini file so that you can easily test the various configuration combinations until you get it right for your particular application.

About the Author

Tom Archer - MSFT

I am a Program Manager and Content Strategist for the Microsoft MSDN Online team managing the Windows Vista and Visual C++ developer centers.
Before being employed at Microsoft, I was awarded MVP status for the Visual C++ product. A 20+ year veteran of programming with various languages - C++, C, Assembler, RPG III/400, PL/I, etc. - I've also written many technical books (Inside C#, Extending MFC Applications with the .NET Framework, Visual C++.NET Bible, etc.) and 100+ online articles.

Top White Papers and Webcasts

U.S. companies are desperately trying to recruit and hire skilled software engineers and developers, but there is simply not enough quality talent to go around. Tiempo Development is a nearshore software development company. Our headquarters are in AZ, but we are a pioneer and leader in outsourcing to Mexico, based on our three software development centers there. We have a proven process and we are experts at providing our customers with powerful solutions. We transform ideas into reality.

When individual departments procure cloud service for their own use, they usually don't consider the hazardous organization-wide implications. Read this paper to learn best practices for setting up an internal, IT-based cloud brokerage function that service the entire organization. Find out how this approach enables you to retain top-down visibility and control of network security and manage the impact of cloud traffic on your WAN.