Introduction

Over time, I've frequently needed to read and write CSV (Comma Separated Value) files. While most follow a fairly standard format, there are many variations on this theme.
Some common variations include Tab Separated Value files, files with header rows, variations in quoting rules, and files that support comment characters / lines.

The classes described in this article simplify reading and writing of many different format variations. CSV in the code and this article refers
to Character Separated Value (of which Comma Separated Value is a sub-category).

While the demonstration project allows the display and manipulation of a few common CSV variants, it does not demonstrate all of the capabilities of the included classes.

While XML files frequently replace CSV files in common usage today, CSV remains a compact and simple alternative to this file format. A number of different applications,
including Excel, are capable of reading and writing these files.

Background

While learning the code, the user should focus on the CsvFormBase, CsvFormReader, and CsvFormWriter classes, which provide some simple
use cases of the code. The user interface code is long and not of direct relevance to this article. However, it does demonstrate some related themes such
as the DataGridView control and the BackgroundWorker component.

Also, the mainBackgroundWorker_DoWork, mainBackgroundWorker_ProgressChanged, and mainBackgroundWorker_RunWorkerCompleted methods
in the MainForm class may be of particular interest to those developers who need to report progress during the time a CSV file is read or written.

Using the Code

The main classes in this library are CsvReader and CsvWriter. Because of the flexibility inherent in the code, they support a large number of methods.
The code itself, which is fully commented, remains the best source of information for most of these methods. This article will only describe some of the more common usage cases.

Reading with CsvReader

At its simplest, a user instantiates a CsvReader and reads an entire DataTable at once as follows. This is, essentially,
the approach used in the CsvFormReader class for the demo project.

// Using a CSV reader for the file MyFile.csv...using (CsvReader reader = new CsvReader("MyFile.csv"))
{
// Indicate the file has a header rowbool hasHeader = true;
// Using a data table read by the reader...using (DataTable table = reader.ReadTable(hasHeader))
{
// Loop through all of the rows...foreach (DataRow row in table.Rows)
{
// Loop through all of the columns...foreach (DataColumn column in table.Columns)
{
// Get the column value for this rowstring columnValue = (string)row[column];
} // Loop through all of the columns...// Get the value for the column named "First"string firstValue = (string)row["First"];
} // Loop through all of the rows...
} // Using a data table read by the reader...
} // Using a CSV reader for the file MyFile.csv...

Alternatively, a user can bypass DataTable altogether and read the file more directly. The following code demonstrates some of the methods for this approach:

// Using a CSV reader for the file MyFile.csv...using (CsvReader reader = new CsvReader("MyFile.csv"))
{
// Read an entire row into a list
List<string> columns = reader.ReadRow();
// While more rows remain...while (reader.ReadRowStart())
{
// While more columns remain...while (reader.HasColumn)
{
// Get the next column valuestring columnValue = reader.ReadString();
} // While more columns remain...// Read the end of the row
reader.ReadRowEnd();
} // While more rows remain...
} // Using a CSV reader for the file MyFile.csv...

One problem Windows developers often deal with is that of long running operations. A common approach to this problem is to perform these long running operations
using the BackgroundWorker component. When using this component, it is usual to report progress. The CsvReader class exposes
a RowEnd event for this purpose. The event is raised each time the end of a row is read.

Writing with CsvWriter

The CsvWriter class generally provides methods which are, in effect, complements of their CsvReader equivalents. So, for example,
the WriteTable method is provided to write an entire DataTable. The WriteRow, WriteRowStart,
Write (a column), and WriteRowEnd methods are provided for more direct access to the file.

The CsvFormWriter class has examples of both of these usages.

The Write method provides many overloads for different data types. If none apply, the object version of the Write method is used.
This method uses the ToString method to convert to a string before writing.

File Format Variants

Both the CsvReader and CsvWriter classes provide support for many variants of the CSV file format. Principally, this is controlled
via the Options property which is of type CsvOption. The CsvOption enumeration specifies the Flags attribute,
so multiple options can be combined, as in the following example:

reader.Options = CsvOption.Comment | CsvOption.CommentLine;

Also, options are provided for special characters. For example, the user may specify the character that separates columns. To read a file containing tab separated values,
for example, the user might do the following:

reader.EndColumnChar = '\t';

Generally, the properties for these options and characters should be set immediately after instantiation and before the CsvReader or CsvWriter is used.

Points of Interest

Because of the unusually wide variety of options / methods, and the time demands of my job, I have not tested every possible combination. I have been a bit selfish
and tested the portions I used in the demo project and those that were required for my own needs. Hopefully, I haven't missed too much in my testing, but if I missed
something that affects you, the reader, I apologize in advance.

Actually there already is such a thing, the OleDbDataReader which when connected to a connection with the right connection string even allows you to execute SQL statements against a CSV file... I don't think there is a writer though that goes with that.

My focus was on trying to cover as many variants of CSV as I could. Also, to provide an interruptible interface for progress reporting / cancellation. Mostly, to meet my own requirements. My only real nods to ADO, which I don't require, were simple mechanisms for reading DataRow and DataTable.

However, in retrospect, with a different focus, I see the value in what you suggest. Perhaps, if I can find time later, I'll implement a data provider. A great many of the functions for IDataReader are already in the code, albeit under different names.

When I implement it, I'll probably do it as a separate class, CsvDataReader, that decorates CsvReader. The underlying IDataRecord interface makes assumptions, like the FieldCount property, which require the entire row to be read ahead of time. CsvReader does not currently have this limitation.

I think, if I can find a few minutes someplace, CsvDataReader should be trivial to implement. Finding those minutes is the tricky part

good idea! Don't do it on my account, but I will probably give it a spin in any case, since I have lots of occasion to grok CSV data. The real reason i usually stick with the OleDB version is because I can use the DbProvider factory and just forklift the connection strings to hit a completely new datasource.

I have always balked at the 'first row must be read before reading any rows' problem with datareaders over CSV... it's like 'almost' a stream, but you have to have read a line before you can have read anything, but didn't really read anything... sheesh. I quite see your point and I have noticed the same thing before.