Import CSV file and query it with LINQ

Assume that you have an plain text, old Comma Separated Values file filled with your precious export from a legacy system. How can you process it easily now? The first answer that comes to mind is to parse it and load it into a datatable and later process it by using DataTable.Select() method. But this approach has some limitations – like splitting data into several tables and then join them.

One would imagine that parsing CSV files is a straightforward and boring task, given that it is quite a while since CSV is around. Some of them are correct – in the sense that many implementations merely use some splitting method like String.Split(). Some don’t even offer the specification of the values splitting character – so your file wouldn’t be parsed correctly if instead of , you have ; as separator – yet another thing to modify if you’re lucky enough to have the sources. Others will not handle properly field values with commas because the simple split method of the String class. But there are better implementations that take care about escaped quotes, trimming spaces before and after fields and other small and useful details, but very few that I found did it all as I liked it – and at least as importantly, in a fast and efficient manner.

After trying several methods to parse the csv file, I endup using Matt Perdeck‘s LINQ to CSV library presented in its article on The Code Project website.

Among the feature that I used and liked very much I would mention (from the article):

In addition to comma, most delimiting characters can be used, including tab for tab delimited fields.

Can be used with an IEnumerable of an anonymous class – which is often returned by a LINQ query.

Supports deferred reading.

Supports processing files with international date and number formats.

Supports different character encodings if you need them.

Recognizes a wide variety of date and number formats when reading files.

Provides fine control of date and number formats when writing files.

Robust error handling, allowing you to quickly find and fix problems in large input files.

Using the library is straight-forward: after downloading the zip file from the The Code Project (this required that you have an account there) you can start using it by including it as a referenced library in your project.

Next step is to define the layout of your csv file, in a manner that CSVtoLINQ to be able to map the data from file (which is pure text) to correct data types :