Introduction

Two classes that illustrate one way to: read a delimited text file, parse the
"fields" of data using regular expressions and move the data it into
either an XML file or a DataSet object for direct use.

.Net framework classes used:

System;//For strings
and things

System.IO;//For
reading and writing streams and files

System.Xml;//For
creating and writing the XML file

System.Text.RegularExpressions;//For
parsing the text file

System.Data;//to
generate a DataSet

Concepts illustrated

Reading and writing files through stream objects

Parsing text using regular expressions

Generating a DataSet in memory from code and using it to fill a DataGrid
control

Generating an XML file from code

Background

The reason I wrote these classes is twofold:

I needed to write an application that would parse a web server log file
(in W3C common log format) and put that data into a SQL server database.

I needed a class that I could re-use in other applications where it was
necessary to move data from a CSV text file into a database.

Using the code

Although very short, the code is commented heavily throughout and contains
referenced hyperlinks to the MSDN articles that explain in more detail the .Net
class being used at each point in the code where relevant.

This code is set to parse a web server log file, however it can easily be
modified to parse any delimited text file and I've indicated in the comments
where to do so. I've also included a commented out line of an alternate regular
expression that can be used to parse comma delimited text files.

A file samplelog.txt is provided with the demo which contains a test web
server log file. I have mangled the IP Addresses for privacy, however the
data is straight out of an Apache server log from our web server.

I've recently started using C# after many years of working in C++ so any
constructive criticism would be welcome.

History

Original version: Feb.26.2003

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

What I wrote was: "In that case you would probably want to do it the original way in the article which is to look for the separator character instead of the text between separators."

If you do it the way it is originally written in the article instead, you have full control over handling missing separators yourself.

Not to sound too harsh here, but I'm not sure why people keep asking about this because there is obviously no magic way to handle badly formed source data.

If the input data is missing some fields, it's just missing some fields, nothing you can do about that but program to handle that situation by counting the number of fields you expect as you parse each line of text and if some fields are missing you need to deal with it in your code. Either abort and warn the user or fill in something yourself.

The original way the parsing was done in the article is very handy for handling the missing fields situation. If your input data is known to be consistent with no missing fields then you can use the much faster second method I mentioned in these threads.

An election is nothing more than the advanced auction of stolen goods.
- Ambrose Bierce

Welcome to C#! Hope you enjoy it. After programming in C++ since early days of Windows 3.0, I switch over all my development to C# about five months ago and really love it for the most part. There are a few things missing (such as generics - AKA Templates, which will be solved on the next release) and does hinder an old C++ programmer at times but, for the most part, it has put the fun back into programming. Just wish I could figure a way to skip sleeping, too many things to learn and play with

Thank you. That's exactly how I feel about it myself. I need to program to get things done in the real world on a daily basis and I've been finding programming in C# excellent for very quickly getting things done without sacrificing much power.

Yeah, it gives you the ease of VB, most of the power and of course lot of the syntax of C++ and an object oriented OS. The most important thing to me though is now that a lot of trivial details it gives me the ability to create more robust programs because I have the extra time to accomplish them. It is like a painter trying to paint a picture when he can only focus on one square inch at a time and cannot see the rest of the image and move on to the next piece until that one piece is done, now being able to view the entire picture as he paints. The overall appearance takes on a entirely new meaning.

I guess what I am getting at is that I now think on a much larger scale since I don’t have to waste all that time on the little trivial parts. A serious n-tier application may take six months but at the end of that six months I have four times the features I would have in another platform and most likely it will be much more scalable and more connected than I would have in MFC/C++.

Plus, FILESIZE! I get shocked at how little the application size turns out to be. Even large applications are only a couple 100K. A meg size application is huge. Easy to keep people upgraded over the web without even messing with compression.

Oh yeah, and now it is SO easy to work in a modular format (DLLs are nothing to link with and you don’t have to worry about DLL signatures throwing things out of whack), I find a split application up more to create are more modular approach instead of one large exe. Easily allows upgrading modules.

Must be late at night, I seem to be running on. Just glad to hear someone else is infected with the C# fever.