Introduction

The main focus of this article is to load the contents of a DBF file into a DataTable. Sure, you can search the Internet and find a thousand examples of how to load a DBF in .NET. However, try to find one that does not use the MS Jet driver or perhaps the MS FoxPro driver. I tried myself and couldn't find one, so I decided to write a class to do it.

Of course, you may be asking yourself, "why would I want to do that?" In my particular case, I tried using the Jet and FoxPro drivers, and found that they stumbled when trying to load a date field from a DBF. It could be that the DBF I was using was not formatted to spec. I really couldn't say, and it doesn't matter since I don't have any control over the format; I simply had to find a way to read it.

What I have provided here is a simple class to read an entire DBF into a DataTable. From there, you can use .NET to manipulate the data. What I have not provided is any method to query the data; this loads the entire file and that's it. You probably don't want to use this if you have a DBF with tens of thousands of records or more. Mine has about 2500 records, and I know it would never be much more than that.

Background

There are two concepts I present in this article. The first is, obviously, the structure of a DBF file and how to load it. The second is a method to load the headers in a DBF directly into a structure.

This class also doesn't verify that the file is in fact a DBF. Where I am using this class, I am certain that the input is a DBF file. If you are not certain, you'll need to find a way to verify the file is a DBF.

I should also mention that I do not profess to be an expert on dBase DBF files. There are also some field types that I have probably left out of my reader. If you feel I am not taking something into account here that I should, or want to send me some info on missing field types, I will update my code.

Using the code

A dBase DBF file is a fairly simple file format that is one of the oldest PC file formats around for storing record based information. Because of its simplicity, it has become somewhat of a generic file format that many applications can read and write.

A normal DBF consists of a main header, followed by column headers, followed by one or more records. To keep the code simple, we are using some tricks that allow us to read the headers directly into .NET structures. By tricks, I simply mean using some less common attributes and methods.

Let's take a look at the structures we'll use to represent the headers in the DBF.

When you define a structure type in C#, the CLR will organize it in memory in whatever way it thinks is most efficient. This is not very conducive to reading in the data straight from disk. To get around this, we add the StructLayout attribute with a parameter of Pack = 1. This tells the CLR that this structure should be aligned in memory exactly as we specify it such that an Int32 only uses 4 bytes, an Int16 only uses 2 bytes, etc.

If you did not specify this, you would end up with things like an Int16 using 4 bytes, and then the data in your file would not line up with the structure in memory.

The other attribute you'll notice is the MarshalAs statement. This is because .NET does not normally support fixed length strings, but this is exactly what we need. This header has 11 bytes reserved for the fieldname. We need to make sure that 11 bytes in our structure is also reserved for the fieldname. The same for the 7 bytes of reserved data.

Once you have the structures defined exactly as they appear in the file, you can read them in using something like this:

First, we read the header into a buffer. .NET doesn't really want us using pointers, but there is some support for something similar that allows us to load the structure directly from a buffer. We have to get a handle to the buffer that is marked as Pinned so the garbage collector won't move it around on us. Once we do that, we can use the PtrToStructure method to copy the data in the buffer directly into our structure. Don't forget to free/unpin the buffer when you're done!

Now, we want to read in all the column descriptors. Each record represents one column in the DBF table, and a 13 marks the end of the headers. We'll use the same method to read them directly into a structure.

// Read in all the field descriptors.
// Per the spec, 13 (0D) marks the end of the field descriptors
ArrayList fields = new ArrayList();
while ((13 != br.PeekChar()))
{
buffer = br.ReadBytes(Marshal.SizeOf(typeof(FieldDescriptor)));
handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
fields.Add((FieldDescriptor) Marshal.PtrToStructure(
handle.AddrOfPinnedObject(), typeof(FieldDescriptor)));
handle.Free();
}

Once we've read all the headers in, then we'll create our DataTable columns. My DBF only has these four types in it. If you have other types, you'll need to add them here.

Finally, we will read in each of the records using the column format information. The main header tells us how many records there are and how long each record is. It appears that the length of a record may not exactly match what you'd expect by summing the column definitions. Therefore, we use the record length field of the header to read the entire record into a buffer, then we read each component out of the buffer into each column of a new DataRow.

The date parsing is the reason I wrote this class in the first place. I'm not sure if other DBFs all work this way, but it seems the one I had to load stores the date in a text format of YYYYMMDD, with a null date represented as 1900 1 1. I default the field to DBNull and check the year to see if I need to parse the date. I wrap the parse in a try block so that if it can't parse, it doesn't break the entire load. I wrote this in .NET 1.1 so anyone could load it, but in .NET 2.0 and later, there is a TryParse member of Int32 that you could use to test the strings.

// Read in all the records
for (int counter = 0; counter <= header.numRecords - 1; counter++)
{
// First we'll read the entire record into a buffer and then read each
// field from the buffer. This helps account for any extra space at the
// end of each record and probably performs better.
buffer = br.ReadBytes(header.recordLen);
recReader = new BinaryReader(new MemoryStream(buffer));
// Loop through each field in a record
row = dt.NewRow();
foreach (FieldDescriptor field in fields)
{
switch (field.fieldType)
{
case'D': // Date (YYYYMMDD)
year = Encoding.ASCII.GetString(recReader.ReadBytes(4));
month = Encoding.ASCII.GetString(recReader.ReadBytes(2));
day = Encoding.ASCII.GetString(recReader.ReadBytes(2));
row[field.fieldName] = System.DBNull.Value;
try
{
if ((Int32.Parse(year) > 1900))
{
row[field.fieldName] = new DateTime(Int32.Parse(year),
Int32.Parse(month), Int32.Parse(day));
}
}
catch
{}
break;
...
}
}
recReader.Close();
dt.Rows.Add(row);
}

Conclusion

I hope that this proves useful for anyone else that may have had a problem with the Jet driver or perhaps just doesn't want to distribute MDAC with their application (MDAC 2.6 is required by .NET to use the OleDB Jet driver, but isn't included with Windows 2000). If you find any mistakes, please let me know and I will update my sample code.

Thanks for this its really helpful. In 2011 I converted it into a class using VB.Net, VS 2010 and .Net 4.0. It has reference to your article in the code so anyone looking at this in the future knows who to thank and I do mean thank.

Our vendors product uses DBF 3 files that we are constantly needing to manipulate into other formats, CSV, SQL, E-mail, etc. I've even added a return of a ADODB.Recordset which was no big deal thanks to .Net 4.0. I've exposed the class to COM interfacing so I can use it from VB for Apps (VBA).

I just got around to cleaning up things and noted I never sent you a thank you. Again thanks.

Hallo Brian,
this is good stuff.
I have to read and store older data-Files from clipper to sql-Server
Even more than 2 Million records work fine.
But (and anytime there is a but) the memo-Fields don't want to work.
Any idea or hints ?
Thanks and regards
Thomas

I would suggest you use some type of hex editor to see what FoxPro is doing to the DBF file when you run that command. I'm not familiar with FoxPro myself so I can't speculate what is happening. By examining the contents of a small dbf with a hex editor and stepping through this code in Visual Studio, you should be able to figure out what is going on.

Hi
DBF is a BINARY file format, so can not be UTF8, who is a TEXT encoding format, I think.
In my opinion, the only thing that could be UTF8 is the "FIELD NAME" field in the Table Field Descriptor, but in the DBF file format specifications this field is ASCII only.

I agree, I don't think there is anything in the DBF header that could indicate a codeset or character set so it is assumed that the text is standard CP1252 ASCII. If you are creating a DBF with UTF-8 in it, that sounds like a non-standard encoding.

If you know all the files you will read are using UTF-8, perhaps you could try using Encoding.UTF8.GetString() instead of Encoding.ASCII.GetString(). I not sure there would be a way to detect this though so you'd have to code it one way or the other and assume all the DBFs you'll ready are the same.

sorry, i missed the records datas themselves !
dont know if it's possible to store the datas in UTF8 encoding format.
when reading the file, you should use Encoding.UTF8.GetString instead of ASCII.
but how to be sure that the datas are UTF8 ?

Hi Brian,
could you let me know which DBF file specifications you used to define the structures ?
if you look here for example : http://www.dbf2002.com/dbf-file-format.html[^], you will see that the seven first fields of DBFHeader structure are same size that your's, but others are not the same.
Same thing with DBFFieldDescriptor structure.

When I search on the web, i can found several differents file format specifications.

This should mean that the first thing to do when opening a DBF file is to look at the first byte, without reading the whole header, because we don't know exactly his size, then check the value of this byte, then marshal the header to the right header structure according to the value of the first byte.

I believe when I wrote this I was only concerned with a standard DBF file structure so I did not pay much attention to the version byte in the file header. What you are saying makes sense though, if you want to support different DBF formats created by other vendors then you would need to check this version field and load according to the spec for that version of DBF.