The Problem

After contributing that article about writing into an Excel file I got tons of requests about how to read from one. Well, you asked for it...

The main problem is that you can´t read an Excel file without previously having some formatting done. Microsoft refers to this in one of their KB papers. If somewhere out there finds there´s a way to do the reading whithout the formatting, please let me know...

Another problem is the DSN you need to have installed in your ODBC Admin. This is not very useful because you don´t always know the name of the Excel file from the start.

The last problem I´m dealing with here is generally doing ODBC reading using CRecordset without deriving from it. That is because if I always have to create a class for every single table I want to use, I´ll end up with lots of rather unnecessary code enlarging my app´s exe.

The Solution

According to Microsoft, an Excel sheet of version 4.x and later can only be read by ODBC if a database range is defined. Unfortunately they don´t state how to do this exactly. One way to let ODBC know what data is in there is to name a range of data on a worksheet using "Insert/Names" from Excel´s menu. There can be more than one "table" on a worksheet. This means that a sheet isn´t necessarily the same as a table in a "real" database. If you open "ReadExcel.xls" from the attached demo project and look up the names, you´ll see what I mean...

Omiting the DSN tag in the connect string of CDatabase:Open() gives the opportunity to refer the ODBC-Driver directly using its name so we don´t have to have a DSN registered. This, of course, implies that the name of the ODBC-Driver is exactly known. If it isn´t, a call to SQLGetInstalledDrivers() will show all the installed drivers. For an example see CReadExcelDlg::GetExcelDriver() below.

To use CRecordset the plain way you have to use a readonly, foreward only recordset. The data to get is defined by the SQL statement you put into CRecordset::Open(). Reading out the result is done by CRecordset::GetFieldValue(). For an example see the code below.

Drawbacks

Using a pseudo DSN only works with ODBC Admin V3.51 and higher. Earlier versions will not be able to use a DSN that actually isn´t installed. The result of an attempt to do so is some mumbling about missing registry keys.

If using an underived CRecordset it needs to be readonly, foreward only. So any attempts to change the data or to move back will fail horribly. If you need to do something like that you´re bound to use CRecordset the "usual" way. Another drawback is that the tremendous overhead of CRecordset does in fact make it rather slow. A solution to this would be using the class CSQLDirect contributed by Dave Merner at http://www.codeguru.com/mfc_database/direct_sql_with_odbc.shtml.

There´s still work to do

One unsolved mystery in reading those files is how to get the data WITHOUT having a name defined for it. That means how can the structure of the data be retrieved, how many "tables" are in there, and so on. If you have any idea about that I´d be glad to read it under almikula@EUnet.at (please make a CC to alexander.mikula@siemens.at).

The Source Code

// Query an Excel file
void CReadExcelDlg::OnButton1()
{
CDatabase database;
CString sSql;
CString sItem1, sItem2;
CString sDriver;
CString sDsn;
CString sFile = "ReadExcel.xls"; // the file name. Could also be something
// like C:\\Sheets\\WhatDoIKnow.xls
// Clear the contents of the listbox
m_ctrlList.ResetContent();
// Retrieve the name of the Excel driver. This is
// necessary because Microsoft tends to use language
// specific names like "Microsoft Excel Driver (*.xls)" versus
// "Microsoft Excel Treiber (*.xls)"
sDriver = GetExcelDriver();
if (sDriver.IsEmpty())
{
// Blast! We didn´t find that driver!
AfxMessageBox("No Excel ODBC driver found");
return;
}
// Create a pseudo DSN including the name of the Driver and the Excel file
// so we don´t have to have an explicit DSN installed in our ODBC admin
sDsn.Format("ODBC;DRIVER={%s};DSN='';DBQ=%s", sDriver, sFile);
TRY
{
// Open the database using the former created pseudo DSN
database.Open(NULL, false, false, sDsn);
// Allocate the recordset
CRecordset recset(&database);
// Build the SQL string
// Remember to name a section of data in the Excel sheet using
// "Insert->Names" to be able to work with the data like you would
// with a table in a "real" database. There may be more than one table
// contained in a worksheet.
sSql = "SELECT field_1, field_2 ""FROM demo_table ""ORDER BY field_1";
// Execute that query (implicitly by opening the recordset)
recset.Open(CRecordset::forwardOnly, sSql, CRecordset::readOnly);
// Browse the result
while (!recset.IsEOF())
{
// Read the result line
recset.GetFieldValue("field_1", sItem1);
recset.GetFieldValue("field_2", sItem2);
// Insert result into the list
m_ctrlList.AddString(sItem1 + " --> "+sItem2);
// Skip to the next resultline
recset.MoveNext();
}
// Close the database
database.Close();
}
CATCH(CDBException, e)
{
// A database exception occured. Pop out the details...
AfxMessageBox("Database error: " + e->m_strError);
}
END_CATCH;
}
// Get the name of the Excel-ODBC driver
// Contibuted by Christopher W. Backen - Thanx Christoper
CString CReadExcelDlg::GetExcelDriver()
{
char szBuf[2001];
WORD cbBufMax = 2000;
WORD cbBufOut;
char *pszBuf = szBuf;
CString sDriver;
// Get the names of the installed drivers
// ("odbcinst.h" has to be included )
if (!SQLGetInstalledDrivers(szBuf, cbBufMax, &cbBufOut))
return"";
// Search for the driver...
do
{
if (strstr(pszBuf, "Excel") != 0)
{
// Found !
sDriver = CString(pszBuf);
break;
}
pszBuf = strchr(pszBuf, '\0') + 1;
}
while (pszBuf[1] != '\0');
return sDriver;
}

Comments and Discussions

I ran the demo ReadExcel.exe directly on a seperate computer without visual studio 2010, OS win7. It is good. However, when i rebuild the program on another computer with visual studio 2010 and then ran it on a seperate computer again. An error message came out " The program can't start because mfc100.dll is missing from your computer. Try reinstalling the program to fix this problem."

I tried the project and it works fine.But, if a long text (longer than 256 chars) is in the data field in excel, GetFieldValue() function returns truncated(256 bytes) text. Is there anyway to support long data in the database?

I want to read data from excel sheet. I make two text box in my form name txtBEG and txtEND. I want to retrieve all dates between the begining date and end date whatever the user types in that text boxes. The connection is showing well. but I am facing lot of problem in it. I dont know why?

Everytime its showing an error messge "DataType mismatch" and the cursor is going to the data reader. can any one help me please in this regard by correcting the code.

***In my excel sheet i created the date field in date format like this 3/14/01(4th item in Type list box) and the excel is read only.

***I think the values are not converting in datetime value which the excel generates. Help me on this issue and correct the code please.....

This program reads only records that were written using program from article "http://www.codeproject.com/KB/database/excel_odbc_write.aspx". If you make any changes of content it doesn't change written data.

hi,i wrote a project that takes data from database and saves it as an excel document. i tried to run the project on different computer but without giving any error, it stops running. i copied all the dll files and i have the same framework that i compiled the project.is there any way to deploy an office 2000 project?thanks for your concern.

You mention in your article, that you canot read from az excel file without some formatting.I found this atrticle wich shows how to do this without formatting.

The link to this article is: http://www.idude.net/excel/articles/using_excel_file_datasources.asp

The info from the article:<<There are a number of ways you can reference a table (or range) in an Excel workbook:

* Workbook Name Use the Workbook Name followed by a dollar sign (for example, [Sheet1$] or [My Workbook$]). A Workbook table that is referenced in this manner consists of the entire used range of the Workbook. "Select [column_names] from [Sheet1$]" * Cell Range Use the Workbook Name followed by a dollar sign with cell range appended to it! Don't forget to use the colon : between the starting and ending cell positions! "Select [column_names] from [Sheet1$A1:B10]" * Named Range This is a range of cells with a defined name "Select [column_names] from [Defined Name Range]">>

Hi Everyone,I've been a member for sometime now and have been reading a few things on this forum. This is the first time that I'm posting a request. Hope you all gurus will respond.I want to read/write MS excel sheets on Unix/solaris platform using C++.I did go through the article posted by Alexander Mikula on how to use ODBC and read excel sheets using CRecordset.But I'm not sure how to use ODBC connection in Unix/solaris.Please tell me ways to do it unix using C++.

If Microsoft wanted to make their formats available (easily) to everyone no matter what platform they where on, they would have simply made it the easy way... That is, suppling an easy cross platform c/c++-api that you could link into your project.

I think its pretty obvious (with the excel-driver approach) that they don't want you to even consider using anything else than windows. (they also want you to pony up the dough for Microsoft(c) Office[tm])

Sorry for the one year late reply, I'm just very frustated that this sort of crap is allowed.

I need a program which can read more than 2 columns, but when i try adding to: sSq1= "SELECT Field1, Field2 FROM [Sheet1$]", it can only take Field1 and Field2. When I try adding a thrid one in... it just won't work. Can someone please help me with this?

This program reads only data that were written with program from article "http://www.codeproject.com/KB/database/excel_odbc_write.aspx"The table name for reading must correspond the table name in that programm. You can't change neighter name of written table nor any data from editor - only by writing program.

Alexander:I tried to run your demo readExcel on my PC and got an error message" Database error: Specified driver could not be loaded due to system error 182{Microsoft Excel Driver [*.xls}} "I don't know what that means, could you help?

When I read the FAQ's on these articles, the biggest problems seems to be the tables names. How can I know the table (sheetname) is demo_table?To provide a list of all tables/sheet in an excelfile, simply use the class CTable from the Microsoft DBFetch sample. (this sample project is on your visual C++ CD).With this table you can generate a list of all the tables in an excelfile.

I tried using the CTable class, but the class did not return any table names.I did not receive any error messages. The program returned EOF(), when I tried to iterate through the tables (my spreadsheet has data).

Curiously, I tried "Sheet1" for the table name (which is the name at the bottom of the spreadsheet) and that did not work either.

This program reads only data that were written with program from article "http://www.codeproject.com/KB/database/excel_odbc_write.aspx"The table name for reading must correspond the table name in that programm. You can't change neighter name of written table nor any data from editor - only by writing program.

Hi,I want to change the font style i.e. headers with bold and change some colors in the data tables (columns or rows) depending on the value. I dont know whether this can be done or not. Please guide me as i have not used excel before. I have to use MFC and i have excel 2003. Looking forward to a response. Thanks a lot in advance.

Thanks, Alexander, for your article.I'm using your sample ReadExcel.exe:http://adam.stup.ac.ru/Developer/ArticleDetail.aspx?ar=1053&l=n&mi=97&mic=139 (the same ODBC excel reader) to read ReadExcel.xls. And it works fine. But if I change the very first item "aaa" to number (ex. 1) it dissapears from output! Looks like ODBC RecordSet cell type depends on cell type from previous row (hidden in header row?).Can not figure it out how to "reset" RecordSet buffer before reading next row.Do you have any suggestions to fix this problem? I want read all data as a text only.

Have you solved this problem? I'm think I'm having a similar problem, except I'm doing number and not text. If the first line after the column title is missing data (which sometimes is the case because not all fields are required), then the rest of the column can not be read in.

Hi,as I need to do a project to convert MS excel file from client to HTML, i.e, update a web page by just uploaading a excel file.(it is done for the non-programmer). Can this source help my project and can anyone give me some comment!!thanks.

The example supports writing to and reading from Excel workbooks using the ODBC Excel Driver.

However, there is a major issue with the Excel ODBC Driver and mixed data types.

As excel columns do not have 'types', the odbc driver scans the first 8 rows to determine the data type of a column. Values that are not of the determined type will return NULL. The 8 rows value can be changed by changing the TypeGuessRows value in the registry.

So, if a column contains mixed data types you will lose data that is not of the determined type. The OLEDB driver for Excel has the IMEX=1 workaround for this problem but the ODBC driver does not seem to support this.

Therefore, if you want to use the ODBC driver you must make sure each column has only one data type or be prepared to lose some of your data when you import it with ODBC. The easiest way of forcing a column to contain only one data type is to set it to text: To force numbers to be entered as text right click a column->Click Format Cells->Set category to text. However, this will not change existing entries.

Some more information on this can be found at MS KB Article 257819[^] under 'A Caution about Mixed Data Types' (halfway down).

If you need the ability to retrieve mixed data types you may have to use ADO with the JET OLEDB provider and IMEX=1 extended property.

i had the same problem. but i found a solution:open the excel-sheet (with excel)and look at the top left listbox where thecoordinate of the cell is displayed.open that listbox and you will find the name of a data-filed.selet the data-filed name. now you can see all cells which belongto the data-field.

Got the answer to my question. Have to name a range from Insert->Name->Define

One strange thing is... cells in excel file are number type. Decimal place is 0. Still your app is getting a number with 1 decimal place.And when I changed the cell type to text, still the app is getting a number with 1 decimal place.

I managed to get around the insert name restriction by using the range as the table name instead. For the example spreadsheet ReadExcel.xls, I can use A1:B4 instead of demo_table in the select statement. That way, a user isn't required to set up a spreadsheet with a specific name before trying to access it.

It would be perfect for me if I could use the range in stead of a name but I have been having some problems. I always get a Database error: To few parameters.Expected 2. And I cant figure out what the problem is, do you have any ideas?

Make sure all of your columns have headings on them. The first row is being accessed by the ODBC driver and using that as the Field name. I kept getting that error, before I noticed there was a blank row inserted above my Header Row. Also make sure if there is a field name with a space in it that you refer to it like this [field with spaces]Hope this helps!

I tried using the Cell references like A1:B4 as suggested, but this uses those cells in the first sheet. Any ideas anyone how to refer to cells but also refer to the sheet name and/or number? I wish there was some help for this topic on the Microsoft side, however I haven't seen any fixes as of yet. Thanks!