Introduction

This code demonstrates how to access Excel spreadsheets using C++, COM and ATL using Visual Studio .NET 2003 (v7.1).

Background

I'm an old school C++ programmer and much of my work revolves around using C++. Although I'm not opposed to using new methodologies like C# (and have in fact for other consulting gigs) I always like to come back to C++.

First, the reason why I needed to access Excel in the first place.

My company is developing a product for Windows 2000 and XP and needs to get its strings translated. The Powers-That-Be have decided that the files we'll be delivering to our translator will be Excel files, one per project. All well and good.

But I'm kind of lazy and I don't feel like maintaining by hand a spreadsheet littered with translation strings. I'd much rather write a program that extracts the strings (which reside in .xml files) and stuff them into the Excel spreadsheet. Well, the first part was easy. Just use the handy-dandy IXMLDOMDocument COM interface. Piece of cake.

Then I started exploring how to access Excel programmatically via C++.

Oh... my... God! The lack of documentation on this subject wasn't just skimpy. It was ludicrously non-existent. There were a few scant articles about how to use MFC to do it (don't try it, unless you like generating a billion little modules over the course of a few minutes while the MFC import wizard merrily generates IDispatch code wrappers for Excel's many interfaces). But I'd abandoned MFC long ago to worship at the altar of ATL and WTL, and never looked back.

After hours of search I managed to find one link that talked about where the various interfaces are stored for the various Excel versions. Since I have Office 2003 installed, I went with that one. (It's stored in the actual Excel.EXE binary, by the way).

"Great!" I said to myself. Just add an #import line to stdafx.h and I'm done.

#import "C:\\Program Files\\Microsoft Office\\OFFICE11\\EXCEL.EXE"

Yeah, right. Boy, was I in for disappointment. You get missing, duplicate, triplicate, googleplicate error messages, warnings, curses and general barking behavior from the compiler and/or linker. Time to roll up the sleeves and get ready to wrestle with include files and #import directives.

The first thing I noticed was that a lot of the missing classes had 'mso' prefixes. Hmmmm... Wonder if that means Microsoft Office? Sure enough, it appears that all of the Office products require the same MSO.DLL to perform a lot of their work. So add that in.

Well, that helped a lot. But it was still barking about 'definitions' not having enough parameters. It seems that a lot of the properties, method names, etc. in Microsoft Office collide with standard C/C++ #define declarations. Fortunately, the Visual Studio hierophants over at Microsoft anticipated this and provided a mechanism for resolving these, namely the 'rename' option for the #import directive. It also appears that Office is deeply in bed with Visual Basic, because you need an #import for that as well.

After all was said and done, I finally came up with the magically alchemical formulae for including the COM interfaces of Excel into a C++ program.

I tried getting fancy and including a 'using namespace Excel;' as well, but the compiler started making growling noises while pawing the ground and I decided to quit while I was ahead. I can type in a few 'Excel::' prefixes to keep things social.

Using the code

The application is a simple WTL dialog based applet. The dialog contains a single list-view control in report mode. It's compiled with Unicode so I don't have to worry about messy MBCS strings. All the code that interfaces with Excel is in the OnLoad( ) method that gets invoked when the user presses the "Load..." button.

The first thing this method does is nuke the items in the list control.

m_list.DeleteAllItems( );
while ( m_list.DeleteColumn( 0 ) );

Then it prompts the user to select the Excel file by using the WTL class CFileDialog. WTL, by the way, if you aren't altogether that familiar with it, is a truly barebones wrapper (with a few notable exceptions) around the Win32 API. Case in point. The CFileDialog takes a pointer to a series of null terminated strings that define the filters to help the user with the file selection process. These are somewhat of a pain to define on the fly so I wrote a simple class that takes a single string with vertical bars (|) that are placeholders for the null terminators. It has an operator LPCTSTR() method that returns a pointer to a string buffer with the vertical bars replaced with their null terminators.

Now to the meat of the method. First we need a pointer to the Excel application. If you run this in the debugger you'll notice a couple second pause when tracing through this code as Excel is loaded by COM. I took this to be a good sign the first time I debugged the program: I was actually invoking Excel! (I always step through lines of new code in the debugger the first time I run a program, don't you?)

If you leave out even one of those varOption parameters, it won't work. Gee, I wish I knew what they all did.

Next I get the pointer to the first Sheet in the workbook. Naturally the index to the first Sheet is 1, not 0. (Remember: Office is in bed with Visual Basic. MPEG at 11mps on BitTorrent!)

Excel::_WorksheetPtr pSheet = pBook->Sheets->Item[ 1 ];

Phew! Finally, we can start sifting through the spreadsheet data. Pretty near as I can figure out, this is all done via the Range object. Now the Range object, if used in a scripted language like C# or Visual Basic is a bold, wonderful thing with playful interfaces and powerful methods. In C++, it's like arguing with Zeus while he's fingering his Aegis.

The basic method for using it, I found, was just to oversupply it with a range. I then used a hack by scanning for empty cells as an the end-of-column or end-of-row marker. I'm sure there are much more elegant ways of using this object in C++, which I carefully leave as an exercise for the student.

First thing I do is fetch the first row, which I'm assuming is the header row delineating all the column names.

The code scans this row via the Item operator of the Range object. I go through a two-step process of first stuffing the cell contents into a _variant_t and then into a _bstr_t because I'm lazy and don't want to check the type of the _variant_t.

I use a simple nested loop to create the rows in the list-view control and set the column data.

At the end, I close the book with a VARIANT_FALSE to prevent any inadvertent changes from creeping into the .xls file.

pBook->Close( VARIANT_FALSE );

Finally, I Quit the Excel application. This is an important step because if you neglect to do it, Excel will be left floating in memory and worse yet, it will keep a lock on the file you opened above. Yeck!

pApplication->Quit( );

That's the lot! Have fun!

Points of Interest

Oh, yeah. What if you want to modify data in the spreadsheet? Well, that turns out to be quite simple. Just use the Item property as a left operand.

History

First revision.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.