Friday, June 29, 2007

In an earlier article, I discussed using ActiveX Data Objects (ADO) to access a Microsoft Access database. A reader commented that ADO can also be used to access data in an Excel worksheet. Here's a brief demonstration...

As usual, we'll use the win32ole library:

require 'win32ole'

Create a new ADODB.Connection object:

connection = WIN32OLE.new('ADODB.Connection')

To open a connection to your Excel workbook, we'll call the Connection object's Open method and pass it a connection string. You can use same the Microsoft Jet driver used for accessing an MS Access database, but we need to append an "Extended Property" to specify that this is an Excel woorkbook:

When calling the RecordSet object's Open method, pass it your SQL statement and the open connection object. When working with an Excel worksheet as your table, append '$' to the worksheet table name and wrap it in brackets:

recordset.Open("select * from [Sheet1$];", connection)

The Recordset object's GetRows method returns an array of columns (not rows, as you might expect), so we'll use the Ruby array's transpose method to convert it to an array of rows:

data = recordset.GetRows.transpose

Close the Connection object by calling its Close method:

connection.close

There you have it! My thanks to reader Khaoz for the suggestion of using ADO with Excel.

Other articles about working with ADO can be found under the ado label to the right.

As always, feel free to post a comment here or email me with questions, comments, or suggestions.

I am getting null values while reading excel cells using ADODB, if excel table column has different sets of data.(Ex: Numeric in one cell, string in another cell). How to handle this situation? I tried with IMEX=1 property, but no fruits.

Hi David,Thanks a lot for sharing this with us. I have been trying to automate a few things that involves querying the excel sheet, just as your article demos. However, I observed that, in some cases, specially in number comparisons, the query won't work because it expects that column to be formatted as a number. But when I format that as a number, the query finds nil values in those cells. It occasionally works fine. Is there a work around for this?

Hi David!I'm a big fan of your site!I'm trying to automate some test case reporting from multiple Excel 2007 spreadsheets, but they're saved in .xlsx format. When I try to open the connection with the example above, I get an error message saying "External table is not in the expected format". Do I need to change the Extended Properties=Excel 8.0 setting?