Excel has a very cool feature where you can declare that a range of cells is a table. It is a feature that allows you to use Excel very much like a database. You can add new rows as necessary, sort the table by columns, do some simple filtering, calculate the sum of columns, and more. Each table has a unique table name, and each column has a column name. Because these tables are stored in Open XML documents, we can implement some simple extension methods and some classes so that we can query these tables using LINQ in a manner that is similar to querying a SQL database. This post presents a bit of code to do this. The code and sample documents are attached to this post.

Blog TOC(Updated July 21, 2010 - Fixed a bug where the code would return the wrong value for cells in the table if the columns had been moved.)

Note: this code is presented as an example - a proof-of-concept. This code could be further optimized, so that it performs better (although it performs quite well as is). And it may be interesting in the future to modify the code to use a strongly-typed approach – as the code is currently implemented, if you misspell a table or column name, the code throws an exception. However, this code is useful as is for doing ad-hoc queries of Excel tables. I certainly will be using it! :-)

This code uses the Open XML SDK, either V1, or the CTP of V2. You can download V1 of the SDK here. You can download CTP1 of V2 of the SDK here.

Thanks to Brian Jones who suggested this project.

Following is a screen clipping of an Excel spreadsheet that contains a table:

You can see the four columns of this table: Item, Qty, Price, and Extension. In addition, in the Design tab of the ribbon, in the far left box, you can see that this table has a table name of “Inventory”. Using the code presented in this post, you can query this table as follows:

The table class (returned by the Table method) has a TableColumns method that iterates the columns in the table:

// list all of the columns in the Inventory tableConsole.WriteLine("Table: Inventory");foreach (var c in spreadsheet.Table("Inventory").TableColumns())Console.WriteLine(" {0}", c.Name);

When you run this code, you see:

Table: Inventory Item Qty Price Extension

The LtxOpenXml Namespace

Some time ago, I wrote some code that enabled querying Open XML spreadsheets using LINQ to XML, presented in the blog post ‘Open XML SDK and LINQ to XML’. I’ve added the code to query tables to the code presented in that post. The extension methods that enable querying tables make use of that code. The enhanced LtxOpenXml namespace now contains code for:

Querying word processing documents

Querying spreadsheets

Querying tables contained in spreadsheets

The code for querying word processing documents and spreadsheets is unmodified. Refer to the above mentioned blog post for details on using those extension methods.

The code that enables querying of spreadsheet tables is, of course, written in the pure functional style. No state is maintained, and all methods to query the document are lazy.

If you have questions about how to write functional code (like the code that implements the extension methods and classes associated with this post), go through this Functional Programming Tutorial.

I’ve provided a summary of the types and extension methods included in the LtxOpenXml namespace at the end of this post.

Use of Data Types

Here’s another example of a table that contains a few more columns with more data types:

Each row returned by the TableRows method is a collection of TableCell objects. I’ve defined explicit conversions between TableCell and some of the most common .NET types, so that you can simply cast a TableCell to your desired type. Here’s a query to list all vehicles in the table:

I’ve imported the Customers and Orders from the Northwind database into a spreadsheet, where the Customers table is in one sheet, and the Orders table is in another sheet within the worksheet. Here is the Customers table:

And here is the Orders table:

We can now write a query that joins the customers and orders tables:

using (SpreadsheetDocument spreadsheet =SpreadsheetDocument.Open(filename, false)){// list all of the columns in the Customer tableConsole.WriteLine("Table: Customer");foreach (var c in spreadsheet.Table("Customer").TableColumns())Console.WriteLine(" {0}", c.Name);Console.WriteLine();

// list all of the columns in the Order tableConsole.WriteLine("Table: Order");foreach (var o in spreadsheet.Table("Order").TableColumns())Console.WriteLine(" {0}", o.Name);Console.WriteLine();

This class contains a number of properties about the table. In addition, it contains two methods, TableColumns, which returns a collection of TableColumn objects (the columns of the table), and TableRows, which returns a collection of TableRow objects (the rows of the table).

TableCell Class

This class represents a cell of a row of a table. It implements IEquatable<T> so that you can do a value compare of two cells. It also implements a number of explicit conversions to other data types so that it’s easy to deal with columns of various types. Its definition:

Excel has a very cool feature where you can declare that a range of cells is a table. It is a feature that allows you to use Excel very much like a database. You can add new rows as necessary, sort the table by columns, do some simple filtering, calculate...

One of the most common scenarios for Open XML is programmatically adding, deleting, and moving paragraphs in a word processing document. A variation on this is moving or copying paragraphs from one document to another. This programming task is complicated...

Debugging LINQ queries can be problematic. One of the reasons is that quite often, you write a large query as a single expression, and you can’t set a breakpoint mid-expression. Writing large queries in expression context is particularly powerful...

This post contains a detailed description of each of the PowerTools for Open XML cmdlets. For an overview and screen cast of PowerTools for Open XML, see Automated Processing of Open XML Documents using PowerShell . For more information on the cmdlets...

A convenient way to explore Open XML markup is to create a small document, modify the document slightly in the Word user interface, save it, and then compare it with the Open XML Diff utility that comes with the Open XML SDK V2 . However, Word adds extraneous...

The ISO announced today the publication of the final text of ISO/IEC 29500, the Office Open XML specification. Doug Mahugh’s post on this has some relevant links regarding the process going forward.
This blog is inactive. New blog: EricWhite...

When writing queries, just as you sometimes want to skip the first n items in a collection, on occasion you want to skip the last n items. You could certain count the items remaining in the collection, and use the Take operator to take all but the last...

Visual Basic 9.0 added many language features that allow us to write in the functional style in a natural and expressive way. The value of programming in the functional style has been apparent to me for some time. VB developers can realize the benefits...

I recently posted some code that allows you to use LINQ to query Excel tables . The source for these queries is the Open XML document – you don’t need to involve the Excel application to query the data in these tables. In that post, I presented...

The Enterprise Engineering Center (EEC) has put together a paper on running SharePoint on Hyper-V . It has a lot of great information, including links to required reading on Hyper-V, configuring and optimizing Hyper-V, key updates to install, and disk...