In this article

OfficeTalk: Dive into SpreadsheetML (Part 1 of 2)

05/23/2014

9 minutes to read

In this article

This content is no longer actively maintained. It is provided as is, for anyone who may still be using these technologies, with no warranties or claims of accuracy with regard to the most recent product version or service release.

Summary: SpreadsheetML is an XML dialect developed by Microsoft to represent information in an Excel workbook. This column, the first of a two-part series, explores the concept, schema definition, and some scenarios in which it makes sense to programmatically generate Excel workbooks using Office 2003 and SpreadsheetML. (10 printed pages)

Introduction to SpreadsheetML

Have you ever thought about all the data that is stored in Microsoft Office Excel workbooks around the world? Excel workbooks are an extraordinary data store, but they also offer great potential as a resource of dynamic information that can move across data-management workflows and business processes.

The future of Microsoft Office development is moving toward greater support to integrate data with documents and workbooks, and it is important for developers to understand the XML story. I have been talking with some developers who are exploring development possibilities offered by Excel, and I decided to write this column for all of you who want to dive into SpreadsheetML to learn more about the concept, recommended usage scenarios, and the steps you need to follow to programmatically create Excel workbooks.

Because this is a long topic, I decided to split this column into two parts:

This article, Part 1, explores the concept and the schema definition of SpreadsheetML. Part 1 also explains some scenarios in which you might consider using SpreadsheetML to programmatically generate Excel workbooks.

Exploring SpreadsheetML

SpreadsheetML is an XML dialect developed by Microsoft to represent the information in an Excel workbook. SpreadsheetML allows you to save Excel workbooks as XML documents and to open them in Excel. Microsoft created a format that allows you to save, in an XML-based file, almost every Excel customization (including formulas, data, and formatting). Microsoft created SpreadsheetML to represent core spreadsheet models, so the following embedded objects cannot be represented using XML:

In this column, I start by walking you through a simple "Hello World" sample to explain what kind of information is stored inside Excel when you save your files as SpreadsheetML. To accomplish this task, you will create a simple Excel file, save it as an XML Spreadsheet file, and open it in a text editor to review the structure of the file.

To create a simple SpreadsheetML sample file

Start Excel.

Type Hello World into cell A1, as shown in Figure 1.

Type 1 into cell B1, as shown in Figure 1.

Type 2 into cell B2, as shown in Figure 1.

Type the formula =SUM(B1:B2) into cell B3, as shown in Figure 1.

Figure 1. Create a simple SpreadsheetML sample file

On the File menu, click Page Setup.

On the Page tab, under Orientation, select the Landscape option, as shown in Figure 2.

Figure 2. Select Landscape page setup

On the File menu, click Save As.

In the Save as type list, select XML Spreadsheet (*.xml).

In the File name box, type a new name for the workbook (for example, SpreadsheetMLDemo.xml), as shown in Figure 3.

Figure 3. Save the file as an XML Spreadsheet

Click Save.

After you create the XML Spreadsheet file, you can open it by using a text editor program.

To open and explore the XML Spreadsheet file

Open a text editor program, such as Notepad, and then open the SpreadsheetMLDemo.xml file.

The first line that appears in the file represents the XML declaration that all XML files must have:

<?xml version="1.0"?>

The second line is a processing instruction that defines the document as an Excel Spreadsheet file:

<?mso-application progid="Excel.Sheet"?>

The third line introduces the Workbook root element. This element stores characteristics and properties of the workbook, such as the namespaces used in SpreadsheetML.

This namespace defines elements and attributes used to describe more complex features of Excel, such as PivotTables, worksheet options, and validation.

XML Spreadsheet 2000

http://www.w3.org/TR/REC-html40

This namespace references the W3C HTML 4.01 specification.

Note

Depending on the Excel Spreadsheet file that you create, the namespaces might change. For a complete reference of namespaces that can appear in SpreadsheetML, please review the Microsoft Office 2003 Edition XML Schema References.

The first child element of the WorkBook element is DocumentProperties. Office documents store metadata related to the documentfor example, the author name, company, creation date, and more. The XML representation of Excel workbooks stores this metadata in the DocumentProperties element.

The next element is the Worksheet element, the heart of Excel XML workbooks:

<Worksheet ss:Name="Sheet1">

Within a Worksheet element, a hierarchy of elements defines a worksheet:

Table
Column
Row
Cell
Data

As you can see in this example, the first child element is Table. This element keeps all the information related to the table that belongs to the worksheet and holds the row and column elements. The Table element has attributes that define the column count (ss:ExpandedColumnCount) and row count (ss:ExpandedRowCount) of the table.

Nested inside the Table element, you see a Column element that defines column-level properties and a Row element for each row of cells defined in the worksheet. The Row element contains a Cell child element for each cell of a row. The Cell element contains information for an individual cellfor example, the data type and the value stored in a cell.

After the Table element is the WorksheetOptions element, which stores information related to the operation and presentation of the worksheet. When you built this sample file, you set the page orientation to Landscape. This kind of information appears inside the WorksheetOptions element.

You see a Worksheet element for each worksheet stored in the Excel workbook. By default, new Excel files always contain three worksheets. For this reason, you see three Worksheet elements inside your file.

In a production environment, Excel workbooks are more complex than the one I just used to explain the content of an XML Spreadsheet file. The important thing is that you understand the kind of information stored in the Workbook, Worksheet, Row, Cell, and Data elements. With this understanding, you can either create an Excel file or extract information stored in an XML Spreadsheet file to process it with extra business logic and then save it in a different data store.

Why Use SpreadsheetML?

I have seen articles, white papers, blogs, and code samples that explore different approaches to generate Excel workbooks or to extract data from workbooks. As a developer, I am faced with a common question: What is my best option? Here is a list of available options you can use to insert or extract data from an Excel workbook:

Define an Excel workbook as an OLE DB data source, and extract or pump information by using OLE objects such as OleDbDataReader and OleDbCommand.

Use the XML support provided by Microsoft Office XP and Microsoft Office 2003 to save, create, and open Excel workbooks using SpreadsheetML.

There are certain scenarios in which it makes sense to choose one approach over all others. An XML Spreadsheet file is a text document, and many tools across a variety of platforms support XML technologies. You might consider using SpreadsheetML for integrating data into Excel if:

You have a Web-based application or Web service that needs to repeatedly extract data from Excel files to process information or store it in a database or repository.

Your application needs to use a text-based messaging service, such as e-mail, or a mobile device.

Your application needs to generate XML-based files that will be processed by other applications (such as Microsoft BizTalk Server) or other cross-platform systems.

You need to avoid the use of Automation by way of COM or primary interop assemblies to connect to Excel workbooks in order to insert and extract information.

You need to be able to open the files in a text-based editor for further processing.

Figure 4 shows how you can build tools that use SpreadsheetML to extract data from sources such as XML documents, databases, and different systems and applications to generate Excel workbooks. You can also build tools to extract data from Excel workbooks to process and transform information and send it to messaging systems (e-mail and mobile devices), other XML documents, databases, and cross-platform systems and applications.