Introduction

This application uses the Open XML SDK to find MERGEFIELDs in Microsoft Word documents and replace them with the provided data. Additionally, there's also support for adding tables with data. This is a very fast and stable way of generating Microsoft Word documents server-side.

The main code only consists of 1 class with a few methods that do all the work. I've provided a frond-end to test the functionality of the class.

To be able to run the application, you must download and install the aforementioned SDK. As the SDK is written in .NET 3.5, the entire library only works in .NET 3.5 and above.

Background

For a customer project, I needed the ability to inject data from an XML file into a standardized document format. The customer still used Microsoft Office 2000 but had installed the Compatibility Pack on all his PCs.

I didn't want to use Microsoft Word through OLE automation because it was a server-side process that ran unattended. As Microsoft doesn't recommend using Microsoft Office in such scenarios, it wasn't an option. But I remembered that the new docx format is just a zipped archive of loose XML files that can be edited. After some searching on the Internet, I found the Open XML SDK that provided a lot of help in parsing the Microsoft Word document structure. Finally, I've written a piece of code that fills a Microsoft Word docx file with the data from the XML file. This resulted in the required document with data.

Using this mechanism also gave me the additional advantage that the customer himself could edit the layout of the template. Although it wasn't a requirement, it saved me a lot of time afterwards.

Using the Front-End

Along with the source code, a front-end application has been provided to allow you to test the functionality.

This application has been written using WPF and uses the datagrid from the WPF toolkit. To be able to run the testing application, you'll need to download and install the WPF toolkit from CodePlex.

Of course, before being able to test anything, you'll need a docx template. I've added a sample template to the zip file, but you can just as well provide your own (see the following chapter for details about the template).

In the main window, you must start by providing the full path of the template in the textbox above (as long as this field is empty, the Generate button will be disabled).

Add your fields and the data in the grid in the center of the window. To add tabular data, click on the 'Add Table' button and define the tablename and column names (max. 5). Click on OK and provide the data for the table. Repeat this for each table.

Finally, click on Generate. Your report should appear automatically.

The docx Template

First of all, you'll need a Microsoft Word docx document with a number of MERGEFIELDs that act as placeholders for your data. The mergefields contain the name (code) of the data that you want to add, for example:

{MERGEFIELD CAND_NAME \* MERGEFORMAT}

There are also 3 suffixes that can be used:

dp: Deletes the paragraph if the data field is empty or wasn't provided

dr: (only in tables) Deletes the row if the data field is empty or wasn't provided

dt: (only in tables) Deletes the whole table if the data field is empty or wasn't provided

The suffixes are added to the field name, with a preceding '#'. For example:

{MERGEFIELD CAND_NAME#dp \* MERGEFORMAT}

If you want to add tabular data to the Word document, you must add a Table to the docx document. The cells of the Table contain mergefields that indicate the datafields that must be placed there. These Mergefields are formatted as: TBL_nameoftable_nameoffield. For example:

{MERGEFIELD TBL_LANG_NAME \* MERGEFORMAT}

The mergefield above tells the application that this cell contains the value of the Name-column in the selected record of the Lang-datatable. The application will add a row to the Table for each record found in the datatable. (Suffixes are not supported for tabular data. Each tablecell can only contain 1 mergefield.)

Note: The application will fill loose mergefields that are placed in the header/footer of the document, but there's no support for tabular data in headers/footers.

Using the Code

There's only one (public) method that can be invoked on the FormFiller class: GetWordReport.

This method accepts 3 parameters:

filename: Full path of the template docx file

dataset: A DataSet containing the tabular data that must be added to the template. Each datatable in the dataset must be named according to the names used in the template (see above). If the template contains a field TBL_LANG_NAME, the datatable must be called 'LANG' and must contain a column 'NAME'. This parameter can be null if there's no tabular data.

values: This is a Dictionary of strings where the key is the fieldname and the value is the data that must be placed in the Microsoft Word document.

If all goes well, the filled-in template is returned as an array of bytes.

A Few Highlights in the Code

Opening the Template

Opening the docx file is very easy with the SDK. Only the following code is required:

Providing a Run-object for the Data

In the OpenXML document, you can't just add text that contains plain hard returns or tabs. These must be replaced by the correct XML tags to be displayed correctly in Microsoft Word.

The mergefields in the OpenXML are represented as SIMPLEFIELD (<fldsimple>) elements and can contain child RUN (<r>) elements. The text of the field is represented as a child TEXT (<t>) element inside the RUN element. A RUN element can also have a RUNPROPERTIES (<rpr>) element with additional layout information about the displayed text, which we don't want to lose, because we'd like our data to keep the same layout as the mergefield has in the template.

So, if we want to replace a mergefield with our text we must make sure that:

This method checks if there's a RUNPROPERTIES element in the given mergefield. If there is, the content is preserved (.OuterXml) and added to the newly instantiated RUN element. The data is inspected for tabs/returns and the correct elements are added to the data (BREAK and TABCHAR elements).

Saving the Template

Once all the fields have been filled in, the changes must be explicitly saved back into the document (it doesn't happen automatically).

docx.MainDocumentPart.Document.Save(); // save main document back in package

Processing Headers and Footers

The headers and footers aren't placed in the same XML file as the main document (it's a different 'document part' in the package). The code that is discussed above won't find MERGEFIELDs that are placed in the header or footer. For this, a loop over the header- and footerparts is required. Below is an example of a loop over the headers of the document:

Points of Interest

The suffixes (see above) allow to delete paragraphs, rows and tables. If this is done while iterating over the elements, the loop suddenly stops (without throwing any error whatsoever). For example: if there are 10 mergefields in the document, you're iterating over them using the following statement:

You'll never reach elements 6 to 10. The loop will quit without any indication that you've missed 4 elements.

To solve this, you'll remark in the code that there are 2 loops: the first loop will fill the mergefields with the data. This first loop will keep a list of empty mergefields and a second loop will delete all those empty mergefields.

Update provided by M. Chale

The library now supports tags for UPPER, LOWER, FirstCap and Caps. UPPER and LOWER modify the entire string to be uppercase or lowercase, FirstCap capitalizes the first letter while making everything else lowercase; and Caps title-cases words, capitalizing the first letter of every word. Note that the Caps routine is a bit naive, only capitalizing letters that directly follow spaces. The library also supports text that should appear before or after the data. They will be inserted with the same formatting as the rest of the MergeField, provided the field is not blank and marked #dp.

A sample field with formatting: MERGEFIELD MYFIELD \ UPPER \b before \f after

Thanks to Michael Chale for this update.

Update for Microsoft Word 2010

Since Microsoft Word 2010, the SimpleField element is no longer used. It has been replaced with a number of Run elements where one (or more) contain a FieldCode element with the field instruction. The code of the library has been modified to replace these with the old-style SimpleField thus remaining backwards compatible with Microsoft Word 2007 documents.

History

2009-07-29: Submitted to CodeProject

2009-08-12: Mergefields in headers and footers will now also be processed

2009-08-14: Small update in source: formatting of mergefields in tables is now also repeated (bold, italic, ...)

Share

About the Author

I'm working since 1999 in an IT environment: started developing in PROGRESS 4GL, then VB6 and am working since 2003 with C#. I'm currently transitioning to HTML5, CSS3 and JavaScript for the front-end development.
I started my own company (TRI-S) in 2007 and co-founded another one (Cogenius) in 2012.
Besides being a Microsoft Certified Professional Developer (MCPD) I'm also a Microsoft Certified Trainer (MCT) and am teaching .NET and JavaScript courses.

Comments and Discussions

I have gotten the add-ins to work with VS 2010 and .net 4.0, but I cannot find the test program anywhere. Am I missing something? There is only the one zip file and there seems to be no other code in it for the test program.

Also, Xavier would you like a copy of the VS2010 code? I was unable to get the VS2008 code to work with VS2010 and ended up re-creating the add ins and then copying your code to them.

Hi, I hope you will find this interesting.
Here is another library which enables mail merging in C#. It has some very useful features like customizing mail merging event (so you can insert a picture) and setting a merging clear options.
I believe this could be an interesting addition to your library as well.

This code really helped me a lot and its a wonderful code to find the mergefields. Bu i have a scenario where i have to find mergefield inside an IF condition the regex is not working for that. suppose if i have a statement like {=IF({MERGEFIELD Sample} > 100,10,0)} the Sample is not able find using the given regex.

We have developed an app which is working good based on Open XMLSDK and MailMerge But recently we have had requirements to introduce conditional statements for the Merge Fields in the template, something like
{If {MERGEFIELD netToBorrower \* MERGEFORMAT } <> 0 "Incentive&quot" "Zero"}
However hard we try, we are unable to get the Second Condition of the conditional statement to work.
Also if there is a formula to be applied something like
{SET calculatedPayoffAmount {={Mergefield payoffAmount } + {Mergefield fundsPostedToAccount }}}
However I try, I am unable to get this working. Do we have a solution for the above 2 questions? Kindly let me know if there is a way to get this working. Your help would be grately appreciated

Using the code to replace fields with concrete string values works fine (although converting from complex fields to simple fields can/does cause a loss of formatting information), however I've modified the code to let me rename fields - it's easier for me to write a program that renames fields en-masse rather than to do them individually in Word).

However after I've converted all of the complex fields to <simpleField> and save the document, the OpenXML SDK 2.0 refuses to read the document back properly. It doesn't read a DocumentFormat.OpenXml.Wordprocessing.SimpleField object, instead it returns an OpenXmlUnknownElement, which of course isn't picked up by your code's processing data.