The Easy Way to Assemble Multiple Word Documents

One of the most common requests we hear related to word processing documents is the ability to merge multiple documents into a single document. Today, I am going to show you how to leverage altChunks and version 2 of the Open XML SDK to easily create a robust document assembly solution in less than a thirty lines of code.

Scenario – Document Assembly

Imagine a scenario where I’m a developer for a book publisher company that specializes in education based books. In my company, we typically have one or more authors write content for a specific chapter within a given book. Each of these chapters is written as a separate document. In this case, my company wants to write a book on the solar system, where the book is divided into chapters that correspond to unique components of the solar system, like the different planets and the sun. My company has asked me to write a solution that will be able to merge all these documents, each representing specific chapters, into one final document or book.

Solution

Before I get into the details of my solution I want to talk about the two different approaches I can take to solve this problem:

By far the first option of using altChunks is the easiest method for merging multiple documents together. I think of altChunks as the “easy button” when it comes to importing external files into a document. Not only can altChunks import other WordprocessingML documents, but it can also import html, xml, rtf, or plain text.

Manually merging multiple documents together is feasible, but requires you to handle a number of issues. For example, you will need to manually merge and deal with conflicts related to styles, bullets and numbering, comments, headers and footers, etc. Perhaps sometime in the future I will write a series of posts talking about how to merge documents manually.

If you just want to jump straight into the code, feel free to download this solution here.

Step 1 – Create a Template

For those of you who have read my previous posts you will know that setting up the right template is the first, and probably the most important, step in creating an Open XML format solution. This scenario is no exception.

The best way to accomplish this scenario is to create a template that represents the final look of the book I want to create. In this template I will merge a specific chapter in a specific location within the template. I can accomplish this task by taking advantage of content controls. Content controls provide an easy mechanism for specifying semantic regions within a document. In other words, content controls allow me to uniquely identify a specific region within a document.

In this case, I am going to add content controls within my template document that have the name of the chapter I want to add at that location. For example, as shown in the screenshot below, I have a content control that has the name “Earth.” This name indicates that the chapter titled “Earth” needs to be merged in this location of the template.

Step 2 – Find Specific Content Controls

Now that I have setup the template I need to programmatically locate content controls based on the alias or name of the content control, which represents the title of the chapter I want to merge. This task is pretty easy with version 2 of the SDK. Once I open a Word processing document I can find all content controls, represented as SdtBlock, that have an alias value set to the source file I want to merge into the template with the following code:

MergeSourceDocument(string sourceFile, string destinationFile)

{

using (WordprocessingDocument myDoc =

WordprocessingDocument.Open(destinationFile, true))

{

MainDocumentPart mainPart = myDoc.MainDocumentPart;

//Find content controls that have the name of the source file as // an Alias value

End Result

Putting everything together and running my code, I will end up with a solar system book that is broken down into chapters representing unique components of the solar system. Using altChunks automatically ensures the following:

Final document has consistent styles applied

Images, comments, tracked changes, etc. are all included as part of my merged document

Bullets and numbering just works

Here is a screenshot of the final solar system document:

[updated 1/9 due to bug in code – SdtProperties.GetFirstChild<Alias>() is the correct syntax]

Unfortunately, version 2 of the SDK is still a CTP, which means you cannot reference this assembly in a commercial product. We are aiming to get this SDK final by the time O14 is released. In the meantime, you can still accomplish all your scenarios with version 1 of the SDK.

Hm, interesting post, but it seems overkill for simple book assembly. Indeed, it doesn’t like the method would save any labor over other techniques, as the content controls still have to be created manually, in addition to not insubstantial work of cobbling the snippets of code together into a functioning applet.

Wouldn’t it be simpler to use INSERTTEXT fields? They do the same things as the combination of content controls above, and they don’t require any coding or compilation. (Admittedly, as I’ve pointed out before, they are anything but robust–but maybe Office 14 will change that? Maybe fields will be merged into smart content controls?)

Still, I can see how the technique could be very useful where large numbers of complex documents are electronically assembled (without human input), e.g. government or corporate reports.

The content controls are simply added to provide semantic structure to the template document. They are not necessary to accomplish the scenario of using altChunks. That being said, IncludeText field is another approach for this scenario (thanks for suggestion). That being said, there are some fundamental differences between using this field code vs. altChunks. altChunks allows for the document to actually be merged within the template document, while using IncludeText links to the content of the file. Another difference is using IncludeText results in field code UI within Word, which may be overwhelming to the user (depending on how much content was merged).

Hi Brian, I have been reading your posts regarding Office Extensibility. I am presenting a situation in response to your following quote:

"Again, if you have any specific scenarios or solutions you would like me to address in future posts please let me know"

A common situation is to prepare Invoices, etc. from information in the database. Clients usually have extremely customized Invoice formats, but the data to be filled in is basically the same.

I was trying to create a word document, with Tokens of the form [$TokenName$] in it, and replacing the tokens with actual text programatically. However, it was not as easy as I thought.

Word 2007 splits up the token into multiple parts, depending upon regions to be checked for spellings. That makes the scenario almost impossible.

Can you suggest a solution.

Also, it would be helpful to be able to add rows to existing column based on a row template. In the same Invoice scenario, we leave a single row for Goods being delivered, and replicate that row replacing the Tokens for all Goods in the Invoice.

I’ve read information (a lot of information) about content controls but I found no answer to my main problem : can I use the content controls out of Word 2007 ?

In other words, can I use the content controls, modify the Custom XML values by code and allow the users who don’t have Word 2007 (but only Word 2003 for example) to get the docx updated ? I don’t think that all the users have Word é007 installed on their computer and I wonder if another way is available to use these controls …

Good question. No, Word 2003 and prior do not support content controls. That being said, I mentioned content controls to you as more of a tool you can utilize in your solution rather than an end user feature.