The big pain point in working with MS Word documents programmatically is . . . the Office Interop. To get almost anything done with Word (including simply pulling the text out of the document, you pretty much need to use Interop, which also means you have to have Word installed on the local machine which is consuming your application. Additionally, my understanding is that there are issues with doing Word automation on the server side.

Interop is essentially a giant wrapper around the ugliness that is COM, and the abstraction layer is thin indeed. If you need to automate MS Office applications, Interop (or going all the way down to the COM level) is pretty much the only way to do it, and obviously, you need to have Office installed on the client machine for it to work at all.

Often times though, we don't so much need to automate the office application directly so much as get at the contents of Office file (such as Word or Excel files). Dealing with all that Interop nastiness makes this more painful than it needs to be.

Thankfully, the open source DocX by Cathal Coffey solves both problems nicely, and unlike Interop, presents an easy-to-use, highly discoverable API for performing myriad manipulations/extractions against the Word document format (the .docx format, introduced as of Word 2007). Best of all, DocX does not require that Word or any other Office dependencies be installed on the client machine! The full source is available from Coffey's Codeplex repo, or you can add DocX to your project using Nuget.

10/2/2013 - NOTE:It has been noted by several commentors on Reddit and elsewhere that the MS official library OpenXml serves the same purpose as Docx, and after all, is the "official" library. I disagree - the OpenXml library "does more" but is inherently more complex to use. While it most certainly offers additional functionality not present in DocX, the DocX library creates a much simpler and more meaningful abstraction for what I find to be the most common use-cases working with Word documents programmatically. As always, the choice of libraries is a matter of preference, and to me, one of "Right tool for the job."

1/23/2014 - NOTE:I mentioned in the opening paragraph the OSS project LinqToExcel, which is a fantastic library. However, LinqToExcel takes a dependency on the Access Database Engine, which can create issues when (for example) deploying to a remote server or other environment where administrative privileges may be limited. I discovered another OSS library with no such dependencies. You can read about it at Use Cross-Platform/OSS ExcelDataReader to Read Excel Files with No Dependencies on Office or ACE

In this post, we will look at a few of the basics for using this exceptionally useful library. Know that under the covers and with a little thought, there is a lot of functionality here beyond what we will look at in this article.

As with LinqToExcel, you can add the DocX library to your Visual Studio solution using the Nuget Package Manager Console by doing:

Install DocX using the Nuget Package Manager Console:

PM> Install-Package DocX

Alternatively, you can use the Solution Explorer. Right-click on the Solution, select "Manager Nuget Packages for Solution," and type "DocX in the search box (make sure you have selected "Online" in the left-hand menu). When you have located the DocX package, click Install:

Note in the above we need to add using Novacode; to our namespace imports at the top of the file. The DocX library is contained within this namespace. If you run the code above, a word document should open like this:

Output of Really Simple Example Code:

What we did in the above example was:

Create an in-memory instance of a DocX object with a file name passed in as part of the constructor.

Insert a DocX.Paragraph object containing some text.

Save the result to disc as a properly formatted .docx file.

Until we execute the Save() method, we are working with the XML representation of our new document in memory. Once we save the file to disc, we find a standard Word-compatible file in our Documents/ directory.

Here, we have created some Formatting objects in advance, and then passed them as parameters to the InsertParagraph method for each of the two paragraphs we create in our code. When the code executes, Word opens and we see this:

Output from Creating Multiple Formatted Paragraphs

In the above, the FontFamily and Size properties of the Formatting object are self-evident. The Position property determines the spacing between the current paragraph and the next.

We can also grab a reference to a paragraph object itself and adjust various properties. Instead of creating a Formatting object for our headline like we did in the previous example, we could grab a reference as the return from the InsertParagraph method and muck about:

Yes, yes I DID print that headline in Comic Sans. Just, you know, so you could see the difference in formatting.

There is a lot that can be done with text formatting in a DocX document. Headers/Footers, paragraphs, and individual words and characters. Importantly, most of the things you might go looking for are easily discoverable – in other words, the author has done a great job building out his API.

Of course, one of the most common things we might want to do is scan a pre-existing document, and replace certain text. Think templating here. For example, performing a standard Word Merge is not very doable on your web server, but using DocX, we can accomplish the same thing. The following example is simple due to space constraints, but you can see the possibilities:

First, just for kicks, we will create an initial document programmatically in one method, then write another method to find and replace certain text in the document:

See the %APPLICANT% placeholder? That is my replacement target (a poor-man's merge field, if you will). Now that we have a private method to generate a document template of sorts, let's add a public method to perform a replacement action:

Obviously, the preceding example was a little contrived and overly simple. But you can see the potential . . . If our letter contained additional "merge fields, we could just as easily pass in a Dictionary<string, string>, where the Dictionary contains one or more Key Value Pairs containing a replacement target and a replacement value. Then we could iterate, using the Dictionary Keys as the search string, and replace with the Dictionary values.

In this quick article, we have only scratched the surface. DocX exposes most of the stuff we commonly wish we could get to within a Word document (Tables, Pictures, Headers, Footers, Shapes, etc.) without forcing us to navigate the crusty Interop model. This also saves us from some of the COM de-referencing issues which often arise when automating Word within an application. Ever had a bunch of "orphaned" instances of Word (or Excel, etc.) running in the background, visible only in the Windows Task Manager? Yeah, I thought so . . .

If you need to generate or work with Word documents on a server, this is a great tool as well. No dependencies on MS Office, no need to have Word running. You can generate Word documents on the fly, and/or from templates, ready to be downloaded.

I strongly recommend you check out the project on Codeplex. Also, the project author, Cathal Coffey, discusses much of the DocX functionality on his own blog. If you dig DocX, drop by and let him know. He's really done a fantastic job.

Share

About the Author

My name is John Atten, and my username on many of my online accounts is xivSolutions. I am Fascinated by all things technology and software development. I work mostly with C#, Javascript/Node.js, Various flavors of databases, and anything else I find interesting. I am always looking for new information, and value your feedback (especially where I got something wrong!)

I have to create a document using docx dll, in the top section i need to add four text values to the document. here i can add those text to the docx and i save using doc.Save(); method.
after that i need to add some images to same document and i can add those images based on my requirement. after that am using doc.SaveAs("Path"); method to save the final document.but while doing the save as method the above added text is removing automatically. ie in the final document i have onlt the image not the text.

I have to open a word document using docX dll and insert some images below the heading IMAGES: on the word document.
could anyone please help me to find the line starts with Images in the word document using DocX dll in c#,
i can insert the image by using the below code but am not able to find the exact place to fix those images.
// Add an Image to the docx file

I saw that when you create a word file you pass a string to Create method and that string is actually a path from your pc. There is any chance to create the word file direct in memory without pass a a path from my pc? Thanks

hey guys need a help....m trying to create a different word file for every user...m using the template which 5 rows on the left side contains predefined data like name,age,sex,designation and on right side i wan to store that...now the question is how can i store the data onto specific location of a file in this case onto the corresponding row.

When you need to generate Word documents populated by real data (from application database) then you generally don't want to create documents from scratch (by using APIs to create a paragraph, insert text...). Instead, even for small projects, the template based approach should be pursued where a template document is first created using MS Word with inserted placeholders. Then, from within a .NET application such a template gets processed by populating placeholders with real data. And sometimes you also need to write items (as table rows) - Is this possible to do with this library? I found another mail merge library which looks like it supports lists as well (along with some other useful features) but is commercial. Is it possible to use DocX library in a similar, template based manner, to render lists without manually adding table rows for each item using code?

Agreed on the bit about not generally creating/inserting paragraphs, etc. I was just walking through some of the functionality.

So far as I can see, you should be able to do all of the templating stuff you describe here, just may need to dig into the API and figure out the best way to approach it. Far as I know, most, if not all, of the word object model is accessible here.

how to do the same thing in the example of Find and Replace Text Using DocX ,but this this time we replace text in a existing doc file ? p
and i want to know is it's possible to display the file on WinForm ?