About Me

Hello,
my name is Cathal Coffey. I am best described as a hybrid between a developer and an adventurer. When I am not behind a keyboard coding, I am hiking and climbing the beautiful mountains of my home country Ireland.
I am a full time student studying Computer Science & Software Engineering at the National University of Ireland Maynooth. I am finishing the final year of a 4 year degree in September 2009.
I am the creator of an open source project on codeplex.com called DocX. At the moment I spend a lot of my free time advancing DocX and I enjoy this very much. My aim is to build a community around DocX and add features based on requests from this community. I really enjoy hearing about how people are using DocX in their work\personal projects. So if you are one of these people, please send me an email.
Cathal
coffey.cathal@gmail.com

Thursday, February 26, 2009

DocX - A .NET library for manipulating Word 2007 files

Note: Code samples have been updated to work with DocX version 1.0.0.6.

Hello my name is Cathal Coffey. I am a intern working at Microsoft Ireland Research. This blog post is about a personal project which I have created outside of my work time.

My project which can be downloaded from here is called DocX. DocX is a .NET library which allows developers to manipulate Word 2007 files, in an easy and intuitive manor. It does not use COM libraries nor does it require Office to be installed in order to function. The rest of this post explains the current features offered by DocX. Please keep in my that this a young library, at the moment it offers two very useful and powerful features

1) String replacement

The document below Test.docx contains the string “pear” lots of time. There are instances of the string “pear” inside structures such as a table, a list and a hyperlink. The document also contains lots of different style properties such as font, colour, bold, italic, strikethrough and underline.

Figure 1.1 - Test.docx before manipulation

Replacing the string “pear” with the string “banana” is a trivial task using the library DocX.

// Load a .docx fileusing (DocX document = DocX.Load("Test.docx")){/* * Replace each instance of the string pear with the string banana. * Specifying true as the third argument informs DocX to track the * changes made by this replace. The fourth argument tells DocX to * ignore case when matching the string pear. */document.ReplaceText("pear", "banana", true, RegexOptions.IgnoreCase);

// Save changes made to this documentdocument.Save();}// Release this document from memory.

After running the above code and reopening Test.docx we can see that every instance of the string “pear” has been replaced by the string “banana” and that both deletions and insertions have been tracked. By hovering over a deletion or insertion, we can see that the DocX library has used the credentials that it was executed with, as the author of the edits.

Figure 1.2 - Test.docx after manipulation

If we click on the “Review” section of the ribbon and select “Accept All Changes in Document” it is now clear that DocX has correctly replace all instances of the string “pear” with the string “banana”.

Figure 1.3 – Test.docx Accept All Changes in Document

An important point to note is that the DocX library inserted the string “banana” with the correctly style information in each case regardless of what structure it was inside a table, a list or a hyperlink.

The following window will then popup and you can create your own custom properties.

Figure 2.3 – Custom properties

I have created seven custom properties for this demo. Four are of type Text: Forename, Username, HomeAddress and FreeGift. One is of type Number: PleaseWaitNDays. One is of type Date: GiftArrivalDate. One is of type Yes or no: RecieveFurtherMail.

Once you have defined custom properties you can use them through your document by selecting ”Insert -> Quick Parts -> Field…”

Figure 2.4 – Insert –> Quick Parts –> Field…

If you double click on one of your custom properties, it will appear in the document at the current carrot position.

Figure 2.5 – Select custom property

The following document is a welcome letter that will be sent to all new users who subscribe to the factious magazine called “Home Appliances. The letter which includes the seven custom properties listed above looks as follows.

Figure 2.6 - Factious magazine welcome letter

Setting values custom properties for this document is a trivial task using DocX.

25 comments:

Hi Cathal, Cool features! Is there a way to do a "find" of a pattern (I noted the Regular Expression setting, so it sounds possible)and get back what the pattern found? For example, string Find("[*]") might return [Address], which would allow me to look up "Address" then do document.ReplaceText("[Address]", "21 Vine Street", true, RegexOptions.IgnoreCase); and avoid Custom Properties (which are difficult to use, but I can get a list of them). If find could return a list, it would be even much better! Thanks!

Thanks for this beautiful library. The primary importance shall only come when your library will support RTF Format files or data to be converted to DocX files or data, and DocX files or data to RTF Format files or data, such that a Word Editor which uses RichTextBox or RichEdit 6.0 would be able to export/import DocX files. There is no other importance.

Hey, these are super cool features! Also, I like how you describe yourself as an adventurer. It especially helps that you are Irish. Anyway, I was wondering if you have more information converting files and XSL-FO. I have checked out a few sites (like the linked one), but am still trying to figure this all out for a project. -Janie

Thanks a lot.Now I am using word to generator a report of a product.First I try crystal report. But it's a report tool driven by data. It is not suitable for report with a many text blocks. Then I saw docx on codeplex. It is wonderful.

Hi. This is shanmugam I used DocX.dll. It is nice.I've one problem. When the word file is closed there is no error in browser of my asp.net application.If I open a docx file and run I got error. Can you give me a solution to solve the problem even docx file is open

Hi Cathal, its very nice project.I am facing an issue, related to AddCustomProperty.

My problem is that, how can i make a template document, without using AddCustomProperty. Actually my customer having template already, i need to fill the values using DocX.dllor in simple word, how can i add CustomProperty in word document?

Hi i want to set the font for whole file programticaly i am using this docx dll for writing the word file but have problem after completion of file writing formating is new well done . i want to use Curier New Font with size 7 please give me sample code and help me.Thanks and regardsSumeet.

Hey I just want to know is this possible with DocX library that I find a specific word from the document file and add some string just after that word. I don't want to replace any text but to find a text and add another text with a space after that.

Hi Cathal,I have to insert custom properties in the property section of info tab. So that user can simply click "show more properties" and will be able to see his custom properties right there and can edit it, rather than going to advance settings -> custom tab and all those stuff.Can we do it? if yes then how?

"Let me tell you about the very rich. They are different from you and me." This line appeared in a short story written by Tiny Net Worth Scott Fitzgerald in 1926, and even after all these years, there's a good deal of truth in it-truth that needs to be understood by those who want to serve high net worth clients as business advisors.