Dienstag, 26. Februar 2013

once upon a time, I had a nice little function which would open an Word-Document via Microsoft-Interop, export all bookmarks as a Dictionary<BookmarkName, StringBookmarkValue>, I would modify this dictionary and by pass it on to another little function, which would afterwards set all the values of the bookmarks. This worked very well until .... One day the server was upgraded to Windows 2012 and Office 2010 and since than nothing seemed to work any more.

Seems very easy, right? It was a great starting point however only a starting point.

First thing I discovered...

There are HIDDEN bookmarks. This was weird in the beginning however you can see quickly the pattern, all of them start with an _ (underscore) and after finding this trustworthy page as the first answer from Google, to confirm if my assumption was correct, I didn't bother looking further, so just add

bookmarkStart.Name.StartsWith("_")

and the problem is solved.

Next Problem occurred...

You can define bookmarks for CELLs and than they behave totally different. So how do they behave?

They are all stuck in the first cell of each row.

So how do I know to which cell they belong? Word seems to know it.

BookmarkStart has a property ColumnFirst. Normally the value is NULL, however in this case it has the 0-based index of the column it refers to. If your bookmarks stretch over multiple cells, there is also a ColumnLast (for my case ColumnFirst == ColumnLast).

However retrieving the data now is a bit tougher, so let's take a step back. First I created some Extension-Methods to make my functions smaller:

As you can see, I'm looking for the Parent of type TableRow and take the first TableCell-child of this row. Afterwards I take the NextSibling of type TableCell until I reach the necessary column. Than I just need to return the first Text which can be found in this column. I myself don't really care how many texts exist, since I need only one to replace the content and keep the formatting. Later you will see that I delete additional Text-elements.

So, problem solved one more time. What else could there be?

While it's not a problem to read bookmark-values, it is one, when you are trying to set them:

Bookmarks can be empty - not having any element...

However once you figured out that the bookmark really is empty it is quite easy and straight forward to add a simple Run with a Text after the BookmarkStart, the following function takes care of this very easily:

This is solved very easily, however as I suggested, the problem is not to insert the value, but to figure out if it needs to be inserted. For this I present you the last problem I've found and solved for retrieving the values from bookmarks:

How to find Text and Run if they are not siblings of the bookmark as suggested by the initial solution?

For this, I expanded the simple search for Run from the initial solution into something more sophisticated. I don't know the specification of OpenXML-Documents so it might be unnecessary, but it provides also the information if the bookmark as such is empty.

var run = bookmarkStart.NextSibling<Run>();
if (run != null)
// I've found a run and suppose it has a Text
return run.GetFirstChild<Text>();
else
{
// I will go through all the siblings and try to find any Text
Text text = null;
var nextSibling = bookmarkStart.NextSibling();
while (text == null && nextSibling != null)
{
if (nextSibling.IsEndBookmark(bookmarkStart))
// I've reached the end of the bookmark and couldn't find any Text
return null;
text = nextSibling.GetFirstDescendant<Text>();
nextSibling = nextSibling.NextSibling();
}
return text;
}

Having this defined I managed to retrieve and replace correctly all bookmarks. We just forgot to solve the last issue - removing unnecessary Text-elements. In the following function, I want to remove all Text-elements within my bookmark except the parameter keep: