Topics

Featured in Development

Peter Alvaro talks about the reasons one should engage in language design and why many of us would (or should) do something so perverse as to design a language that no one will ever use. He shares some of the extreme and sometimes obnoxious opinions that guided his design process.

Featured in AI, ML & Data Engineering

Today on The InfoQ Podcast, Wes talks with Katharine Jarmul about privacy and fairness in machine learning algorithms. Jarul discusses what’s meant by Ethical Machine Learning and some things to consider when working towards achieving fairness. Jarmul is the co-founder at KIProtect a machine learning security and privacy firm based in Germany and is one of the three keynote speakers at QCon.ai.

Featured in Culture & Methods

Organizations struggle to scale their agility. While every organization is different, common patterns explain the major challenges that most organizations face: organizational design, trying to copy others, “one-size-fits-all” scaling, scaling in siloes, and neglecting engineering practices. This article explains why, what to do about it, and how the three leading scaling frameworks compare.

Open Source Word Generator Using OpenXML SDK 2.0

OpenXML SDK 2.0 for MS Office provides strongly typed part classes to manipulate Open XML documents. WorddocGenerator, an open source utility for generating template driven word files is one example of what can be done with this SDK. InfoQ got in touch with Atul Verma the developer of this utility to ask him a few questions about this project.

InfoQ: How is worddocgenerator different from other document generators like FlexDoc?

Atul: This utility

Doesn’t require that Word needs to be installed for document generation

Uses Open Xml 2.0 and Visual Studio 2010

Used Content Controls for document generation

Provides a lot of samples that cover many ways to generate a Word document e.g.

Setting content using C#(no data binding)

Data bound content controls

XPath expressions

Generate using Xml i.e. XNode or entity class e.g. Order

Though I never used FlexDoc, however I saw a warning message on the home page i.e. “WARNING: Current version of fleXdoc depends on a feature of Microsoft Word, that has been removed (sort of) in Office 2010 due to patent issues! This also applies to US-versions of Office 2007 released after november 2009.”. If that is the case then FlexDoc doesn’t seem to be appropriate for document generation.

InfoQ: How do the refreshable components work? Do they connect to the server to fetch data?

Atul: The utility expects that every content control for which data needs to be populated we need a specify a Tag in the Word template. During generation we need to map the Tag to the PlaceHolderType enum accordingly. The types of PlaceHolders are

- Recursive: This type corresponds to controls where there is 1:N relation between template and data i.e. one example will be repeating a list of Items.

- Non-Recursive: This type corresponds to controls where there is 1:1 relation between template and data i.e. one example will be showing a User name.

- Ignore: No action is required for these controls.

- Container: This type is required only for refreshable documents. We save the container region in CustomXmlPart the first time document is generated from template. Next time onwards we retrieve the container region that was saved and refresh the document. This makes the document self-refreshable.

I’ll explain the refresh operation with this example. I have a template e.g. “Test.docx”. I get a data object for which the document needs to be generated e.g. Order from my data layer(through database). The first time document is generated from template the content controls(container type) are saved to the CustomXmlPart. Let’s say that the generated document is “TestOut.docx”. Let’s say that a change happened to Order. This means that to be sync with database I need to refresh the document. I will get the document i.e. “TestOut.docx” and latest data i.e. Order object from data layer(through database) and refresh it. As the document is refreshable I don’t require “Test.docx” for refresh. I’ve covered all these types of PlaceHolders in the samples.

The utility requires a document, data object and a generator and returns the generated document. How the data is fetched is not required. Word need not to be installed for document generation.

I have added a sample which shows one of the ways to refresh the document from within the Word (e.g. right click on document and click Refresh Data) using document-level customizations for Word 2010. In this particular case utility can be hosted on Server (Word need not be installed) and invoked from the client (Word document having document-level customization). Please visit this link for more information.

InfoQ: How is the performance for generating multiple documents for the same data?

Atul: I’ve not done any performance benchmarking, however the document generation is quite fast. I wanted to create an utility for document generation using Open Xml 2.0 SDK from the point of view of POC/Samples. I’ll work during spare time on refactoring as well as performance in future.

InfoQ: Is a similar utility possible with Excel?

Atul: As this utility is specific to Word 2007/ Word 2010 it won’t work with Excel. However similar utilities/frameworks can be easily created for Excel using OpenXml 2.0 SDK e.g. ClosedXml is one such project.

InfoQ: This is a good example of what can be done using the OpenXML SDK – any other useful features that could ideally be added?

Atul: The purpose to create this Utility is

Write minimum code to generate documents

Show the samples to generate documents using approaches listed below

Generate documents that can be non-refreshable as well as refreshable

Generate documents from either Object(e.g. Order class) or XmlNode(using XPath expressions)

Setting values of content controls using C#

Using data bound content controls

Append documents to the primary document

I’d like to seek feedback about the samples that should to be added to the utility.

Check out these blogposts for more details about this utility and to provide your feedback to Atul. To learn more about OpenXML SDK 2.0, you can refer to the XML in Office Developer resources as well as MSDN.