Project Links

Meta

Maintainers

Classifiers

Project Description

Wordinserter

This module allows you to insert HTML into a Word Document, as well as
allowing you to programmatically build word documents in pure Python
(Python 3.x only at the moment). After running
pip install wordinserter you can use the wordinserter CLI to
quickly generate test documents:

API

Inserting HTML into a Word document is a two step process: first the
input has to be parsed into a sequence of operations, which is then
inserted into a Word document. This library currently only supports
inserting using the Word COM interface which means it is Windows
specific at the moment.

Below is a more complex example including starting word that will insert
a representation of the HTML code into the new word document, including
the image, caption and list.

fromwordinserterimportinsert,parsefromcomtypes.clientimportCreateObject# This opens Microsoft Word and creates a new document.word=CreateObject("Word.Application")word.Visible=True# Don't set this to True in production!document=word.Documents.Add()fromcomtypes.genimportWordasconstantshtml="""
<h3>This is a title</h3>
<p><img src="http://placehold.it/150x150" alt="I go below the image as a caption"></p>
<p><i>This is <b>some</b> text</i> in a <a href="http://google.com">paragraph</a></p>
<ul>
<li>Boo! I am a <b>list</b></li>
</ul>
"""# Parse the HTML into a list of operations then feed them into insert.operations=parse(html,parser="html")insert(operations,document=document,constants=constants)

What’s with the constants part? Wordinserter is agnostic to the COM
library you use. Each library exposes constant values that are needed by
Wordinserter in a different way: the pywin32 library exposes it as
win32com.client.constants whereas the comtypes library exposes them as a
module that resides in comtypes.gen. Rather than guess which one you are
using Wordinserter requires you to pass the right one in explicitly. If
you need to mix different constant groups you can use the
CombinedConstants class:

This will render “Hello Word” in red. Inheritance is respected, so child
styles override parent ones.

Why aren’t my lists showing up properly?

There are two ways people write lists in HTML, one with each sub-list as
a child of the parent list, or as a child of a list element. Below is a
sample of the two different ways, both of which display correctly in all
browsers:

The second way is correct according to the HTML specification. lxml
parses the first structure incorrectly in some cases, which leads to
weird list behavior. There isn’t much this library can do about that, so
make sure your lists are in the second format.

One other thing to note: Word does not support lists with mixed
list-types on a single level. i.e this HTML will render incorrectly: