Rich Text Document Structure

The structured representation of a text document presents its contents as a hierarchy of text blocks, frames, tables, and other objects. These provide a logical structure to the document and describe how their contents will be displayed. Generally, frames and tables are used to group other structures while text blocks contain the actual textual information.

New elements are created and inserted into the document programmatically with a QTextCursor or by using an editor widget, such as QTextEdit. Elements can be given a particular format when they are created; otherwise they take the cursor's current format for the element.

Basic structure

The "top level" of a document might be populated in the way shown. Each document always contains a root frame, and this always contains at least one text block.

For documents with some textual content, the root frame usually contains a sequence of blocks and other elements.

Sequences of frames and tables are always separated by text blocks in a document, even if the text blocks contain no information. This ensures that new elements can always be inserted between existing structures.

In this chapter, we look at each of the structural elements used in a rich text document, outline their features and uses, and show how to examine their contents. Document editing is described in The QTextCursor Interface.

Rich Text Documents

QTextDocument objects contain all the information required to construct rich text documents for use with a QTextEdit widget or in a custom editor. Although QTextEdit makes it easy to display and edit rich text, documents can also be used independently of any editor widget, for example:

This flexibility enables applications to handle multiple rich text documents without the overhead of multiple editor widgets, or requiring documents to be stored in some intermediate format.

An empty document contains a root frame which itself contains a single empty text block. The text cursor interface automatically inserts new document elements into the root frame, and ensures that it is padded with empty blocks where necessary.

When navigating the document structure, it is useful to begin at the root frame because it provides access to the entire document structure.

Document Elements

Rich text documents usually consist of common elements such as paragraphs, frames, tables, and lists. These are represented in a QTextDocument by the QTextBlock, QTextFrame, QTextTable, and QTextList classes. Unlike the other elements in a document, images are represented by specially formatted text fragments. This enables them to be placed formatted inline with the surrounding text.

The basic structural building blocks in documents are QTextBlock and QTextFrame. Blocks themselves contain fragments of rich text (QTextFragment), but these do not directly influence the high level structure of a document.

Elements which can group together other document elements are typically subclasses of QTextObject, and fall into two categories: Elements that group together text blocks are subclasses of QTextBlockGroup, and those that group together frames and other elements are subclasses of QTextFrame.

Text Blocks

Text blocks group together fragments of text with different character formats, and are used to represent paragraphs in the document. Each block typically contains a number of text fragments with different styles. Fragments are created when text is inserted into the document, and more of them are added when the document is edited. The document splits, merges, and removes fragments to efficiently represent the different styles of text in the block.

The fragments within a given block can be examined by using a QTextBlock::iterator to traverse the block's internal structure:

Blocks are also used to represent list items. As a result, blocks can define their own character formats which contain information about block-level decoration, such as the type of bullet points used for list items. The formatting for the block itself is described by the QTextBlockFormat class, and describes properties such as text alignment, indentation, and background color.

Although a given document may contain complex structures, once we have a reference to a valid block in the document, we can navigate between each of the text blocks in the order in which they were written:

This method is useful for when you want to extract just the rich text from a document because it ignores frames, tables, and other types of structure.

QTextBlock provides comparison operators that make it easier to manipulate blocks: operator==() and operator!=() are used to test whether two blocks are the same, and operator<() is used to determine which one occurs first in a document.

Frames

Text frames group together blocks of text and child frames, creating document structures that are larger than paragraphs. The format of a frame specifies how it is rendered and positioned on the page. Frames are either inserted into the text flow, or they float on the left or right hand side of the page. Each document contains a root frame that contains all the other document elements. As a result, all frames except the root frame have a parent frame.

Since text blocks are used to separate other document elements, each frame will always contain at least one text block, and zero or more child frames. We can inspect the contents of a frame by using a QTextFrame::iterator to traverse the frame's child elements:

Note that the iterator selects both frames and blocks, so it is necessary to check which it is referring to. This allows us to navigate the document structure on a frame-by-frame basis yet still access text blocks if required. Both the QTextBlock::iterator and QTextFrame::iterator classes can be used in complementary ways to extract the required structure from a document.

Tables

Tables are collections of cells that are arranged in rows and columns. Each table cell is a document element with its own character format, but it can also contain other elements, such as frames and text blocks. Table cells are automatically created when the table is constructed, or when extra rows or columns are added. They can also be moved between tables.

QTextTable is a subclass of QTextFrame, so tables are treated like frames in the document structure. For each frame that we encounter in the document, we can test whether it represents a table, and deal with it in a different way:

Lists

Lists are sequences of text blocks that are formatted in the usual way, but which also provide the standard list decorations such as bullet points and enumerated items. Lists can be nested, and will be indented if the list's format specifies a non-zero indentation.

Since QTextList is a subclass of QTextBlockGroup, it does not group the list items as child elements, but instead provides various functions for managing them. This means that any text block we find when traversing a document may actually be a list item. We can ensure that list items are correctly identified by using the following code:

Images

Images in QTextDocument are represented by text fragments that reference external images via the resource mechanism. Images are created using the cursor interface, and can be modified later by changing the character format of the image's text fragment: