Writer Accessibility

1. Introduction

This paper contains a proposal how
StarOffice/OpenOffice.org Writer (and Writer/Web) documents can be
made accessible by using the UNO Accessibility API (UAA).

As stated in the guidelines
of document representation, the accessible objects tree for
Writer documents represents the current view of the document as it
does for any other application, too.

It is obvious that the most important accessible
objects are the ones that contain the document's text, or to be more
precise, support the AccessibleEditableText service. A real
difficulty however results from the fact that the text that is
visible on the screen might in fact be part of very different parts
of the document. If a document for instance contains headers and
footers for pages, text from a footer, a header and text from the the
body region of two different pages might be visible simultaneously.
And things might become more complex if columns, text-boxes and
footnotes get involved.

Beside text that might be contained in very
different objects, the view of a Writer document might also contain
tables, images, drawings and OLE objects.

This draft first gives an overview of the
accessible object tree for Writer documents, followed by some
explanations what reasons led to the structure. After that, a
detailed specification follows.

1. Overview

The root accessible object (i.e. the one of the
window that contains view of the Writer document) has a child object
for

the body text of the document (i.e. the text
that is distributed to the pages of a text document), except for the
very rare cases where no body text is visible at all;

every page header and every page footer that
is visible;

every footnote and endnote that is visible;
and

every text-box, image, OLE object and drawing
that is visible;

Teseobjects are called area objects
within this proposal.

There is neither an object for pages nor for
columns. This especially means that there is exactly one object for
body text that is visible currently, even if the text appears on
different pages.

With the exception of text-boxes, images, OLE
objects and drawings the only service area objects support is
AccessibleContext. This especially means that there is no
AccessibleComponent service available and therefor no geometrical
information.

Text-boxes, images, OLE objects and drawings
however do support the AccessibleComponent service. They are children
of the root object, regardless whether they in fact are bound to a
page, a paragraph, etc.

With the exception of images, OLE objects and
drawings, area objects have children that are either paragraph
fragment objects or table fragment object.

A paragraph fragment object supports the
AccessibleEditableText and AccessibleComponent services1.
It either represents the text of a paragraph or the text of a part of
a paragraph, if the paragraph contains page or column breaks. In the
later case it represents exactly the paragraph's text that is
displayed at certain page or within a certain column. In other words,
a single paragraph might be represented by more than one paragraph
fragment object if and only if contains page or column breaks. But a
paragraph fragment object never contains text from more than one
paragraphs.

A table fragment object supports the
AccessibleTable service. It represents the fragment of a table that
is displayed on one page or in one column. The table cell objects
themselves have paragraph fragment objects as children again.

All area objects contain children only for
paragraphs and tables (or fragments of them) that are at least
partially visible. If a page header for instance contains two
paragraphs, but only one of them is visible, then the page header's
area object has one child only2.

Paragraph as well as table fragment objects that
are partially visible contain their off-screen parts, too. This means
that a paragraph fragment object contains text that is not displayed
currently and a table fragment object contains cells that are not
displayed currently.

2. Design Influences

This section describes some concepts and issues
that influenced the accessible object context tree described in the
previous section.

Text Flows

As said above, the text that is shown in the view
of a Writer document might in fact be contained in different
unrelated parts of a document. Within this proposal, these parts are
called text flows. The following text flows exist:

page headers

page footers

foot- and endnotes

text boxes

table cells

text contained in drawings

body text i.e. the text that is distributed
to the page's body regions.

On the screen, the different text flows can be
distinguished by gaps or lines between them, or by other hints like
background colors. For the accessibility API a simple way to
distinguish them should exist, too. This for instance enables voice
tools to read the body text of a document without mixing it up with
headers, footers, footnotes and so on.

An appropriate way to get a differentiation
between text flows seems to be the parent/child relation the
XAccessibleContext interface offers. This requires that

a single object that supports the
AccessibleEditableText service must not represent text from
different text flows.

A single object that supports the
AccessibleContext service does not have children that represent text
from different text flow.

The area objects exactly make use of these
parent/child relation to differ between text flows. To get a unique
access to the text flow's text is seems to be reasonable to use area
objects even if if they contain one child only.

Pages And Columns

If a document is not displayed in the Online
Layout mode, then its body text flow is distributed to several
pages. On the screen, the different pages are visualized by a gray
bars between them.

There seems to be no requirement for having an
accessible object for pages itself. But in many cases, it also seems
not to be convenient that text that is accessible by a single
AccessibleEditableText service contains a page break, that is,
represents text from two different pages. This is the case if the
page before the page break contains a footer and/or footnotes, or if
the page behind the page break contains a header. If the
AccessibleEditableText for the body text would represent the text of
both pages, then its bounding box would overlap with the ones of the
headers, footers or footnotes.

Moreover it does not seem to be convenient that an
AccessibleEditableText represents text of different columns. The
simple reason for this is that an AccessibleEditableText that
contains text of more than one column would require more than one
bounding box.

Like for pages, there also seems to be no
requirement for having an accessible object for columns themselves.

Paragraphs

The third thing that has to taken into account
when defining the accessible object tree are paragraphs. On the one
hand, paragraphs divide text into fragments at reasonable positions.
On the other hand, and that's more important, they assign certain
semantics to text, like being a heading or an item of a bullet list.
These semantics put a structure at the document that is at least
helpful for navigation in a document. Therefor it seems to be
reasonable to not have text of more than one paragraph within one
accessible object, and to include the structural information a
paragraph carries into either the object's description or role.

The opposite however does not hold. A paragraph's
text has to be distributed to more than one object if the paragraph
contains a page or column break. Therefor there will be accessible
objects for paragraph portions in fact, the paragraph fragment
objects.

Though the guidelines state that text can be
splitted into on- and off-screen parts, it seems not to be convenient
to do that for paragraph portions. This means that text that is not
visible on the scroll might be accessible through an
AccessibleEditableText service, provided that also some text of the
same service is displayed. There are two reasons for that, that both
have to do with the fact the each paragraph fragment object also
supports the AccessibleComponent service that gives access to its
screen coordinates. First of all, that's the same behavior has for
any other accessible object that supports the AccessibleCpmponet
service. Secondly, and more important, hiding the non visible text
fragments would have the result that a simple scroll action might
change the objects text.

3. Details

To be continued ...

1Editor's
note: There is a open issue with read-only and generated text within
paragraphs, like fields and generated hyphens. Therefor it might be
necessary that the paragraph fragment objects do not support
AccessibleEditableText themselves, but have children that either
support AccessibleText or AcessibleEditableText.

2That's
in fact the reason these objects are called header area objects
instead of header objects. They do not represent a header, but an
area on the screen there a header is displayed.

Apache, the Apache feather logo, and OpenOffice are trademarks of The Apache Software Foundation.
OpenOffice.org and the seagull logo are registered trademarks of The Apache Software Foundation.
Other names appearing on the site may be trademarks of their respective owners.