Accessibility in [WebKit]GTK+

This past week I’ve spent some time explaining to my mates at Samsung the basics about how accessibilityworks and is implemented in WebKitGTK+. I realized, yet again, of how messy and confusing everything can be the first time you encounter these things. After all, WebKit is quite a complex project already and accessibility is not a simple matter either.

In order to help them better understand this topic, I wrote a summary to have as reference that explains in my own words which the main pieces of the whole puzzle are, and how they relate to one another. In my experience, it’s not always easy to understand the big picture quickly, and I think this kind of documentation can be quite useful for anyone willing to contribute to accessibility in WebKitGTK+. At least it would have been useful for me when I started working on this. I only regret not having written it before, but better late than never, right?

So let’s begin then. I will start by talking about accessibility-only stuff, which are basically common to any accessible application based in GTK+. Then I will explain the bits specific to WebKitGTK+ and how they fit in the picture.

Accessibility in GTK+ applications

The parts, or “actors”, involved in any GTK+ application from an accessibility point of view are:

Assistive Technologies (ATs)

AT-SPI (Assistive Technology Service Provider Interface)

ATK (Accessibility ToolKit)

ATK <-> AT-SPI bridge

GTK+

GTK’s Accessibility Implementation

Now let’s describe all those points, one by one:

Assistive Technologies (ATs):

ATs are applications whose main purpose is to facilitate access and/or interaction with certain bits of information interesting from an accessibility related point of view, exposed by other applications. This access/interaction can be primarily output based. For instance the Orca screen reader is an AT which provides access via text to speech and/or refreshable braille to on-screen information exposed by editors, browsers, mail agents and other applications.

Other ATs are primarily input based, allowing the user to interact with the exposed applications by executing certain actions over them (e.g. clicking on a exposed link), so it’s not just about “consuming” information. Normally, ATs are called the clients and the applications exposing information the servers, as in the end it’s actually an implementation of a typical client/server architecture.

AT-SPI (Assistive Technology Service Provider Interface):

Set of interfaces that Assistive Technologies (the clients) understand and use to inspect and interact with the accessible content exposed by applications in Linux environments. At some point, “someone” has to provide actual AT-SPI objects (linked together forming a AT-SPI hierarchy) implementing several of those interfaces (depending of the type of object) so ATs can “see them”.

This is the job of the AT-SPI registry, a daemon which takes care of maintaining a hierarchy of AT-SPI objects for every single accessible application in the system, in a centralized way, so ATs can interact with them. It is worth mentioning that the parent/children relationships in that hierarchy are modelled in terms of D-Bus, so different AT-SPI objects can belong to different processes.

ATK (Accessibility ToolKit):

The toolkit used by GTK+ applications to expose accessible representations of the toolkit’s objects, along with appropriate interfaces, on the side of the applications exposing content (the servers). This representation is an almost a 1:1 match with the objects and interfaces defined by AT-SPI (that is, almost).

The main difference when it comes to understanding its place in the puzzle is that AT-SPI is what clients (ATs) understand, and that is not process-bounded (see previous point). ATK, in contrast, is what servers implement to expose accessible information, and it is process-bounded. Thus the parent/children relationships in the ATK hierarchy are modelled by actual references (pointers) between objects living in the same process.

ATK <-> AT-SPI bridge:

The glue that makes sure there’s a mapping between the ATK hierarchy living in the server process and the AT-SPI hierarchy held by the AT-SPI registry. Such a bridge is implemented in terms of D-Bus too, as it needs to communicate with the registry whenever something needs to be updated there, as well as when the server needs to react to external actions coming from ATs (e.g. perform the default action for an object).

GTK+:

The widgets toolkit normally used by applications embedding WebKitGTK+. Explaining what GTK+ is beyond the scope of this post, so I will assume you already know what it is.

GTK’s Accessibility Implementation:

Provides ATK objects implementing different ATK interfaces for every widget from the GTK+ library, and uses the ATK <-> AT-SPI bridge to communicate with the AT-SPI registry. This means that if you use standard GTK+ widgets only, your application will be accessible out-of-the-box. On the contrary, should you use custom widgets, you’ll probably have to write custom ATK objects implementing the proper ATK interfaces to make them accessible too.

So that’s all so far, when it comes to GTK+ applications. Check the following diagram for a more detailed look at all these hierarchies for a hypothetical GTK+ application exposing information and a screen reader accessing it:

As you can see, there’s an ATK tree matching the GTK+ hierarchy, and another AT-SPI tree matching the ATK one. Finally, the screen reader accesses the information through that AT-SPI tree, as explained above.

Accessibility in WebKitGTK+

Now that we already understand the basics of accessibility in GTK+ applications, let’s add the bits specifically related to WebKitGTK+. Namely:

WebCore’s Accessibility Objects

WebKitGTK+ (ATK) wrappers

WebKit2GTK+ specific details

Again, a picture is usually better than just text, so here you have one too:

In order to clarify it a bit more before explaining each point, let’s just say that you’ll have to look in the dashed box named WebCore accessibility world, where the hierarchy on the left (red & orange) represent theWebCore Accessibility objects, while the one on the right (the green one) represents the WebKitGTK+ ATK wrappers.

With this in mind, let’s examine these three points in more depth:

WebCore Accessibility objects:

Similar to GTK’s Accessibility Implementation, WebCore‘s accessibility objects are the implementation of an independent hierarchy exposing accessibility related information for objects present in a web page. As the mission of accessibility in WebKit is to expose information to users that are normally being rendered in the screen (as well as some other information that might be hidden to regular users), there is a tight relationship between this hierarchy and other ones in WebKit, such as the DOM tree and the Render Objects tree.

An ATK-based implementation of an accessibility hierarchy where every ATK object will take care of wrapping the proper accessibility object from WebCore, as well as implementing the proper ATK interfaces depending on the situation (e.g. the role of the WebCore accessibility object, some properties coming from the associated Render Object…).

The ATK hierarchy created here is connected with the ATK hierarchy from the embedding application (normally a GTK+ app) by setting the root ATK object in this tree (normally representing DOM‘s root element) as the child of the leaf ATK object in the tree coming from the embedding application (normally the GtkWidget containing the WebView).

As is the case with any other regular GTK+ application, this ATK hierarchy will finally be seen by ATs thanks to the translation that the ATK <-> AT-SPI bridge will do for us, making the whole ATK tree from the WebKitGTK+ based application (from the top level GTK+ window down to the deepest accessibility object inside WebCore) available to the AT-SPI registry by means of D-Bus.

WebKit2GTK+ specific details:

I already talked about this inpreviousposts, so I will focus here just on commenting the main difference compared to the generic case for WebKitGTK+ described earlier (see previous diagram above):

WebKit2GTK+ implements a split-process model, where the high level API belongs to one process (the UI process) while the core logic of the web engine lives in another one (the Web process).

From an accessibility point of view, this means that the full hierarchy of ATK objects we had before is also split in two parts: some accessibility objects are now in the UI process and the rest of them will be in the Web process.

To be more specific, we’ll find the following objects in each process:

As I explained previously, these two ATK hierarchies will be seen as a single accessibility hierarchy by ATs thanks to the “magic” of AtkPlug and AtkSocket classes, which takes care of exposing everything together in a single AT-SPI tree. And remember that such a tree is modelled by means of D-Bus, so it does not matter that things are actually in different processes.

Thus, since ATs just understand AT-SPI, they will see The Right Thing ™ as in the previous case where we have one single process. See the following diagram for a more visual explanation of this:

Wrapping up

So that’s it. At the end the post turned out to be longer than what I was expecting, as my initial idea was to publish the stuff I wrote internally at Samsung this week, but ended up extending it quite a lot!

At least I hope this will be helpful for anyone willing to contribute to accessibility, either in WebKitGTK+ or in a more general way.

Last, I would like to thanks Joanmarie Diggs from Igalia for her help with this blog post. One certainly feels more confident writing a long article like this one about a very specific topic when you have one of the most experienced persons on the matter reviewing it!