If you use the WAI-ARIA role “application”, please do so wisely!

This goes out to all web developers out there reading this blog and implementing widgets and other rich content in HTML, CSS and JavaScript! If you think of using the WAI-ARIA role “application” in your code, please do so with caution and care! And here’s why:

What is it?

“application” is one of the WAI-ARIA landmark roles. If you’d like to read up on landmarks, please go here. It is used to denote a region of a web application that is to be treated like a desktop application, not like a regular web page. In other words, if you use something that is not part of standard HTML, like a mashup widget, and your page has no resemblance to a classic document in, roughly, over 90% of its content, “application” is for you. If you, on the other hand, make up a user interface solely of elements that are part of standard HTML like selects, check boxes, text boxes etc., and in addition use only common compound widgets and lots of document-like content like hyperlinks, you will most probably not want to use “application” because browsers and assistive technologies provide a standard interaction model for these already and don’t need special support from you in that.

Why use it at all?

Traditionally, assistive technologies like screen readers for the blind convert a page’s content into a format that is easier for a person with a disability to comprehend. A two-column newspaper style text, for example, is reformatted so that the text flows from beginning to end like it would in a single-column document. Multi-column layouts of pages are streamlined so that there’s a structured flow a person who, for example, cannot see the screen, can understand.

This is the mode screen readers especially on Windows operate in when the user browses a web page. The term virtual cursor has been used since its inception in 1999 because this feels to the user like a document in, for example, WordPad or MS Word. The user walks the document using the arrow keys and has the text to them read via the speech synthesizer. In addition, semantic information is spoken to indicate whether a particular piece of text is a link, graphic, form element, part of a data table structure, list etc. In addition, several keys are captured by the assistive technology and are not processed by the browser. These allow navigation by headings, lists, links, tables, form elements and others. Usually, these are done via single letters. The visual focus may or may not follow the virtual cursor onto focusable items, depending on the assistive technology in use and its settings.

Forms mode or focus mode

This is a mode where the user interacts with elements that accept a form of data entry. This may be via entering text, checking a check box, selecting one of several possible radio buttons, or making a selection in a select element. This mode is either invoked by putting the virtual or browse focus onto such an element and pressing a key (usually Enter), or by the assistive technology switching to this mode automatically when the virtual focus encounters such an element. Others may only activate this mode automatically when specifically using the tab key. In this mode, all keys are passed through to the browser. It is as if you were sitting in front of your browser using it with the keyboard, and don’t have a screen reader running. Likewise, if the application focus leaves such an element that supports or requires direct keyboard interaction, if it was switched on automatically, it will be switched off and the user returned to browse/virtual mode. Note: Some elements like buttons and links do not require the user to switch into focus or forms mode, because for efficiency, screen readers allow activation of these elements directly from virtual/focus modes.

The challenge is that you may be creating widgets that require you to force the user into direct interaction with the browser. You know that your widget can best be used via the keyboard if the user is not in virtual mode. In addition, you know that you have no classic document content to display, but only use widgets and provide all necessary context through them. This is what role “application” is for. It is under your control whether the user is being thrown into focus mode once your widget gains keyboard focus. Also, contrary to standard focus mode, if an assistive technology encounters an area that is marked up with role “application”, it is usually not so easy to manually exit out of that mode to review the surrounding content in browse mode.

So when should I use it, and when not?

You do not use it if a set of controls only contains these widgets, that are all part of standard HTML. This also applies if you mark them up and create an interaction model using WAI-ARIA roles instead of standard HTML widgets:

text box. This also applies to password, search, tel and other newer input type derivates

textarea

check box

button

radio button (usually inside a fieldset/legend element wrapper)

select + option(s)

links, paragraphs, headings, and other things that are classic/native to documents on the web.

You also do not use the “application” role if your widget is one of the following more dynamic and non-native widgets. Screen readers and other assistive technologies that support WAI-ARIA will support switching between browse and focus modes for these by default, too:

tree view

slider

table that has focusable items and is being navigated via the arrow keys, for example a list of e-mail messages where you provide specific information. Other examples are interactive grids, tree grids etc.

A list of tabs (tab, tablist) where the user selects tabs via the left and right arrow keys. Remember that you have to implement the keyboard navigation model for this!

dialog and alertdialog. These cause screen readers to go into a sort of application mode implicitly once focus moves to a control inside them. Note that for these to work best, set the aria-describedby attribute of the element whose role is “dialog” to the id of the text that explains the dialog’s purpose, and set focus to the first interactive control when you open it.

toolbar and toolbarbuttons, menus and menu items, and similar

You only want to use role “application” if the content you’re providing consists of only interactive controls, and of those, mostly advanced widgets, that emulate a real desktop application. Note that, despite many things now being called a web application, most of the content these web applications work with are still document-based information, be it Facebook posts and comments, blogs, Twitter feeds, even accordeons that show and hide certain types of information dynamically. We primarily still deal with documents on the web, even though they may have a desktop-ish feel to them on the surface.

In short: The times when you actually will use role “applications” are probably going to be very rare cases!

So where do I put this thing?

Put it on the closest containing element of your widget, for example the parent div of your element that is your outer most widget element. If that outer div wraps only widgets that need the application interaction model, this will make sure focus mode is switched off once the user tabs out of this widget.

Only put it on the body element if your page consists solely of a widget or set of widgets that all need the focus mode to be turned on. If you have a majority of these widgets, but also have something you want the user to browse, use the role “document” on the outer-most element of this document-ish part of the page. It is the counterpart to “application” and will allow you to tell the screen reader to use browse mode for this part. Also make this element tabbable by setting a tabindex value on it so the user has a chance to reach it. As a rule of thumb: If your page consists of over 90 or even 95 percent of widgets, role “application” may be appropriate. Even then, find someone knowledgeable who can actually test two versions of this: One with and one without role “application” set to see which model works best.

NEVER put it on a widely containing element such as body if your page consists mostly of traditional widgets or page elements such as links that the user does not have to interact with in focus mode. This will cause huge headaches for any assistive technology user trying to use your site/app.

Some examples

The behavior that originally prompted me to write this is the newest version of the layout of Gmail. It looked to me like Gmail treats the whole thing as role “application”, causing the user to tab a zillion times before actually getting to the inbox message table. It turned out to be a bug which has been haunting us for a few months now, where parts of the accessible tree (the thing we create from HTML and CSS) gets detached from the rest, causing a very similar behavior in screen readers like role “application”. If virtual cursor was still active, the users could press t in their screen reader once and jump to that table right away instead of tabbing 30 or 40 times. A temporary solution to this problem seems to be this: Google provide us with keyboard shortcuts j and k. These can be disabled in the account settings of Gmail. That makes the bug disappear. Had Google used role “application” here, it would have been inappropriate. Gmail is, below the surface, still strongly document based and thus many traditional interaction modes do apply here.

An example where roles “application” and “document” are really used wisely is the current Yahoo! mail web interface. The table where the messages are being displayed in a list is marked up to be an “application”, because the arrow keys are used to navigate between messages, enter opens one etc. Once a mail is being displayed, everything around the actual mail header and text is an application, but the mail header and text are a document, so role “document” is used and initial focus is put there so the user can immediately start reading their mail in a familiar browsing fashion.

funny disclaimer: Yahoo! do not pay me for constantly calling them out as a good example of an accessible web app. They just did a bang-up job, that’s all!

Closing comments

If you have questions about this, feel free to post them here on the blog, I’ll do my best to answer them and also incorporate answers to updates to this post. You can also find the Google Group free-aria and discuss questions there with other web developers, standards experts and users. This is an important topic that, if done right, can provide all your users with a rich and pleasant experience, but if done wrong, can also cause headaches, complaints, and cause your web app to be less likeable by a sizeable number of users.

So use role “application” only if you absolutely have to, and tests show that this provides the better interaction model! Use it wisely when you do it, it’s worth the effort!

Update February 7

I’ve incorporated comments that were posted to the original version of this blog post into the post, making the statements even stronger hopefully. Thanks to Jamie for pointing out a couple of very important points that make the point against using role “application” in most cases even stronger!

Update Feb 9, 2012

Google’s accessibility team contacted me, stating that they actually don’t use role “application” in Gmail. After further investigation, it turns out that this seems to be another case of a bug we’ve been hunting for months, but which the main person who can fix it, cannot reproduce. I’ve thus updated the example section with the appropriate info. Thanks to Google for getting back to me and providing these details!

31 comments:

Great article on the ‘application’ role! This could be turned into documentation for that role on MDN with only slight adaptation (similar to Using the alert role). If that’s OK with you (I didn’t find a copyright or licensing statement on your blog), I’ll add that to the list of possible tasks for MDN contributors.

Great post for the most part, though I very strongly disagree with a few points which I think could actually make this situation worse.

First, it’s important to note that role=”application” does not just enable focus mode. In focus mode, you can normally hit escape or some other key to switch back to browse mode. Also, if auto focus mode switching is disabled, the screen reader won’t automatically switch to focus mode when an interactive widget gains focus. In contrast, in NVDA, if role=”application” is used, you cannot switch back to browse mode at all; as far as NVDA is concerned, this is no longer a document. As I understand it, the same is true for JAWS, though you can force enable the virtual cursor there.

role=”application” should never be used unless you truly believe a given section or the entirety of what you are developing should behave like an application and not like a document. Tree views, sliders, interactive tables, tab controls (tab/tablist), toolbars, menus, etc. are all single compound widgets and should be handled just fine by browse mode/focus mode switching (excepting bugs in screen readers). This enables any of these widgets to be used as part of a document without breaking normal document interaction, just like the traditional widgets.

role=”application” is placed on the body of the Yahoo! Mail interface because the entire site has been structured to enable interaction as an application. To be honest, it is probably the only mainstream example I have ever seen where role=”application” is necessary and correctly used. Google Documents and Spreadsheets also do this, but the problem there is that there are too many links, etc. in the tab order before you hit the main part of the application.

Dialog and alertdialog should already be implicitly treated as applications, so a specific application role isn’t required here either.

One thing I cannot emphasize enough is that, if an area is defined as an application, the developer/designer *must* make sure that sufficient keyboard commands are made available for the user of assistive technology. In the Gmail example listed, there are several keyboard commands available. There appears to be a disconnect between those keyboard interactions and screen reader output. For example, Using J and K will allow the user to move between messages. Then use O to open the message, x to mark the message, etc. Other than the assistive technology user not knowing these keyboard commands, there is an additional problem of the screen reader not providing appropriate feedback. Even if the focus is visually moving from message to message with j and k, there is no audible confirmation, leaving the user completely confused. I know that both Marco and James in his comment have made this point, but, when the application role is used, start thinking about how you expect the blind user to interact with your application and not just your web page. At that point, it is not the web site that the user is interacting with;it’s your application.

First and foremost, thanks for your great comments! Here are a few points I’d like to respond to.

Jamie writes:

First, it’s important to note that role=”application” does not just enable focus mode. In focus mode, you can normally hit escape or some other key to switch back to browse mode. Also, if auto focus mode switching is disabled, the screen reader won’t automatically switch to focus mode when an interactive widget gains focus. In contrast, in NVDA, if role=”application” is used, you cannot switch back to browse mode at all; as far as NVDA is concerned, this is no longer a document. As I understand it, the same is true for JAWS, though you can force enable the virtual cursor there.

Where did this paradigm that, when role “application” is being used, browse mode can no longer be enabled, come from? Could it not be implemented in a less strict fashion?

Jamie writes:

role=”application” should never be used unless you truly believe a given section or the entirety of what you are developing should behave like an application and not like a document.

yes I agree that this is the main point to make! And I’m considering making the above blog post even more restrictive based on your and others’ comments (some of which I received through other channels). But what’s the guideline? How can we help web developers make that decision?

Jamie continues:

Dialog and alertdialog should already be implicitly treated as applications, so a specific application role isn’t required here either.

Right, thanks for the pointer! This is, of course, assuming that the screen reader vendor treats these roles properly.

Pratik writes about Gmail:

Other than the assistive technology user not knowing these keyboard commands, there is an additional problem of the screen reader not providing appropriate feedback. Even if the focus is visually moving from message to message with j and k, there is no audible confirmation, leaving the user completely confused.

The problem here, I think, is that they change the focus visually only. or they use elements to focus they didn’t make focusable before, causing browsers not to emit focus events. These are usually what tell screen readers that the element of interaction has changed. Or a combination of both above mentioned.

Again: I’m certainly going to update this post to incorporate some of the points mentioned in your comments, so stay tuned!

Inclusion of role=”application” as a landmark is a point of confusion, as is the differing affects it has across platforms. As far as I know, on Mac OSX and iOS role=”application” has no effect other than to indicate a region unlike Windows screen readers. This may be a dumb question, but why can’t screen readers just set the appropriate interaction mode based on widget roles?

Steve, Mac OS X and iOS use a completely different user interaction paradigm. There is no virtual cursor in either of those operating systems, you interact with VoiceOver commands/gestures, and if you need to input something and focus an element, you just start typing. The only exception is if quick navigation is enabled by pressing left and right arrow keys simultaneously. This causes all keyboard input to be captured even when the focus is in a text field or the like. For them, there is not really a difference in interacting with a web page and objects in other applications. On iOS, for example, you can navigate to headings in the list of contacts in exactly the same way as you do to headings in HTML documents: You set the rotor to navigate by headings and then swipe up and down.

You then ask:

why can’t screen readers just set the appropriate interaction mode based on widget roles?

They do, in many cases, as I point out above. In NvDA, for example, you can tab to a tree view, and NVDA will automatically switch to focus mode. The tricky part is that there is no standard way to tell a screen reader that “this element is interactive”. Screen readers can only whitelist roles for which they allow switching to focus mode. If we had something standard that widget authors could use, that would be extremely helpful!

If you have a group of controls which behave as though they’re some sort of application, and these only take up a part of the web page, then the reasons for wanting to label that region with the role application include:
1. It lets the user know that the controls in the region form some sort of application.
2. Within this region, unmodified or shift modified letters can be used as shortcuts.
Note that both these reasons don’t depend on whether the controls are native or standard sorts of controls known to screen readers.
However, given current screen reader support, there are some problems:
1. Given that you’re probably tabbing around the controls in an application region, it might be helpful if the screen reader told you if you tabbed into and out of the region. However Jaws and NVDA don’t do this.
(Jaws 13 tells you when you enter and leave regions such as banner and main as you read the page, and the above suggestion is just an extension of this.)
2. It’s difficult to move to the part of the web page immediately after an application region. You can tab out of the region, but that’s not much use. Jaws allows you to manually come out of forms mode, and then you can read to the end of the region, but depending on the controls, that could be tedious. Maybe a screen reader command could do this.

I have found no way of terminating application mode using Jaws apart from moving focus away from the content. the Jaws key +z is supposed to toggle from Virtual cursor mode to auto, but the Help text provided for this command explicitly indicates that this toggling functionality does not work if the application role is in effect. Turning forms mode off with escape or the numberpad plus key has no effect either.

This is a great article and I will recommend it to anyone who might be interested.

@ Marco. I was referring to bug xxxxxx that to my memory pertains to the WAI-ARIA role. I received the bug progress via e-mails in Thunderbird’s Inbox and wondered if and how that pertained to this article.

Where did this paradigm that, when role “application” is being used, browse mode can no longer be enabled, come from? Could it not be implemented in a less strict fashion?

There are both practical and principle reasons to strictly treat them differently. The practical reasons include the following. (I’m using NVDA terminology here, but the same is true for other Windows screen readers.)
In focus mode, certain keys such as escape and alt+upArrow are still captured to switch back to focus mode (the latter is conditional).
Also, if auto focus mode is enabled, the screen reader would switch between focus and browse modes depending on the focused element, which should not happen inside an application.
The screen reader shouldn’t have to do browse mode rendering at all for applications, which improves performance.
If the body is marked as an application, NVDA at least doesn’t even know it’s a web document.

In terms of principle, if an author has marked a region as an application, it should be treated as such. Otherwise, they don’t get the full power of “being an application”. (As always, power incurs responsibility.)

Marco continued:

But what’s the guideline? How can we help web developers make that decision?

I agree this is tricky. Your revised advice is probably as good as we can get: only use it if you want a user to navigate this region in a desktop-like fashion (as opposed to document-like fashion).

Marco’s revised article says:

▪dialog and alertdialog. These cause screen readers to go into a sort of application mode implicitly once focus moves to a control inside them.

For NVDA at least, this is not just “a sort of application mode”. It really is the same thing. (For the technically curious, a better way to think of this is that it *doesn’t* use browse mode. Only documents use browse mode; documents are special, not applications. There is actually no “application” mode in NVDA. I believe JAWS does have a separate app mode though.)

Marco wrote:

The problem here, I think, is that they change the focus visually only.

Yup, I had to do some serious Greasemonkey magic on Twitter for the same reason. Basically, I detect the visual focus and make it real focus. It is a very annoying habbit in quite a few mainstream web sites.

Steve Faulkner wrote:

Inclusion of role=”application” as a landmark is a point of confusion, as is the differing affects it has across platforms.

The key then is to understand when and why to use it, not what effect it will have on a given platform.

Steve continued:

This may be a dumb question, but why can’t screen readers just set the appropriate interaction mode based on widget roles?

Aside from the answers already given, another reason is that sometimes, an application globally overrides the arrow or alphanumeric keys; e.g. j and k to move to previous/next item or g to go somewhere. In this case, if you’re focused on a link (which would normally trigger browse mode) and there’s no way for the author to specify this is an application, those shortcut keys would be unusable. Btw, to answer a later question, focus isn’t an indicator for the same reason: focus on a link suggests browse mode.

Marco wrote:

Mac OS X and iOS use a completely different user interaction paradigm. … if you need to input something and focus an element, you just start typing. The only exception is if quick navigation is enabled by pressing left and right arrow keys simultaneously. This causes all keyboard input to be captured even when the focus is in a text field or the like.

While many claim there are only advantages to this approach and that it frees you from the mode switching problems of the Windows world, it actually doesn’t. Quick nav mode is just that: a different mode. At least on iOS, quick nav mode is enabled by default a lot of the time and it switches automatically if you activate an editable text field (and then switches off when you navigate out). This means that users still have to be educated about modes. This leads me to ask: why do people complain about these “modes” in Windows screen readers while the same complaint is not made about iOS? Why is it so unacceptable for screen reader users to just be a little educated about them? In NVDA, for example, you can lock focus mode with NVDA+space, after which it will not switch to browse mode again until the user does it manually or a different page loads. This is no more complex than quick nav mode on iOS.

There is this ‘Application Mode’ JAWS has. It can be activated whenever the add-ons manager is invoked with CTRL+Shift+A [JAWS says ‘Application Mode on’. I think NVDA has the same thing too if I recall but I do not know definitively. This is extremely annoying because I can only use Tab and Shift+Tab to navigate. Is that the mode of which James mentioned and if so what is its real purpose?

Earlier today I removed ‘ role=”application”’ from a Firefox extension’s options file [it defines the dialogue using styling and JavaScript]. I swear I felt free of constraints previously employed. I now understand completely what you mean when you said use it wisely. I then e-mailed the dveloper asking them to remove it for the next public update. Can somebody provide examples where it would actually be a good and useful asset? I really dislike it thus far.

I know this is a somewhat old post, but I want to discuss a kind of an edge case about modal dialogs.

I just logged in to my ebanking environment and got a message telling me about updated authentication procedures. This was marked as a modal dialog and focus was restricted to the dialog as it should. However, the dialog contained:
– Text about the new codes
– A link to get more information
– A button to close the dialog and continue to the banking environment (has focus by default)

(Sorry for the markup, tags are not allowed it seems)

Marking this as a dialog implied:
1. I had to read the text using NVDA object nav or JAWS cursor
2. The heading containing the title of the dialog was kind of useless in this mode

So we have a kind of a clash here. The thing inside the dialog could be seen as a document because it only contains standard HTML content. On the other side, this is meant as a modal dialog.

I think the situation could be made better by using aria-describedby to let the screenreader autoread the dialog, or by not marking the whole thing as a dialog at all.

This blog post is the closest thing to practical guide lines around ARIA application and associated roles I know, and I cannot figure out the best thing to do here. Any thoughts/comments greatly appriciated.

@Michael, it really depends on what the web app does. If it makes sure that everything is keyboard accessible and all parts are readable through being dialog text or labels to inputs etc., then it might be appropriate to use role “application”. This should always be a per-application decision and never one based solely on the toolkit used.

Hate to resurrect an old post, but I have another $.02 from web app framework developer’s perspective. Having put in a lot of work to handle keyboard navigation, it’s really frustrating to see that not all screen readers actually trust the developer to do the right thing. I’d go with @James on this, having role=”application” should disable virtual cursor completely, otherwise it’s kind of pointless – the app is still messed up because of screen reader going in and out forms mode all the time, and JAWS does exactly that.

The most frustrating thing is that it’s unpredictable; one of the examples I have is when there are 4 buttons rendered in a toolbar, and tabbing over them switches virtual cursor on every other button. The HTML markup is the same (that’s a demo app) and I can’t even think of a reason why JAWS does that but it does. It probably tries to be smart on behalf of the user, but fails miserably – it should have turned virtual cursor off completely and let the dev handle it; all it manages to do now is to make the app totally unusable.

Fortunately, disabling virtual cursor with JAWS-z works but requiring users to do that manually just to use the app is not going to work. If somebody knows a way to force JAWS to disable virtual cursor from HTML markup I would really appreciate to get the know-how.

What about the case of a product demo that replicates the experience through html? Basically there is a stage that looks like an iPhone, and an interactive product demo of a 4 step process. There will at most be one button per screen/state of the demo.

Great article, thanx a lot.
We are specifiying a page with lots of widgets and let the role=application away. Needs a clever tab- and cursor (right-, left- and down-arrow) handling in the controls-section of each widget. And needs testing.