A few months ago I was introduced to a gentleman who wanted
to work with e-mail and web browsing on his Windows 7 PC. He’d done these things
some time ago, but since then some features of the software he’d used had
changed, and he’d benefit from some help in learning the new functionality. In
addition to getting familiar with his versions of Windows, Windows Live Mail and
Internet Explorer (IE), being blind, he’d also need to become familiar with the
current version of his screen reader, Window-Eyes. So over the weeks that followed, we
ran through the steps for reading and composing e-mail and browsing the web.

During this period, I was also conscious of how someone
with MND/ALS such as this gentleman, might find it a challenge to press all the
keys required in order to browse the web. So this got me thinking about whether there’s anything I could build
which might be useful in this situation. In Particular, is there a way to perform
specific actions by pressing only a single key, and minimizing the amount of
hand movement? Perhaps one approach is to see what can be done through use of
the Number Pad keys alone.

The screen reader used has a very useful feature whereby the
default key combinations for certain actions can be replaced with other
combinations, (or a single key press), which might be preferable to the user. So
we could have changed the trigger for a certain set of actions to be NumPad keys.
But in this case, we were interested in having a key-press perform custom action
that went beyond controlling the screen reader. For example, say we have a key
that should invoke the IE Favorites list. It’s possible that when that key’s
pressed, IE is not in the foreground, or not even running at all. So in reaction
to that key, I want to start IE if it’s not running, bring it into the
foreground, and then invoke the Favorites list.

With this in mind, I set out of build a simple tool that
would allow web browsing with the Window-Eyes screen reader, only using single
key presses on the NumPad. (If this seemed to have potential, I could enhance
this to allow reading of e-mails too.) This is what I ended up with:

The app itself is a regular WinForms app, and having got
some UI in place, the next thing I did was add a low-level keyboard hook.
(Clicking the buttons in the app doesn’t actually do anything, because we’re not
interested in that input mode.) If I detected a key press from a NumPad key, I’d
post a message to my main UI, and eat the key press. I set one key to
effectively turn the app on or off, in case it’d be useful to temporarily render
the app inert while it’s running. The app has a fair bit of interop with Win32
API, so http://www.pinvoke.net was very
helpful to me as I built the app.

By the way, here are a couple of quick notes on keyboard simulation. When I
wanted to simulate an ‘l’ key press, I set the virtual key code of the key I
wanted simulating to be ‘l’. This was a bad idea, because the ascii ‘l’
character matches the virtual key code of VK_SEPARATOR, and so that’s what I was
actually simulating. Instead I should use the ascii value for ‘L’. I also wanted
to run the spyxx.exe tool to see what key code were being generated by some
keys. That tool won’t work unless everything of interest has the same bitness,
(that is all 32-bit or all 64-bit). On my system it so happened that I couldn’t
see the key code I was interested in unless I pointed my spyxx.exe to the Notepad run
from c:\windows\syswow64.

I also used one key to allow a description of the key to be
spoken rather than acting on the key press. This would help the user get
familiar with the layout of the keys. The app uses the very useful System.Speech.Synthesis to output speech itself when it needs to.

When the main UI receives the message describing what key’s
been pressed, it first takes some preparatory action like making sure the IE
window is in the foreground. (Having said that, I’m being rather relaxed about
“making sure” here. I have a couple of Thread.Sleep() in the app, where I assume
that if I trigger some action, that action will really happen before long. I
might update this at some point, to add a little verification and avoid the
assumptions.)

The bulk of what the app does next is to simulate key
presses with the SendInput(). For example, I can control IE by simulating Alt+C
to show the Favorites list. And I can control Window-Eyes by simulating ‘l’, ‘h’
and ‘p’ to move to the next link, header or paragraph on the web page.

This generates a managed wrapper around the Windows UIA API
which I can then reference as “interop.UIAutomationCore” in my project’s list of
references. Then by adding the following in my main app source file:

using interop.UIAutomationCore;

…

private IUIAutomation m_uiautomation;

…

m_uiautomation = new CUIAutomation();

I have my UIA object and I’m good to go.

I used UIA in the app for two things. The first is to
detect whether the Favorites list is visible. The app has a key to toggle the
display of the Favorites list, so I need to know whether the list is already
visible in order to know what action to take. (It’s always possible there some
keyboard shortcut which will always toggle the display of the list, but I don’t
know it if there is.)

I needed to find a way to determine if the Favorites list
is visible or not. So I pointed the Inspect SDK tool to the Favorites list. The
image below shows the results shown.

I found that when the Favorites list is visible, a UIA Tree
control appears in the UIA tree and it has a name of “Favorites”. That element
does not exist in the UIA tree when the Favorites list is not visible. So in
order to determine whether the Favorites list is visible, all I need to do is
try to find that element.

The code below shows how I did that. A very important
aspect of this is that I do not look for an element whose name is “Favorites”.
The element’s name is probably localized for worldwide use, and I’ll only find
“Favorites” on US-English system. Instead, if the element has an AutomationId I
should always base my search on that rather than on the name. The AutomationId does
not get localized, and Inspect shows me the AutomationId of the element I’m
interested in is “100”. I’ve found that element has the same id in IE7, IE9 and IE10,
so I expect it’s had this id for a long time, and my code will be robust
regardless of what version of IE is being used.

The other way I originally leveraged UIA in the app is to
invoke a button in the UI. As it happens, I did this in such a way that I broke the rules I’ve just mentioned on not basing searches for elements on
fixed US-English strings. When I first started on the app, I wrote some quick
code to have a key in the app invoke IE’s Back button. As Inspect reported, (as
shown in the image below), the button doesn’t have an AutomationId, so I found
the button from its name. This was fine for my needs at the time, but it meant
the app can’t be leveraged outside English-speaking countries, and that’s not
sufficient for me. I don’t want the limitations of my app to be the reason why
it can’t be used anywhere in the world.

So I replaced my original code with a simulated keyboard
shortcut of Backspace, which triggers a move to the previous page in IE. That
avoided the bad practice of searching for the English accessible name of a
button. For anyone interested, this is what the original code for invoking a
button looked like.

int
c_patternIdInvoke = 10000;

privatevoid InvokeButton(string
buttonName)

{

// Find the "IEFrame" window. We've already taken
action to try to make sure IE is running.

So there we have it - a simple app which through a mix of
keyboard simulation and UIA calls can provide a means to control other features
in order to browse the web with a screen reader, through single keys presses
which are in in close proximity to each other.