UI Automation — Under the Hood

The desktop development technology revolved from Win32 SDK, .NET WinForm to WPF and Silverlight. The relative UI Automation testing technology changes as well.

This doc describes UI Automation technology on Windows platform in two parts. The first part introduces the technology revolution. The second part addresses the lasted UIA in detail.

Part1: The revolution of UI Automation technology

UI Automation refers controlling a UI application form another application by simulating user actions. Usually UI Automation involves the following three parts:

·Probing the source

It refers the process to identify target UI element. For example, to test calc.exe (calculator), the first step is to distinguish the calculator window from other window like notepad. To continue test the menu, it requires identification of the location of menu bar, and obtaining the second level of menu item. In other words, the basic step is to navigate the UI tree, from root desktop to the sub-controls, until the target UI element is identified.

Win32 SDK and Windows Message

Before CLR there are merely two techs for UI development: Win32 SDK or DirectX. As DirectX is mainly targeting on special field like game and CAD, I will not include it here.

No matter MFC, VCL or VB6 is used, Win32 SDK is the core. The UI element through the development life cycle is always HWND and Windows Message. There are only three parts: Win32 API, Windows Message and Windows Hook.

Usually the test client use FindWindowEx and EnumWindow to iterate window and sub control, until the testing target is identified. It sends/posts Windows Message or uses API to validate the target. For example, use WM_GETTEXT or GetWindowText to read Window title, or use GetWindowRect to read location of a button. To simulate user action, it uses SendKey API, or directly simulating WM_CHAR or WM_KEYDOWN notification.

Windows Hook enriches the choice. By using Window Hook, tester can monitor, intercept and simulate Windows Message directly. It is even used to record test case and playback.

Spy++ is not a Automation Test Tool, however, it explains how this technology works. Spy++ can identify any Win32 Window, reading Window Property or monitoring Windows Message:

The advantage of Win32 SDK and Windows Message:

Direct and flexible

There is no additional learning curve to do testing. Win32 SDK is your friend. Using Message Hook simplifies a lot of situations and provides flexible solution. It makes test hook implementation simple.

The disadvantage:

Complex and costly. There are lots of implementation details to take care. For example, some Win32 API cannot cross process. Some Windows Message can only be sent to the Window which is owned by the calling thread. It is costly to write very stable test case.

The interface is not user-friendly. Usually the test case will not invoking Win32 API directly. It requires a wrapper, which introduce development cost. Win32 API is not user-friendly for VB programmer. Different development tools like MFC, VCL and latterly .NET WinForm handles a lot of special details. It is difficult to create a tool-independent testing wrapper. For example, .NET WinForm maintains HWND dynamically. For a specified WinForm control, the internal HWND may change during the application life cycle. It conflicts with a common understanding in Win32: HWND is usually considered as unique.

Does not work for self-draw window. Open Excel worksheet, you should find every Excel Cell does not reflect a single HWND. Excel draws the cells in a container. In such case, Win32 API does not work.

MSAA

The full name of MSAA is Microsoft Active Accessibility. It is similar as distributed COM (DCOM) but not the same. It works like this. UI application exposes an interface, so that another application can use the interface to control the target. The initial goal of MSAA is for incapable people. For example, a blind person cannot see window, but she can connect a USB screen reader. The reader obtains application’s information through the MSAA interface, and sends the information to the blind person in a suitable way.

UI Automation leverages this technology. The interface exposed by MSAA is called IAccessible. The communication between UI and test client is:

·IAccessible::accName/ I Accessible:: accSelect Test client uses them to read element’s information like Name, and do some operation like select items.

·IAccessible::accValueThe UI developer can customize the value property. For example, a polyline control developer may expose the coordinate array in this property.

In real situation, tester usually combines MSAA and Win32 API. For reading UI element property and situation simulating, it uses Win32. In other side, it uses MSAA to compensate Win32, for example:

MSAA provides get_accChild method. It allows the UI tree obtaining in test can be different from the real Win32 control tree. It benefits self-drawn window. The developer can define logic tree with such method. In Excel example, each Cell can be exposed separately.

It allows the control developer to provide flexible implementation. Previously we mentioned returning coordinate array for polyline control. Another example is .NET WinForm. Microsoft provides default implementation of IAccessible so that the HWND awkward maintenance detail is well handled.

There are several tools targeting MSAA. AccExplorer works like Spy++, allowing UI tree navigation and property checking.

To implement MSAA on UI side, unmanaged situation can refer to WM_GETOBJECT in MSDN. For managed application, there are simple sample code in:

For test client with MSAA, we are not going to cover it here. We will discuss how MSAA is implicitly used in later sections.

The disadvantage of MSAA:

IAccessible is not a standard COM interface. The caller is not required to invoke CoInitialize, but the caller cannot use QueryInterface to continue obtain further interface. It restricts the extendibility.

There are defect in the interface definition. Some interface function is dispensable, while it lacks some key function to support UI automation. For example, it provides accSelect to support selection, but there is no method like accExpand to support tree control expandation.

The following doc is a good reference for MSAA’s defect and situation.

UIAutomation (UIA)

UIAutomation (UIA) is a new UI automation technology. It was released in Vista era and included in .NET Framework 3.5. It works from XP to Win7. In latest Windows SDK, UIA, MSAA and other technology that supports UI Automation are put together and called Windows Automation API

Comparing with Win32 and MSAA, I trend to think UIA as a “technology”, and MSAA/Win32 are testing “method”. A technology usually contains a model, considerate programming API, targeting for a specified problem, and allowing different implementation details. UIA could be a “technology” because:

For Win32 and MSAA, the design goal is not targeting on solving UI Automation problem, but UIA is. In part2, we will analyze details of UIA to cover different use case and scenarios. Here we first focus on UIA Client. There are managed and unmanaged UIA client APIs. The following C# code shows how to automate calc.exe to finish 3+5-2 operation. (The code may require modification to run in different Windows version. The following code is tested against Windows Server 2008 R2)

//Read name property of Text control. The name property is the output.

return btn.Current.Name;

}

}

The advantage of UIA is obvious:

It suits for different UI application, including Win32, WinForm, WPF and Silverlight. Win32 and MSAA cannot work for WPF and Silverlight because the sub-control is not HWND based.

It is compatible with traditional Win32 and MSAA. We will talk about UIA<->MSAA bridge, which allows UIA leverage MSAA and Win32 to implement. It allows original application works for UIA without any modification.

The new client test model and pattern is convenient for UI Automation. These patterns abstract the requirements of UI Automation. For example, the Invoke pattern on Button abstract the operation to hit the button, no matter by clicking or pressing. No matter it is Win32 button, WPF button or even HTML button, the pattern is the same. The testers in different technology use the same language now. The new UIA event pattern and conditional query extremely simplify tester’s work.

It provides both managed and unmanaged client API. It also provides simple and flexible model to customize UI side behavior. The developer can use IRawElementProviderSimple interface to implement WinForm control, or use AutomationPeer to extend WPF control. We will go into the detail in part 2.

It provides rich tools, docs and examples. Just like Spy++ and AccExplorer, UI Spy provides similar functions with UIA:

Technology vs framework

There are lots of advantages of UIA. However, it does not solve all the UI Automation problems. For example:

·UI synchronization and timing issue. Test client usually decide next operation step based on current UI situation. For example, to test Save As, if the path is in network, the UI may freeze a while because of network latency. Regardless the situation, if the test client simplify continue next step like creating a new document, it will fail. The correct way is to wait until the Save As finishes then continue next step. There are several ways. A simple but rude solution is to hardcode a long sleep. A better solution is to polling the status in a loop with a small time slice sleep. With UIA event pattern, the test client could try to hook the WindowClosedEvent. A completed solution may involve additional Message Loop checking, CPU utilization checking and timeout setting.

·Test code generation. It is usually dozens of sub-controls in a UI window. If the test code for sub-control obtaining and operations cannot be simplified, the coding and maintenance cost will be very high. It can be solved by using auto-code generation and ORM technology. For example, we can use tool to serialize relationship and query conditions of a window with sub-controls to an XML file, and use ORM to access the UI element easily.

·Muti-language and localization support. This topic is very critical for UI application. Usually the UI displays are read from localized resources. It requires the test client be able to access resource easily.

·Distinguish functional testing and real user simulating. In previous button clicking case, there are two solutions:using SendKey or Windows Message. If the button is hidden by some other element, SendKey will fail while Windows Message will work. The judgment is based on the goal. If the case is just targeting the behavior if the button gets clicked, Windows Message is a good solution. But if it is interface testing, SendKey will expose the bug but Windows Message will not.

The point is, just a technology is not good enough to meet these goals. To solve such problems, a lot of UI testing framework is being developed. Inside Microsoft, there are several frameworks with different design philosophy. VS2010 introduces support for UI Automation. In CodePlex, there are several frameworks like white and UI Automation Verify.

Summary

Part1 introduces the evolving of UI Automation technology. UIA will be the mainstream tech. In part2, it will focus on UIA details, extensibility, implementation and internals.

In the following article, test application which automates the UI target is usually referred as client. The target UI and controls are usually referred as server.

Part2 UIA internals and implementations

Because of the high abstract of UIA, no matter the UI type is WinForm or WPF, the programming in UIA client is uniform. However, on the server side, there are different ways to implement UIA functions. The following discusses the different scenarios.

Implement UIA on Server side

Comparing with Win32, MSAA introduces new interface for extensibility. UIA continues the model and introduces interfaces. However, UIA does not expose such interfaces for client use. Instead, UIA use such interfaces only for function implementations, and expose client model for client use.

In such design, the UI server can define its UIA function by implementing interface. UIA SDK provides both trivialand simple interfaces. For example, IRawElementProviderFragment provides rich functions, while IRawElementProviderSimple is simple.

The following code demos how to implement IRawElementProviderSimple in a WinForm. It defines own UIA name property and Value Pattern.

Run the application and use UI Spy to check, the Name property and Value pattern contains time information.

The code is quite simple. The Form class just needs to implement the interface. As the code runs in UI server, the ProviderOptions property should return ServerSideProvider. This is also called Server-side Provider.

It is worth noting the handler of WM_GETOBJECT message. To support Server-side Provider, the application should invoke ReturnRawElementProvider API to return the object which implements IRawElementProviderSimple. We will discuss this API later.

The benefits of Server-side Provider:

·Direct and powerful. It can cover every aspect of UIA functions. No matter implementing different patterns, or use Navigate function of IRawElementProviderFragment interface to define logic UI tree, Server-side provider is competent. This is the idea way when designing customer control.

·Easy to implement. It follows standard C# interface implementation. For unmanaged code, using UiaReturnRawElementProvider API is also simple. For WPF and Silverlight, it is even easier to use AutomationPeer to implement Server-side provider. We will discuss later.

Implement UIA on client side

Server-side provider does not apply for legacy Win32 and WinForm application. It is costly to modify existing code to add UIA support. If existing Win32 API and MSAA provide good enough functions for legacy application, UIA should leverage them. UIA introduced Client-side provider to solve this.

The goal of GetPropertyValue method in IRawElementProviderSimple interface is to return the UIA property. In previous sample, the property is read from server side control instance. Actually, there is no restriction to say such interface should only be put in UI server. If such information can be get from client with Win32 API like GetWindowTitle or MSAA, we can put the implementation on client side.

The first sample shows how to implement Client-side provider. First we create a simple WinForm without any UIA related code:

publicclassMyFormClient : System.Windows.Forms.Form

{

[STAThread]

staticvoid Main()

{

Application.Run(newMyFormClient());

}

public MyFormClient()

{

this.Name = "testForm";

this.Text = "ClientUIADemo";

}

}

Compile above code to WinFormServer.exe.

The following code is test client and relative UIA Client-side provider implementation:

·It is easy to leverage test hook for UIA. For example, some control may expose customized test interface or test hook, like WCF or Socket port. With Client-side provider, UIA implementation can leverage such infrastructure.

·Client-side provider requires registering before use. One way is to use RegisterClientSideProviders API like above code. Another way is to compile Client-side provider to Assembly and use RegisterClientSideProviderAssembly to register. In UI Spy, user can use Load Client-side Provider menu item to load specified provider.

·The provider class’s creator function delegation should be put in an array when registering.

·The provider class’s creator function should first check if the target matches the provider before turning the instance.

The second sample shows how the default Client-side provider leverage MSAA interface. The following code implements MSAA through ControlAccessibleObject class:

publicclassMSAAForm : System.Windows.Forms.Form

{

public MSAAForm()

{

this.Name = "testFormMSAA";

this.Text = "MSAAForm";

}

//This override function returns MSAA implementation instance

protectedoverrideAccessibleObject CreateAccessibilityInstance()

{

returnnewMyControlAccessibleObject(this);

}

publicclassMyControlAccessibleObject : ControlAccessibleObject

{

MSAAForm form;

publicoverridestring Name

{

//Return current time in MSAA’s name property

get { return form.Name + DateTime.Now.ToLongTimeString(); }

}

public MyControlAccessibleObject(MSAAForm form)

: base(form)

{

this.form = form;

}

}

}

Run the application and use UI Spy to check name property. The time information returns:

The default Client-side provider resides in UIAutomationClientsideProviders.dll assembly (In ” Program Files\Reference Assemblies\Microsoft\Framework\v3.0”). We can use reflector to check MS.Internal.AutomationProxies. ProxyHwnd class:

Use AutomationPeer to simplify the implementation

For WPF application, WPF runtime introduces AutomationPeer class to help implementation of Server-side Provider. The following code demos how to override a WPF button’s UIA Name property and Value Pattern:

With reflector, we can observe how AutomationPeer classes are implemented, and we can check how WPF bridge UIA interface with AutomationPeer. The key is in MS.Internal.Automation. ElementProxy class, which resides in PresentationCore.dll assembly:

ElementProxy class uses Proxy Pattern to convert AutomationPeer to UIA interface. The following code is copied from reflector. It is a message hander of System.Windows.Interop.HwndTarget Class:

WPF runtime uses HwndTarget class to interop with Win32 message. The CriticalHandleWMGetobject function converts AutomationPeer to IRawElementProviderSimple interface, and return it by calling ReturnRawElementProvider API, which we introduced previously.

We mentioned the default Client-side Provider automatically check and use MSAA from target. Actually, UIA uses similar way to expose UA Server-side provider implementation as MSAA interface, so that legacy tools based on MSAA can work with WPF without modification. For example, AccExplorer returns name and value for our WPF sample:

Client Server Communications

Traditional Automation method like Win32 depends on Windows message for client-server communication. There are good and bad for different communication methods. For example, Windows Message should consider if the sender and receiver are in the same thread, message should be posted or sent, how to prevent message related deadlock.

UIA uses different communications for Server-side provider and Client-side provider

·Server-side provider:

In current implementation, when UI server uses ReturnRawElementProvider to return, the API creates a named pipe. The UIA test client and server use the named pipe for communication. The process is:

We can use ProcessExplorer to check the named pipe.

Start the WPF sample, and use UI Spy to monitor the button. Download ProcessExplorer tool, and find the WPF process. Activate the “Show Unnamed Handles and Mappings” and “Show Lower Pane” menu item in ProcessExplorer from View menu. You should find several named pipe whose name starts with UIA_PIPE:

Let’s do a simple test. Right click the named pipes and choose Close Handle for force close. In UI Spy, try to refresh the node and will see:

This test proves the communication channel. There are several advantages by using named pipe to replace Windows message. For example, it fits into WPF because there is no HWND for sub-controls; not necessary to consider threading mode and message loop.

·Client-side provider:

As the client-side provider communication depends on the actual implementation. It continues use Windows Message if Win32 is used, or may use WCF/Socket if it uses test hook exposed specially.

What about memory leaks caused when the life time of the ElementProxy is greater than the life time of the UI control being automated? The ElementProxy will prevent the control from being garbage collected.

Excellent and complete article. I'd like to know how provide accessibility for controls that inherits "System.ComponentModel.Component" like ToolBarButton, these kind of components don't have a Handle and CreateAccessibilityInstance() method to override.