Java Automation Using Core Java

Abstract

Automating Java Applications is a very vast domain as one solution does not fit all. This paper tries to consolidate all the learning in the process of coming up with an API that can be easily used to automate any Java application. The paper follows a Problem-Solution approach where at each stage of development, hurdles are identified and the best solution is selected from the options available.

What can We Achieve Using These Automation Techniques?

Java Automation technique deals with code that can be used to control the execution of specific UI operations in Java. This is helpful in the following scenarios:

2. To obtain specific information by performing repetitive operations (complex or simple) on the UI.

Both of above scenarios are achievable in just a single click or by running a single program. All the operations on the application get performed automatically and the user gets the end result.

So there are basically two usages of this technique.

To automate UI testing

To perform the same sequence of operation with different data

There are multiple approaches that can be used to achieve this. Each approach comes with its own pros and cons. Each approach is discussed in detail in the next section.

Approaches for Automating

The following approaches are available for automating a Java Application:

Using robot class to automate

Using reflection to launch the application

Injecting code into another application

We will discuss each of these approaches in detail.

1.Using Robot Class (Imperfect Approach)

Robot is one the class provided by Java to simulate mouse click / keyboard events. We can use this class to automate not only Java but any application.

However, with Robot, issue arises while providing the location coordinates. Robot basically simulates a mouse click at the current position of the mouse. Hence hard coding the mouse location is not a good idea as the location depends on whether the application is opened to full screen. If not full screen, then what is the x, y position of the top-left end of the application? Moving the application a little on the screen changes all the x, y positions of all components. Also if screen resolution is changed then also x, y positions of each component will be changed. Another disadvantage is that users will have to keep the application on top and wait until the automation finishes. This blocks valuable user time.

Here are some pros/cons of this approach in brief:

Automation Possible Scenarios

Automating applications will be possible by using this approach if we consider the following scenario.

Screen resolution is same across all desktops

Each application opens with fixed width and height or application opens in maximized state.

If application is not opening in maximized state then (x,y) co-ordinate for the specific application remains the same for each run.

User should not perform any other operation while automation is running.

Assuming the above conditions are met, we can automate not only java, but any technology application. This technique is solely dependent upon where the application is on the screen. So any application in any technology is possible to automate.

Advantages

Disadvantages

When the application is moved in the screen then this method cannot be used

User needs to wait till the operation is completed

If something pops up while automation is running then automation will fail

If screen resolution is changed then this approach will fail

Background processing of the automation is not possible

Retrieving values from any field is not possible

So simulating clicks on the UI is not a good approach and we will have to go inside Java to get more accessibility and access actual objects to perform the required operation.

Here the question arises as to how we can access the objects of a third party application. The next 2 approaches explain how we can go about it.

2.Using Reflection (Imperfect Approach)

The second approach manipulates how Java applications get launched using the JVM.

This approach uses the Reflection API. In this approach, instead of JVM launching the application, it is routed through the custom launcher. Custom launcher reads the main jar file and also loads all dependency jar files. Then it loads the entry class (class with main method) and executes main method from the class.

As custom launcher has launched the application using Reflection, it has control over the JVM on which the application is running and so it can access any of the UI component objects from the application. Once the object from the UI is accessible, it is easy to perform any required operation on the object. These operations can include setting text, getting text, dispatching events, etc.

Though at this stage Java Reflection looks like a favorable approach, further investigation reveals issues while automating applet applications. Specifically in scenarios where an applet takes dynamic parameters from the web page or where the applet is launched using complex logic on a web page, it is not feasible to use this approach. Also the Java applications, which are wrapped as exe, are also not possible to load. For thick client applications where launching java applications has complex steps, this approach cannot be used or it may require more complex coding to be done.

Automation Possible Scenarios

Thick client applications, which can be run using a simple command java –jar UIApplication.jar, can be automated easily. If the command has more properties set in order to run the application with –D options then it will be difficult to automate, or it will need more effort to make it work.

Applet applications with simple tag <Applet class=”my.applet.example.ClassName” width=”500” height==”300” /> can be automated. If the applet tag is generated using JSP or if it is retrieved from the application server after few steps then it becomes difficult. This approach goes to discarding state when a user wants to automate an applet, which takes input as parameter like token from an HTML or JSP page.

The same is true in the case of JNLP. Applications with a simple static JNLP file can be automated, but difficulties start when JNLP files are generated dynamically.

Advantages

Perform Mouse click for specified object

Dispatch keyboard events for specified objects

Retrieving value from specified objects

Does not depend on screen resolution/position of application on screen

Background processing is possible

As this runs in background, user can simultaneously work on other things

Disadvantages

All Applets cannot be automated using this approach

Dynamically generated JNLP files cannot be automated

Thick client apps with complex launching logic will be difficult to automate

Thick client apps, which use batch file to launch and having complex launch logic cannot be automated

Jar files wrapped as exe cannot be automated

It is required to give extra permission for socket connection and listening to AWT events and depending on the app may require giving all permission as well

3. Injecting Code into Running JVM (Best Suited Approach)

Java provides 2 approaches to inject code into running JVM. These are the concepts of Java instrumentation and attaching Instrumentation Agent.

We will discuss these concepts in brief here.

Java provides a class called java.lang.instrument.Instrumentation, which can be used to inject code in running JVM. Code can be injected in 2 ways.

3.1. Using Agentmain Method

For more detail about using this method, refer to Java Instrumentation from the Oracle docs. This type of injection injects the code into running JVM. Here the assumption is that the third party application is already up and running. Code can be injected into the running application by creating a Java agent jar file with the agentmain method. Once the code is injected, the agentmain method gets called. Automation related code can be executed considering the agentmain method as the entry point. Now the code written in agentmain and the third party application both reside in the same JVM. So automation code can directly access objects from the running third party application. More detail about accessing objects from a third party application and performing operations on it are explained in later sections.

Still there are some restrictions to using this method e.g., not all JVMs support this type of code injection. Also we require the JDK to be installed in order to use this type of code injection.

3.2. Using Premain Method

For more detail about using this method, read Java Instrumentation from the Oracle docs. This type of injection injects the code while starting JVM. Here, the Java agent runs even before third party application. As the method suggests, the premain method gets called even before the main method of the third party application. Here, the same as agentmain, Automation related code is executed from the premain method. These two run in the same JVM so Automation code can get ahold of UI objects easily. More detail about accessing objects from third party applications and performing operations on it are explained in later sections.

Advantages/Automation Possible Scenarios

All kinds of Java applications are possible to automate using this approach. The only pre-requisite is that before launching the app, environment variable JAVA_TOOL_OPTIONS is required to set with value as –javaagent:”<PATH TO JAVAAGENT>”. The variable can be set as process level, user level, or system level depending upon the requirement.

Advantages

Perform Mouse click for specified object

Dispatch keyboard events for specified objects

Retrieving value from specified objects

Does not depend on screen resolution/position of application on screen

Background processing is possible

As this runs in background, user can simultaneously work on other things

All kind of desktop Swing/AWT applications can be automated

All JNLP applications can be automated

All applet applications can be automated

Java applications wrapped as exe can be automated

Dynamically generating JNLP applications can be automated

Two or more applets embedded in one html can be automated

Disadvantages

It is required to give extra permission for socket connection and listening to AWT events and depending on the app may require giving all permission as well.

Using Pre-Defined Environment Variables

This is required in the case of using the premain method. This section talks about attaching the Automation Agent we have created to any Java Application.

We will have to add Java Agent along with application. To do this, we need to pass the JVM parameter‘-javaagent:<Java agent path>’

If you are running the Java application using Java command, then it is simple, just add these JVM parameters along with what you have already.

e.g.) if you are running the Java program like this: java –jar MyApp.jar then it will be: java –javaagent:”C:\AutomationAgent.jar” –jar MyApp.jar

If you are running the Java program like this: java –cp someclasspath com.MyClass then it will be: java – javaagent:”C:\AutomationAgent.jar” –cp someclasspath com.MyClass

However these environment variables do not work in a few scenarios, like if JNLP itself has the JVM argument set, then this will be ignored. If the browser doesn’t create different java.exe process for applets, then this will not work.

The alternative for this is the following:

Set, JAVA_TOOL_OPTIONS=-javaagent:<java agent path>

This works for any kind of Java application, be it Applet or JNLP or running through any command.

Accessing Objects of Third Party Applications

As we are dealing with UI applications only, we can use Abstract Window Toolkit provided by Java to achieve this. From the premain method, we can add an event listener, which will receive window opened and window closed events.

By adding above snippet of code, we can listen to all windows that are being opened. And from the event we can get the target component. Events include the top level window object. Once we get the top level window object, we can iterate top level and all its children to find child components and so on. So in this way we can get all the UI objects. After getting the handle for all the objects, we can perform any operation. E.g. we can dispatch any event that could simulate the actions we want to perform.

As we need to perform some operation, it is time specific or conditional operations. We will have to design few APIs, which can be executed in order to achieve the functionality.

Exposing API to Perform Operations

Consider a simple scenario, we want to set text to a text field and then click on a button.

So here we need two APIs, one to set text to text field and the other to click on the button. We will expose below two APIs to perform the operations.

setText(JTextField textField,String text)

buttonClick(JButton button)

These functions now need to be called from a different JVM. How do we achieve this Inter-JVM call?

The most popular approach is using the RPC (Remote Procedure Call) server-client mechanism. Another benefit of using this mechanism is that the client can be on any platform like Java, .NET, C, etc.

XML-RPC Role

XML-RPC is used to call the API from a different JVM or different platform program. The application that is required to automate will act as an XMLRPC server and the program, which calls the API exposed for automation will act as the XMLRPC client and call the XMLRPC server APIs.

A challenge while using XMLRPC is which port should we use to run the XMLRPC server. The port cannot be fixed as this may lead to a conflict later in the application when some other functionality may try to use the same port. Also, if three to four applications are running simultaneously then the same port cannot be used for each application.

The approach used here is to assign the free port numbers randomly to the application.

So the next question is while assigning port numbers randomly, how will the client come to know which port to send the request to?

To solve this problem, we have come up with the concept of a MappingServer.

Mapping Server

A MappingServer, as the name suggests, is one of the XMLRPC servers. When the application runs and assigns the port numbers randomly, it gives a call to MappingServer with port number and PID details. The MappingServer then stores these details and whenever the client wants to connect; they will request the MappingServer to get the port provided a particular PID. So this way, the challenge of identifying which port the application is running on is solved.

Block Diagram

The block diagram below helps us understand the flow and working of the each of the components.

Step 1 and 2 are simultaneous steps as launching and code injection happens at the same time.

Component Identification Mechanism

The next problem that we face is regarding the API we have exposed for accessing the component i.e.

‘public void setText(JTextField textField)’

This API cannot be directly called, as this cannot identify the textfield in the different JVM.

The challenge is to identify the right component. Java doesn’t provide any unique mechanism to identify components. If you are aware of hashCode then maybe you're thinking that we can use hashCode. But we cannot, as they will be unique for each component, but will differ for every run for the same component.

Next we evaluate two approaches to uniquely identify a component.

The first approach is assigning numbers in BFS (Breadth First Search) order to each component. So each time the application runs, the same order will be followed. But this fails when there is a dynamic addition of components on the UI.

The second approach is to identify components using searchPath. The searchPath is defined right from the parent till the component. e.g.) suppose JFrame has JPanel and JPanel has two children – JtextField and JButton. Then searchPath for JTextField will be [(frame)JFrame].[(panel0)JPanel].[(myText)JTextField] and for the button it will be [(frame)JFrame].[(panel0)JPanel].[(submit)JButton].

If there are two buttons or identical components in the same container then there are chances that both components searchPath will become the same. In this case, again we give indexes to identify these components uniquely. So for the first button the id would be [(frame)JFrame].[(panel0)JPanel].[(null)JButton]{0}and for second button the id would be[(frame)JFrame].[(panel0)JPanel].[(null)JButton]{1}.

This is just an illustrative example. So, in this way we can identify the component uniquely.

Thus we change the signature of the APIs exposed. The new signatures will be,

setText(String searchPath,String text)

buttonClick(String searchPath)

We will have to first get the object according to the search path and then perform the required operation.

Conclusion

Thus, in conclusion, we have walked through the process of automating any kind of Java applications and in the process have identified and overcome the various hurdles faced in the path. The paper illustrates the best approach to pick in order to automate any kind of Java application. The paper talks in brief about the approaches that are not suitable as well. In the walkthrough, we have concluded that Injecting code into a third party application using java instrumentation is the best suited approach for automating Java applications.

References

This article was written by Vitthal Kavitake. The information in this presentation is accurate as of its publication date; such information is subject to change without notice. The author acknowledges the proprietary rights of other companies to the trademarks, product names and such other intellectual property rights mentioned in this document. Except as expressly permitted, neither this presentation nor any part of it may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, printing, photocopying, recording or otherwise, without the prior permission and/or any named intellectual property rights holders under this presentation.