AST 2012 – Proceedings

Preface

As proud co-chairs of the 7th workshop on Automation of Software Test, we would like to take this
opportunity to welcome you to AST2012 and to thank all participants for taking the time out from
their busy lives in their home countries to make the effort to attend this workshop.

Security

We propose a light-weight, yet effective, technique for fuzz-testing security protocols. Our technique is modular, it exercises (stateful) protocol implementations in depth, and handles encrypted traffic. We use a concrete implementation of the protocol to generate valid inputs, and mutate the inputs using a set of fuzz operators. A dynamic memory analysis tool monitors the execution as an oracle to detect the vulnerabilities exposed by fuzz-testing. We provide the fuzzer with the necessary keys and cryptographic algorithms in order to properly mutate encrypted messages. We present a case study on two widely used, mature implementations of the Internet Key Exchange (IKE) protocol and report on two new vulnerabilities discovered by our fuzz-testing tool. We also compare the effectiveness of our technique to two existing model-based fuzz-testing tools for IKE.

The implementation of an authorization system is a difficult and error-prone activity that requires a careful verification and testing process. In this paper, we focus on testing the implementation of the PolPA authorization system and in particular its Policy Decision Point (PDP), used to define whether an access should be allowed or not. Thus exploiting the PolPA policy specification, we present a fault model and a test strategy able to highlight the problems, vulnerabilities and faults that could occur during the PDP implementation, and a testing framework for the automatic generation of a test suite that covers the fault model. Preliminary results of the test framework application to a realistic case study are presented.

The goal of security testing is to detect those defects that could be exploited to conduct attacks. Existing works, however, address security testing mostly from the point of view of automatic generation of test cases. Less attention is paid to the problem of developing and integrating with a security oracle.
In this paper we address the problem of the security oracle, in particular for Cross-Site Scripting vulnerabilities. We rely on existing test cases to collect HTML pages in safe conditions, i.e. when no attack is run. Pages are then used to construct the safe model of the application under analysis, a model that describes the structure of an application response page for safe input values. The oracle eventually detects a successful attack when a test makes the application display a web page that is not compliant with the safe model.

By changing the way software is delivered to end users,
markets for mobile apps create a false sense of security:
apps are downloaded from a market that can potentially be
regulated. In practice, this is far from truth and instead, there
has been evidence that security is not one of the primary design
tenets for the mobile app stores. Recent studies have indicated
mobile markets are harboring apps that are either malicious or
vulnerable leading to compromises of millions of devices. The
key technical obstacle for the organizations overseeing these
markets is the lack of practical and automated mechanisms to
assess the security of mobile apps, given that thousands of apps
are added and updated on a daily basis. In this paper, we
provide an overview of a multi-faceted project targeted at
automatically testing the security and robustness of Android
apps in a scalable manner. We describe an Android-specific
program analysis technique capable of generating a large
number of test cases for fuzzing an app, as well as a test bed
that given the generated test cases, executes them in parallel on
numerous emulated Androids running on the cloud.

Surveys

While mobile applications are becoming so extraordinarily adopted, it is still unclear if they deserve any specific testing approach for their verification and validation. This paper wants to investigate new research directions on mobile applications testing automation, by answering three research questions: (RQ1) are mobile applications (so) different from traditional ones, so to require different and specialized new testing techniques?, (RQ2) what are the new challenges and research directions on testing mobile applications?, and (RQ3) which is the role automation may play in testing mobile applications? We answer those questions by analyzing the current state of the art in mobile applications development and testing, and by proposing our view on the topic.

There is a documented gap between academic and
practitioner views on software testing. This paper tries to close
the gap by investigating both views regarding the benefits and
limits of test automation. The academic views are studied with
a systematic literature review while the practitioners views are
assessed with a survey, where we received responses from 115
software professionals. The results of the systematic literature
review show that the source of evidence regarding benefits
and limitations is quite shallow as only 25 papers provide
the evidence. Furthermore, it was found that benefits often
originated from stronger sources of evidence (experiments and
case studies), while limitations often originated from experience
reports. We believe that this is caused by publication bias
of positive results. The survey showed that benefits of test
automation were related to test reusability, repeatability, test
coverage and effort saved in test executions. The limitations
were high initial invests in automation setup, tool selection
and training. Additionally, 45% of the respondents agreed that
available tools in the market offer a poor fit for their needs.
Finally, it was found that 80% of the practitioners disagreed
with the vision that automated testing would fully replace
manual testing.

Industrial Case Studies

Various approaches for the automated test case generation in the area of graphical user interface (GUI) testing have emerged in recent years. A notable trend is model-based testing (MBT). In this experience report we shed light on the challenges faced during the introduction and every day use of a concrete technique which leverages MBT in a Scrum project along with practical solutions found. Such topics as process of model definition and maintenance for the purposes of regression and risk-based testing of GUIs, suitable test case derivation algorithms, human factors as well as choice of appropriate architecture are discussed.

Automatic test generators pursue some type of systematic coverage of the program code or heuristic sampling of the program inputs. Test generators are effective after the assumption, often (enthusiastically) embraced by researchers, that the generated test cases produce informative data for domain experts, e.g., pinpoint important bugs. This paper investigates the validity of such assumption through a case study of using test generators on industrial software with nontrivial domain-specific peculiarities. Our results properly enhance the available body of knowledge on the strengths and weaknesses of test generators.

The increased importance of Test Automation in software engineering is very evident considering the number of companies investing in automated testing tools nowadays, with the main aim of preventing defects during the development process. Test Automation is considered an essential activity for agile methodologies being the key to speed up the quality assurance process. This paper presents empirical observations and the challenges of a test team new to agile practices and Test Automation using open source testing tools integrated in software projects that use the Scrum methodology. The results obtained showed some important issues to be discussed and the Test Automation practices collected based on the experiences and lessons learned.

Input Generation and Selection I

In this paper we focus on test case generation for large database applications in the telecommunication industry domain. In particular, we present an approach that is based on the Category Partition Method and uses the SMT solver Z3 for automatically generating input test data values for the obtained test cases. For the generation process, we make use of different test case generation strategies. First initial results show that the one based on genetic programming delivers the fewest number of test cases while retaining choice coverage. Moreover, the obtained results indicate that the presented approach is feasible for the intended application domain.

Dynamic symbolic execution has been shown to be an effective technique for automated test input generation. When applied to large-scale programs, its scalability however is limited due to the combinatorial explosion of the path space and the high cost of computation. Several sophisticated search strategies have been proposed to better guide dynamic symbolic execution towards achieving high code coverage. While confirmed effective, these techniques may deteriorate in practical situations because of the large computation cost involved. In this paper, we propose a search heuristic which is directed by coverage information and interleaved with random search to perform dynamic symbolic execution for coverage improvements and cost-effectiveness. We conducted two evaluations to evaluate the effectiveness of our proposed approach and to study the impact of computation costs on its practical capabilities.

Extended Finite State Machine (EFSM) is widelyused to represent system specifications. Automated test data generation based on EFSM models is still a challenging task due to the complexity of transition paths. In this paper, we introduce a new approach to generate test cases automatically for given transition paths of an EFSM model. An executable EFSM model is used to provide run-time feedback information as fitness function. And then scatter search algorithm is used to search for test data that can trigger given transition paths. Based on the executable model, the expected outputs associated with test data are also collected for construction of test oracles automatically. Finally, test data (inputs) and test oracles (expected outputs) are combined to be test cases. The experimental results show that our approach can effectively generate test cases to exercise the feasible transition paths.

Test cases are often similar. A preliminary study of eight open-source projects found that on average at least 8 % of all test cases are clones; the maximum found was 42 %. The clones are not identical with their originals - identifiers of classes, methods, attributes and sometimes even order of statements and assertions differ. But the test cases reuse testing logic and are needed for testing. They serve a purpose and cannot be eliminated.
We present an approach that generates useful test clones automatically, thereby eliminating some of the "grunt" work of testing. An important advantage over existing automated test case generators is that the clones include the test oracle. Hence, a human decision maker is often not needed to determine whether the output of a test is correct.
The approach hinges on pairs of classes that provide analogous functionality, i.e., functions that are tested with the same logic. TestCloner transcribes tests involving analogous functions from one class to the other. Programmers merely need to indicate which methods are analogs. Automatic detection of analogs is currently under investigation. Preliminary results indicate a significant reduction in the number of "boiler-plate" tests that need to be written by hand. The transcribed tests do detect defects and can provide hints about missing functionality.

Testing software with a GUI is difficult. Manual testing is costly and error-prone, but recorded test cases frequently ``break'' due to changes in the GUI. Test cases intended to test business logic must therefore be converted to a less ``brittle'' form to lengthen their useful lifespan. In this paper, we describe BlackHorse, an approach to doing this that converts a recorded test case to Java code that bypasses the GUI. The approach was implemented within the testing environment of Research In Motion. We describe the design of the toolset and discuss lessons learned during the course of the project.

Automated tests at the business level can be expensive to develop and maintain. One common approach is to have a domain expert instruct a QA developer to implement what she would do manually in the application. Though there exist record-replay tools specifically developed for this, these tend to scale poorly for more complicated test scenarios.
We present a different solution: An Embedded Domain Specific Language (EDSL) in F#, containing the means to model the user interface, and the various manipulations of it. We hope that this DSL will bridge the gap between the business domain and technical domain of applications to such a degree that domain experts may be able to construct automatic tests without depending on QA developers, and that these tests will prove more maintainable.

Information systems with sophisticated graphical user interfaces are still difficult to test and debug. As a detailed and reproducible report of test case execution is essential, we advocate the documentation of test case execution on several levels. We present an approach to video-based documentation of automated GUI testing that is linked to the test execution procedure. Viewing currently executed test case instructions alongside actual onscreen responses of the application under test facilitates understanding of the failure. This approach is tailored to the challenges of automated GUI testing and debugging with respect to technical and usability aspects. Screen recording is optimized for speed and memory consumption while all relevant details are captured. Additional browsing capabilities for easier debugging are introduced. Our concepts are evaluated by a working implementation, a series of performance measurements during a technical experiment, and industrial experience from 370 real-world test cases carried out in a large software company.

Smartphones are becoming increasingly popular among users. They are equipped with an enormous number
of applications, and these applications drain the smartphones’ batteries. Moreover, battery capacity is significantly restricted due to constraints on size and weight of the device. It is important for smartphone applications to be energy efficient. Thus, a methodology to conduct energy performance testing is needed for two reasons: (i) evaluate the power consumption of a single application on a given device; (ii) compare the power consumption of different smartphones or platforms running the same application. In our earlier work “Selection and execution of user level test cases for energy cost evaluation of smartphones” (Proceedings of the 6th AST, 2011), we have developed a testing methodology that significantly reduces the number of test cases. In addition, we have introduced the concepts of primary and standalone test configurations. However, ordering of the executions of those two kinds of tests is non-trivial, and it was not studied in that paper.
In this paper, we introduce a methodology to interleave the identification of those two kinds of test configurations in order to reduce the total number of configurations. We express the methodology in the form of a detailed flow chart that application developers can easily follow. We apply the methodology to a specific smartphone, namely HTC Nexus One smartphone in order to illustrate the process of this methodology. We have shown that the total number of test configurations obtained by the given methodology is the same as the number predicted by numerical expressions.

Design for Test

Breaking dependencies is an important task in refactoring legacy code. With the help of Feathers' seams we gain the power to unit test our legacy code base because they enable us to inject dependencies from outside. Although seams are a valuable technique, it is hard and cumbersome to apply them without automated refactorings and tool chain configuration support.
We show how to create seams in C++ by using new refactorings. To automate this task, we provide sophisticated IDE support. Our reference implementation creates the boilerplate code and the necessary infrastructure for the four presented seam types object, compile, preprocessor and link seam.

Input Generation Selection II

Test suites often grow very large over many releases, such that it is impractical to re-execute all test cases within limited resources. Test case prioritization, which rearranges test cases, is a key technique to improve regression testing. Code coverage information has been widely used in test case prioritization. However, other important information, such as the ordered sequence of program elements measured by execution frequencies, was ignored by previous studies. It raises a risk to lose detections of difficult-to-find bugs. Therefore, this paper improves the similarity-based test case prioritization using the ordered sequence of program elements measured by execution counts. The empirical results show that our new technique can increase the rate of fault detection more significantly than the coverage-based ART technique. Moreover, our technique can detect bugs in loops more quickly and be more cost-benefits than the traditional ones.

Since controller applications must typically satisfy real-time constraints while manipulating real-world variables, their implementation often results in programs that run ex- tremely fast and manipulate numerical inputs and outputs. These characteristics make them particularly suitable for test case generation. In fact a number of test cases can be easily created, due to the simplicity of numerical inputs, and executed, due to the speed of computations.
In this paper we present G-RankTest, a technique for test case generation and prioritization. The key idea is that test case generation can run for long sessions (e.g., days) to accurately sample the behavior of a controller application and then the generated test cases can be prioritized according to different strategies, and used for regression testing every time the application is modified. In this work we investigate the feasibility of using the gradient of the output as a criterion for selecting the test cases that activate the most tricky behaviors, which we expect easier to break when a change occurs, and thus deserve priority in regression testing.

This paper discusses and exemplifies our ideas on all-values symbolic execution, an alternative strategy to the traditional all-paths style of symbolic execution. All-values symbolic execution focuses on enumerating the (symbolic) values that may derive from the symbolic execution of program statements. It exploits program dependencies to optimize the symbolic execution of those statements that can be executed with the same symbolic inputs on multiple (up to infinite) paths. Although a fully working implementation and a thorough evaluation are yet to come, this paper illustrates with simple, but representative examples that the proposed technique can boost the efficiency of symbolic execution, and suite interesting new applications.

Test case selection has been recently formulated as multi-objective optimization problem trying to satisfy conflicting goals, such as code coverage and computational cost. This paper introduces the concept of asymmetric distance preserving, useful to improve the diversity of non-dominated solutions produced by multi-objective Pareto efficient genetic algorithms, and proposes two techniques to achieve this objective. Results of an empirical study conducted over four programs from the SIR benchmark show how the proposed technique (i) obtains non-dominated solutions having a higher diversity than the previously proposed multi-objective Pareto genetic algorithms; and (ii) improves the convergence speed of the genetic algorithms.