YAACT: A genetic algorithm tool for code coverage analysis

Testing is both technically and economically an important part of high quality software production. It has been estimated that testing accounts for half of the expenses in software production. Much of the testing is done manually or using other labor-intensive methods. It is thus vital for the software industry to develop efficient, cost effective, and automatic means and tools for software testing. Researchers have proposed several methods over years to generate automatically solution which have different drawbacks. This study examines automatic software testing optimization by using genetic algorithm approaches. This study will cover two approaches: a) obtain the sequence of regression tests that cover the greatest amount of code and b) once it is achieved another genetic algorithm will eliminate tests cases that cover the same section of code on the basis of still get the maximum code coverage. The overall aim of this research is to reduce the number of test cases that need to be run with the greatest amount of code covered.

Previous Versions

YAACT: A genetic algorithm tool for code coverage analysis

Testing is both technically and economically an important part of high quality software production. It has been estimated that testing accounts for half of the expenses in software production. Much of the testing is done manually or using other labor-intensive methods. It is thus vital for the software industry to develop efficient, cost effective, and automatic means and tools for software testing. Researchers have proposed several methods over years to generate automatically solution which have different drawbacks. This study examines automatic software testing optimization by using genetic algorithm approaches. This study will cover two approaches: a) obtain the sequence of regression tests that cover the greatest amount of code and b) once it is achieved another genetic algorithm will eliminate tests cases that cover the same section of code on the basis of still get the maximum code coverage. The overall aim of this research is to reduce the number of test cases that need to be run with the greatest amount of code covered.

\documentclass[a4paper]{article}
\usepackage[english]{babel}
\usepackage{hyperref}
\usepackage{float}
\usepackage[utf8]{inputenc}
\usepackage{amsmath}
\usepackage{graphicx}
\usepackage[colorinlistoftodos]{todonotes}
\usepackage{tikz}
\usetikzlibrary{arrows,positioning,shapes.geometric}
\title{YAACT: A genetic algorithm tool for code coverage analysis }
\author{\normalsize Víctor Rodríguez \\\normalsize Jindrizka Dominguez \\\normalsize Gilberto Sánchez\\\normalsize }
\date{\color{black}April 28 2014}
\begin{document}
\maketitle
\section{Introduction}
Testing is both technically and economically an important part of high quality software production. It has been estimated that testing accounts for half of the expenses in software production. Much of the testing is done manually or using other labor-intensive methods. It is thus vital for the software industry to develop efficient, cost effective, and automatic means and tools for software testing. Researchers have proposed several methods over years to generate automatically solution which have different drawbacks. This study examines automatic software testing optimization by using genetic algorithm approaches. This study will cover two approaches: a) obtain the sequence of regression tests that cover the greatest amount of code and b) once it is achieved another genetic algorithm will eliminate tests cases that cover the same section of code on the basis of still get the maximum code coverage. The overall aim of this research is to reduce the number of test cases that need to be run with the greatest amount of code covered.
Regression testing is expensive but an essential activity in software maintenance. Regression testing attempts to validate modified software and ensure that the modified parts of the program do not introduce unexpected errors. It executes an existing test suite on a changed program to assure that the program is not adversely affected by unintended amendments\cite{Suman}. The time used for regression testing can be assumed approximately half of the software maintenance activities. However, through the use of an effective prioritization sequence, testers can reorder(or eliminate) the test cases to reduce the time and cost in the system, allowing corrections to be made earlier and raising overall confidence that the software has been adequately tested. One concern in regression testing is the effectiveness of test suites in finding new faults in successive program versions. A regression test selection technique may help us to select an appropriate number of test cases from this test suite.
Many automatic test case generators have been developed and many classical searching
methods have been used to derive test case. However, these techniques work only for
continuous function (while most program domains are of discontinuous spaces). This is
where Evolutionary Algorithms steps in, such as those using Genetic Algorithms GA. GAs
are heuristic search techniques which are able to search a discontinuous space. Evolutionary Testing is a promising approach to entirely automate test case design. It can be used to generate test cases for structural testing. In some projects\cite{Maha} \cite{GANESH} a GA based test data generation is presented. However, we will not afect the test data input or try to create an automatic generator of tests cases \cite{Cadar}.
The proposed GA is applied to different types of software systems small programs with different complexity. Results compared to random testing showed that genetic algorithms can be used effectively in automatic software testing to generate test data for unit testing.
\section{Algorithm}
This study will apply genetic algorithms (GA) to solve two aproaches of testing automation:
\begin{enumerate}
\item GA to obtain the sequence of regression tests that cover the greatest amount of code (GA to increase code coverage).
\item GA to find the minimal secuence of regression tests that cover the greatest amount of code (GA to get minimal funtional test suite).
\end{enumerate}
The following figure \ref{fig:gen_1_diagram} and figure \ref{fig:gen_2_diagram} show theese two main aproaches:
%The diagram **********GA to increase code coverage**********%
\begin{figure}[H]
\centering
\includegraphics[scale=0.50]{gen_1_diagram.jpg}
\caption{\label{fig:gen_1_diagram}GA to increase code coverage}
\end{figure}
%The diagram **********GA to get minimal functional test suite**********%
\begin{figure}[H]
\centering
\includegraphics[scale=0.50]{gen_2_diagram.jpg}
\caption{\label{fig:gen_2_diagram}GA to get minimal functional test suite}
\end{figure}
Before going furter we will describe the Genetic Algorithm in a very light way, in order to avoid knowledge gaps.
Genetic algorithm is a population-based search method. Genetic algorithms are acknowledged as good solvers for tough problems. Genetic algorithms require several parameters including the following\cite{Suman}:
\begin{enumerate}
\item Maximum number of generations, G,
\item Population size, P,
\item Crossover rate, pc,
\item Mutation rate, pm,
\item Convergence criterion.
\end{enumerate}
There are two main concepts in genetic algorithm: crossover and mutation.
Crossover : A binary variation operator is called recombination or crossover. This operator merges information from two parent genotypes into one or two offspring genotypes. Similarly to mutation, crossover is a stochastic operator: the choice of what parts of each parent are combined, and the way these parts are combined, depends on random drawings. The principle behind crossover is simple: by mating two individuals with different but desirable features, we can produce an offspring which combines both of those features.
\vspace{5 mm}
\verb|child_1 = parent_1(:size-1)/2)|\\
\verb|\\ remove all duplicated genotypes from in parent_2|\\
\verb|\\ that match genotypes found in child_1|\\
\verb| new_child = parent_2 + child_1|
\vspace{5 mm}
Mutation: A unary variation operator is called mutation. It is applied to one genotype and delivers a modified mutant, the child or offspring of it. In general, mutation is supposed to cause a random unbiased change. Mutation has a theoretical role: it can guarantee that the space is connected.
\vspace{5 mm}
\verb|child_value = parent[random_index_1]|\\
\verb|\\ switch two genotypes randomly|\\
\verb| children_pop[random_index] = children_pop[random_index_2]|\\
\verb| children_pop[random_index_2] = child_value|\\
\vspace{5 mm}
The Figure \ref{fig:gen_1_diagram} shows the algorithm for the first problem : GA to obtain the sequence of regression tests that cover the greatest amount of code (GA to increase code coverage). This algorithm will execute the following steps:
\begin{enumerate}
\item Check if we have reach the max number of iteration. If true stop the execution else continue with the execution of the algorithm.
\item Get clean source code of SW project to test.
\item Build SW project (for codecoverage).
\item Based on Mutation and Cross-reference generate new gen.
\item Map gen to tests: Map the chromosome numbers to specific tests.
\item Run tests.
\item Get code coverage percentage.
\item Save the chromosome and its code coverage into a history file. Also the previous code coverage and chromosome is saved into a history file.
\item Current code coverage is >= GOAL? . If true Stop execution else back to begining.
\end{enumerate}
This algorithm try to generate a sequence of tests that maximaze the code coverage. The aproach will not be valid if we have duplicated tests. A duplicated functional tests will not increase the code coverage. The kind of tests that we will need to use are functional tests.
The Figure \ref{fig:gen_2_diagram} shows the algorithm for the second problem: GA to find the minimal secuence of regression tests that cover the greatest amount of code (GA to get minimal funtional test suite ). This algorithm will execute the following steps:
\begin{enumerate}
\item Check if we have reach the max number of iteration. If true stop the execution else continue with the execution of the algorithm
\item Get clean source code of SW project to test.
\item Build SW project (for codecoverage).
\item Append empty test to the chromosome.
\item Start only mutation of empty tests with any other tests on the chromosome.
\item Strip chromosome: Remove latest test.
\item Map gen to tests: Map the chromosome numbers to specific tests.
\item Run tests.
\item Get code coverage percentage.
\item Save the chromosome and its code coverage into a history file. Also the previous code coverage and chromosome is saved into a history file.
\item Current code coverage is >= GOAL?. If true Stop execution else back to begining.
\end{enumerate}
This GA will try to minimize the number of tests that execute the same portion of the code. Thiw will reduce the total amount of time of the regression test suite.
The complete source code and history of development is detailed on the git repository:
\url{https://github.com/VictorRodriguez/yaact}
\section{Results}
After doing multiple iterations on the GA for optimizaton we realize that the code coverage was inotincreasing no matter how many iterations we could test our solution
%The figure **********GA to increase code coverage**********%
\begin{figure}[H]
\centering
\includegraphics[scale=0.50]{solution_a.JPG}
\caption{\label{fig:gen_2_diagram}Solution A: GA to increase code coverage}
\end{figure}
The same problen hapend with the secind GA aproach. After many iteratiosn we stil get that
the same number of test cases with maximum level of code covefage
%The figure **********GA to get minimal functional test suite**********%
\begin{figure}[H]
\centering
\includegraphics[scale=0.50]{solution_b.JPG}
\caption{\label{fig:gen_2_diagram}Solution B: GA to get minimal functional test suite}
\end{figure}
\section{Conclusion}
In Conclusion, genetic algorithms will not solve all testing needs in all systems.This solution does not apply for all SW products. It requires a level of complexity (still not defined) on the SW product and their integration/application tests suites.
It was discovered that in order to see an increase in code coverage, of the test suite used, there must be a development of new test cases. This happens because each time tests are executed the same code is being exercised.
This approach will not affect the following tests suites :
\begin{enumerate}
\item Unit tests
\item Feature Test
\end{enumerate}
This approach might affect the following tests suites :
\begin{enumerate}
\item Integration/compatibility tests
\item Performance benchmarks
\item Stress tests
\item Longevity tests
\end{enumerate}
\section{References}
\begin{thebibliography}{99}
\bibitem{Suman}Suman and Seema, A Genetic Algorithm for Regression Test Sequence Optimization, Yadavindra College of Engineering, International Journal of Advanced Research in Computer and Communication Engineering Vol. 1, Issue 7, September 2012
\bibitem{Maha}Maha,study of genetic algorithm for automatic software test data generation,university of pune, giirj, vol.1 (2), december (2013)
\bibitem{Cadar}Cristian Cadar/Daniel Dunbar/ Dawson Engler, KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs,Stanford University
\bibitem{GANESH}Ganesh, EXE: Automatically generating inputs of death.
,Stanford University,In Proceedings of the 13th ACM Conference on Computer and Communications Security (CCS 2006).
\end{thebibliography}
\end{document}