The Oracle Problem in Software Testing: A Survey

Transcription

1 1 The Oracle Problem in Software Testing: A Survey Earl T. Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz and Shin Yoo Abstract Testing involves examining the behaviour of a system in order to discover potential faults. Given an input for a system, the challenge of distinguishing the corresponding desired, correct behaviour from potentially incorrect behavior is called the test oracle problem. Test oracle automation is important to remove a current bottleneck that inhibits greater overall test automation. Without test oracle automation, the human has to determine whether observed behaviour is correct. The literature on test oracles has introduced techniques for oracle automation, including modelling, specifications, contract-driven development and metamorphic testing. When none of these is completely adequate, the final source of test oracle information remains the human, who may be aware of informal specifications, expectations, norms and domain specific information that provide informal oracle guidance. All forms of test oracles, even the humble human, involve challenges of reducing cost and increasing benefit. This paper provides a comprehensive survey of current approaches to the test oracle problem and an analysis of trends in this important area of software testing research and practice. Index Terms Test oracle; Automatic testing; Testing formalism. 1 INTRODUCTION Much work on software testing seeks to automate as much of the test process as practical and desirable, to make testing faster, cheaper, and more reliable. To this end, we need a test oracle, a procedure that distinguishes between the correct and incorrect behaviors of the System Under Test (SUT). However, compared to many aspects of test automation, the problem of automating the test oracle has received significantly less attention, and remains comparatively less wellsolved. This current open problem represents a significant bottleneck that inhibits greater test automation and uptake of automated testing methods and tools more widely. For instance, the problem of automatically generating test inputs has been the subject of research interest for nearly four decades [46], [108]. It involves finding inputs that cause execution to reveal faults, if they are present, and to give confidence in their absence, if none are found. Automated test input generation been the subject of many significant advances in both Search- Based Testing [3], [5], [83], [127], [129] and Dynamic Symbolic Execution [75], [109], [162]; yet none of these advances address the issue of checking generated inputs with respect to expected behaviours that is, providing an automated solution to the test oracle problem. Of course, one might hope that the SUT has been developed under excellent designfor-test principles, so that there might be a detailed, and possibly formal, specification of intended behaviour. One might also hope that the code itself contains pre- and post- conditions that implement well-understood contractdriven development approaches [136]. In these situations, the test oracle cost problem is ameliorated by the presence of an automatable test oracle to which a testing tool can refer to check outputs, free from the need for costly human intervention. Where no full specification of the properties of the SUT exists, one may hope to construct a

2 2 partial test oracle that can answer questions for some inputs. Such partial test oracles can be constructed using metamorphic testing (built from known relationships between desired behaviour) or by deriving oracular information from execution or documentation. For many systems and most testing as currently practiced in industry, however, the tester does not have the luxury of formal specifications or assertions, or automated partial test oracles [91], [92]. The tester therefore faces the daunting task of manually checking the system s behaviour for all test cases. In such cases, automated software testing approaches must address the human oracle cost problem [1], [82], [131]. To achieve greater test automation and wider uptake of automated testing, we therefore need a concerted effort to find ways to address the test oracle problem and to integrate automated and partially automated test oracle solutions into testing techniques. This paper seeks to help address this challenge by providing a comprehensive review and analysis of the existing literature of the test oracle problem. Four partial surveys of topics relating to test oracles precede this one. However, none has provided a comprehensive survey of trends and results. In 2001, Baresi and Young [17] presented a partial survey that covered four topics prevalent at the time the paper was published: assertions, specifications, state-based conformance testing, and log file analysis. While these topics remain important, they capture only a part of the overall landscape of research in test oracles, which the present paper covers. Another early work was the initial motivation for considering the test oracle problem contained in Binder s textbook on software testing [23], published in More recently, in 2009, Shahamiri et al. [165] compared six techniques from the specific category of derived test oracles. In 2011, Staats et al. [174] proposed a theoretical analysis that included test oracles in a revisitation of the fundamentals of testing. Most recently, in 2014, Pezzè et al. focus on automated test oracles for functional properties [151]. Despite this work, research into the test oracle problem remains an activity undertaken in a fragmented community of researchers and practitioners. The role of the present paper is to overcome this fragmentation in this important area of software testing by providing the first comprehensive analysis and review of work on the test oracle problem. The rest of the paper is organised as follows: Section 2 sets out the definitions relating to test oracles that we use to compare and contrast the techniques in the literature. Section 3 relates a historical analysis of developments in the area. Here we identify key milestones and track the volume of past publications. Based on this data, we plot growth trends for four broad categories of solution to the test oracle problem, which we survey in Sections 4 7. These four categories comprise approaches to the oracle problem where: test oracles can be specified (Section 4); test oracles can be derived (Section 5); test oracles can be built from implicit information (Section 6); and no automatable oracle is available, yet it is still possible to reduce human effort (Section 7) Finally, Section 8 concludes with closing remarks. 2 DEFINITIONS This section presents definitions to establish a lingua franca in which to examine the literature on oracles. These definitions are formalised to avoid ambiguity, but the reader should find that it is also possible to read the paper using only the informal descriptions that accompany these formal definitions. We use the theory to clarify the relationship between algebraic specification, pseudo oracles, and metamorphic relations in Section 5. To begin, we define a test activity as a stimulus or response, then test activity sequences

3 3 S i I f(i) O o Fig. 1. Stimulus and observations: S is anything that can change the observable behavior of the SUT f; R is anything that can be observed about the system s behavior; I includes f s explicit inputs; O is its explicit outputs; everything not in S R neither affects nor is affected by f. that incorporate constraints over stimuli and responses. Test oracles accept or reject test activity sequences, first deterministically then probabilistically. We then define notions of soundness and completeness of test oracles. 2.1 Test Activities To test is to stimulate a system and observe its response. A stimulus and a response both have values, which may coincide, as when the stimulus value and the response are both reals. A system has a set of components C. A stimulus and its response target a subset of components. For instance, a common pattern for constructing test oracles is to compare the output of distinct components on the same stimulus value. Thus, stimuli and responses are values that target components. Collectively, stimuli and responses are test activities: Definition 2.1 (Test Activities). For the SUT p, S is the set of stimuli that trigger or constrain p s computation and R is the set of observable responses to a stimulus of p. S and R are disjoint. Test activities form the set A = S R. The use of disjoint union implicitly labels the elements of A, which we can flatten to the tuple L C V, where L = {stimulus, response} is R the set of activities labels, C is the set of components, and V is an arbitrary set of values. To model those aspects of the world that are independent of any component, like a clock, we set an activity s target to the empty set. We use the terms stimulus and observation in the broadest sense possible to cater to various testing scenarios, functional and nonfunctional. As shown in Figure 1, a stimulus can be either an explicit test input from the tester, I S, or an environmental factor that can affect the testing, S \ I. Similarly, an observation ranges from an output of the SUT, O R, to a nonfunctional execution profile, like execution time in R \ O. For example, stimuli include the configuration and platform settings, database table contents, device states, resource constraints, preconditions, typed values at an input device, inputs on a channel from another system, sensor inputs and so on. Notably, resetting a SUT to an initial state is a stimulus and stimulating the SUT with an input runs it. Observations include anything that can be discerned and ascribed a meaning significant to the purpose of testing including values that appear on an output device, database state, temporal properties of the execution, heat dissipated during execution, power consumed, or any other measurable attributes of its execution. Stimuli and observations are members of different sets of test activities, but we combine them into test activities. 2.2 Test Activity Sequence Testing is a sequence of stimuli and response observations. The relationship between stimuli and responses can often be captured formally; consider a simple SUT that squares its input. To compactly represent infinite relations between stimulus and response values such as (i, o = i 2 ), we introduce a compact notation for set comprehensions: x:[φ] = {x φ},

4 4 where x is a dummy variable over an arbitrary set. Definition 2.2 (Test Activity Sequence). A test activity sequence is an element of T A = {w T w} over the grammar T ::= A :[ φ ] T AT ɛ where A is the test activity alphabet. Under Definition 2.2, the testing activity sequence io:[o = i 2 ] denotes the stimulus of invoking f on i, then observing the response output. It further specifies valid responses obeying o = i 2. Thus, it compactly represents the infinite set of test activity sequences i 1 o 1, i 2 o 2, where o k = i 2 k. For practical purposes, a test activity sequence will almost always have to satisfy constraints in order to be useful. Under our formalism, these constraints differentiate the approaches to test oracle we survey. As an initial illustration, we constrain a test activity sequence to obtain a practical test sequence: Definition 2.3 (Practical Test Sequence). A practical test sequence is any test activity sequence w that satisfies w = T st rt, for s S, r R. Thus, the test activity sequence, w, is practical iff w contains at least one stimulus followed by at least one observation. This notion of a test sequence is nothing more than a very general notion of what it means to test; we must do something to the system (the stimulus) and subsequently observe some behaviour of the system (the observation) so that we have something to check (the observation) and something upon which this observed behaviour depends (the stimulus). A reliable reset (p, r) S is a special stimulus that returns the SUT s component p to its start state. The test activity sequence (stimulus, p, r)(stimulus, p, i) is therefore equivalent to the conventional application notation p(i). To extract the value of an activity, we write v(a); to extract its target component, we write c(a). To specify two invocations of a single component on the different values, we must write r 1 i 1 r 2, i 2 : [r 1, i 1, r 2, i 2 S, c(r 1 ) = c(i 1 ) = c(r 2 ) = c(i 2 ) v(i 1 ) v(i 2 )]. In the sequel, we often compare different executions of a single SUT or compare the output of independently implemented components of the SUT on the same input value. For clarity, we introduce syntactic sugar to express constraints on stimulus values and components. We let f(x) denote ri:[c(i) = f v(i) = x], for f C. A test oracle is a predicate that determines whether a given test activity sequence is an acceptable behaviour of the SUT or not. We first define a test oracle, and then relax this definition to probabilistic test oracle. Definition 2.4 (Test Oracle). A test oracle D : T A B is a partial 1 function from a test activity sequence to true or false. When a test oracle is defined for a test activity, it either accepts the test activity or not. Concatenation in a test activity sequence denotes sequential activities; the test oracle D permits parallel activities when it accepts different permutations of the same stimuli and response observations. We use D to distinguish a deterministic test oracle from probabilistic ones. Test oracles are typically computationally expensive, so probabilistic approaches to the provision of oracle information may be desirable even where a deterministic test oracle is possible [125]. Definition 2.5 (Probabilistic Test Oracle). A probabilistic test oracle D : T A [0, 1] maps a test activity sequence into the interval [0, 1] R. A probabilistic test oracle returns a real number in the closed interval [0, 1]. As with test oracles, we do not require a probabilistic test oracle to be a total function. A probabilistic test 1. Recall that a function is implicitly total: it maps every element of its domain to a single element of its range. The partial function f : X Y is the total function f : X Y, where X X.

5 5 oracle can model the case where the test oracle is only able to efficiently offer a probability that the test case is acceptable, or for other situations where some degree of imprecision can be tolerated in the test oracle s response. Our formalism combines a languagetheoretic view of stimulus and response activities with constraints over those activities; these constraints explicitly capture specifications. The high-level language view imposes a temporal order on the activities. Thus, our formalism is inherently temporal. The formalism of Staats et al. captures any temporal exercising of the SUT s behavior in tests, which are atomic black boxes for them [174]. Indeed, practitioners write test plans and activities, they do not often write specifications at all, let alone a formal one. This fact and the expressivity of our formalism, as evident in our capture of existing test oracle approaches, is evidence that our formalism is a good fit with practice. 2.3 Soundness and Completeness We conclude this section by defining soundness and completeness of test oracles. In order to define soundness and completeness of a test oracle, we need to define a concept of the ground truth, G. The ground truth is another form of oracle, a conceptual oracle, that always gives the right answer. Of course, it cannot be known in all but the most trivial cases, but it is a useful definition that bounds test oracle behaviour. Definition 2.6 (Ground Truth). The ground truth oracle, G, is a total test oracle that always gives the right answer. We can now define soundness and completeness of a test oracle with respect to G. Definition 2.7 (Soundness). The test oracle D is sound iff D(a) G(a) Definition 2.8 (Completeness). The test oracle D is complete iff G(a) D(a) While test oracles cannot, in general, be both sound and complete, we can, nevertheless, define and use partially correct test oracles. Further, one could argue, from a purely philosophical point of view, that human oracles can be sound and complete, or correct. In this view, correctness becomes a subjective human assessment. The foregoing definitions allow for this case. We relax our definition of soundness to cater for probabilistic test oracles: Definition 2.9 (Probablistic Soundness and Completeness). A probabilistic test oracle D is probabilistically sound iff P ( D(w) = 1) > ɛ G(w) and D is probabilistically complete iff G(w) P ( D(w) = 1) > ɛ where ɛ is non-negligible. The non-negligible advantage ɛ requires D to do sufficiently better than flipping a fair coin, which for a binary classifier maximizes entropy, that we can achieve arbitrary confidence in whether the test sequence w is valid by repeatedly sampling D on w. 3 TEST ORACLE RESEARCH TRENDS The term test oracle first appeared in William Howden s seminal work in 1978 [99]. In this section, we analyze the research on test oracles, and its related areas, conducted since We begin with a synopsis of the volume of publications, classified into specified, derived, implicit, and lack of automated test oracles. We then discuss when key concepts in test oracles were first introduced.

7 7 mining, API mining, metamorphic testing, regression testing and program documentation. An implicit oracle (see Section 6) refers to the detection of obvious faults such as a program crash. For implicit test oracles we applied the queries implicit oracle, null pointer + detection, null reference + detection, deadlock + livelock + race + detection, memory leaks + detection, crash + detection, performance + load testing, non-functional + error detection, fuzzing + test oracle and anomaly detection. There have also been papers researching strategies for handling the lack of an automated test oracle (see Section 7). Here, we applied the queries human oracle, test minimization, test suite reduction and test data + generation + realistic + valid. Each of the above queries were appended by the keywords software testing. The results were filtered, removing articles that were found to have no relation to software testing and test oracles. Figure 2 shows the cumulative number of publications on each type of test oracle from 1978 onwards. We analyzed the research trend on this data by applying different regression models. The trend line, shown in Figure 2, is fitted using a power model. The high values for the four coefficients of determination (R 2 ), one for each of the four types of test oracle, confirm that our models are good fits to the trend data. The trends observed suggest a healthy growth in research volumes in these topics related to the test oracle problem in the future. 3.2 The Advent of Test Oracle Techniques We classified the collected publications by techniques or concepts they proposed to (partially) solve a test oracle problem; for example, Model Checking [35] and Metamorphic Testing [36] fall into the derived test oracle and DAISTIS [69] is an algebraic specification system that addresses the specified test oracle problem. For each type of test oracle and the advent of a technique or a concept, we plotted a timeline in chronological order of publications to study research trends. Figure 3 shows the timeline starting from 1978 when the term test oracle was first coined. Each vertical bar presents the technique or concept used to solve the problem labeled with the year of its first publication. The timeline shows only the work that is explicit on the issue of test oracles. For example, the work on test generation using finite state machines (FSM) can be traced back to as early as 1950s. But the explicit use of finite state machines to generate test oracles can be traced back to Jard and Bochmann [103] and Howden in 1986 [98]. We record, in the timeline, the earliest available publication for a given technique or concept. We consider only published work in journals, the proceedings of conferences and workshops, or magazines. We excluded all other types of documentation, such as technical reports and manuals. Figure 3 shows a few techniques and concepts that predate Although not explicitly on test oracles, they identify and address issues for which test oracles were later developed. For example, work on detecting concurrency issues (deadlock, livelock, and races) can be traced back to the 1960s. Since these issues require no specification, implicit test oracles can and have been built that detect them on arbitrary systems. Similarly, Regression Testing detects problems in the functionality a new version of a system shares with its predecessors and is a precursor of derived test oracles. The trend analysis suggests that proposals for new techniques and concepts for the formal specification of test oracles peaked in 1990s, and has gradually diminished in the last decade. However, it remains an area of much research activity, as can be judged from the number of publications for each year in Figure 2. For derived test oracles, many solutions have been proposed throughout this period. Initially, these solutions were primarily theoretical, such as Partial/Pseudo-Oracles [196] and

9 9 system conforms to a formal specification. Our formalism, defined in Section 2, is, itself, a specification language for specifying test oracles. Over the last 30 years, many methods and formalisms for testing based on formal specification have been developed. They fall into four broad categories: model-based specification languages, state transition systems, assertions and contracts, and algebraic specifications. Model-based languages define models and a syntax that defines desired behavior in terms of its effect on the model. State transition systems focus on modeling the reaction of a system to stimuli, referred to as transitions in this particular formalism. Assertions and contracts are fragments of a specification language that are interleaved with statements of the implementation language and checked at runtime. Algebraic specifications define equations over a program s operations that hold when the program is correct. 4.1 Specification Languages Specification languages define a mathematical model of a system s behaviour, and are equipped with a formal semantics that defines the meaning of each language construct in terms of the model. When used for testing, models do not usually fully specify the system, but seek to capture salient properties of a system so that test cases can be generated from or checked against them Model-Based Specification Languages Model-based specification languages model a system as a collection of states and operations to alter these states, and are therefore also referred to as state-based specifications in the literature [101], [110], [182], [183]. Preconditions and postconditions constrain the system s operations. An operation s precondition imposes a necessary condition over the input states that must hold in a correct application of the operation; a postcondition defines the (usually strongest) effect the operation has on program state [110]. A variety of model-based specification languages exist, including Z [172], B [111], UML/OCL [31], VDM/VDM-SL [62], Alloy [102], and the LARCH family [71], which includes an algebraic specification sub-language. Broadly, these languages have evolved toward being more concrete, closer to the implementation languages programmers use to solve problems. Two reasons explain this phenomenon: the first is the effort to increase their adoption in industry by making them more familiar to practitioners and the second is to establish synergies between specification and implementation that facilitate development as iterative refinement. For instance, Z models disparate entities, like predicates, sets, state properties, and operations, through a single structuring mechanism, its schema construct; the B method, Z s successor, provides a richer array of less abstract language constructs. Börger discusses how to use the abstract state machine formalism, a very general settheoretic specification language geared toward the definition of functions, to define high level test oracles [29]. The models underlying specification languages can be very abstract, quite far from concrete execution output. For instance, it may be difficult to compute whether a model s postcondition for a function permits an observed concrete output. If this impedance mismatch can be overcome, by abstracting a system s concrete output or by concretizing a specification model s output, and if a specification s postconditions can be evaluated in finite time, they can serve as a test oracle [4]. Model-based specification languages, such as VDM, Z, and B can express invariants, which can drive testing. Any test case that causes a program to violate an invariant has discovered an incorrect behavior; therefore, these invariants are partial test oracles. In search of a model-based specification language accessible to domain experts, Parnas

10 10 et al. proposed TOG (Test Oracles Generator) from program documentation [143], [146], [149]. In their method, the documentation is written in fully formal tabular expressions in which the method signature, the external variables, and relation between its start and end states are specified [105]. Thus, test oracles can be automatically generated to check the outputs against the specified states of a program. The work by Parnas et al. has been developed over a considerable period of more than two decades [48], [59], [60], [145], [150], [190], [191] State Transition Systems State transition systems often present a graphical syntax, and focus on transitions between different states of the system. Here, states typically abstract sets of concrete state of the modeled system. State transition systems have been referred as visual languages in the literature [197]. A wide variety of state transition systems exist, including Finite State Machines [112], Mealy/Moore machines [112], I/O Automata [118], Labeled Transition Systems [180], SDL [54], Harel Statecharts [81], UML state machines [28], X- Machines [95], [96], Simulink/Stateflow [179] and PROMELA [97]. Mouchawrab et al. conducted a rigorous empirical evaluation of test oracle construction techniques using state transition systems [70], [138]. An important class of state transition systems have a finite set of states and are therefore particularly well-suited for automated reasoning about systems whose behaviour can be abstracted into states defined by a finite set of values [93]. State transition systems capture the behavior of a system under test as a set of states 3, with transitions representing stimuli that cause the system to change state. State transition systems model the output of 3. Unfortunately, the term state has different interpretation in the context of test oracles. Often, it refers to a snapshot of the configuration of a system at some point during its execution; in context of state transition systems, however, state typically refers to an abstraction of a set of configurations, as noted above. a system they abstract either as a property of the states (the final state in the case of Moore machines) or the transitions traversed (as with Mealy machines). Models approximate a SUT, so behavioral differences between the two are inevitable. Some divergences, however, are spurious and falsely report testing failure. State-transition models are especially susceptible to this problem when modeling embedded systems, for which time of occurrence is critical. Recent work model tolerates spurious differences in time by steering model s evaluation: when the SUT and its model differ, the model is backtracked, and a steering action, like modifying timer value or changing inputs, is applied to reduce the distance, under a similarity measure [74]. Protocol conformance testing [72] and, later, model-based testing [183] motivated much of the work applying state transition systems to testing. Given a specification F as a state transition system, e.g. a finite state machine, a test case can be extracted from sequences of transitions in F. The transition labels of such a sequence define an input. A test oracle can then be constructed from F as follows: if F accepts the sequence and outputs some value, then so should the system under test; if F does not accept the input, then neither should the system under test. Challenges remain, however, as the definition of conformity comes in different flavours, depending on whether the model is deterministic or non-deterministic and whether the behaviour of the system under test on a given test case is observable and can be interpreted at the same level of abstraction as the model s. The resulting flavours of conformity have been captured in alternate notions, in terms of whether the system under test is isomorphic to, equivalent to, or quasi-equivalent to F. These notions of conformity were defined in the mid- 1990s in the famous survey paper by Lee and Yannakakis [112] among other notable papers, including those by Bochmann et al. [26] and

11 11 Tretmans [180]. 4.2 Assertions and Contracts An assertion is a boolean expression that is placed at a certain point in a program to check its behaviour at runtime. When an assertion evaluates to true, the program s behaviour is regarded as intended at the point of the assertion, for that particular execution; when an assertion evaluates to false, an error has been found in the program for that particular execution. It is obvious to see how assertions can be used as a test oracle. The fact that assertions are embedded in an implementation language has two implications that differentiate them from specification languages. First, assertions can directly reference and define relations over program variables, reducing the impedance mismatch between specification and implementation, for the properties an assertion can express and check. In this sense, assertions are a natural consequence of the evolution of specification languages toward supporting development through iterative refinement. Second, they are typically written along with the code whose runtime behavior they check, as opposed to preceding implementation as specification languages tend to do. Assertions have a long pedigree dating back to Turing [181], who first identified the need to separate the tester from the developer and suggested that they should communicate by means of assertions: the developer writing them and the tester checking them. Assertions gained significant attention as a means of capturing language semantics in the seminal work of Floyd [64] and Hoare [94] and subsequently were championed as a means of increasing code quality in the development of the contract-based programming approach, notably in the language Eiffel [136]. Widely used programming languages now routinely provide assertion constructs; for instance, C, C++, and Java provide a construct called assert and C# provides a Debug.Assert method. Moreover, a variety of systems have been independently developed for embedding assertions into a host programming languages, such as Anna [117] for Ada, APP [156] and Nana [120] for C languages. In practice, assertion approaches can check only a limited set of properties at a certain point in a program [49]. Languages based on design by contract principles extend the expressivity of assertions by providing means to check contracts between client and supplier objects in the form of method pre- and postconditions and class invariants. Eiffel was the first language to offer design by contract [136], a language feature that has since found its way into other languages, such as Java in the form of Java modeling language (JML) [140]. Cheon and Leavens showed how to construct an assertion-based test oracle on top of JML [45]. For more on assertion-based test oracles, see Coppit and Haddox-Schatz s evaluation [49], and, later, a method proposed by Cheon [44]. Both assertions and contracts are enforced observation activity that are embedded into the code. Araujo et al. provide a systematic evaluation of design by contract on a large industrial system [9] and using JML in particular [8]; Briand et al. showed how to support testing by instrumenting contracts [33]. 4.3 Algebraic Specification Languages Algebraic specification languages define a software module in terms of its interface, a signature consisting of sorts and operation symbols. Equational axioms specify the required properties of the operations; their equivalence is often computed using term rewriting [15]. Structuring facilities, which group sorts and operations, allow the composition of interfaces. Typically, these languages employ first-order logic to prove properties of the specification, like the correctness of refinements. Abstract data types (ADT), which combine data and operations over that data, are well-suited to algebraic specification.

12 12 One of the earliest algebraic specification systems, for implementing, specifying and testing ADTs, is DAISTS [69]. In this system, equational axioms generally equate a termrewriting expression in a restricted dialect of ALGOL 60 against a function composition in the implementation language. For example, consider this axiom used in DAISTS: Pop2(Stack S, EltType I) : Pop(Push(S, I)) = if Depth(S) = Limit then Pop(S) else S; This axiom is taken from a specification that differentiates the accessor Top, which returns the top element of a stake without modifying the stack, and the mutator Pop, which returns a new stack lacking the previous top element. A test oracle simply executes both this axiom and its corresponding composition of implemented functions against a test suite: if they disagree, a failure has been found in the implementation or in the axiom; if they agree, we gain some assurance of their correctness. Gaudel and her colleagues [19], [20], [72], [73] were the first to provide a general testing theory founded on algebraic specification. Their idea is that an exhaustive test suite composed only of ground terms, i.e., terms with no free variables, would be sufficient to judge program correctness. This approach faces an immediate problem: the domain of each variable in a ground term might be infinite and generate an infinite number of test cases. Test suites, however, must be finite, a practical limitation to which all forms of testing are subject. The workaround is, of course, to abandon exhaustive coverage of all bindings of values to ground terms and select a finite subset of test cases [20]. Gaudel s theory focuses on observational equivalence. Observational inequivalence is, however, equally important [210]. For this reason, Frankl and Doong extended Gaudel s theory to express inequality as well an equality [52]. They proposed a notation that is suitable for object-oriented programs and developed an algebraic specification language called LOBAS and a test harness called ASTOOT. In addition to handling object-orientation, Frankl and Doong require classes to implement the testing method EQN that ASTOOT uses to check the equivalence or inequivalence of two instances of a given class. From the vantage point of an observer, an object has observable and unobservable, or hidden, state. Typically, the observable state of an object is its public fields and method return values. EQN enhances the testability of code and enables ASTOOT to approximate the observational equivalence of two objects on a sequence of messages, or method calls. When ASTOOT checks the equivalence of an object and a specification in LOBAS, it realizes a specified test oracle. Expanding upon ASTOOT, Chen et al. [40] [41] built TACCLE, a tool that employs a whitebox heuristic to generate a relevant, finite number of test cases. Their heuristic builds a data relevance graph that connects two fields of a class if one affects the other. They use this graph to consider only that can affect an observable attributes of a class when considering the (in)equivalence of two instances. Algebraic specification has been a fruitful line of research; many algebraic specification languages and tools exist, including Daistish [100], LOFT [123], CASL [11], CASCAT [205]. The projects have been evolving toward testing a wider array of entities, from ADTS, to classes, and most recently, components; they also differ in their degree of automation of test case generation and test harness creation. Bochmann et al. used LOTOS to realise test oracle functions from algebraic specifications [184]; most recently, Zhu also considered the use of algebraic specifications as test oracles [210] Specified Test Oracle Challenges Three challenges must be overcome to build specified test oracles. The first is the lack of a formal specification. Indeed, the other classes

13 13 of test oracles, discussed in this survey, all address the problem of test oracle construction in the absence of a formal specification. Formal specifications models necessarily rely on abstraction that can lead to the second problem: imprecision, models that include infeasible behavior or that do not capture all the behavior relevant to checking a specification [68]. Finally, one must contend with the problem of interpreting model output and equating it to concrete program output. Specified results are usually quite abstract, and the concrete test results of a program s executions may not be represented in a form that makes checking their equivalence to the specified result straightforward. Moreover, specified results can be partially represented or oversimplified. This is why Gaudel remarked that the existence of a formal specification does not guarantee the existence of a successful test driver [72]. Formulating concrete equivalence functions may be necessary to correctly interpret results [119]. In short, solutions to this problem of equivalence across abstraction levels depend largely on the degree of abstraction and, to a lesser extent, on the implementation of the system under test. 5 DERIVED TEST ORACLES A derived test oracle distinguishes a system s correct from incorrect behavior based on information derived from various artefacts (e.g. documentation, system executions) or properties of the system under test, or other versions of it. Testers resort to derived test oracles when specified test oracles are unavailable, which is often the case, since specifications rapidly fall out of date when they exist at all. Of course, the derived test oracle might become a partial specified test oracle, so that test oracles derived by the methods discussed in this section could migrate, over time, to become, those considered to be the specified test oracles of the previous section. For example, JWalk incrementally learns algebraic properties of the class under test [170]. It allows interactive confirmation from the tester, ensuring that the human is in the learning loop. The following sections discuss research on deriving test oracles from development artefacts, beginning in Section 5.1 with pseudooracles and N-version programming, which focus on agreement among independent implementations. Section 5.2 then introduces metamorphic relations which focuses on relations that must hold among distinct executions of a single implementation. Regression testing, Section 5.3, focuses on relations that should hold across different versions of the SUT. Approaches for inferring models from system executions, including invariant inference and specification mining, are described in Section 5.4. Section 5.5 closes with a discussion of research into extracting test oracle information from textual documentation, like comments, specifications, and requirements. 5.1 Pseudo-Oracles One of the earliest versions of a derived test oracle is the concept of a pseudo-oracle, introduced by Davis and Weyuker [50], as a means of addressing so-called non-testable programs: Programs which were written in order to determine the answer in the first place. There would be no need to write such programs, if the correct answer were known. [196]. A pseudo-oracle is an alternative version of the program produced independently, e.g. by a different programming team or written in an entirely different programming language. In our formalism (Section 2), a pseudo-oracle is a test oracle D that accepts test activity sequences of the form f 1 (x)o 1 f 2 (x)o 2 :[f 1 f 2 o 1 = o 2 ], (1) where f 1, f 2 C, the components of the SUT (Section 2), are alternative, independently produced, versions of the SUT on the same value. We draw the reader s attention to the similarity

14 14 between pseudo-oracles and algebraic specification systems (Section 4.3), like DIASTIS, where the function composition expression in the implementation language and the termrewriting expression are distinct implementations whose output must agree and form a pseudo-oracle. A similar idea exists in fault-tolerant computing, referred to as multi- or N-version programming [13], [14], where the software is implemented in multiple ways and executed in parallel. Where results differ at run-time, a voting mechanism decides which output to use. In our formalism, an N-version test oracle accepts test activities of the following form: f 1 (x)o 1 f 2 (x)o 2 f k (x)o k : [ i, j [1..k], i j f i f j arg max m(o i ) t] o i (2) In Equation 2, the outputs form a multiset and m is the multiplicity, or number of repetitions of an element in the multiset. The arg max operator finds the argument that maximizes a function s output, here an output with greatest multiplicity. Finally, the maximum multiplicity is compared against the threshold t. We can now define a N-version test oracle as D nv (w, x) where w obeys Equation 2 with t bound to x. Then D maj (w) = D nv (w, k 2 ) is an N-version oracle that requires a majority of the outputs to agree and D pso (w) = D nv (w, k) generalizes pseudo oracles to agreement across k implementations. More recently, Feldt [58] investigated the possibility of automatically producing different versions using genetic programming, and McMinn [128] explored the idea of producing different software versions for testing through program transformation and the swapping of different software elements with those of a similar specification. 5.2 Metamorphic Relations For the SUT p that implements the function f, a metamorphic relation is a relation over applications of f that we expect to hold across multiple executions of p. Suppose f(x) = e x, then e a e a = 1 is a metamorphic relation. Under this metamorphic relation, p(0.3) * p(-0.3) = 1 will hold if p is correct [43]. The key idea is that reasoning about the properties of f will lead us to relations that its implementation p must obey. Metamorphic testing is a process of exploiting metamorphic relations to generate partial test oracles for follow-up test cases: it checks important properties of the SUT after certain test cases are executed [36]. Although metamorphic relations are properties of the ground truth, the correct phenomenon (f in the example above) that a SUT seeks to implement and could be considered a mechanism for creating specified test oracles. We have placed them with derived test oracles, because, in practice, metamorphic relations are usually manually inferred from a white-box inspection of a SUT. Metamorphic relations differ from algebraic specifications in that a metamorphic relation relates different executions, not necessarily on the same input, of the same implementation relative to its specification, while algebraic specifications equates two distinct implementations of the specification, one written in an implementation language and the other written in formalism free of implementation details, usually term rewriting [15]. Under the formalism of Section 2, a metamorphic relation is f(x 1 )o 1 f(x 2 )o 2 f(i k )o k :[expr k 2], where expr is a constraint, usually arithmetic, over the inputs x i and o x. This definition makes clear that a metamorphic relation is a constraint on the values of stimulating the single SUT f at least twice, observing the responses, and imposing a constraint on how they interrelate. In contrast, algebraic specification is a type of pseudo-oracle, as specified in Equation 1, which stimulates two distinct implementations on the same value, requiring their output to be equivalent.

15 15 It is often thought that metamorphic relations need to concern numerical properties that can be captured by arithmetic equations, but metamorphic testing is, in fact, more general. For example, Zhou et al. [209] used metamorphic testing to test search engines such as Google and Yahoo!, where the relations considered are clearly non-numeric. Zhou et al. build metamorphic relations in terms of the consistency of search results. A motivating example they give is of searching for a paper in the ACM digital library: two attempts, the second quoted, using advanced search fail, but a general search identical to the first succeeds. Using this insight, the authors build metamorphic relations, like R OR : A 1 = (A 2 A 3 ) A 2 A 1, where the A i are sets of web pages returned by queries. Metamorphic testing is also means of testing Weyuker s non-testable programs, introduced in the last section. When the SUT is nondeterministic, such as a classifier whose exact output varies from run to run, defining metamorphic relations solely in terms of output equality is usually insufficient during metamorphic testing. Murphy et al. [139], [140] investigate relations other than equality, like set intersection, to relate the output of stochastic machine learning algorithms, such as classifiers. Guderlei and Mayer introduced statistical metamorphic testing, where the relations for test output are checked using statistical analysis [80], a technique later exploited to apply metamorphic testing to stochastic optimisation algorithms [203]. The biggest challenge in metamorphic testing is automating the discovery of metamorphic relations. Some of those in the literature are mathematical [36], [37], [42] or combinatorial [139], [140], [161], [203]. Work on the discovery of algebraic specifications [88] and JWalk s lazy systematic unit testing, in which the specification is lazily, and incrementally, learned through interactions between JWalk and the developer [170] might be suitable for adaptation to the discovery metamorphic relations. For instance, the programmer s development environment might track relationships among the output of test cases run during development, and propose ones that hold across many runs to the developer as possible metamorphic relations. Work has already begun that exploits domain knowledge to formulate metamorphic relations [38], but it is still at an early stage and not yet automated. 5.3 Regression Test Suites Regression testing aims to detect whether the modifications made to the new version of a SUT have disrupted existing functionality [204]. It rests on the implicit assumption that the previous version can serve as an oracle for existing functionality. For corrective modifications, desired functionality remains the same so the test oracle for version i, D i, can serve as the next version s test oracle, D i+1. Corrective modifications may fail to correct the problem they seek to address or disrupt existing functionality; test oracles may be constructed for these issues by symbolically comparing the execution of the faulty version against the newer, allegedly fixed version [79]. Orstra generates assertionbased test oracles by observing the program states of the previous version while executing the regression test suite [199]. The regression test suite, now augmented with assertions, is then applied to the newer version. Similarly, spectra-based approaches use the program and value spectra obtained from the original version to detect regression faults in the newer versions [86], [200]. For perfective modifications, those that add new features to the SUT, D i must be modified to cater for newly added behaviours, i.e. D i+1 = D i D. Test suite augmentation techniques specialise in identifying and generating D [6], [132], [202]. However, more work is required to develop these augmentation techniques so that they augment, not merely the test input, but also the expected output. In this way, test suite augmentation could be extended

16 16 to augment the existing oracles as well as the test data. Changes in the specification, which is deemed to fail to meet requirements perhaps because the requirements have themselves changed, drives another class of modifications. These changes are generally regarded as perfective maintenance in the literature but no distinction is made between perfections that add new functionality to code (without changing requirements) and those changes that arise due to changed requirements (or incorrect specifications). Our formalisation of test oracles in Section 2 forces a distinction of these two categories of perfective maintenance, since the two have profoundly different consequences for test oracles. We therefore refer to this new category of perfective maintenance as changed requirements. Recall that, for the function f : X Y, dom(f) = X. For changed requirements: α D i+1 (α) D i (α), which implies, of course, dom(d i+1 ) dom(d i ) and the new test oracle cannot simply union the new behavior with the old test oracle. Instead, we have { D if α dom( D) D i+1 = otherwise. D i 5.4 System Executions A system execution trace can be exploited to derive test oracles or to reduce the cost of a human test oracle by aligning an incorrect execution against the expected execution, as expressed in temporal logic [51]. This section discusses the two main techniques for deriving test oracles from traces invariant detection and specification mining. Derived test oracles can be built on both techniques to automatically check expected behaviour similar to assertion-based specification, discussed in Section Invariant Detection Program behaviours can be automatically checked against invariants. Thus, invariants can serve as test oracles to help determine the correct and incorrect outputs. When invariants are not available for a program in advance, they can be learned from the program (semi-) automatically. A well-known technique proposed by Ernst et al. [56], implemented in the Daikon tool [55], is to execute a program on a collection of inputs (test cases) against a collection of potential invariants. The invariants are instantiated by binding their variables to the program s variables. Daikon then dynamically infers likely invariants from those invariants not violated during the program executions over the inputs. The inferred invariants capture program behaviours, and thus can be used to check program correctness. For example, in regression testing, invariants inferred from the previous version can be checked as to whether they still hold in the new version. In our formalism, Daikon invariant detection can define an unsound test oracle that gathers likely invariants from the prefix of a testing activity sequence, then enforces those invariants over its suffix. Let I j be the set of likely invariants at observation j; I 0 are the initial invariants; for the test activity sequence r 1 r 2 r n, I n = {x I i [1..n], r i = x}, where = is logical entailment. Thus, we take an observation to define a binding of the variables in the world under which a likely invariant either holds or does not: only those likely invariants remain that no observation invalidates. In the suffix r n+1 r n+2 r m, the test oracle then changes gear and accepts only those activities whose response observations obey I n, i.e. r i :[r i = I n ], i > n. Invariant detection can be computationally expensive, so incremental [22], [171] and light weight static analyses [39], [63] have been brought to bear. A technical report summarises various dynamic analysis techniques [158]. Model inference [90], [187] could also be re-

17 17 garded as a form of invariant generation in which the invariant is expressed as a model (typically as an FSM). Ratcliff et al. used Search-Based Software Engineering (SBSE) [84] to search for invariants, guided by mutation testing [154]. The accuracy of inferred invariants depends in part on the quality and completeness of the test cases; additional test cases might provide new data from which more accurate invariants can be inferred [56]. Nevertheless, inferring perfect invariants is almost impossible with the current state of the art, which tends to frequently infer incorrect or irrelevant invariants [152]. Wei et al. recently leveraged existing contracts in Eiffel code to infer postconditions on commands (as opposed to queries) involving quantification or implications whose premises are conjunctions of formulae [192], [193]. Human intervention can, of course, be used to filter the resulting invariants, i.e., retaining the correct ones and discarding the rest. However, manual filtering is error-prone and the misclassification of invariants is frequent. In a recent empirical study, Staats et al. found that half of the incorrect invariants Daikon inferred from a set of Java programs were misclassified [175]. Despite these issues, research on the dynamic inference of program invariants has exhibited strong momentum in the recent past with the primary focus on its application to test generation [10], [142], [207] Specification Mining Specification mining or inference infers a formal model of program behaviour from a set of observations. In terms of our formalism, a test oracle can enforce these formal models over test activities. In her seminal work on using inference to assess test data adequacy, Weyuker connected inference and testing as inverse processes [194]. The testing process starts with a program, and looks for I/O pairs that characterise every aspect of both the intended and actual behaviours, while inference starts with a set of I/O pairs, and derives a program to fit the given behaviour. Weyuker defined this relation for assessing test adequacy which can be stated informally as follows. A set of I/O pairs T is an inference adequate test set for the program P intended to fulfil specification S iff the program I T inferred from T (using some inference procedure) is equivalent to both P and S. Any difference would imply that the inferred program is not equivalent to the actual program and, therefore, that the test set T used to infer the program P is inadequate. This inference procedure mainly depends upon the set of I/O pairs used to infer behaviours. These pairs can be obtained from system executions either passively, e.g., by runtime monitoring, or actively, e.g., by querying the system [106]. However, equivalence checking is undecidable in general, and therefore inference is only possible for programs in a restricted class, such as those whose behaviour can be modelled by finite state machines [194]. With this, equivalence can be accomplished by experiment [89]. Nevertheless, serious practical limitations are associated with such experiments (see the survey by Lee and Yannakakis [112] for complete discussion). The marriage between inference and testing has produced wealth of techniques, especially in the context of black-box systems, when source code/behavioural models are unavailable. Most work has applied L, a well-known learning algorithm, to learn a black-box system B as a finite state machine (FSM) with n states [7]. The algorithm infers an FSM by iteratively querying B and observing the corresponding outputs. A string distinguishes two FSMs when only one of the two machines ends in a final state upon consuming the string. At each iteration, an inferred model M i with i < n states is given. Then, the model is refined with the help of a string that distinguishes B and M i to produce a new model, until the number of states reaches n. Lee and Yannakakis [112] showed how to use L for conformance testing of B with a

18 18 specification S. Suppose L starts by inferring a model M i, then we compute a string that distinguishes M i from S and refine M i through the algorithm. If, for i = n, M n is S, then we declare B to be correct, otherwise faulty. Apart from conformance testing, inference techniques have been used to guide test generation to focus on particular system behavior and to reduce the scope of analysis. For example, Li et al. applied L to the integration testing of a system of black-box components [114]. Their analysis architecture derives a test oracle from a test suite by using L to infer a model of the systems from dynamically observing system s behavior; this model is then searched to find incorrect behaviors, such as deadlocks, and used to verify the system s behaviour under fuzz testing (Section 6). To find concurrency issues in asynchronous black-box systems, Groz et al. proposed an approach that extracts behavioural models from systems through active learning techniques [78] and then performs reachability analysis on the models [27] to detect issues, notably races. Further work in this context has been compiled by Shahbaz [166] with industrial applications. Similar applications of inference can be found in system analysis [21], [78], [135], [188], [189], component interaction testing [115], [122], regression testing [200], security testing [168] and verification [53], [77], [148]. Zheng et al. [208] extract item sets from web search queries and their results, then apply association rule mining to infer rules. From these rules, they construct derived test oracles for web search engines, which had been thought to be untestable. Image segmentation delineates objects of interest in an image; implementing segmentation programs is a tedious, iterative process. Frouchni et al. successfully apply semi-supervised machine learning to create test oracles for image segmentation programs [67]. Memon et al. [133], [134], [198] introduced and developed the GUITAR tool, which has been evaluated by treating the current version of the SUT as correct, inferring the specification, and then executing the generated test inputs. Artificial Neural Networks have also been applied to learn system behaviour and detect deviations from it [163], [164]. The majority of specification mining techniques adopt Finite State Machines as the output format to capture the functional behaviour of the SUT [21], [27], [53], [77], [78], [89], [112], [114], [135], [148], [166], [168], [189], sometimes extended with temporal constraints [188] or data constraints [115], [122] which are, in turn, inferred by Daikon [56]. Büchi automata have been used to check properties against black-box systems [148]. Annotated call trees have been used to represent the program behaviour of different versions in the regression testing context [200]. GUI widgets have been directly modelled with objects and properties for testing [133], [134], [198]. Artificial Neural Nets and machine learning classifiers have been used to learn the expected behaviour of SUT [67], [163], [164]. For dynamic and fuzzy behaviours such as the result of web search engine queries, association rules between input (query) and output (search result strings) have been used as the format of an inferred oracle [208]. 5.5 Textual Documentation Textual documentation ranges from natural language descriptions of requirements to structured documents detailing the functionalities of APIs. These documents describe the functionalities expected from the SUT to varying degrees, and can therefore serve as a basis for generating test oracles. They are usually informal, intended for other humans, not to support formal logical or mathematical reasoning. Thus, they are often partial and ambiguous, in contrast to specification languages. Their importance for test oracle construction rests on the fact that developers are more likely to write them than formal specifications. In other words, the documentation defines the

19 19 constraints that the test oracle D, as defined in Section 2, enforces over testing activities. At first sight, it may seem impossible to derive test oracles automatically because natural languages are inherently ambiguous and textual documentation is often imprecise and inconsistent. The use of textual documentation has often been limited to humans in practical testing applications [144]. However, some partial automation can assist the human in testing using documentation as a source of test oracle information. Two approaches have been explored. The first category builds techniques to construct a formal specification out of an informal, textual artefact, such as an informal textual specification, user and developer documentation, and even source code comments. The second restricts a natural language to a semi-formal fragment amenable to automatic processing. Next, we present representative examples of each approach Converting Text into Specifications Prowell and Poore [153] introduced a sequential enumeration method for developing a formal specification from an informal one. The method systematically enumerates all sequences from the input domain and maps the corresponding outputs to produce an arguably complete, consistent, and correct specification. However, it can suffer from an exponential explosion in the number of input/output sequences. Prowell and Poore employ abstraction techniques to control this explosion. The end result is a formal specification that can be transferred into a number of notations, e.g., state transition systems. A notable benefit of this approach is that it tends to discover many inconsistent and missing requirements, making the specification more complete and precise Restricting Natural Language Restrictions on a natural language reduce complexities in its grammar and lexicon and allow the expression of requirements in a concise vocabulary with minimal ambiguity. This, in turn, eases the interpretation of documents and makes the automatic derivation of test oracles possible. The researchers who have proposed specification languages based on (semi-) formal subsets of a natural language are motivated by the fact that model-based specification languages have not seen wide-spread adoption, and believe the reason is the inaccessibility their formalism and set-theoretic underpinnings to the average programmer. Schwitter introduced a computerprocessable, restricted natural language called PENG [160]. It covers a strict subset of standard English with a restricted grammar and a domain specific lexicon for content words and predefined function words. Documents written in PENG can be translated deterministically into first-order predicate logic. Schwitter et al. [30] provided guidelines for writing test scenarios in PENG that can automatically judge the correctness of program behaviours. 6 IMPLICIT TEST ORACLES An implicit test oracle is one that relies on general, implicit knowledge to distinguish between a system s correct and incorrect behaviour. This generally true implicit knowledge includes such facts as buffer overflows and segfaults are nearly always errors. The critical aspect of an implicit test oracle is that it requires neither domain knowledge nor a formal specification to implement, and it applies to nearly all programs. Implicit test oracle can be built on any procedure that detects anomalies such as abnormal termination due to a crash or an execution failure [34], [167]. This is because such anomalies are blatant faults; that is, no more information is required to ascertain whether the program behaved correctly or not. Under our formalism, an implicit oracle defines a subset of stimulus and response relations as guaranteed failures, in some context.

20 20 Implicit test oracles are not universal. Behaviours abnormal for one system in one context may be normal for that system in a different context or normal for a different system. Even crashing may be considered acceptable, or even desired behaviour, as in systems designed to find crashes. Research on implicit oracles is evident from early work in software engineering. The very first work in this context was related to deadlock, livelock and race detection to counter system concurrency issues [24] [107] [185] [16] [169]. Similarly, research on testing nonfunctional attributes have garnered much attention since the advent of the object-oriented paradigm. In performance testing, system throughput metrics can highlight degradation errors [121], [124], as when a server fails to respond when a number of requests are sent simultaneously. A case study by Weyuker and Vokolos showed how a process with excessive CPU usage caused service delays and disruptions [195]. Similarly, test oracles for memory leaks can be built on a profiling technique that detects dangling references during the run of a program [12], [57], [87], [211]. For example, Xie and Aiken proposed a boolean constraint system to represent the dynamically allocated objects in a program [201]. Their system raises an alarm when an object becomes unreachable but has not yet been deallocated. Fuzzing is an effective way to find implicit anomalies, such a crashes [137]. The main idea is to generate random, or fuzz, inputs and feed them to the system to find anomalies. This works because the implicit specification usually holds over all inputs, unlike explicit specifications which tend to relate subsets of inputs to outputs. If an anomaly is detected, the fuzz tester reports it along with the input that triggers it. Fuzzing is commonly used to detect security vulnerabilities, such as buffer overflows, memory leaks, unhandled exceptions, denial of service, etc. [18], [177]. Other work has focused on developing patterns to detect anomalies. For instance, Ricca and Tonella [155] considered a subset of the anomalies that Web applications can harbor, such as navigation problems, hyperlink inconsistencies, etc. In their empirical study, 60% of the Web applications exhibited anomalies and execution failures. 7 THE HUMAN ORACLE PROBLEM The above sections give solutions to the test oracle problem when some artefact exists that can serve as the foundation for either a full or partial test oracle. In many cases, however, no such artefact exists so a human tester must verify whether software behaviour is correct given some stimuli. Despite the lack of an automated test oracle, software engineering research can still play a key role: finding ways to reduce the effort that the human tester has to expend in directly creating, or in being, the test oracle. This effort is referred to as the Human Oracle Cost [126]. It aims to reduce the cost of human involvement along two dimensions: 1) writing test oracles and 2) evaluating test outcomes. Concerning the first dimension, the work of Staats et al. is a representative. They seek to reduce the human oracle cost by guiding human testers to those parts of the code they need to focus on when writing test oracles [173]. This reduces the cost of test oracle construction, rather than reducing the cost of a human involvement in testing in the absence of an automated test oracle. Additional recent work on test oracle construction includes Dodona, a tool that suggests oracle data to a human who then decides whether to use it to define a test oracle realized as a Java unit test [116]. Dodona infers relations among program variables during execution, using network centrality analysis and data flow. Research that seeks to reduce the human oracle cost broadly focuses on finding a quantitative reduction in the amount of work the tester has to do for the same amount of test coverage or finding a qualitative reduction in the work needed to understand and evaluate test cases.

Introducing Formal Methods Formal Methods for Software Specification and Analysis: An Overview 1 Software Engineering and Formal Methods Every Software engineering methodology is based on a recommended

Module 10 Coding and Testing Lesson 23 Code Review Specific Instructional Objectives At the end of this lesson the student would be able to: Identify the necessity of coding standards. Differentiate between

Software testing cmsc435-1 Objectives To discuss the distinctions between validation testing and defect testing To describe the principles of system and component testing To describe strategies for generating

The Role of Programming in Informatics Curricula A. J. Cowling Department of Computer Science University of Sheffield Structure of Presentation Introduction The problem, and the key concepts. Dimensions

Günter Böckle Klaus Pohl Frank van der Linden 2 A Framework for Software Product Line Engineering In this chapter you will learn: o The principles of software product line subsumed by our software product

Chapter 8 Software Testing Summary 1 Topics covered Development testing Test-driven development Release testing User testing 2 Program testing Testing is intended to show that a program does what it is

CHAPTER 7 GENERAL PROOF SYSTEMS 1 Introduction Proof systems are built to prove statements. They can be thought as an inference machine with special statements, called provable statements, or sometimes

Systems Integration: Component-based software engineering Objectives To explain that CBSE is concerned with developing standardised components and composing these into applications To describe components

Appendix E Glossary of Object Oriented Terms abstract class: A class primarily intended to define an instance, but can not be instantiated without additional methods. abstract data type: An abstraction

2008 AGI-Information Management Consultants May be used for personal purporses only or by libraries associated to dandelon.com network. Computing Concepts with Java Essentials 3rd Edition Cay Horstmann

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

Chapter II. Controlling Cars on a Bridge 1 Introduction The intent of this chapter is to introduce a complete example of a small system development. During this development, you will be made aware of the

Static Program Transformations for Efficient Software Model Checking Shobha Vasudevan Jacob Abraham The University of Texas at Austin Dependable Systems Large and complex systems Software faults are major

Mathematical Reasoning in Software Engineering Education Peter B. Henderson Butler University Introduction Engineering is a bridge between science and mathematics, and the technological needs of mankind.

Standard for Software Component Testing Working Draft 3.4 Date: 27 April 2001 produced by the British Computer Society Specialist Interest Group in Software Testing (BCS SIGIST) Copyright Notice This document

Appendix B Data Quality Dimensions Purpose Dimensions of data quality are fundamental to understanding how to improve data. This appendix summarizes, in chronological order of publication, three foundational

UML 2.0 in a Nutshell Appendix B. The Object Constraint Pub Date: June 2005 Language The Object Constraint Language 2.0 (OCL) is an addition to the UML 2.0 specification that provides you with a way to

Firewall Verification and Redundancy Checking are Equivalent H. B. Acharya University of Texas at Austin acharya@cs.utexas.edu M. G. Gouda National Science Foundation University of Texas at Austin mgouda@nsf.gov

Mathematics for Computer Science/Software Engineering Notes for the course MSM1F3 Dr. R. A. Wilson October 1996 Chapter 1 Logic Lecture no. 1. We introduce the concept of a proposition, which is a statement

Design by Contract beyond class modelling Introduction Design by Contract (DbC) or Programming by Contract is an approach to designing software. It says that designers should define precise and verifiable

Factoring & Primality Lecturer: Dimitris Papadopoulos In this lecture we will discuss the problem of integer factorization and primality testing, two problems that have been the focus of a great amount

Module 10 Coding and Testing Lesson 26 Debugging, Integration and System Testing Specific Instructional Objectives At the end of this lesson the student would be able to: Explain why debugging is needed.

UML Tutorial: Collaboration Diagrams Robert C. Martin Engineering Notebook Column Nov/Dec, 97 In this column we will explore UML collaboration diagrams. We will investigate how they are drawn, how they

Execution of A Requirement Model in Software Development Wuwei Shen, Mohsen Guizani and Zijiang Yang Dept of Computer Science, Western Michigan University {wwshen,mguizani,zijiang}@cs.wmich.edu Kevin Compton

Architecture Artifacts Vs Application Development Artifacts By John A. Zachman Copyright 2000 Zachman International All of a sudden, I have been encountering a lot of confusion between Enterprise Architecture

Testing and Inspecting to Ensure High Quality Basic definitions A failure is an unacceptable behaviour exhibited by a system The frequency of failures measures the reliability An important design objective

WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT? introduction Many students seem to have trouble with the notion of a mathematical proof. People that come to a course like Math 216, who certainly

School of Computing FACULTY OF ENGINEERING MEng, BSc Applied Computer Science Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give a machine instructions