Academic Commons Search Resultshttp://academiccommons.columbia.edu/catalog.rss?f%5Bauthor_facet%5D%5B%5D=Murphy%2C+Christian&q=&rows=500&sort=record_creation_date+desc
Academic Commons Search Resultsen-usMetamorphic Runtime Checking of Applications Without Test Oracleshttp://academiccommons.columbia.edu/catalog/ac:181974
Bell, Jonathan Schaffer; Murphy, Christian; Kaiser, Gailhttp://dx.doi.org/10.7916/D8J9655PFri, 30 Jan 2015 00:00:00 +0000For some applications, it is impossible or impractical to know what the correct output should be for an arbitrary input, making testing difficult. Many machine-­learning applications for “big data”, bioinformatics and cyberphysical systems fall in this scope: they do not have a test oracle. Metamorphic Testing, a simple testing technique that does not require a test oracle, has been shown to be effective for testing such applications. We present Metamorphic Runtime Checking, a novel approach that conducts metamorphic testing of both the entire application and individual functions during a program’s execution. We have applied Metamorphic Runtime Checking to 9 machine-­‐learning applications, finding it to be on average 170% more effective than traditional metamorphic testing at only the full application level.Computer sciencejsb2125, gek1Computer ScienceTechnical reportsMetamorphic Runtime Checking of Applications without Test Oracleshttp://academiccommons.columbia.edu/catalog/ac:165342
Murphy, Christian; Kaiser, Gail E.; Bell, Jonathan Schaffer; Su, Fang-hsianghttp://hdl.handle.net/10022/AC:P:21695Thu, 19 Sep 2013 00:00:00 +0000Challenges arise in testing applications that do not have test oracles, i.e., for which it is impossible or impractical to know what the correct output should be for general input. Metamorphic testing, introduced by Chen et al., has been shown to be a simple yet effective technique in testing these types of applications: test inputs are transformed in such a way that it is possible to predict the expected change to the output, and if the output resulting from this transformation is not as expected, then a fault must exist. Here, we improve upon previous work by presenting a new technique called Metamorphic Runtime Checking, which automatically conducts metamorphic testing of both the entire application and individual functions during a program's execution. This new approach improves the scope, scale, and sensitivity of metamorphic testing by allowing for the identification of more properties and execution of more tests, and increasing the likelihood of detecting faults not be found by application-level properties alone. We also discuss a technique for automatically discovering functions' metamorphic properties, and present the results of new studies that demonstrate that Metamorphic Runtime Checking advances the state of the art in testing applications without oracles.Computer sciencegek1, jsb2125, fs2455Computer ScienceTechnical reportsAutomatic Detection of Defects in Applications without Test Oracleshttp://academiccommons.columbia.edu/catalog/ac:133587
Murphy, Christian; Kaiser, Gail E.http://hdl.handle.net/10022/AC:P:10518Thu, 09 Jun 2011 00:00:00 +0000In application domains that do not have a test oracle, such as machine learning and scientific computing, quality assurance is a challenge because it is difficult or impossible to know in advance what the correct output should be for general input. Previously, metamorphic testing has been shown to be a simple yet effective technique in detecting defects, even without an oracle. In metamorphic testing, the application's ``metamorphic properties'' are used to modify existing test case input to produce new test cases in such a manner that, when given the new input, the new output can easily be computed based on the original output. If the new output is not as expected, then a defect must exist. In practice, however, metamorphic testing can be a manually intensive technique for all but the simplest cases. The transformation of input data can be laborious for large data sets, and errors can occur in comparing the outputs when they are very complex. In this paper, we present a tool called Amsterdam that automates metamorphic testing by allowing the tester to easily set up and conduct metamorphic tests with little manual intervention, merely by specifying the properties to check, configuring the framework, and running the software. Additionally, we describe an approach called Heuristic Metamorphic Testing, which addresses issues related to false positives and non-determinism, and we present the results of new empirical studies that demonstrate the effectiveness of metamorphic testing techniques at detecting defects in real-world programs without test oracles.Computer sciencegek1Computer ScienceTechnical reportsOn Effective Testing of Health Care Simulation Softwarehttp://academiccommons.columbia.edu/catalog/ac:133635
Murphy, Christian; Raunak, M. S.; King, Andrew; Chen, Sanjien; Imbriano, Christopher; Kaiser, Gail E.; Lee, Insup; Sokolsky, Oleg; Clarke, Lori; Osterweil, Leonhttp://hdl.handle.net/10022/AC:P:10532Thu, 09 Jun 2011 00:00:00 +0000Health care professionals rely on software to simulate anatomical and physiological elements of the human body for purposes of training, prototyping, and decision making. Software can also be used to simulate medical processes and protocols to measure cost effectiveness and resource utilization. Whereas much of the software engineering research into simulation software focuses on validation (determining that the simulation accurately models real-world activity), to date there has been little investigation into the testing of simulation software itself, that is, the ability to effectively search for errors in the implementation. This is particularly challenging because often there is no test oracle to indicate whether the results of the simulation are correct. In this paper, we present an approach to systematically testing simulation software in the absence of test oracles, and evaluate the effectiveness of the technique.Computer sciencegek1Computer ScienceTechnical reportsEmpirical Evaluation of Approaches to Testing Applications without Test Oracleshttp://academiccommons.columbia.edu/catalog/ac:133608
Murphy, Christian; Kaiser, Gail E.http://hdl.handle.net/10022/AC:P:10525Thu, 09 Jun 2011 00:00:00 +0000Software testing of applications in fields like scientific computing, simulation, machine learning, etc. is particularly challenging because many applications in these domains have no reliable "test oracle" to indicate whether the program's output is correct when given arbitrary input. A common approach to testing such applications has been to use a "pseudo-oracle", in which multiple independently-developed implementations of an algorithm process an input and the results are compared. Other approaches include the use of program invariants, formal specification languages, trace and log file analysis, and metamorphic testing. In this paper, we present the results of two empirical studies in which we compare the effectiveness of some of these approaches, including metamorphic testing, pseudo-oracles, and runtime assertion checking. We also analyze the results in terms of the software development process, and discuss suggestions for practitioners and researchers who need to test software without a test oracle.Computer sciencegek1Computer ScienceTechnical reportsThe weHelp Reference Architecture for Community-Driven Recommender Systemshttp://academiccommons.columbia.edu/catalog/ac:133550
Sheth, Swapneel Kalpesh; Arora, Nipun; Murphy, Christian; Kaiser, Gail E.http://hdl.handle.net/10022/AC:P:10509Wed, 08 Jun 2011 00:00:00 +0000Recommender systems have become increasingly popular. Most research on recommender systems has focused on recommendation algorithms. There has been relatively little research, however, in the area of generalized system architectures for recommendation systems. In this paper, we introduce weHelp - a reference architecture for social recommender systems. Our architecture is designed to be application and domain agnostic, but we briefly discuss here how it applies to recommender systems for software engineering.Computer sciencesks2142, na2271, gek1Computer ScienceTechnical reportsCONFU: Configuration Fuzzing Testing Framework for Software Vulnerability Detectionhttp://academiccommons.columbia.edu/catalog/ac:133544
Dai, Huning; Murphy, Christian; Kaiser, Gail E.http://hdl.handle.net/10022/AC:P:10507Wed, 08 Jun 2011 00:00:00 +0000Many software security vulnerabilities only reveal themselves under certain conditions, i.e., particular configurations and inputs together with a certain runtime environment. One approach to detecting these vulnerabilities is fuzz testing. However, typical fuzz testing makes no guarantees regarding the syntactic and semantic validity of the input, or of how much of the input space will be explored. To address these problems, we present a new testing methodology called Configuration Fuzzing. Configuration Fuzzing is a technique whereby the configuration of the running application is mutated at certain execution points, in order to check for vulnerabilities that only arise in certain conditions. As the application runs in the deployment environment, this testing technique continuously fuzzes the configuration and checks "security invariants'' that, if violated, indicate a vulnerability. We discuss the approach and introduce a prototype framework called ConFu (CONfiguration FUzzing testing framework) for implementation. We also present the results of case studies that demonstrate the approach's feasibility and evaluate its performance.Computer sciencehd2210, gek1Computer ScienceTechnical reportsMetamorphic Testing Techniques to Detect Defects in Applications without Test Oracleshttp://academiccommons.columbia.edu/catalog/ac:133541
Murphy, Christianhttp://hdl.handle.net/10022/AC:P:10505Tue, 07 Jun 2011 00:00:00 +0000Applications in the fields of scientific computing, simulation, optimization, machine learning, etc. are sometimes said to be "non-testable programs" because there is no reliable test oracle to indicate what the correct output should be for arbitrary input. In some cases, it may be impossible to know the program's correct output a priori; in other cases, the creation of an oracle may simply be too hard. These applications typically fall into a category of software that Weyuker describes as "Programs which were written in order to determine the answer in the first place. There would be no need to write such programs, if the correct answer were known." The absence of a test oracle clearly presents a challenge when it comes to detecting subtle errors, faults, defects or anomalies in software in these domains. Without a test oracle, it is impossible to know in general what the expected output should be for a given input, but it may be possible to predict how changes to the input should effect changes in the output, and thus identify expected relations among a set of inputs and among the set of their respective outputs. This approach, introduced by Chen et al., is known as "metamorphic testing". In metamorphic testing, if test case input x produces an output f(x), the function's so-called "metamorphic properties" can then be used to guide the creation of a transformation function t, which can then be applied to the input to produce t(x); this transformation then allows us to predict the expected output f(t(x)), based on the (already known) value of f(x). If the new output is as expected, it is not necessarily right, but any violation of the property indicates a defect. That is, though it may not be possible to know whether an output is correct, we can at least tell whether an output is incorrect. This thesis investigates three hypotheses. First, I claim that an automated approach to metamorphic testing will advance the state of the art in detecting defects in programs without test oracles, particularly in the domains of machine learning, simulation, and optimization. To demonstrate this, I describe a tool for test automation, and present the results of new empirical studies comparing the effectiveness of metamorphic testing to that of other techniques for testing applications that do not have an oracle. Second, I suggest that conducting function-level metamorphic testing in the context of a running application will reveal defects not found by metamorphic testing using system-level properties alone, and introduce and evaluate a new testing technique called Metamorphic Runtime Checking. Third, I hypothesize that it is feasible to continue this type of testing in the deployment environment (i.e., after the software is released), with minimal impact on the user, and describe a generalized approach called In Vivo Testing. Additionally, this thesis presents guidelines for identifying metamorphic properties, explains how metamorphic testing fits into the software development process, and discusses suggestions for both practitioners and researchers who need to test software without the help of a test oracle.Computer scienceComputer ScienceTechnical reportsTesting and Validating Machine Learning Classifiers by Metamorphic Testinghttp://academiccommons.columbia.edu/catalog/ac:133529
Xie, Xiaoyuan; Ho, Joshua W. K.; Murphy, Christian; Kaiser, Gail E.; Xu, Baowen; Chen, Tsong Yuehhttp://hdl.handle.net/10022/AC:P:10501Tue, 07 Jun 2011 00:00:00 +0000Machine Learning algorithms have provided important core functionality to support solutions in many scientific computing applications - such as computational biology, computational linguistics, and others. However, it is difficult to test such applications because often there is no "test oracle" to indicate what the correct output should be for arbitrary input. To help address the quality of scientific computing software, in this paper we present a technique for testing the implementations of machine learning classification algorithms on which such scientific computing software depends. Our technique is based on an approach called "metamorphic testing", which has been shown to be effective in such cases. Also presented is a case study on a real-world machine learning application framework, and a discussion of how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also conduct mutation analysis and cross-validation, which reveal that our method has very high effectiveness in killing mutants, and that observing expected cross-validation result alone is not sufficient to test for the correctness of a supervised classification program. Metamorphic testing is strongly recommended as a complementary approach. Finally we discuss how our findings can be used in other areas of computational science and engineering.Computer sciencegek1Computer ScienceTechnical reportsAutomatic Detection of Previously-Unseen Application States for Deployment Environment Testing and Analysishttp://academiccommons.columbia.edu/catalog/ac:133532
Murphy, Christian; Vaughan, Moses; Ilahi, Waseem; Kaiser, Gail E.http://hdl.handle.net/10022/AC:P:10502Tue, 07 Jun 2011 00:00:00 +0000For large, complex software systems, it is typically impossible in terms of time and cost to reliably test the application in all possible execution states and configurations before releasing it into production. One proposed way of addressing this problem has been to continue testing and analysis of the application in the field, after it has been deployed. The theory behind this "perpetual testing" approach is that over time, defects will reveal themselves given that multiple instances of the same application may be run globally with different configurations, in different environments, under different patterns of usage, and in different system states. A practical limitation of many automated approaches to deployment environment testing and analysis is the potentially high performance overhead incurred by the necessary instrumentation. However, it may be possible to reduce this overhead by selecting test cases and performing analysis only in previously-unseen application states, thus reducing the number of redundant tests and analyses that are run. Solutions for fault detection, model checking, security testing, and fault localization in deployed software may all benefit from a technique that ignores application states that have already been tested or explored. In this paper, we apply such a technique to a testing methodology called "In Vivo Testing", which conducts tests in deployed applications, and present a solution that ensures that tests are only executed in states that the application has not previously encountered. In addition to discussing our implementation, we present the results of an empirical study that demonstrates its effectiveness, and explain how the new approach can be generalized to assist other automated testing and analysis techniques.Computer sciencemjv2123, wki2001, gek1Computer ScienceTechnical reportsAn Approach to Software Testing of Machine Learning Applicationshttp://academiccommons.columbia.edu/catalog/ac:110692
Murphy, Christian; Kaiser, Gail E.; Arias, Martahttp://hdl.handle.net/10022/AC:P:29502Thu, 28 Apr 2011 00:00:00 +0000Some machine learning applications are intended to learn properties of data sets where the correct answers are not already known to human users. It is challenging to test such ML software, because there is no reliable test oracle. We describe a software testing approach aimed at addressing this problem. We present our findings from testing implementations of two different ML ranking algorithms: Support Vector Machines and MartiRank.Computer sciencegek1Computer ScienceTechnical reportsProperties of Machine Learning Applications for Use in Metamorphic Testinghttp://academiccommons.columbia.edu/catalog/ac:110846
Murphy, Christian; Kaiser, Gail E.; Hu, Lifenghttp://hdl.handle.net/10022/AC:P:29550Wed, 27 Apr 2011 00:00:00 +0000It is challenging to test machine learning (ML) applications, which are intended to learn properties of data sets where the correct answers are not already known. In the absence of a test oracle, one approach to testing these applications is to use metamorphic testing, in which properties of the application are exploited to define transformation functions on the input, such that the new output will be unchanged or can easily be predicted based on the original output; if the output is not as expected, then a defect must exist in the application. Here, we seek to enumerate and classify the metamorphic properties of some machine learning algorithms, and demonstrate how these can be applied to reveal defects in the applications of interest. In addition to the results of our testing, we present a set of properties that can be used to define these metamorphic relationships so that metamorphic testing can be used as a general approach to testing machine learning applications.Computer sciencegek1, lh2342Computer ScienceTechnical reportsBackstop: A Tool for Debugging Runtime Errorshttp://academiccommons.columbia.edu/catalog/ac:110742
Murphy, Christian; Kim, Eunhee; Kaiser, Gail E.; Cannon, Adamhttp://hdl.handle.net/10022/AC:P:29518Wed, 27 Apr 2011 00:00:00 +0000The errors that Java programmers are likely to encounter can roughly be categorized into three groups: compile-time (semantic and syntactic), logical, and runtime (exceptions). While much work has focused on the first two, there are very few tools that exist for interpreting the sometimes cryptic messages that result from runtime errors. Novice programmers in particular have difficulty dealing with uncaught exceptions in their code and the resulting stack traces, which are by no means easy to understand. We present Backstop, a tool for debugging runtime errors in Java applications. This tool provides more user-friendly error messages when an uncaught exception occurs, but also provides debugging support by allowing users to watch the execution of the program and the changes to the values of variables. We also present the results of two studies conducted on introductory-level programmers using the two different features of the tool.Computer scienceek2044, gek1Computer ScienceTechnical reportsParameterizing Random Test Data According to Equivalence Classeshttp://academiccommons.columbia.edu/catalog/ac:110733
Murphy, Christian; Kaiser, Gail E.; Arias, Martahttp://hdl.handle.net/10022/AC:P:29515Wed, 27 Apr 2011 00:00:00 +0000We are concerned with the problem of detecting bugs in machine learning applications. In the absence of sufficient real-world data, creating suitably large data sets for testing can be a difficult task. Random testing is one solution, but may have limited effectiveness in cases in which a reliable test oracle does not exist, as is the case of the machine learning applications of interest. To address this problem, we have developed an approach to creating data sets called "parameterized random data generation". Our data generation framework allows us to isolate or combine different equivalence classes as desired, and then randomly generate large data sets using the properties of those equivalence classes as parameters. This allows us to take advantage of randomness but still have control over test case selection at the system testing level. We present our findings from using the approach to test two different machine learning ranking applications.Computer sciencegek1Computer ScienceTechnical reportsDistributed In Vivo Testing of Software Applicationshttp://academiccommons.columbia.edu/catalog/ac:110800
Chu, Matt; Murphy, Christian; Kaiser, Gail E.http://hdl.handle.net/10022/AC:P:29536Wed, 27 Apr 2011 00:00:00 +0000The in vivo software testing approach focuses on testing live applications by executing unit tests throughout the lifecycle, including after deployment. The motivation is that the "known state" approach of traditional unit testing is unrealistic; deployed applications rarely operate under such conditions, and it may be more informative to perform the testing in live environments. One of the limitations of this approach is the high performance cost it incurs, as the unit tests are executed in parallel with the application. Here we present distributed in vivo testing, which focuses on easing the burden by sharing the load across multiple instances of the application of interest. That is, we elevate the scope of in vivo testing from a single instance to a community of instances, all participating in the testing process. Our approach is different from prior work in that we are actively testing during execution, as opposed to passively monitoring the application or conducting tests in the user environment prior to execution. We discuss new extensions to the existing in vivo testing framework (called Invite) and present empirical results that show the performance overhead improves linearly with the number of clients.Computer sciencemwc2110, gek1Computer ScienceTechnical reportsExperiences in Teaching eXtreme Programming in a Distance Learning Programhttp://academiccommons.columbia.edu/catalog/ac:110774
Murphy, Christian; Phung, Dan; Kaiser, Gail E.http://hdl.handle.net/10022/AC:P:29528Wed, 27 Apr 2011 00:00:00 +0000As university-level distance learning programs become more and more popular, and software engineering courses incorporate eXtreme Programming (XP) into their curricula, certain challenges arise when teaching XP to students who are not physically co-located. In this paper, we present our experiences and observations from managing such an online software engineering course, and describe some of the specific challenges we faced, such as students' aversion to using XP and difficulties in scheduling. We also present some suggestions to other educators who may face similar situations.Computer sciencegek1Computer ScienceTechnical reportsTowards In Vivo Testing of Software Applicationshttp://academiccommons.columbia.edu/catalog/ac:110777
Murphy, Christian; Kaiser, Gail E.; Chu, Matthttp://hdl.handle.net/10022/AC:P:29529Wed, 27 Apr 2011 00:00:00 +0000Software products released into the field typically have some number of residual bugs that either were not detected or could not have been detected during testing. This may be the result of flaws in the test cases themselves, assumptions made during the creation of test cases, or the infeasibility of testing the sheer number of possible configurations for a complex system. Testing approaches such as perpetual testing or continuous testing seek to continue to test these applications even after deployment, in hopes of finding any remaining flaws. In this paper, we present our initial work towards a testing methodology we call "in vivo testing", in which unit tests are continuously executed inside a running application in the deployment environment. In this novel approach, unit tests execute within the current state of the program (rather than by creating a clean slate) without affecting or altering that state. Our approach has been shown to reveal defects both in the applications of interest and in the unit tests themselves. It can also be used for detecting concurrency or robustness issues that may not have appeared in a testing lab. Here we describe the approach, the testing framework we have developed for Java applications, classes of bugs our approach can discover, and the results of experiments to measure the added overhead.Computer sciencegek1, mwc2110Computer ScienceTechnical reportsA Framework for Quality Assurance of Machine Learning Applicationshttp://academiccommons.columbia.edu/catalog/ac:110590
Murphy, Christian; Kaiser, Gail E.; Arias, Martahttp://hdl.handle.net/10022/AC:P:29471Wed, 27 Apr 2011 00:00:00 +0000Some machine learning applications are intended to learn properties of data sets where the correct answers are not already known to human users. It is challenging to test and debug such ML software, because there is no reliable test oracle. We describe a framework and collection of tools aimed to assist with this problem. We present our findings from using the testing framework with three implementations of an ML ranking algorithm (all of which had bugs).Computer sciencegek1Computer Science, Center for Computational Learning SystemsTechnical reportsThe In Vivo Approach to Testing Software Applicationshttp://academiccommons.columbia.edu/catalog/ac:110832
Murphy, Christian; Kaiser, Gail E.; Chu, Matthttp://hdl.handle.net/10022/AC:P:29546Wed, 27 Apr 2011 00:00:00 +0000Software products released into the field typically have some number of residual bugs that either were not detected or could not have been detected during testing. This may be the result of flaws in the test cases themselves, assumptions made during the creation of test cases, or the infeasibility of testing the sheer number of possible configurations for a complex system. Testing approaches such as perpetual testing or continuous testing seek to continue to test these applications even after deployment, in hopes of finding any remaining flaws. In this paper, we present our initial work towards a testing methodology we call in vivo testing, in which unit tests are continuously executed inside a running application in the deployment environment. These tests execute within the current state of the program (rather than by creating a clean slate) without affecting or altering that state. Our approach can reveal defects both in the applications of interest and in the unit tests themselves. It can also be used for detecting concurrency or robustness issues that may not have appeared in a testing lab. Here we describe the approach and the testing framework called Invite that we have developed for Java applications. We also enumerate the classes of bugs our approach can discover, and provide the results of a case study on a publicly-available application, as well as the results of experiments to measure the added overhead.Computer sciencegek1, mwc2110Computer ScienceTechnical reportsA Distance Learning Approach to Teaching eXtreme Programminghttp://academiccommons.columbia.edu/catalog/ac:110823
Murphy, Christian; Phung, Dan; Kaiser, Gail E.http://hdl.handle.net/10022/AC:P:29543Wed, 27 Apr 2011 00:00:00 +0000As university-level distance learning programs become more and more popular, and software engineering courses incorporate eXtreme Programming (XP) into their curricula, certain challenges arise when teaching XP to students who are not physically co-located. In this paper, we present the results of a three-year study of such an online software engineering course targeted to graduate students, and describe some of the specific challenges faced, such as students' aversion to aspects of XP and difficulties in scheduling. We discuss our findings in terms of the course's educational objectives, and present suggestions to other educators who may face similar situations.Computer sciencegek1Computer ScienceTechnical reportsQuality Assurance of Software Applications Using the In Vivo Testing Approachhttp://academiccommons.columbia.edu/catalog/ac:111009
Murphy, Christian; Kaiser, Gail E.; Vo, Ian; Chu, Matthttp://hdl.handle.net/10022/AC:P:29600Tue, 26 Apr 2011 00:00:00 +0000Software products released into the field typically have some number of residual defects that either were not detected or could not have been detected during testing. This may be the result of flaws in the test cases themselves, incorrect assumptions made during the creation of test cases, or the infeasibility of testing the sheer number of possible configurations for a complex system; these defects may also be due to application states that were not considered during lab testing, or corrupted states that could arise due to a security violation. One approach to this problem is to continue to test these applications even after deployment, in hopes of finding any remaining flaws. In this paper, we present a testing methodology we call in vivo testing, in which tests are continuously executed in the deployment environment. We also describe a type of test we call in vivo tests that are specifically designed for use with such an approach: these tests execute within the current state of the program (rather than by creating a clean slate) without affecting or altering that state from the perspective of the end-user. We discuss the approach and the prototype testing framework for Java applications called Invite. We also provide the results of case studies that demonstrate Invite's effectiveness and efficiency.Computer sciencegek1, mwc2110Computer ScienceTechnical reportsImproving the Dependability of Machine Learning Applicationshttp://academiccommons.columbia.edu/catalog/ac:111024
Murphy, Christian; Kaiser, Gail E.http://hdl.handle.net/10022/AC:P:29605Tue, 26 Apr 2011 00:00:00 +0000As machine learning (ML) applications become prevalent in various aspects of everyday life, their dependability takes on increasing importance. It is challenging to test such applications, however, because they are intended to learn properties of data sets where the correct answers are not already known. Our work is not concerned with testing how well an ML algorithm learns, but rather seeks to ensure that an application using the algorithm implements the specification correctly and fulfills the users' expectations. These are critical to ensuring the application's dependability. This paper presents three approaches to testing these types of applications. In the first, we create a set of limited test cases for which it is, in fact, possible to predict what the correct output should be. In the second approach, we use random testing to generate large data sets according to parameterization based on the application's equivalence classes. Our third approach is based on metamorphic testing, in which properties of the application are exploited to define transformation functions on the input, such that the new output can easily be predicted based on the original output. Here we discuss these approaches, and our findings from testing the dependability of three real-world ML applications.Computer sciencegek1Computer ScienceTechnical reportsUsing JML Runtime Assertion Checking to Automate Metamorphic Testing in Applications without Test Oracleshttp://academiccommons.columbia.edu/catalog/ac:111012
Murphy, Christian; Shen, Kuang; Kaiser, Gail E.http://hdl.handle.net/10022/AC:P:29601Tue, 26 Apr 2011 00:00:00 +0000It is challenging to test applications and functions for which the correct output for arbitrary input cannot be known in advance, e.g. some computational science or machine learning applications. In the absence of a test oracle, one approach to testing these applications is to use metamorphic testing: existing test case input is modified to produce new test cases in such a manner that, when given the new input, the application should produce an output that can be easily be computed based on the original output. That is, if input x produces output f(x), then we create input x' such that we can predict f(x') based on f(x); if the application or function does not produce the expected output, then a defect must exist, and either f(x) or f(x') (or both) is wrong. By using metamorphic testing, we are able to provide built-in 'pseudo-oracles' for these so-called 'nontestable programs' that have no test oracles. In this paper, we describe an approach in which a function's metamorphic properties are specified using an extension to the Java Modeling Language (JML), a behavioral interface specification language that is used to support the 'design by contract' paradigm in Java applications. Our implementation, called Corduroy, pre-processes these specifications and generates test code that can be executed using JML runtime assertion checking, for ensuring that the specifications hold during program execution. In addition to presenting our approach and implementation, we also describe our findings from case studies in which we apply our technique to applications without test oracles.Computer scienceks2555, gek1Computer ScienceTechnical reportsUsing Metamorphic Testing at Runtime to Detect Defects in Applications without Test Oracleshttp://academiccommons.columbia.edu/catalog/ac:111047
Murphy, Christianhttp://hdl.handle.net/10022/AC:P:29612Tue, 26 Apr 2011 00:00:00 +0000First, we will present an approach called Automated Metamorphic System Testing. This will involve automating system-level metamorphic testing by treating the application as a black box and checking that the metamorphic properties of the entire application hold after execution. This will allow for metamorphic testing to be conducted in the production environment without affecting the user, and will not require the tester to have access to the source code. The tests do not require an oracle upon their creation; rather, the metamorphic properties act as built-in test oracles. We will also introduce an implementation framework called Amsterdam. Second, we will present a new type of testing called Metamorphic Runtime Checking. This involves the execution of metamorphic tests from within the application, i.e., the application launches its own tests, within its current context. The tests execute within the application's current state, and in particular check a function's metamorphic properties. We will also present a system called Columbus that supports the execution of the Metamorphic Runtime Checking from within the context of the running application. Like Amsterdam, it will conduct the tests with acceptable performance overhead, and will ensure that the execution of the tests does not affect the state of the original application process from the users' perspective; however, the implementation of Columbus will be more challenging in that it will require more sophisticated mechanisms for conducting the tests without pre-empting the rest of the application, and for comparing the results which may conceivably be in different processes or environments. Third, we will describe a set of metamorphic testing guidelines that can be followed to assist in the formulation and specification of metamorphic properties that can be used with the above approaches. These will categorize the different types of properties exhibited by many applications in the domain of machine learning and data mining in particular (as a result of the types of applications we will investigate), but we will demonstrate that they are also generalizable to other domains as well. This set of guidelines will also correlate to the different types of defects that we expect the approaches will be able to find.Computer scienceComputer ScienceTechnical reportsImproving the Quality of Computational Science Software by Using Metamorphic Relations to Test Machine Learning Applicationshttp://academiccommons.columbia.edu/catalog/ac:111064
Xie, Xiaoyuan; Ho, Joshua; Murphy, Christian; Kaiser, Gail E.; Xu, Baowen; Chen, T. Y.http://hdl.handle.net/10022/AC:P:29618Tue, 26 Apr 2011 00:00:00 +0000Many applications in the field of scientific computing - such as computational biology, computational linguistics, and others - depend on Machine Learning algorithms to provide important core functionality to support solutions in the particular problem domains. However, it is difficult to test such applications because often there is no 'test oracle' to indicate what the correct output should be for arbitrary input. To help address the quality of scientific computing software, in this paper we present a technique for testing the implementations of machine learning classification algorithms on which such scientific computing software depends. Our technique is based on an approach called 'metamorphic testing', which has been shown to be effective in such cases. In addition to presenting our technique, we describe a case study we performed on a real-world machine learning application framework, and discuss how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also discuss how our findings can be of use to other areas of computational science and engineering.Computer sciencegek1Computer ScienceTechnical reportsDeux: Autonomic Testing System for Operating System Upgradeshttp://academiccommons.columbia.edu/catalog/ac:110986
Wu, Leon L.; Kaiser, Gail E.; Nieh, Jason; Murphy, Christianhttp://hdl.handle.net/10022/AC:P:29593Tue, 26 Apr 2011 00:00:00 +0000Operating system upgrades and patches sometimes break applications that worked fine on the older version. We present an autonomic approach to testing of OS updates while minimizing downtime, usable without local regression suites or IT expertise. Deux utilizes a dual-layer virtual machine architecture, with lightweight application process checkpoint and resume across OS versions, enabling simultaneous execution of the same applications on both OS versions in different VMs. Inputs provided by ordinary users to the production old version are also fed to the new version. The old OS acts as a pseudo-oracle for the update, and application state is automatically re-cloned to continue testing after any output discrepancies (intercepted at system call level) - all transparently to users. If all differences are deemed inconsequential, then the VM roles are switched with the application state already in place. Our empirical evaluation with both LAMP and standalone applications demonstrates Deux's efficiency and effectiveness.Computer sciencellw2107, gek1, jn234Computer Science, Center for Computational Learning SystemsTechnical reportsRetina: Helping Students and Instructors Based on Observed Programming Activitieshttp://academiccommons.columbia.edu/catalog/ac:110989
Murphy, Christian; Kaiser, Gail E.; Loveland, Kristin; Hasan, Saharhttp://hdl.handle.net/10022/AC:P:29595Tue, 26 Apr 2011 00:00:00 +0000t is difficult for instructors of CS1 and CS2 courses to get accurate answers to such critical questions as 'how long are students spending on programming assignments?', or 'what sorts of errors are they making?' At the same time, students often have no idea of where they stand with respect to the rest of the class in terms of time spent on an assignment or the number or types of errors that they encounter. In this paper, we present a tool called Retina, which collects information about students' programming activities, and then provides useful and informative reports to both students and instructors based on the aggregation of that data. Retina can also make real-time recommendations to students, in order to help them quickly address some of the errors they make. In addition to describing Retina and its features, we also present some of our initial findings during two trials of the tool in a real classroom setting.Computer sciencegek1, kl2289, sh2503Computer ScienceTechnical reportsgenSpace: Exploring Social Networking Metaphors for Knowledge Sharing and Scientific Collaborative Workhttp://academiccommons.columbia.edu/catalog/ac:110969
Murphy, Christian; Sheth, Swapneel Kalpesh; Kaiser, Gail E.; Wilcox, Laurenhttp://hdl.handle.net/10022/AC:P:29589Tue, 26 Apr 2011 00:00:00 +0000Many collaborative applications, especially in scientific research, focus only on the sharing of tools or the sharing of data. We seek to introduce an approach to scientific collaboration that is based on knowledge sharing. We do this by automatically building organizational memory and enabling knowledge sharing by observing what users do with a particular tool or set of tools in the domain, through the addition of activity and usage monitoring facilities to standalone applications. Once this knowledge has been gathered, we apply social networking models to provide collaborative features to users, such as suggestions on tools to use, and automatically-generated sequences of actions based on past usage amongst the members of a social network or the entire community. In this work, we investigate social networking models as an approach to scientific knowledge sharing, and present an implementation called genSpace, which is built as an extension to the geWorkbench platform for computational biologists. Last, we discuss the approach from the viewpoint of social software engineering.Computer sciencesks2142, gek1Computer ScienceTechnical reportsConfiguration Fuzzing for Software Vulnerability Detectionhttp://academiccommons.columbia.edu/catalog/ac:127710
Dai, Huning; Murphy, Christian; Kaiser, Gail E.http://hdl.handle.net/10022/AC:P:9314Fri, 16 Jul 2010 00:00:00 +0000Many software security vulnerabilities only reveal themselves under certain conditions, i.e., particular configurations of the software together with its particular runtime environment. One approach to detecting these vulnerabilities is fuzz testing, which feeds a range of randomly modified inputs to a software application while monitoring it for failures. However, fuzz testing makes no guarantees regarding the syntactic and semantic validity of the input, or of how much of the input space will be explored. To address these problems, in this paper we present a new testing methodology called configuration fuzzing. Configuration fuzzing is a technique whereby the configuration of the running application is randomly modified at certain execution points, in order to check for vulnerabilities that only arise in certain conditions. As the application runs in the deployment environment, this testing technique continuously fuzzes the configuration and checks "security invariants" that, if violated, indicate a vulnerability; however, the fuzzing is performed in a duplicated copy of the original process, so that it does not affect the state of the running application. In addition to discussing the approach and describing a prototype framework for implementation, we also present the results of a case study to demonstrate the approach's efficiency.Computer sciencehd2210, gek1Computer ScienceTechnical reportsMetamorphic Runtime Checking of Non-Testable Programshttp://academiccommons.columbia.edu/catalog/ac:127713
Murphy, Christian; Kaiser, Gail E.http://hdl.handle.net/10022/AC:P:9315Fri, 16 Jul 2010 00:00:00 +0000Challenges arise in assuring the quality of applications that do not have test oracles, i.e., for which it is impossible to know what the correct output should be for arbitrary input. Metamorphic testing has been shown to be a simple yet effective technique in addressing the quality assurance of these "non-testable programs". In metamorphic testing, if test input x produces output f(x), specified "metamorphic properties" are used to create a transformation function t, which can be applied to the input to produce t(x); this transformation then allows the output f(t(x)) to be predicted based on the already-known value of f(x). If the output is not as expected, then a defect must exist. Previously we investigated the effectiveness of testing based on metamorphic properties of the entire application. Here, we improve upon that work by presenting a new technique called Metamorphic Runtime Checking, a testing approach that automatically conducts metamorphic testing of individual functions during the program's execution. We also describe an implementation framework called Columbus, and discuss the results of empirical studies that demonstrate that checking the metamorphic properties of individual functions increases the effectiveness of the approach in detecting defects, with minimal performance impact.Computer sciencegek1Computer ScienceTechnical reportsAutomatic System Testing of Programs without Test Oracleshttp://academiccommons.columbia.edu/catalog/ac:127640
Murphy, Christian; Shen, Kuang; Kaiser, Gail E.http://hdl.handle.net/10022/AC:P:9291Thu, 15 Jul 2010 00:00:00 +0000Metamorphic testing has been shown to be a simple yet effective technique in addressing the quality assurance of applications that do not have test oracles, i.e., for which it is difficult or impossible to know what the correct output should be for arbitrary input. In metamorphic testing, existing test case input is modified to produce new test cases in such a manner that, when given the new input, the application should produce an output that can be easily be computed based on the original output. That is, if input x produces output f (x), then we create input x' such that we can predict f (x') based on f(x); if the application does not produce the expected output, then a defect must exist, and either f (x) or f (x') (or both) is wrong. In practice, however, metamorphic testing can be a manually intensive technique for all but the simplest cases. The transformation of input data can be laborious for large data sets, or practically impossible for input that is not in human-readable format. Similarly, comparing the outputs can be error-prone for large result sets, especially when slight variations in the results are not actually indicative of errors (i.e., are false positives), for instance when there is non-determinism in the application and multiple outputs can be considered correct. In this paper, we present an approach called Automated Metamorphic System Testing. This involves the automation of metamorphic testing at the system level by checking that the metamorphic properties of the entire application hold after its execution. The tester is able to easily set up and conduct metamorphic tests with little manual intervention, and testing can continue in the field with minimal impact on the user. Additionally, we present an approach called Heuristic Metamorphic Testing which seeks to reduce false positives and address some cases of non-determinism. We also describe an implementation framework called Amsterdam, and present the results of empirical studies in which we demonstrate the effectiveness of the technique on real-world programs without test oracles.Computer scienceks2555, gek1Computer ScienceTechnical reportsweHelp: A Reference Architecture for Social Recommender Systemshttp://academiccommons.columbia.edu/catalog/ac:127665
Sheth, Swapneel Kalpesh; Arora, Nipun; Murphy, Christian; Kaiser, Gail E.http://hdl.handle.net/10022/AC:P:9299Thu, 15 Jul 2010 00:00:00 +0000Recommender systems have become increasingly popular. Most of the research on recommender systems has focused on recommendation algorithms. There has been relatively little research, however, in the area of generalized system architectures for recommendation systems. In this paper, we introduce weHelp: a reference architecture for social recommender systems — systems where recommendations are derived automatically from the aggregate of logged activities conducted by the system's users. Our architecture is designed to be application and domain agnostic. We feel that a good reference architecture will make designing a recommendation system easier; in particular, weHelp aims to provide a practical design template to help developers design their own well-modularized systems.Computer sciencesks2142, na2271, gek1Computer ScienceTechnical reportsMetamorphic Runtime Checking of Non-Testable Programshttp://academiccommons.columbia.edu/catalog/ac:127646
Murphy, Christian; Kaiser, Gail E.http://hdl.handle.net/10022/AC:P:9293Thu, 15 Jul 2010 00:00:00 +0000Challenges arise in assuring the quality of applications that do not have test oracles, i.e., for which it is difficult or impossible to know that the correct output should be for arbitrary input. Recently, metamorphic testing has been shown to be a simple yet effective technique in addressing the quality assurance of these so-called "non-testable programs". In metamorphic testing, existing test case input is modified to produce new test cases in such a manner that, when given the new input, the function should produce an output that can easily be computed based on the original output. That is, if input x produces output f(x), then we create input x' such that we can predict f(x') based on f(x); if the application does not produce the expected output, then a defect must exist, and either f(x) or f(x') (or both) is wrong. Previously we have presented an approach called "Automated Metamorphic System Testing", in which metamorphic testing is conducted automatically as the program executes. In the approach, metamorphic properties of the entire application are specified, and then checked after execution is complete. Here, we improve upon that work by presenting a technique in which the metamorphic properties of individual functions are used, allowing for the specification of more complex properties and enabling finer-grained runtime checking. Our goal is to demonstrate that such an approach will be more effective than one based on specifying metamorphic properties at the system level, and is also feasible for use in the deployment environment. This technique, called Metamorphic Runtime Checking, is a system testing approach in which the metamorphic properties of individual functions are automatically checked during the program's execution. The tester is able to easily specify the functions' properties so that metamorphic testing can be conducted in a running application, allowing the tests to execute using real input data and in the context of real system states, without affecting those states. We also describe an implementation framework called Columbus, and present the results of empirical studies that demonstrate that checking the metamorphic properties of individual functions increases the effectiveness of the approach in detecting defects, with minimal performance impact.Computer sciencegek1Computer ScienceTechnical reportsDeux: Autonomic Testing System for Operating System Upgradeshttp://academiccommons.columbia.edu/catalog/ac:110983
Wu, Leon L.; Kaiser, Gail E.; Nieh, Jason; Murphy, Christianhttp://hdl.handle.net/10022/AC:P:29593Wed, 24 Jun 2009 00:00:00 +0000Operating system upgrades and patches sometimes break applications that worked fine on the older version. We present an autonomic approach to testing of OS updates while minimizing downtime, usable without local regression suites or IT expertise. Deux utilizes a dual-layer virtual machine architecture, with lightweight application process checkpoint and resume across OS versions, enabling simultaneous execution of the same applications on both OS versions in different VMs. Inputs provided by ordinary users to the production old version are also fed to the new version. The old OS acts as a pseudo-oracle for the update, and application state is automatically re-cloned to continue testing after any output discrepancies (intercepted at system call level) - all transparently to users. If all differences are deemed inconsequential, then the VM roles are switched with the application state already in place. Our empirical evaluation with both LAMP and standalone applications demonstrates Deuxäó»s efficiency and effectiveness.