Accepted Industry Track Papers

Abstract: Achieving true service agility requires development teams to be able to continuously integrate and deliver software every few weeks. This requires capabilities for test automation, modelling and prediction of software reliability. In this paper, we report on our recent experiences applying a simple and novel curve shifting technique to software defect prediction for a continuous integration, continuous delivery project. The technique transforms the defect arrival curve from a given previous release using the user story (or feature)1 development plan so as to predict defect arrival for the required release. We also discuss the different views on software defects from a quality vis-a-vis project management perspective and how the proposed technique applies to either.

Abstract: In this paper, we outline a general automated testing approach to be applied for verification and validation of automated and autonomous driving functions. The approach makes use of ontologies of environment the system under test is interacting with. Ontologies are automatically converted into input models for combinatorial testing, which are used to generate test cases. The obtained abstract test cases are used to generate concrete test scenarios that provide the basis for simulation used to verify the functionality of the system under test. We discuss the general approach including its potential for automation in the automotive domain where there is growing need for sophisticated verification based on simulation in case of automated and autonomous vehicles.

3. Kumi Jinzenji and Daisuke Hamuro. A concept of QCD prediction and control in agile software development for portfilio management

Abstract: Agile software development is gradually increasing recently, while metrics for productivity of existing methodology such as plan-driven development are not always suitable for agile development. In addition, existing metrics such as velocity are effective for developers by measuring micro status, but they occasionally do not indicate macro status such as QCD (quality, cost, and delivery) for management such as PMO (project management office). We propose a new QCD concept and prediction scheme for agile software development featuring “value” delivery that comes from LEAN. Experimental result shows measured and predicted project quality and development period can be utilizing personnel and budget assignment.

Abstract: Recently, the use of embedded devices such as WiFi APs, IP CAM, and drones in Internet of Things (IoT) applications has become more widespread. These embedded devices are connected to networks and are often used for critical services. Thus, they receive significant attention from hackers who attempt to find a major intrusion vector in IoT applications. Hackers focus on identifying hidden backdoors in embedded devices to gain full remote access; if they gain access, they can cause significant damage to critical infrastructures. Therefore, to improve embedded device security, this study introduces Universal Firmware vulnerability Observer (UFO); UFO is a firmware vulnerability discovery system, which can automatically perform tasks such as reversing firmware embedded filesystem, identifying vulnerability, and exploring password leaks to meet the IoT firmware security verification standards, including OWASP, UL-2900, and ICSA Labs. In addition, we design a Shell Script Dependency algorithm to help identify hidden backdoor problems by discovering suspicious shell script execution paths in the extracted firmware filesystem. We use 237 real-world embedded device firmware files to evaluate UFO. The results indicate that the effectiveness of reversing firmware binary is 96%, which is significantly higher than that of open source tools. Besides, we also conclude that 73% of firmware files contain Common Vulnerabilities and Exposures in their embedded Linux kernel, 22% of firmware files can leak login passwords, and 6% of firmware files contain hidden backdoors. Moreover, we reported hidden backdoor problems to two IoT device vendors in Taiwan and received their confirmation. UFO can be successfully used for verifying firmware security and discovering hidden backdoor threats in commercial IoT devices.

Abstract: In software development, there is a great demand for online information and resources. The traditional way for the developers to access online resources is through formulating keywords and searching in the web browser. The search results are limited by the keywords and the web browser also ignores the developers' working and search context. Tools that integrate information retrieval into the IDE are available, but they fail to perceive the developers' dynamic working context and use in the process of online search. In this paper, we present a context-aware program assistant called amAssist. amAssist monitors the developers' development events and models their working context dynamically, then integrates them with the entire online search process~(e.g. keywords formulation, customized searching, search results annotation, etc.). We integrate amAssist into the Eclipse IDE. Our preliminary user study showed that by using our program assistant, developers can formulate keywords more accurately and acquire online information and resources more rapidly. Demo video: https://youtu.be/X4Tkjhc6wfU

Abstract: Software-Defined Networking (SDN) is a fundamental paradigm shift in communication networks, separating the network control and data planes. This separation enables the dynamic reconfiguration of the data plane at run-time through control plane software, bringing virtualization from the cloud to networks. The logically centralized control plane - the network "brain" - is typically realized in a distributed fashion to avoid a single point of failure and provide redundancy of key control plane functions vis-a-vis the data plane. As SDN begins to be adopted as the underlying paradigm and platform for carrier-grade networks through the advent of open-source SDN controllers, a deep understanding of the reliability of SDNs is essential to satisfying carrier-grade requirements and fulfilling service-level agreements. To this end, we present a model of SDN reliability under control and data plane failures, that encompasses the distributed nature of the SDN control plane.

Abstract: Central Processing Units (CPUs) that satisfy the throughput demands of highly automated driving trade reliability off for performance. Such CPUs often do not include extensive hardware-implemented reliability measures e. g., lockstep. At the
same time, POSIX-compliant (including Linux like) operating systems (OSses) become increasingly popular for such complex automotive system, specified e.g., by the AUTOSAR Adaptive Platform. In such systems, the fault analysis of critical software components such as the OS becomes an important dependability asset.
We determine the robustness of OS by injecting random CPU faults and measure the extent to which these faults propagate through the OS in order to manifest as application level side effects. In this paper, we present our QEMU-based fault injection framework that simulates bit flips in x86 registers during the execution of the system calls of Linux 4.10 and classifies their effects at the application level. Our results show that for the clone, futex, mmap, mprotect, and pipe syscall in average 76.3% of the 4.48 million injected faults are benign. Our experiments also show that the program counter and stack pointer (in case of memory operations) are the most susceptible registers.
Our measurements help to guide the appropriate deployment of software-implemented hardware fault-tolerance (SIHFT) measures. Re-evaluation of the implemented SIHFT measures can be potentially used as an argument for safety.

8. Nicolas Brousse and Oleksii Mykhailov. Use of Self-Healing Techniques to Improve the Reliability of a Dynamic and Geo-Distributed Ad Delivery Service

Abstract: The advertising industry faces numerous challenges in achieving its goal of targeting a given audience dynamically and accurately in order to deliver a meaningful brand message. Near real-time, low latency delivery of dynamic content, the sheer volume of information processed, and the sparse geographic distribution of the intended eyeball traffic all drive the complexity of building a successful experience for the end user and the brand. Additionally, the competitiveness of the industry makes it critical to preserve low operational expenses while delivering reliably at scale. In attempting to address the above, we have found that a distributed infrastructure that leverages public cloud providers and a private cloud with open infrastructure technologies can deliver dynamic advertising content with low latency while preserving its high availability. But network or physical utility infrastructures can’t be relied on to ensure the service dependability. We show that the complexity of the networks, the sparse geographic distribution of eyeballs, the risk of data center failures, and the increase of encrypted transactions call for thoughtful architectures. The introduction of modern practices, failure injections, and self-healing mechanisms allowed us to improve the service fault tolerance while optimizing for latency and significantly improve our service reliability.

Abstract: Supercomputers, high performance computers, and clusters are composed of very large numbers of independent
operating systems that are generating their own system logs. Messages are generated locally on each host and usually are transferred to a central logging infrastructure which keeps a master record of the system as a whole. At Los Alamos National Laboratory (LANL) a collection of open source cloud tools are used which log over a hundred million system log messages per day from over a dozen such systems. Understanding what source code created those messages can be extremely useful to system administrators when they are troubleshooting these complex systems as it can give insight into a subsystem (disk, network, etc.) or even line numbers of source code. Oftentimes, debugging supercomputers is done in environments where open access cannot be provided to all individuals due to security
concerns. As such, providing a means for conveying information between system log messages and source code lines allows for communication between system administrators and source developers or supercomputer vendors.

In this work, we demonstrate a prototype tool which aims to provide such an expert system. We leverage capabilities from ElasticSearch, one of the open source cloud tools deployed at LANL, and with our own metrics develop a means for correctly matching source code lines as well as files with high confidence. We discuss confidence metrics and show that in our experiments 92% of syslog lines were correctly matched. For any future samples, we predict with 95% confidence that the correct file will be detected between 88.2% and 95.8% of the time. Finally, we discuss enhancements that are underway to improve the tool and study it on a larger dataset.

Abstract: This paper reports an industrial study that was conducted to evaluate whether human error training procedures and instrumentation created by authors can be used to train industry software practitioners on human errors that occur during requirements engineering process. Industry practitioners were trained (using an on-line audio-visual package) to analyze requirements faults and map them to underlying human errors (i.e., the root causes of faults). Results of the study show that even though our training helped practitioners in gaining knowledge about requirements phase human errors, parts of the training procedures need to be improved. Additionally, practitioners also reported mechanisms to prevent human errors from happening during the requirements engineering process. These mechanisms can help organizations create interventions (like checklists) that can help software developers avoid committing human errors, thereby preventing faults that are caused due to these errors.

Abstract: Android provides a security system with permission control, but there are a number of vulnerabilities that have excessive permission rights and a large number of per-permission related APIs. To address these vulnerabilities, permission control studies have been conducted on APIs that are at risk of compromising user privacy. However, it is impossible to add a new security function to an insecure app, and there is a disadvantage that an overhead occurs in the progress of the app because the user is required to permit permission in real time and the users’ convenience is deteriorated. In this paper, we propose an AppWrapper toolkit. The toolkit can add security functions to the user/administrator’s desired locations (method in the activity) using the appwrapping technique. And, using dynamic policy management, it is easy to apply secure policies without adding security function again. In addition, by providing a real-time app log function that considers the convenience of users, it is possible to confirm the location where the security function is required according to the progress flow of the compromised application, and to create a policy file by setting the policy. Experiments on commercial apps have shown 100% success rate, except for apps with built-in security and Android apps. On the average, it took 1.86 seconds to add the security function through the proposed technique, and the file size increased by about 2.11%, indicating that the security function can be added in a short time with the increase of the minimum file size.

12. Maninder Singh, Gursimran Walia and Anurag Goswami. Using Supervised Learning to Guide the Selection of Software Inspectors in Industry

Abstract: Software development is a multi-phase process that starts with requirement engineering. Requirements are elicited from different stakeholders are documented in natural language (NL) software requirement specification (SRS) document. Due to the inherent ambiguity of NL, SRS is prone to faults (e.g., ambiguity, incorrectness, inconsistency). To find and fix faults early (where they are cheapest to find), companies routinely employ inspections, where skilled inspectors are selected to review the SRS and log faults. While other researchers have attempted to understand the factors (experience and learning styles) that can guide the selection of effective inspectors, but they met with limited success. This study analyzes the reading patterns (RPs) of an inspector recorded by eye-tracking equipment and evaluates their abilities to find various fault-types. The inspectors’ characteristics are selected by employing ML algorithms to find most generalizable RPs w.r.t. each fault-types. Our results show that our approach could guide the inspector selection with an accuracy ranging between 79.3% and 94% for various fault-types.

Abstract: Reviewing is a key technology to software reliability engineering. One of the most important purposes of reviewing is to detect faults. There are various sorts of reviewing techniques proposed such as Defect-based Reading and Orthogonal Defect Classification, which focus on fault. In this research, we focus on "trap" in a developer's cognitive process. A trap is a part of a deliverable or its pattern that induces an engineer to make an error and to embed a fault. Firstly, this paper models the cognitive process called Software Trap model and proposes Trap-based Review (TBR). Secondly, we introduce three traps which were extracted from a commercial software development. Then the steps for TBR are also explained. Finally, we show case studies to evaluate effectiveness of TBR in a commercial software product in a financial domain. The result showed that TBR succeeded in detecting the faults that were not detected through a commercial software development. This paper also mentions how the trap-based approach can be applied to dynamic testing.

Abstract: This paper presents the design, development and evaluation of a software tool to assist the localisation of root causes of test case failures in distributed embedded systems, specifically vehicle systems controlled by a network of electronic control units (ECUs). We use data visualising to provide sensible information from a large number of test execution logs from large-scale software integration testing under a continuous integration process. Our goal is to allow more efficient root-cause identification of failures and foster a continuous feedback loop in the fault localisation process. We evaluate our solution in-situ at the Research and Development division of Volvo Car Corporation (VCC). Our prototype helps the failure debugging procedures by presenting clear and concise data and by allowing stakeholders to filter and control which information is displayed. Moreover, it encourages a systematic and continuous analysis of the current state of testing by aggregating and categorising historical data from test harnesses to identify patterns and trends in test results.

Abstract: Monitoring the results of software reliability growth models (SRGMs) helps evaluate a project's situation. SRGMs are used to measure the reliability of software by analyzing the relations between the number of detected bugs and the detection time to predict the number of remaining bugs within the software. Sometimes the SRGM results lead managers to make incorrect decisions because the results are temporary snapshots that change over time. In our previous study, we proposed a method to help evaluate a project's qualities by monitoring the results of SRGM applications. We collected the number of detected bugs and the detection time in the test phases for cloud services provided by e-Seikatsu to real estate businesses. The datasets contain about 34 cloud service features. Our method provides correct answers for 29 features and incorrect answers for 5 features. In this paper, we classify the monitoring results of unstable features based on the tendencies of the results into four types to aid developers and managers to make appropriate decisions about the development status.

Abstract: Infrastructure as Code (IaC), which specifies system configurations in an imperative or declarative way, automates environment set up, system deployment and configuration. Despite wide adoption, developing and maintaining high-quality IaC artifacts is still challenging. This paper proposes an approach to handling the fine-grained and frequently occurring IaC code errors. The approach extracts code changes from historical commits and clusters them into groups, by constructing a feature model of code changes and employing an unsupervised machine learning algorithm. It identifies error patterns from the clusters and proposes a set of inspection rules to check the potential IaC code errors. In practice, we take Puppet code artifacts as subject objects and perform a comprehensive study on 14 popular Puppet artifacts. In our experiment, we get 41 cross-artifact error patterns, covering 42% crawled code changes. Based on these patterns, 30 rules are proposed, covering 60% identified error patterns, to proactively check IaC artifacts. The approach would be helpful in improving code quality of IaC artifacts.

Abstract: When systems fail, log data is often the most important information source for fault diagnosis. However, the performance of automatic fault diagnosis is limited by the ad-hoc nature of logs. The key problem is that existing developer-written logs are designed for humans rather than machines to automatically detect system anomalies. To improve the quality of logs for fault diagnosis, we propose a novel log enhancement approach which automatically identifies logging points that reflect anomalous behavior during system fault. We evaluate our approach on three popular software systems AcmeAir, HDFS and TensorFlow. Results show that it can significantly improve fault diagnosis accuracy by 50% on average compared to the developers’ manually placed logging points.

Abstract: The standards family IEC-62443 represents an international agreement on best practices for securing Industrial Automation Control Systems (IACS).
Engineering projects have limits on cost and resources, which makes it particularly challenging to cover all security topics adequately, as prescribed by the standards.
We propose a framework of artefacts that supports projects in addressing security completely, and at the same time enables exchange and reuse of inter- and cross-domain security designs and concepts. The framework also resolves ambiguities in the interpretation of the standards for an organisational unit, and is a driver for certification-readiness.
We describe our experiences in applying the framework in large scale industry projects.

Abstract: Since its release in the mid-1990s, the Microsoft Windows-based software reliability modeling tool CASRE has been downloaded over 3000 times from the Open Channel Foundation’s website. It was also included on the CDROM distributed with the Handbook of Software Reliability Engineering (M. Lyu, ed). In the years since it was first released, however, CASRE has become more difficult to use. This is mainly because there have been no updates since 2000. The last version of Windows on which CASRE would reliably execute was Windows XP, and since it was developed explicitly for Windows, it is not feasible to run it on other platforms.
Software development and acquisition organizations continued to be interested in using tools of the same type as CASRE. In 2013, the U.S. Naval Air Systems Command (NAVAIR) contacted the authors at the Jet Propulsion Laboratory and the University of Massachusetts to determine whether a) CASRE could be modified to run in contemporary environments, or b) whether a new tool with the same functionality as CASRE could be developed with modern programming languages and techniques.
After weighing the alternatives, a decision was made to develop a new tool rather than updating CASRE. There were several reasons for which this decision was made. First of all, CASRE had been developed with the programming languages available at the time, in this case C and Fortran. Developing a new tool would allow use of modern, expressive languages and development environments that were particularly well suited to the statistical modeling domain (e.g., R, RStudio). Developing a new tool would also allow us to implement it so that it would run in modern operating environments, specifically Windows, Mac OS, Unix, and Linux. This last design decision would make it possible for users who would not have been able to run CASRE to use the new tool.
We wanted to address the difficulty of adding new models to CASRE, since it was not architected for ease of modification. We wanted to develop a tool for which it would be easy to add new models and model evaluation techniques (e.g., prequential likelihood ratio, Akaike Information Criterion). This would enable organizations already using software reliability modeling to manage their testing to place the models they were using into a common framework, making it easier to evaluate multiple sets of results to gain additional insight into their testing process. Researchers would also gain an advantage by having a common framework in which to work with multiple models, analyze their results, and identify relationships among them. Finally, this would make it practical to distribute as open-source software, to which contributors could add new models and evaluation techniques as they were developed by the research community and validated by practitioners.
We have used these ideas to develop “Software Failure and Reliability Assessment Tool” (SFRAT). It is implemented in R, uses the Shiny user interface package, and will run in any environment in which R and RStudio can run. We intend for it to be an open-source tool with a mechanism for contributors to add new capabilities. Our hope is that distributing SFRAT as open-source software will allow it to retain currency in the software reliability practice and research communities.

Abstract: Expertise on distributed systems is critical for system maintenance and improvement. However, it is challenging to keep the up-to-date knowledge from distributed systems due to the complexity and continuous updates. Hence, computing platform providers study on how to extract knowledge directly from system behavior. In this paper, we propose a methodology called KEREP to automatically extract knowledge on distributed system behavior through request execution path. Technologies are devised to construct component structures, to depict the in-depth dynamic behavior and to identify the heartbeat mechanisms of target distributed systems. Experiments on two real-world distributed systems show the KEREP methodology extracts accurate knowledge of request processing and discovers undocumented features with good execution performance.

Abstract: Security Information and Event Management (SIEM) is the state-of-the-practice in handling heterogeneous data sources for security analysis. This paper presents challenges and directions in SIEM in the context of a real-life mission critical system by a top leading company in the Air Traffic Control domain. The system emits massive volumes of highly-unstructured logs. We present the challenges in addressing such logs, ongoing work with an open source SIEM, and directions in modeling system behavioral baselines for inferring compromise indicators. Our explorative analysis paves the way for data discovery approaches aiming to complement the current SIEM practice.

Abstract: Distributed systems play an increasingly important role in leading-edge networks with high availability requirements, including software-defined networks (SDN), where replicating essential network data is critical to ensure resilience under failures. Strong consistency algorithms based on distributed consensus, such as Raft, are often used to ensure that all components of the distributed system agree on their view of the replicated data, even when a minority of the distributed components crash. Another critical requirement for highly available networks is to gracefully handle overload conditions, where the demands on the network exceed expected levels for a transient period of time. Hence, the strong consistency algorithms used in such networks must also behave gracefully under transient overload.

We show that, in fact, strong consistency algorithms such as Raft exhibit pathological behaviors under overload conditions and can significantly affect SDN network availability. We demonstrate that the open-source ONOS SDN controller, which uses the Java-based Atomix implementation of Raft, exhibits such pathological behavior under intent overload, resulting in the loss of requests to the network, and with the entire SDN network eventually crashing. We further demonstrate similar behaviors of the Python-based pysyncobj implementation of Raft. We then propose a dynamic extension of the Raft algorithm (DynRaft) that continues to ensure the formally proven strong consistency properties of Raft, and demonstrate the effectiveness of DynRaft in handling transient overload conditions on our extension of the pysyncobj implementation.