ICSE 2013 – Proceedings

The International Conference on Software Engineering (ICSE) provides an environment where ideas are created, exchanged, and synthesized as researchers, practitioners, and educators present and discuss the most recent innovations, trends, experiences, and challenges in the field of software engineering. ICSE 2013 brings ICSE back to San Francisco for the first time in 37 years. Leading edge results from ICSE 1976 laid the foundation for work in areas ranging from software correctness and robustness, to language design and analysis, and more. Similarly, leading-edge results at ICSE 2013 will lay the foundation for the future in areas known and emerging as is vital for a world where software will continue to become increasingly intertwined with our environment, our economies, our health, and our political and educational systems.

Composition

Ubiquitous and pervasive computing promotes the creation of an environment where Networked Systems (NSs) eternally provide connectivity and services without requiring explicit awareness of the underlying communications and computing technologies. In this context, achieving interoperability among heterogeneous NSs represents an important issue. In order to mediate the NSs interaction protocol and solve possible mismatches, connectors are often built. However, connector development is a never-ending and error-prone task and prevents the eternality of NSs. For this reason, in the literature, many approaches propose the automatic synthesis of connectors. However, solving the connector synthesis problem in general is hard and, when possible, it results in a monolithic connector hence preventing its evolution. In this paper, we define a method for the automatic synthesis of modular connectors, each of them expressed as the composition of independent mediators. A modular connector, as synthesized by our method, supports connector evolution and performs correct mediation.

In this paper, we propose a reconfiguration protocol that
can handle any number of failures during a reconfiguration,
always producing an architecturally-consistent assembly
of components that can be safely introspected and further
reconfigured. Our protocol is based on the concept
of Incrementally Consistent Sequences (ICS),
ensuring that any reconfiguration incrementally respects
the reconfiguration contract given to component
developers: reconfiguration grammar and architectural
invariants. We also propose two recovery policies,
one rolls back the failed reconfiguration and
the other rolls it forward, both going as far as
possible, failure permitting.
We specified and proved the reconfiguration contract,
the protocol, and recovery policies in Coq.

Refactoring is a disciplined technique for restructuring code to improve its readability and maintainability. Almost all modern integrated development environments (IDEs) offer built-in support for automated refactoring tools. However, the user interface for refactoring tools has remained largely unchanged from the menu and dialog approach introduced in the Smalltalk Refactoring Browser, the first automated refactoring tool, more than a decade ago. As the number of supported refactorings and their options increase, invoking and configuring these tools through the traditional methods have become increasingly unintuitive and inefficient. The contribution of this paper is a novel approach that eliminates the use of menus and dialogs altogether. We streamline the invocation and configuration process through direct manipulation of program elements via drag-and-drop. We implemented and evaluated this approach in our tool, Drag-and-Drop Refactoring (DNDRefactoring), which supports up to 12 of 23 refactorings in the Eclipse IDE. Empirical evaluation through surveys and controlled user studies demonstrates that our approach is intuitive, more efficient, and less error-prone compared to traditional methods available in IDEs today. Our results bolster the need for researchers and tool developers to rethink the design of future refactoring tools.

Modern software systems are often characterized by uncertainty and changes in the environment in which they are embedded. Hence, they must be designed as adaptive systems. We propose a framework that supports adaptation to non-functional manifestations of uncertainty. Our framework allows engineers to derive, from an initial model of the system, a finite state automaton augmented with probabilities. The system is then executed by an interpreter that navigates the automaton and invokes the component implementations associated to the states it traverses. The interpreter adapts the execution by choosing among alternative possible paths of the automaton in order to maximize the system's ability to meet its non-functional requirements. To demonstrate the adaptation capabilities of the proposed approach we implemented an adaptive application inspired by an existing worldwide distributed mobile application and we discussed several adaptation scenarios.

A system's early architectural decisions impact its properties (e.g., scalability, dependability) as well as stakeholder concerns (e.g., cost, time to delivery). Choices made early on are both difficult and costly to change, and thus it is paramount that the engineer gets them "right". This leads to a paradox, as in early design, the engineer is often forced to make these decisions under uncertainty, i.e., not knowing the precise impact of those decisions on the various concerns. How could the engineer make the "right" choices in such circumstances? This is precisely the question we have tackled in this paper. We present GuideArch, a framework aimed at quantitative exploration of the architectural solution space under uncertainty. It provides techniques founded on fuzzy math that help the engineer with making informed decisions.

The emergence of socio-technical systems characterized by significant user collaboration poses a new challenge for system adaptation. People are no longer just the ``users'' of a system but an integral part. Traditional self-adaptation mechanisms, however, consider only the software system and remain unaware of the ramifications arising from collaboration interdependencies. By neglecting collective user behavior, an adaptation mechanism is unfit to appropriately adapt to evolution of user activities, consider side-effects on collaborations during the adaptation process, or anticipate negative consequence upon reconfiguration completion. Inspired by existing architecture-centric system adaptation approaches, we propose linking the runtime software architecture to the human collaboration topology. We introduce a mapping mechanism and corresponding framework that enables a system adaptation manager to reason upon the effect of software-level changes on human interactions and vice versa. We outline the integration of the human architecture in the adaptation process and demonstrate the benefit of our approach in a case study.

Environment domain models are a key part of the information used by adaptive systems to determine their behaviour. These models can be incomplete or inaccurate. In addition, since adaptive systems generally operate in environments which are subject to change, these models are often also out of date. To update and correct these models, the system should observe how the environment responds to its actions, and compare these responses to those predicted by the model. In this paper, we use a probabilistic rule learning approach, NoMPRoL, to update models using feedback from the running system in the form of execution traces. NoMPRoL is a technique for non-monotonic probabilistic rule learning based on a transformation of an inductive logic programming task into an equivalent abductive one. In essence, it exploits consistent observations by finding general rules which explain observations in terms of the conditions under which they occur. The updated models are then used to generate new behaviour with a greater chance of success in the actual environment encountered.

Apps

Touchscreen-based devices such as smartphones and tablets are gaining popularity but their rich input capabilities pose new development and testing complications. To alleviate this problem, we present an approach and tool named RERAN that permits record-and-replay for the Android smartphone platform. Existing GUI-level record-and-replay approaches are inadequate due to the expressiveness of the smartphone domain, in which applications support sophisticated GUI gestures, depend on inputs from a variety of sensors on the device, and have precise timing requirements among the various input events. We address these challenges by directly capturing the low-level event stream on the phone, which includes both GUI events and sensor events, and replaying it with microsecond accuracy. Moreover, RERAN does not require access to app source code, perform any app rewriting, or perform any modifications to the virtual machine or Android platform. We demonstrate RERAN’s applicability in a variety of scenarios, including (a) replaying 86 out of the Top-100 Android apps on Google Play; (b) reproducing bugs in popular apps, e.g., Firefox, Facebook, Quickoffice; and (c) fast-forwarding executions. We believe that our versatile approach can help both Android developers and researchers.

Software developers often need to port applications written for a source platform to a target platform. In doing so, a key task is to replace an application's use of methods from the source platform API with corresponding methods from the target platform API. However, this task is challenging because developers must manually identify mappings between methods in the source and target APIs, e.g., using API documentation.
We develop a novel approach to the problem of inferring mappings between the APIs of a source and target platform. Our approach is tailored to the case where the source and target platform each have independently-developed applications that implement similar functionality. We observe that in building these applications, developers exercised knowledge of the corresponding APIs. We develop a technique to systematically harvest this knowledge and infer likely mappings between the APIs of the source and target platform. The output of our approach is a ranked list of target API methods or method sequences that likely map to each source API method or method sequence. We have implemented this approach in a prototype tool called Rosetta, and have applied it to infer likely mappings between the Java2 Mobile Edition (JavaME) and Android graphics APIs.

Optimizing the energy efficiency of mobile applications can greatly increase user satisfaction. However, developers lack viable techniques for estimating the energy consumption of their applications. This paper proposes a new approach that is both lightweight in terms of its developer requirements and provides fine-grained estimates of energy consumption at the code level. It achieves this using a novel combination of program analysis and per-instruction energy modeling. In evaluation, our approach is able to estimate energy consumption to within 10% of the ground truth for a set of mobile applications from the Google Play store. Additionally, it provides useful and meaningful feedback to developers that helps them to understand application energy consumption behavior.

Testing

In many critical systems domains, test suite adequacy is currently measured using structural coverage metrics over the source code. Of particular interest is the modified condition/decision coverage (MC/DC) criterion required for, e.g., critical avionics systems. In previous investigations we have found that the efficacy of such test suites is highly dependent on the structure of the program under test and the choice of variables monitored by the oracle. MC/DC adequate tests would frequently exercise faulty code, but the effects of the faults would not propagate to the monitored oracle variables.
In this report, we combine the MC/DC coverage metric with a notion of observability that helps ensure that the result of a fault encountered when covering a structural obligation propagates to a monitored variable; we term this new coverage criterion Observable MC/DC (OMC/DC). We hypothesize this path requirement will make structural coverage metrics 1.) more effective at revealing faults, 2.) more robust to changes in program structure, and 3.) more robust to the choice of variables monitored. We assess the efficacy and sensitivity to program structure of OMC/DC as compared to masking MC/DC using four subsystems from the civil avionics domain and the control logic of a microwave. We have found that test suites satisfying OMC/DC are significantly more effective than test suites satisfying MC/DC, revealing up to 88% more faults, and are less sensitive to program structure and the choice of monitored variables.

Many software development projects struggle with creating and communicating a testing culture that is appropriate for the project's needs. This may degrade software quality by leaving defects undiscovered. Previous research suggests that social coding sites such as GitHub provide a collaborative environment with a high degree of social transparency. This makes developers' actions and interactions more visible and traceable.
We conducted interviews with 33 active users of GitHub to investigate how the increased transparency found on GitHub influences developers' testing behaviors. Subsequently, we validated our findings with an online questionnaire that was answered by 569 members of GitHub. We found several strategies that software developers and managers can use to positively influence the testing behavior in their projects. However, project owners on GitHub may not be aware of them. We report on the challenges and risks caused by this and suggest guidelines for promoting a sustainable testing culture in software development projects.

We report experiences with constraint-based whitebox fuzz testing in production across hundreds of large Windows applications and over 500 machine years of computation from 2007 to 2013. Whitebox fuzzing leverages symbolic execution on binary traces and constraint solving to construct new inputs to a program. These inputs execute previously uncovered paths or trigger security vulnerabilities. Whitebox fuzzing has found one-third of all file fuzzing bugs during the development of Windows 7, saving millions of dollars in potential security vulnerabilities. The technique is in use today across multiple products at Microsoft. We describe key challenges with running whitebox fuzzing in production. We give principles for addressing these challenges and describe two new systems built from these principles: SAGAN, which collects data from every fuzzing run for further analysis, and JobCenter, which controls deployment of our whitebox fuzzing infrastructure across commodity virtual machines. Since June 2010, SAGAN has logged over 3.4 billion constraints solved, millions of symbolic executions, and tens of millions of test cases generated. Our work represents the largest scale deployment of whitebox fuzzing to date, including the largest usage ever for a Satisfiability Modulo Theories (SMT) solver. We present specific data analyses that improved our production use of whitebox fuzzing. Finally we report data on the performance of constraint solving and dynamic test generation that points toward future research problems.

In industry, software testing and coverage-based metrics are the predominant
techniques to check correctness of software. This paper addresses
automatic unit test generation for programs written in C/C++.
The main idea is to improve the coverage obtained by feedback-directed random
test generation methods, by utilizing concolic execution on the generated test drivers.
Furthermore, for programs with numeric computations, we
employ non-linear solvers in a lazy manner to generate new test inputs.
These techniques significantly improve the coverage provided by a feedback-directed
random unit testing framework, while retaining the benefits of full automation.
We have implemented these techniques in a prototype platform, and describe promising experimental results on a number of C/C++ open source benchmarks.

This work presents a method to combine testing techniques adaptively during the testing process. It intends to mitigate the sources of uncertainty of software testing processes, by learning from past experience and, at the same time, adapting the technique selection to the current testing session.
The method is based on machine learning strategies. It uses offline strategies to take historical information into account about the techniques performance collected in past testing sessions; then, online strategies are used to adapt the selection of test cases to the data observed as the testing proceeds. Experimental results show that techniques performance can be accurately characterized from features of the past testing sessions, by means of machine learning algorithms, and that integrating this result into the online algorithm allows improving the fault detection effectiveness with respect to single testing techniques, as well as to their random combination.

As software systems evolve, new interface features such as keyboard shortcuts and toolbars are introduced. While it is common to regression test the new features for functional correctness, there has been less focus on systematic regression testing for usability, due to the effort and time involved in human studies. Cognitive modeling tools such as CogTool provide some help by computing predictions of user performance, but they still require manual effort to describe the user interface and tasks, limiting regression testing efforts. In recent work, we developed CogTool Helper to reduce the effort required to generate human performance models of existing systems. We build on this work by providing task specific test case generation and present our vision for human performance regression testing (HPRT) that generates large numbers of test cases and evaluates a range of human performance predictions for the same task. We examine the feasibility of HPRT on four tasks in LibreOffice, find several regressions, and then discuss how a project team could use this information. We also illustrate that we can increase efficiency with sampling by leveraging an inference algorithm. Samples that take approximately 50% of the runtime lose at most 10% of the performance predictions.

We focus on functional testing of enterprise applications with the
goal of exercising an application's interesting behaviors by driving
it from its user interface. The difficulty in doing this is focusing
on the interesting behaviors among an unbounded number of behaviors.
We present a new technique for automatically generating tests that
drive a web-based application along interesting behaviors, where the
interesting behavior is specified in the form of "business rules."
Business rules are a general mechanism for describing business logic,
access control, or even navigational properties of an application's
GUI. Our technique is black box, in that it does not analyze the
application's server-side implementation, but relies on directed
crawling via the application's GUI. To handle the unbounded number of
GUI states, the technique includes two phases. Phase 1 creates an
abstract state-transition diagram using a relaxed notion of
equivalence of GUI states without considering rules. Next, Phase 2
identifies rule-relevant abstract paths and refines those paths using
a stricter notion of state equivalence. Our technique can be much
more effective at covering business rules than an undirected
technique, developed as an enhancement of an existing test-generation
technique. Our experiments showed that the former was able to cover
92% of the rules, compared to 52% of the rules covered by the
latter.

Test-Case Selection

We introduce a family of coverage criteria, called Multi-Point Stride
Coverage (MPSC). MPSC generalizes branch coverage to coverage of tuples
of branches taken from the execution sequence of a program.
We investigate its potential as a replacement for dataflow coverage,
such as def-use coverage. We find that programs can be instrumented
for MPSC easily, that the instrumentation usually incurs less overhead than
that for def-use coverage, and that MPSC is comparable in usefulness
to def-use in predicting test suite effectiveness. We also find that
the space required to collect MPSC can be predicted from the number
of branches in the program.

Combinatorial Test Design (CTD) is an effective test planning
technique that reveals faults resulting from feature interactions in a
system. The standard application of CTD requires manual modeling of
the test space, including a precise definition of restrictions between
the test space parameters, and produces a test suite that corresponds
to new test cases to be implemented from scratch.
In this work, we propose to use Interaction-based Test-Suite
Minimization (ITSM) as a complementary approach to standard CTD. ITSM
reduces a given test suite without impacting its coverage of feature
interactions. ITSM requires much less modeling effort, and does not
require a definition of restrictions. It is appealing where there has
been a significant investment in an existing test suite, where
creating new tests is expensive, and where restrictions are very
complex. We discuss the tradeoffs between standard CTD and ITSM, and
suggest an efficient algorithm for solving the latter. We also discuss
the challenges and additional requirements that arise when applying
ITSM to real-life test suites. We introduce solutions to these
challenges and demonstrate them through two real-life case studies.

In recent years, researchers have intensively investigated
various topics in test-case prioritization, which aims to
re-order test cases to increase the rate of fault detection during
regression testing. The total and additional prioritization strategies,
which prioritize based on total numbers of elements covered
per test, and numbers of additional (not-yet-covered) elements
covered per test, are two widely-adopted generic strategies used
for such prioritization. This paper proposes a basic model and
an extended model that unify the total strategy and the additional
strategy. Our models yield a spectrum of generic strategies
ranging between the total and additional strategies, depending
on a parameter referred to as the p value. We also propose
four heuristics to obtain differentiated p values for different
methods under test. We performed an empirical study on 19
versions of four Java programs to explore our results. Our
results demonstrate that wide ranges of strategies in our basic
and extended models with uniform p values can significantly
outperform both the total and additional strategies. In addition,
our results also demonstrate that using differentiated p values for
both the basic and extended models with method coverage can
even outperform the additional strategy using statement coverage.

Formal Analysis

Abstraction is one of the most important strategies for dealing with the state space explosion problem in model checking. With an abstract model, the state space is largely reduced, however, a counterexample found in such a model that does not satisfy the desired property may not exist in the concrete model. Therefore, how to check whether a reported counterexample is spurious is a key problem in the abstraction-refinement loop. Particularly, there are often thousands of millions of states in systems of industrial scale, how to check spurious counterexamples in these systems practically is a significant problem. In this paper, by re-analyzing spurious counterexamples, a new formal definition of spurious path is given. Based on it, efficient algorithms for detecting spurious counterexamples are presented. By the new algorithms, when dealing with infinite counterexamples, the finite prefix to be analyzed will be polynomially shorter than the one dealt by the existing algorithm. Moreover, in practical terms, the new algorithms can naturally be parallelized that makes multi-core processors contributes more in spurious counterexample checking. In addition, by the new algorithms, the state resulting in a spurious path (false state) that is hidden shallower will be reported earlier. Hence, as long as a false state is detected, lots of iterations for detecting all the false states will be avoided. Experimental results show that the new algorithms perform well along with the growth of system scale.

Symbolic analysis is indispensable for software tools that require program semantic information at compile time. However, determining symbolic values for program variables related to loops and library calls is challenging, as the computation and data related to loops can have statically unknown bounds, and the library sources are typically not available at compile time. In this paper, we propose segmented symbolic analysis, a hybrid technique that enables fully automatic symbolic analysis even for the traditionally challenging code of library calls and loops. The novelties of this work are threefold: 1) we flexibly weave symbolic and concrete executions on the selected parts of the program based on demand; 2) dynamic executions are performed on the unit tests constructed from the code segments to infer program semantics needed by static analysis; and 3) the dynamic information from multiple runs is aggregated via regression analysis. We developed the Helium framework, consisting of a static component that performs symbolic analysis and partitions a program, a dynamic analysis that synthesizes unit tests and automatically infers symbolic values for program variables, and a protocol that enables static and dynamic analyses to be run interactively and concurrently. Our experimental results show that by handling loops and library calls that a traditional symbolic analysis cannot process, segmented symbolic analysis detects 5 times more buffer overflows. The technique is scalable for real-world programs such as putty, tightvnc and snort.

Previous applications of symbolic execution (SymExe) have focused on bug-finding and test-case generation. However, SymExe has the potential to significantly improve usability and automation when applied to verification of software contracts in safety-critical systems. Due to the lack of support for processing software contracts and ad hoc approaches for introducing a variety of over/under-approximations and optimizations, most SymExe implementations cannot precisely characterize the verification status of contracts. Moreover, these tools do not provide explicit justifications for their conclusions, and thus they are not aligned with trends toward evidence-based verification and certification. We introduce the concept of "explicating symbolic execution" (xSymExe) that builds on a strong semantic foundation, supports full verification of rich software contracts, explicitly tracks where over/under-approximations are introduced or avoided, precisely characterizes the verification status of each contractual claim, and associates each claim with "explications" for its reported verification status. We report on case studies in the use of Bakar Kiasan, our open source xSymExe tool for SPARK Ada.

Scenario-finding tools such as Alloy are widely used to understand the consequences of specifications, with applications to software modeling, security analysis, and verification. This paper focuses on the exploration of scenarios: which scenarios are presented first, and how to traverse them in a well-defined way. We present Aluminum, a modification of Alloy that presents only minimal scenarios: those that contain no more than is necessary. Aluminum lets users explore the scenario space by adding to scenarios and backtracking. It also provides the ability to find what can consistently be used to extend each scenario. We describe the semantic basis of Aluminum in terms of minimal models of first-order logic formulas. We show how this theory can be implemented atop existing SAT-solvers and quantify both the benefits of minimality and its small computational overhead. Finally, we offer some qualitative observations about scenario exploration in Aluminum.

The scenario-based approach to the specification and simulation of reactive systems has attracted much research efforts in recent years. While the problem of synthesizing a controller or a transition system from a scenario-based specification has been studied extensively, no work has yet effectively addressed the case where the specification is unrealizable and a controller cannot be synthesized. This has limited the effectiveness of using scenario-based specifications in requirements analysis and simulation.
In this paper we present counter play-out, an interactive debugging method for unrealizable scenario-based specifications. When we identify an unrealizable specification, we generate a controller that plays the role of the environment and lets the engineer play the role of the system. During execution, the former chooses environment's moves such that the latter is forced to eventually fail in satisfying the system's requirements. This results in an interactive, guided execution, leading to the root causes of unrealizability. The generated controller constitutes a proof that the specification is conflicting and cannot be realized.
Counter play-out is based on a counter strategy, which we compute by solving a Rabin game using a symbolic, BDD-based algorithm. The work is implemented and integrated with PlayGo, an IDE for scenario-based programming developed at the Weizmann Institute of Science. Case studies show the contribution of our work to the state-of-the-art in the scenario-based approach to specification and simulation.

Logging system behavior is a staple development practice. Numerous powerful model inference algorithms have been proposed to aid developers in log analysis and system understanding. Unfortunately, existing algorithms are difficult to understand, extend, and compare. This paper presents InvariMint, an approach to specify model inference algorithms declaratively. We applied InvariMint to two model inference algorithms and present evaluation results to illustrate that InvariMint (1) leads to new fundamental insights and better understanding of existing algorithms, (2) simplifies creation of new algorithms, including hybrids that extend existing algorithms, and (3) makes it easy to compare and contrast previously published algorithms. Finally, algorithms specified with InvariMint can outperform their procedural versions.

Experience with lightweight formal methods suggests that programmers are willing to write specification if it brings tangible benefits to their usual development activities. This paper considers stronger specifications and studies whether they can be deployed as an incremental practice that brings additional benefits without being unacceptably expensive. We introduce a methodology that extends Design by Contract to write strong specifications of functional properties in the form of preconditions, postconditions, and invariants. The methodology aims at being palatable to developers who are not fluent in formal techniques but are comfortable with writing simple specifications. We evaluate the cost and the benefits of using strong specifications by applying the methodology to testing data structure implementations written in Eiffel and C#. In our extensive experiments, testing against strong specifications detects twice as many bugs as standard contracts, with a reasonable overhead in terms of annotation burden and run-time performance while testing. In the wide spectrum of formal techniques for software quality, testing against strong specifications lies in a "sweet spot" with a favorable benefit to effort ratio.

Analysis

We propose a novel fine-grained causal inference technique.
Given two executions and some observed differences between them,
the technique reasons about the causes of such differences.
The technique does so by state replacement, i.e. replacing part of the program
state at an earlier point to observe whether the target differences can be
induced.
It makes a number of key advances: it features a novel execution model that
avoids undesirable entangling of the replaced state and the original state;
it properly handles differences of omission by symmetrically analyzing both
executions;
it also leverages a recently developed slicing technique to limit
the scope of causality testing while ensuring that no relevant state causes can
be missed.
The application of the technique on automated debugging shows
that it substantially improves the precision and efficiency of causal
inference compared to state of the art techniques.

Languages with inheritance and polymorphism assume that a subclass instance can substitute a superclass instance without causing behavioral differences for clients of the superclass. However, programmers may accidentally create subclasses that are semantically incompatible with their superclasses. Such subclasses lead to bugs, because a programmer may assign a subclass instance to a superclass reference. This paper presents an automatic testing technique to reveal subclasses that cannot safely substitute their superclasses. The key idea is to generate generic tests that analyze the behavior of both the subclass and its superclass. If using the subclass leads to behavior that cannot occur with the superclass, the analysis reports a warning. We find a high percentage of widely used Java classes, including classes from JBoss, Eclipse, and Apache Commons Collections, to be unsafe substitutes for their superclasses: 30% of these classes lead to crashes, and even more have other behavioral differences.

Spreadsheets are widely used in industry: it is estimated that end-user programmers outnumber programmers by a factor 5. However, spreadsheets are error-prone, numerous companies have lost money because of spreadsheet errors. One of the causes for spreadsheet problems is the prevalence of copy-pasting.
In this paper, we study this cloning in spreadsheets. Based on existing text-based clone detection algorithms, we have developed an algorithm to detect data clones in spreadsheets: formulas whose values are copied as plain text in a different location.
To evaluate the usefulness of the proposed approach, we conducted two evaluations. A quantitative evaluation in which we analyzed the EUSES corpus and a qualitative evaluation consisting of two case studies. The results of the evaluation clearly indicate that 1) data clones are common, 2) data clones pose threats to spreadsheet quality and 3) our approach supports users in finding and resolving data clones.

Code Analysis

Regression verification (RV) seeks to guarantee the absence of regression errors in a changed program version. This paper presents Partition-based Regression Verification (PRV): an approach to RV based on the gradual exploration of differential input partitions. A differential input partition is a subset of the common input space of two program versions that serves as a unit of verification. Instead of proving the absence of regression for the complete input space at once, PRV verifies differential partitions in a gradual manner. If the exploration is interrupted, PRV retains partial verification guarantees at least for the explored differential partitions. This is crucial in practice as verifying the complete input space can be prohibitively expensive.
Experiments show that PRV provides a useful alternative to state-of-the-art regression test generation techniques. During the exploration, PRV generates test cases which can expose different behaviour across two program versions. However, while test cases are generally single points in the common input space, PRV can verify entire partitions and moreover give feedback that allows programmers to relate a behavioral difference to those syntactic changes that contribute to this difference.

The behavior of a software system often depends on how that system is configured. Small configuration errors can lead to hard-to-diagnose undesired behaviors. We present a technique (and its tool implementation, called ConfDiagnoser) to identify the root cause of a configuration error a single configuration option that can be changed to produce desired behavior. Our technique uses static analysis, dynamic profiling, and statistical analysis to link the undesired behavior to specific configuration options. It differs from existing approaches in two key aspects: it does not require users to provide a testing oracle (to check whether the software functions correctly) and thus is fully-automated; and it can diagnose both crashing and non-crashing errors. We evaluated ConfDiagnoser on 5 non-crashing configuration errors and 9 crashing configuration errors from 5 configurable software systems written in Java. On average, the root cause was ConfDiagnosers fifth-ranked suggestion; in 10 out of 14 errors, the root cause was one of the top 3 suggestions; and more than half of the time, the root cause was the first suggestion.

Previously, we developed a data-centric approach to concurrency control in which programmers specify synchronization constraints declaratively, by grouping shared locations into atomic sets. We implemented our ideas in a Java extension called AJ, using Java locks to implement synchronization. We proved that atomicity violations are prevented by construction, and demonstrated that realistic Java programs can be refactored into AJ without significant loss of performance. This paper presents an algorithm for detecting possible dead- lock in AJ programs by ordering the locks associated with atomic sets. In our approach, a type-based static analysis is extended to handle recursive data structures by considering programmer- supplied, compiler-verified lock ordering annotations. In an eval- uation of the algorithm, all 10 AJ programs under consideration were shown to be deadlock-free. One program needed 4 ordering annotations and 2 others required minor refactorings. For the remaining 7 programs, no programmer intervention of any kind was required.

Debugging

When software engineers fix bugs, they may have several options as to how to fix those bugs. Which fix they choose has many implications, both for practitioners and researchers: What is the risk of introducing other bugs during the fix? Is the bug fix in the same code that caused the bug? Is the change fixing the cause or just covering a symptom? In this paper, we investigate alternative fixes to bugs and present an empirical study of how engineers make design choices about how to fix bugs. Based on qualitative interviews with 40 engineers working on a variety of products, data from 6 bug triage meetings, and a survey filled out by 326 engineers, we found a number of factors, many of them non-technical, that influence how bugs are fixed, such as how close to release the software is. We also discuss several implications for research and practice, including ways to make bug prediction and localization more accurate.

Bug triaging is an important activity in any software development project. It involves developers working through the set of unassigned bugs, determining for each of the bugs whether it represents a new issue that should receive attention, and, if so, assigning it to a developer and a milestone. Current tools provide only minimal support for bug triaging and especially break down when developers must triage a large number of bug reports, since those reports can only be viewed one-by-one. This paper presents PorchLight, a novel tool that uses tags, attached to individual bug reports by queries expressed in a specialized bug query language, to organize bug reports into sets so developers can explore, work with, and ultimately assign bugs effectively in meaningful groups. We describe the challenges in supporting bug triaging, the design decisions upon which PorchLight rests, and the technical aspects of the implementation. We conclude with an early evaluation that involved six professional developers who assessed PorchLight and its potential for their day-to-day triaging duties.

We present Expositor, a new debugging environment that combines scripting and time-travel debugging to allow programmers to automate complex debugging tasks. The fundamental abstraction provided by Expositor is the execution trace, which is a time-indexed sequence of program state snapshots. Programmers can manipulate traces as if they were simple lists with operations such as map and filter. Under the hood, Expositor efficiently implements traces as lazy, sparse interval trees, whose contents are materialized on demand. Expositor also provides a novel data structure, the edit hash array mapped trie, which is a lazy implementation of sets, maps, multisets, and multimaps that enables programmers to maximize the efficiency of their debugging scripts. We have used Expositor to debug a stack overflow and to unravel a subtle data race in Firefox. We believe that Expositor represents an important step forward in improving the technology for diagnosing complex, hard-to-understand bugs.

When programs fail in the field, developers are often left with limited information to diagnose the failure. Automated error reporting tools can assist in bug report generation but without precise steps from the end user it is often difficult for developers to recreate the failure. Advanced remote debugging tools aim to capture sufficient information from field executions to recreate failures in the lab but often have too much overhead to practically deploy. We present CHRONICLER, an approach to remote debugging that captures non-deterministic inputs to applications in a lightweight manner, assuring faithful reproduction of client executions. We evaluated CHRONICLER by creating a Java implementation, CHRONICLERJ, and then by using a set of benchmarks mimicking real world applications and workloads, showing its runtime overhead to be under 10% in most cases (worst case 86%), while an existing tool showed overhead over 100% in the same cases (worst case 2,322%).

While many bug prediction algorithms have been developed by academia, they're often only tested and verified in the lab using automated means. We do not have a strong idea about whether such algorithms are useful to guide human developers. We deployed a bug prediction algorithm across Google, and found no identifiable change in developer behavior. Using our experience, we provide several characteristics that bug prediction algorithms need to meet in order to be accepted by human developers and truly change how developers evaluate their code.

Many software defect prediction approaches have been proposed and most are effective in within-project prediction settings. However, for new projects or projects with limited training data, it is desirable to learn a prediction model by using sufficient training data from existing source projects and then apply the model to some target projects (cross-project defect prediction). Unfortunately, the performance of cross-project defect prediction is generally poor, largely because of feature distribution differences between the source and target projects.
In this paper, we apply a state-of-the-art transfer learning approach, TCA, to make feature distributions in source and target projects similar. In addition, we propose a novel transfer defect learning approach, TCA+, by extending TCA. Our experimental results for eight open-source projects show that TCA+ significantly improves cross-project prediction performance.

In a manual examination of more than 7,000 issue reports from the bug databases of five open-source projects, we found 33.8% of all bug reports to be misclassified---that is, rather than referring to a code fix, they resulted in a new feature, an update to documentation, or an internal refactoring. This misclassification introduces bias in bug prediction models, confusing bugs and features: On average, 39% of files marked as defective actually never had a bug. We discuss the impact of this misclassification on earlier studies and recommend manual data validation for future studies.

Big data analytics is the process of examining large amounts of data (big data) in an effort to uncover hidden patterns or unknown correlations. Big Data Analytics Applications (BDA Apps) are a new type of software applications, which analyze big data using massive parallel processing frameworks (e.g., Hadoop). Developers of such applications typically develop them using a small sample of data in a pseudo-cloud environment. Afterwards, they deploy the applications in a large-scale cloud environment with considerably more processing power and larger input data (reminiscent of the mainframe days). Working with BDA App developers in industry over the past three years, we noticed that the runtime analysis and debugging of such applications in the deployment phase cannot be easily addressed by traditional monitoring and debugging approaches. In this paper, as a first step in assisting developers of BDA Apps for cloud deployments, we propose a lightweight approach for uncovering differences between pseudo and large-scale cloud deployments. Our approach makes use of the readily-available yet rarely used execution logs from these platforms. Our approach abstracts the execution logs, recovers the execution sequences, and compares the sequences between the pseudo and cloud deployments. Through a case study on three representative Hadoop-based BDA Apps, we show that our approach can rapidly direct the attention of BDA App developers to the major differences between the two deployments. Knowledge of such differences is essential in verifying BDA Apps when analyzing big data in the cloud. Using injected deployment faults, we show that our approach not only significantly reduces the deployment verification effort, but also provides very few false positives when identifying deployment failures.

Modern software systems are built by composing components drawn from large repositories, whose size and complexity increase at a fast pace. Software systems built with components from a release of a repository should be seamlessly upgradeable using components from the next release. Unfortunately, users are often confronted with sets of components that were installed together, but cannot be upgraded together to the latest version from the new repository. Identifying these broken sets can be of great help for a quality assurance team, that could examine and fix these issues well before they reach the end user. Building on previous work on component co-installability, we show that it is possible to find these broken sets for any two releases of a component repository, computing extremely efficiently a concise representation of these upgrade issues, together with informative graphical explanations. A tool implementing the algorithm presented in this paper is available as free software, and is able to process the evolution between two major releases of the Debian GNU/Linux distribution in just a few seconds. These results make it possible to integrate seamlessly this analysis in a repository development process.

In today's software-centric world, ultra-large-scale software repositories, e.g. SourceForge (350,000+ projects), GitHub (250,000+ projects), and Google Code (250,000+ projects) are the new library of Alexandria. They contain an enormous corpus of software and information about software. Scientists and engineers alike are interested in analyzing this wealth of information both for curiosity as well as for testing important hypotheses. However, systematic extraction of relevant data from these repositories and analysis of such data for testing hypotheses is hard, and best left for mining software repository (MSR) experts! The goal of Boa, a domain-specific language and infrastructure described here, is to ease testing MSR-related hypotheses. We have implemented Boa and provide a web-based interface to Boa's infrastructure. Our evaluation demonstrates that Boa substantially reduces programming efforts, thus lowering the barrier to entry. We also see drastic improvements in scalability. Last but not least, reproducing an experiment conducted using Boa is just a matter of re-running small Boa programs provided by previous researchers.

Process

Defect prediction techniques could potentially help us to focus quality-assurance efforts on the most defect-prone files. Modern statistical tools make it very easy to quickly build and deploy prediction models. Software metrics are at the heart of prediction models; understanding how and especially why different types of metrics are effective is very important for successful model deployment. In this paper we analyze the applicability and efficacy of process and code metrics from several different perspectives. We build many prediction models across 85 releases of 12 large open source projects to address the performance, stability, portability and stasis of different sets of metrics. Our results suggest that code metrics, despite widespread use in the defect prediction literature, are generally less useful than process metrics for prediction. Second, we find that code metrics have high stasis; they dont change very much from release to release. This leads to stagnation in the prediction models, leading to the same files being repeatedly predicted as defective; unfortunately, these recurringly defective files turn out to be comparatively less defect-dense.

Software projects involve diverse roles and artifacts that have dependencies to requirements. Project team members in different roles need to coordinate but their coordination is affected by the availability of domain knowledge, which is distributed among different project members, and organizational structures that control cross-functional communication. Our study examines how information flowed between different roles in two software projects that had contrasting distributions of domain knowledge and different communication structures. Using observations, interviews, and surveys, we examined how diverse roles working on requirements and their related artifacts coordinated along task dependencies. We found that communication only partially matched task dependencies and that team members that are boundary spanners have extensive domain knowledge and hold key positions in the control structure. These findings have implications on how organizational structures interfere with task assignments and influence communication in the project, suggesting how practitioners can adjust team configuration and communication structures.

Work practices vary among software developers. Some are highly focused on a few artifacts; others make wide-ranging contributions. Similarly, some artifacts are mostly authored, or owned, by one or few developers; others have very wide ownership. Focus and ownership are related but different phenomena, both with strong effect on software quality. Prior studies have mostly targeted ownership; the measures of ownership used have generally been based on either simple counts, information-theoretic views of ownership, or social-network views of contribution patterns. We argue for a more general conceptual view that unifies developer focus and artifact ownership. We analogize the developer-artifact contribution network to a predator-prey food web, and draw upon ideas from ecology to produce a novel, and conceptually unified view of measuring focus and ownership. These measures relate to both cross-entropy and Kullback-Liebler divergence, and simultaneously provide two normalized measures of focus from both the developer and artifact perspectives. We argue that these measures are theoretically well-founded, and yield novel predictive, conceptual, and actionable value in software projects. We find that more focused developers introduce fewer defects than defocused developers. In contrast, files that receive narrowly focused activity are more likely to contain defects than other files.

Software Engineering and development is well- known to suffer from unplanned overtime, which causes stress and illness in engineers and can lead to poor quality software with higher defects. In this paper, we introduce a multi-objective decision support approach to help balance project risks and duration against overtime, so that software engineers can better plan overtime. We evaluate our approach on 6 real world software projects, drawn from 3 organisations using 3 standard evaluation measures and 3 different approaches to risk assessment. Our results show that our approach was significantly better (p < 0.05) than standard multi-objective search in 76% of experiments (with high Cohen effect size in 85% of these) and was significantly better than currently used overtime planning strategies in 100% of experiments (with high effect size in all). We also show how our approach provides actionable overtime planning results and inves- tigate the impact of the three different forms of risk assessment.

Model checking techniques for software product lines (SPL) are actively researched. A major limitation they currently have is the inability to deal efficiently with non-Boolean features and multi-features. An example of a non-Boolean feature is a numeric attribute such as maximum number of users which can take different numeric values across the range of SPL products. Multi-features are features that can appear several times in the same product, such as processing units which number is variable from one product to another and which can be configured independently. Both constructs are extensively used in practice but currently not supported by existing SPL model checking techniques. To overcome this limitation, we formally define a language that integrates these constructs with SPL behavioural specifications. We generalize SPL model checking algorithms correspondingly and evaluate their applicability. Our results show that the algorithms remain efficient despite the generalization.

Product-line technology is increasingly used in mission-critical and safety-critical applications. Hence, researchers are developing verification approaches that follow different strategies to cope with the specific properties of product lines. While the research community is discussing the mutual strengths and weaknesses of the different strategies—mostly at a conceptual level—there is a lack of evidence in terms of case studies, tool implementations, and experiments. We have collected and prepared six product lines as subject systems for experimentation. Furthermore, we have developed a model-checking tool chain for C-based and Java-based product lines, called SPLVERIFIER, which we use to compare sample-based and family-based strategies with regard to verification performance and the ability to find defects. Based on the experimental results and an analytical model, we revisit the discussion of the strengths and weaknesses of product-line–verification strategies.

Software design is a process of trading off competing objectives. If the user objective space is rich, then we should use optimizers that can fully exploit that richness. For example, this study configures software product lines (expressed as feature maps) using various search-based software engineering methods. As we increase the number of optimization objectives, we find that methods in widespread use (e.g. NSGA-II, SPEA2) perform much worse than IBEA (Indicator-Based Evolutionary Algorithm). IBEA works best since it makes most use of user preference knowledge. Hence it does better on the standard measures (hypervolume and spread) but it also generates far more products with 0% violations of domain constraints. Our conclusion is that we need to change our methods for search-based software engineering, particularly when studying complex decision spaces.

Search-Based SE

Adding features and fixing bugs often require sys- tematic edits that make similar, but not identical, changes to many code locations. Finding all the relevant locations and making the correct edits is a tedious and error-prone process for developers. This paper addresses both problems using edit scripts learned from multiple examples. We design and implement a tool called LASE that (1) creates a context-aware edit script from two or more examples, and uses the script to (2) automatically identify edit locations and to (3) transform the code. We evaluate LASE on an oracle test suite of systematic edits from Eclipse JDT and SWT. LASE finds edit locations with 99% precision and 89% recall, and transforms them with 91% accuracy. We also evaluate LASE on 37 example systematic edits from other open source programs and find LASE is accurate and effective. Furthermore, we confirmed with developers that LASE found edit locations which they missed. Our novel algorithm that learns from multiple examples is critical to achieving high precision and recall; edit scripts created from only one example produce too many false positives, false negatives, or both. Our results indicate that LASE should help developers in automating systematic editing. Whereas most prior work either suggests edit locations or performs simple edits, LASE is the first to do both for nontrivial program edits.

Migrating existing enterprise software to cloud platforms involves the comparison of competing cloud deployment options (CDOs). A CDO comprises a combination of a specific cloud environment, deployment architecture, and runtime reconfiguration rules for dynamic resource scaling. Our simulator CDOSim can evaluate CDOs, e.g., regarding response times and costs. However, the design space to be searched for well-suited solutions is extremely huge. In this paper, we approach this optimization problem with the novel genetic algorithm CDOXplorer. It uses techniques of the search-based software engineering field and CDOSim to assess the fitness of CDOs. An experimental evaluation that employs, among others, the cloud environments Amazon EC2 and Microsoft Windows Azure, shows that CDOXplorer can find solutions that surpass those of other state-of-the-art techniques by up to 60%. Our experiment code and data and an implementation of CDOXplorer are available as open source software.

Information Retrieval (IR) methods, and in particular topic models, have recently been used to support essential software engineering (SE) tasks, by enabling software textual retrieval and analysis. In all these approaches, topic models have been used on software artifacts in a similar manner as they were used on natural language documents (e.g., using the same settings and parameters) because the underlying assumption was that source code and natural language documents are similar. However, applying topic models on software data using the same settings as for natural language text did not always produce the expected results.
Recent research investigated this assumption and showed that source code is much more repetitive and predictable as compared to the natural language text. Our paper builds on this new fundamental finding and proposes a novel solution to adapt, configure and effectively use a topic modeling technique, namely Latent Dirichlet Allocation (LDA), to achieve better (acceptable) performance across various SE tasks. Our paper introduces a novel solution called LDA-GA, which uses Genetic Algorithms (GA) to determine a near-optimal configuration for LDA in the context of three different SE tasks: (1) traceability link recovery, (2) feature location, and (3) software artifact labeling. The results of our empirical studies demonstrate that LDA-GA is ableto identify robust LDA configurations, which lead to a higher accuracy on all the datasets for these SE tasks as compared to previously published results, heuristics, and the results of a combinatorial search.

Performance

This paper introduces GREEN STREAMS, a novel solution to address a critical but often overlooked property of data-intensive software: energy efficiency. GREEN STREAMS is built around two key insights into data-intensive software. First, energy consumption of data-intensive software is strongly correlated to data volume and data processing, both of which are naturally abstracted in the stream programming paradigm; Second, energy efficiency can be improved if the data processing components of a stream program coordinate in a “balanced” way, much like an assembly line that runs most efficiently when participating workers coordinate their pace. GREEN STREAMS adopts a standard stream programming model, and applies Dynamic Voltage and Frequency Scaling (DVFS) to coordinate
the pace of data processing among components, ultimately achieving energy efficiency without degrading performance in a parallel processing environment. At the core of GREEN STREAMS is a novel constraint-based inference to abstract the intrinsic relationships of data flow rates inside a stream program, that uses linear programming to minimize the frequencies – hence the energy consumption – for processing components while still maintaining the maximum output data flow rate. The core algorithm of GREEN STREAMS is formalized, and its optimality is established. The effectiveness of GREEN STREAMS is evaluated on top of the StreamIt framework, and preliminary results show the approach can save CPU energy by an average of 28% with a 7% performance improvement.

Service composition makes use of existing service-based applications as components to achieve a business goal. In time critical business environments, the response time of a service is crucial, which is also reflected as a clause in service level agreements (SLAs) between service providers and service users. To allow the composite service to fulfill the response time requirement as promised, it is important to find a feasible set of component services, such that their response time could collectively allow the satisfaction of the response time of the composite service. In this work, we propose a fully automated approach to synthesize the response time requirement of component services, in the form of a constraint on the local response times, that guarantees the global response time requirement. Our approach is based on parameter synthesis techniques for real-time systems. It has been implemented and evaluated with real-world case studies.

Performance problems pose a significant risk to software vendors. If left
undetected, they can lead to lost customers, increased operational costs, and
damaged reputation.
Despite all efforts, software engineers cannot fully prevent performance
problems being introduced into an application.
Detecting and resolving such problems as early as possible with minimal effort
is still an open challenge in software performance engineering.
In this paper, we present a novel approach for Performance Problem Diagnostics
(PPD) that systematically searches for well-known performance problems (also
called performance antipatterns) within an application.
PPD automatically isolates the problem's root cause, hence facilitating problem
solving.
We applied PPD to a well established transactional web e-Commerce
benchmark (TPC-W) in two deployment scenarios. PPD automatically identified
four performance problems in the benchmark implementation and its deployment
environment.
By fixing the problems, we increased the maximum throughput of the benchmark
from 1800 requests per second to more than 3500.

Performance bugs are programming errors that create significant performance degradation. While developers often use automated oracles for detecting functional bugs, detecting performance bugs usually requires time-consuming, manual analysis of execution profiles. The human effort for performance analysis limits the number of performance tests analyzed and enables performance bugs to easily escape to production. Unfortunately, while profilers can successfully localize slow executing code, profilers cannot be effectively used as automated oracles. This paper presents TODDLER, a novel automated oracle for performance bugs, which enables testing for performance bugs to use the well established and automated process of testing for functional bugs. TODDLER reports code loops whose computation has repetitive and partially similar memory-access patterns across loop iterations. Such repetitive work is likely unnecessary and can be done faster. We implement TODDLER for Java and evaluate it on 9 popular Java codebases. Our experiments with 11 previously known, real-world performance bugs show that TODDLER finds these bugs with a higher accuracy than the standard Java profiler. Using TODDLER, we also found 42 new bugs in six Java projects: Ant, Google Core Libraries, JUnit, Apache Collections, JDK, and JFreeChart. Based on our bug reports, developers so far fixed 10 bugs and confirmed 6 more as real bugs.

Studying human analyst's behavior in automated tracing is a new research thrust. Building on a growing body of work in this area, we offer a novel approach to understanding requirements analyst's information seeking and gathering. We model analysts as predators in pursuit of prey --- the relevant traceability information, and leverage the optimality models to characterize a rational decision process. The behavior of real analysts with that of the optimal information forager is then compared and contrasted. The results show that the analysts' information diets are much wider than the theory's predictions, and their residing in low-profitability information patches is much longer than the optimal residence time. These uncovered discrepancies not only offer concrete insights into the obstacles faced by analysts, but also lead to principled ways to increase practical tool support for overcoming the obstacles.

User feedback is imperative in improving software quality. In this paper, we explore the rich set of user feedback available for third party mobile applications as a way to extract new/changed requirements for next versions. A potential problem using this data is its volume and the time commitment involved in extracting new/changed requirements. Our goal is to alleviate part of the process through automatic topic extraction. We process user comments to extract the main topics mentioned as well as some sentences representative of those topics. This information can be useful for requirements engineers to revise the requirements for next releases. Our approach relies on adapting information retrieval techniques including topic modeling and evaluating them on different publicly available data sets. Results show that the automatically extracted topics match the manually extracted ones, while also significantly decreasing the manual effort.

Requirements modelling helps software engineers understand a system’s required behaviour and explore alternative system designs. It also generates a formal software specification that can be used for testing, verification, and debugging. However, elaborating such models requires expertise and significant human effort. The paper aims at reducing this effort by automating an essential activity of requirements modelling which consists in deriving a machine specification satisfying a set of goals in a domain. It introduces deontic input-output automata —an extension of input-output automata with permissions and obligations— and an automated synthesis technique over this formalism to support such derivation. This technique helps modellers identifying early when a goal is not realizable in a domain and can guide the exploration of alternative models to make goals realizable. Synthesis techniques for input-output or interface automata are not adequate for requirements modelling.

Reliability

Model-based reliability estimation of software systems can provide useful insights early in the development process. However, computational complexity of estimating reliability metrics such as mean time to first failure (MTTF) can be prohibitive both in time, space and precision. In this paper we present an alternative to exhaustive model exploration-as in probabilistic model checking-and partial random exploration--as in statistical model checking. Our hypothesis is that a (carefully crafted) partial systematic exploration of a system model can provide better bounds for reliability metrics at lower computation cost. We present a novel automated technique for reliability estimation that combines simulation, invariant inference and probabilistic model checking. Simulation produces a probabilistically relevant set of traces from which a state invariant is inferred. The invariant characterises a partial model which is then exhaustively explored using probabilistic model checking. We report on experiments that suggest that reliability estimation using this technique can be more effective than (full model) probabilistic and statistical model checking for system models with rare failures.

Software systems are constantly evolving, with new versions and patches being released on a continuous basis. Unfortunately, software updates present a high risk, with many releases introducing new bugs and security vulnerabilities.
We tackle this problem using a simple but effective multi-version based approach. Whenever a new update becomes available, instead of upgrading the software to the new version, we run the new version in parallel with the old one; by carefully coordinating their executions and selecting the behaviour of the more reliable version when they diverge, we create a more secure and dependable multi-version application.
We implemented this technique in Mx, a system targeting Linux applications running on multi-core processors, and show that it can be applied successfully to several real applications such as Coreutils, a set of user-level UNIX applications; Lighttpd, a popular web server used by several high-traffic websites such as Wikipedia and YouTube; and Redis, an advanced key-value data structure server used by many well-known services such as GitHub and Flickr.

Software reliability analysis tackles the problem of predicting the failure probability of software. Most of the current approaches base reliability analysis on architectural abstractions useful at early stages of design, but not directly applicable to source code. In this paper we propose a general methodology that exploit symbolic execution of source code for extracting failure and success paths to be used for probabilistic reliability assessment against relevant usage scenarios. Under the assumption of finite and countable input domains, we provide an efficient implementation based on Symbolic PathFinder that supports the analysis of sequential and parallel programs, even with structured data types, at the desired level of confidence. The tool has been validated on both NASA prototypes and other test cases showing a promising applicability scope.

Applications that continuously gather and disclose personal information about users are increasingly common. While disclosing this information may be essential for these applications to function, it may also raise privacy concerns. Partly, this is due to frequently changing context that introduces new privacy threats, and makes it difficult to continuously satisfy privacy requirements. To address this problem, applications may need to adapt in order to manage changing privacy concerns. Thus, we propose a framework that exploits the notion of privacy awareness requirements to identify runtime privacy properties to satisfy. These properties are used to support disclosure decision making by applications. Our evaluations suggest that applications that fail to satisfy privacy awareness requirements cannot regulate users information disclosure. We also observe that the satisfaction of privacy awareness requirements is useful to users aiming to minimise exposure to privacy threats, and to users aiming to maximise functional benefits amidst increasing threat severity.

In previous work, we proposed a set of static attributes that characterize input validation and input sanitization code patterns. We showed that some of the proposed static attributes are significant predictors of SQL injection and cross site scripting vulnerabilities. Static attributes have the advantage of reflecting general properties of a program. Yet, dynamic attributes collected from execution traces may reflect more specific code characteristics that are complementary to static attributes. Hence, to improve our initial work, in this paper, we propose the use of dynamic attributes to complement static attributes in vulnerability prediction. Furthermore, since existing work relies on supervised learning, it is dependent on the availability of training data labeled with known vulnerabilities. This paper presents prediction models that are based on both classification and clustering in order to predict vulnerabilities, working in the presence or absence of labeled training data, respectively. In our experiments across six applications, our new supervised vulnerability predictors based on hybrid (static and dynamic) attributes achieved, on average, 90% recall and 85% precision, that is a sharp increase in recall when compared to static analysis-based predictions. Though not nearly as accurate, our unsupervised predictors based on clustering achieved, on average, 76% recall and 39% precision, thus suggesting they can be useful in the absence of labeled training data.

Remote code execution (RCE) attacks are one of the most prominent security threats for web applications. It is a special kind of cross-site-scripting (XSS) attack that allows client inputs to be stored and executed as server side scripts. RCE attacks often require coordination of multiple requests and manipulation of string and non-string inputs from the client side to nullify the access control protocol and induce unusual execution paths on the server side. We propose a path- and context-sensitive interprocedural analysis to detect RCE vulnerabilities. The analysis features a novel way of analyzing both the string and non-string behavior of a web application in a path sensitive fashion. It thoroughly handles the practical challenges entailed by modeling RCE attacks. We develop a prototype system and evaluate it on ten real-world PHP applications. We have identified 21 true RCE vulnerabilities, with 8 unreported before.

Reviewing software system architecture to pinpoint potential security flaws before proceeding with system development is a critical milestone in secure software development lifecycles. This includes identifying possible attacks or threat scenarios that target the system and may result in breaching of system security. Additionally we may also assess the strength of the system and its security architecture using well-known security metrics such as system attack surface, Compartmentalization, least-privilege, etc. However, existing efforts are limited to specific, predefined security properties or scenarios that are checked either manually or using limited toolsets. We introduce a new approach to support architecture security analysis using security scenarios and metrics. Our approach is based on formalizing attack scenarios and security metrics signature specification using the Object Constraint Language (OCL). Using formal signatures we analyse a target system to locate signature matches (for attack scenarios), or to take measurements (for security metrics). New scenarios and metrics can be incorporated and calculated provided that a formal signature can be specified. Our approach supports defining security metrics and scenarios at architecture, design, and code levels. We have developed a prototype software system architecture security analysis tool. To the best of our knowledge this is the first extensible architecture security risk analysis tool that supports both metric-based and scenario-based architecture security analysis. We have validated our approach by using it to capture and evaluate signatures from the NIST security principals and attack scenarios defined in the CAPEC database.

Analysis Studies

Using static analysis tools for automating code inspections can be beneficial for software engineers. Such tools can make finding bugs, or software defects, faster and cheaper than manual inspections. Despite the benefits of using static analysis tools to find bugs, research suggests that these tools are underused. In this paper, we investigate why developers are not widely using static analysis tools and how current tools could potentially be improved. We conducted interviews with 20 developers and found that although all of our participants felt that use is beneficial, false positives and the way in which the warnings are presented, among other things, are barriers to use. We discuss several implications of these results, such as the need for an interactive mechanism to help developers fix defects.

Code smells are indicators of issues with source code quality that may hinder evolution. While previous studies mainly focused on the effects of individual code smells on maintainability, we conjecture that not only the individual code smells but also the interactions between code smells affect maintenance. We empirically investigate the interactions amongst 12 code smells and analyze how those interactions relate to maintenance problems. Professional developers were hired for a period of four weeks to implement change requests on four medium-sized Java systems with known smells. On a daily basis, we recorded what specific problems they faced and which artifacts were associated with them. Code smells were automatically detected in the pre-maintenance versions of the systems and analyzed using Principal Component Analysis (PCA) to identify patterns of co-located code smells. Analysis of these factors with the observed maintenance problems revealed how smells that were co-located in the same artifact interacted with each other, and affected maintainability. Moreover, we found that code smell interactions occurred across coupled artifacts, with comparable negative effects as same-artifact co-location. We argue that future studies into the effects of code smells on maintainability should integrate dependency analysis in their process so that they can obtain a more complete understanding by including such coupled interactions.

Coupling is a fundamental property of software systems, and numerous coupling measures have been proposed to support various development and maintenance activities. However, little is known about how developers actually perceive coupling, what mechanisms constitute coupling, and if existing measures align with this perception.
In this paper we bridge this gap, by empirically investigating how class coupling---as captured by structural, dynamic, semantic, and logical coupling measures---aligns with developers' perception of coupling. The study has been conducted on three Java open-source systems---namely ArgoUML, JHotDraw and jEdit---and involved 64 students, academics, and industrial practitioners from around the world, as well as 12 active developers of these three systems.
We asked participants to assess the coupling between the given pairs of classes and provide their ratings and some rationale. The results indicate that the peculiarity of the semantic coupling measure allows it to better estimate the mental model of developers than the other coupling measures. This is because, in several cases, the interactions between classes are encapsulated in the source code vocabulary, and cannot be easily derived by only looking at structural relationships, such as method calls.

Empirical Studies

Due to the increasing popularity of web applications, and the number of browsers and platforms on which such applications can be executed, cross-browser incompatibilities (XBIs) are becoming a serious concern for organizations that develop web-based software. Most of the techniques for XBI detection developed to date are either manual, and thus costly and error-prone, or partial and imprecise, and thus prone to generating both false positives and false negatives. To address these limitations of existing techniques, we developed X-PERT, a new automated, precise, and comprehensive approach for XBI detection. X-PERT combines several new and existing differencing techniques and is based on our findings from an extensive study of XBIs in real-world web applications. The key strength of our approach is that it handles each aspects of a web application using the differencing technique that is best suited to accurately detect XBIs related to that aspect. Our empirical evaluation shows that X-PERT is effective in detecting real-world XBIs, improves on the state of the art, and can provide useful support to developers for the diagnosis and (eventually) elimination of XBIs.

Code review is a common software engineering practice employed both in open source and industrial contexts. Review today is less formal and more lightweight than the code inspections performed and studied in the 70s and 80s. We empirically explore the motivations, challenges, and outcomes of tool-based code reviews. We observed, interviewed, and surveyed developers and managers and manually classified hundreds of review comments across diverse teams at Microsoft. Our study reveals that while finding defects remains the main motivation for review, reviews are less about defects than expected and instead provide additional benefits such as knowledge transfer, increased team awareness, and creation of alternative solutions to problems. Moreover, we find that code and change understanding is the key aspect of code reviewing and that developers employ a wide range of mechanisms to meet their understanding needs, most of which are not met by current tools. We provide recommendations for practitioners and researchers.

UML has been described by some as "the lingua franca" of software engineering. Evidence from industry does not necessarily support such endorsements. How exactly is UML being used in industry if it is? This paper presents a corpus of interviews with 50 professional software engineers in 50 companies and identifies 5 patterns of UML use.

Software conflicts arising because of conflicting changes are a regular occurrence and delay projects. The main precept of workspace awareness tools has been to identify potential conflicts early, while changes are still small and easier to resolve. However, in this approach conflicts still occur and require developer time and effort to resolve. We present a novel conflict minimization technique that proactively identifies potential conflicts, encodes them as constraints, and solves the constraint space to recommend a set of conflict-minimal development paths for the team. Here we present a study of four open source projects to characterize the distribution of conflicts and their resolution efforts. We then explain our conflict minimization technique and the design and implementation of this technique in our prototype, Cassandra. We show that Cassandra would have successfully avoided a majority of conflicts in the four open source test subjects. We demonstrate the efficiency of our approach by applying the technique to a simulated set of scenarios with higher than normal incidence of conflicts.

Programming Support

Object ownership enforces encapsulation within object-oriented programs by forbidding incoming aliases into objects' representations. Many common data structures, such as collections with iterators, require incoming aliases, so there has been much work on relaxing ownership's encapsulation to permit multiple incoming aliases. This research asks the opposite question: Are your aliases really necessary?
In this paper, we count the cost of programming with strong object encapsulation. We refactored the JDK 5.0 collection classes so that they did not use incoming aliases, following either the owner-as-dominator or the owner-as-accessor encapsulation discipline. We measured the performance time overhead the refactored collections impose on a set of microbenchmarks and on the DaCapo, SPECjbb and SPECjvm benchmark suites. While the microbenchmarks show that individual operations and iterations can be significantly slower on encapsulated collection (especially for owner-as-dominator), we found less than 3% slowdown for owner-as-accessor across the large scale benchmarks.
As a result, we propose that well-known design patterns such as Iterator commonly used by software engineers around the world need to be adjusted to take ownership into account. As most design patterns are used as a building block in constructing larger pieces of software, a small adjustment to respect ownership will not have any impact on the productivity of programmers but will have a huge impact on the quality of the resulting code with respect to aliasing.

The rapid rise of JavaScript as one of the most popular programming languages of the present day has led to a demand for sophisticated IDE support similar to what is available for Java or C#. However, advanced tooling is hampered by the dynamic nature of the language, which makes any form of static analysis very difficult. We single out efficient call graph construction as a key problem to be solved in order to improve development tools for JavaScript. To address this problem, we present a scalable field-based flow analysis for constructing call graphs. Our evaluation on large real-world programs shows that the analysis, while in principle unsound, produces highly accurate call graphs in practice. Previous analyses do not scale to these programs, but our analysis handles them in a matter of seconds, thus proving its suitability for use in an interactive setting.

Feature location is a human-oriented and information-intensive process. When performing feature location tasks with existing tools, developers often feel it difficult to formulate an accurate feature query (e.g., keywords) and determine the relevance of returned results. In this paper, we propose a feature location approach that supports multi-faceted interactive program exploration. Our approach automatically extracts and mines multiple syntactic and semantic facets from candidate program elements. Furthermore, it allows developers to interactively group, sort, and filter feature location results in a centralized, multi-faceted, and intelligent search User Interface (UI). We have implemented our approach as a web-based tool MFIE and conducted an experimental study. The results show that the developers using MFIE can accomplish their feature location tasks 32% faster and the quality of their feature location results (in terms of F-measure) is 51% higher than that of the developers using regular Eclipse IDE.

Program Repair

Debugging consumes significant time and effort in any major software development project. Moreover, even after the root cause of a bug is identified, fixing the bug is non-trivial. Given this situation, automated program repair methods are of value. In this paper, we present an automated repair method based on symbolic execution, constraint solving and program synthesis. In our approach, the requirement on the repaired code to pass a given set of tests is formulated as a constraint. Such a constraint is then solved by iterating over a layered space of repair expressions, layered by the complexity of the repair code. We compare our method with recently proposed genetic programming based repair on SIR programs with seeded bugs, as well as fragments of GNU Coreutils with real bugs. On these subjects, our approach reports a higher success-rate than genetic programming based repair, and produces a repair faster.

We present a technique to make applications resilient to failures. This technique is intended to maintain a faulty application functional in the field while the developers work on permanent and radical fixes. We target field failures in applications built on reusable components. In particular, the technique exploits the intrinsic redundancy of those components by identifying workarounds consisting of alternative uses of the faulty components that avoid the failure. The technique is currently implemented for Java applications but makes little or no assumptions about the nature of the application, and works without interrupting the execution flow of the application and without restarting its components. We demonstrate and evaluate this technique on four mid-size applications and two popular libraries of reusable components affected by real and seeded faults. In these cases the technique is effective, maintaining the application fully functional with between 19% and 48% of the failure-causing faults, depending on the application. The experiments also show that the technique incurs an acceptable runtime overhead in all cases.

C makes it easy to misuse integer types; even mature programs harbor many badly-written integer code. Traditional approaches at best detect these problems; they cannot guide developers to write correct code. We describe three program transformations that fix integer problems---one explicitly introduces casts to disambiguate type mismatch, another adds runtime checks to arithmetic operations, and the third one changes the type of a wrongly-declared integer. Together, these transformations fixed all variants of integer problems featured in 7,147 programs of NIST's SAMATE reference dataset, making the changes automatically on over 15 million lines of code. We also applied the transformations automatically on 5 open source software. The transformations made hundreds of changes on over 700,000 lines of code, but did not break the programs. Being integrated with source code and development process, these program transformations can fix integer problems, along with developers' misconceptions about integer usage.

Patch generation is an essential software maintenance task because most software systems inevitably have bugs that need to be fixed. Unfortunately, human resources are often insufficient to fix all reported and known bugs. To address this issue, several automated patch generation techniques have been proposed. In particular, a genetic-programming-based patch generation technique, GenProg, proposed by Weimer et al., has shown promising results. However, these techniques can generate nonsensical patches due to the randomness of their mutation operations. To address this limitation, we propose a novel patch generation approach, Pattern-based Automatic program Repair (PAR), using fix patterns learned from existing human-written patches. We manually inspected more than 60,000 human-written patches and found there are several common fix patterns. Our approach leverages these fix patterns to generate program patches automatically. We experimentally evaluated PAR on 119 real bugs. In addition, a user study involving 89 students and 164 developers confirmed that patches generated by our approach are more acceptable than those generated by GenProg. PAR successfully generated patches for 27 out of 119 bugs, while GenProg was successful for only 16 bugs.

Tools

The web is an important source of development-related resources, such as code examples, tutorials, and API documentation. Yet existing development environments are largely disconnected from these resources. In this work, we explore how to provide useful web page recommendations to developers by focusing on the problem of refinding web pages that a developer has previously used. We present the results of a study about developer browsing activity in which we found that 13.7% of developers visits to code-related pages are revisits and that only a small fraction (7.4%) of these were initiated through a low-cost mechanism, such as a bookmark. To assist with code-related revisits, we introduce Reverb, a tool which recommends previously visited web pages that pertain to the code visible in the developer's editor. Through a field study, we found that, on average, Reverb can recommend a useful web page in 51% of revisitation cases.

Software Engineering in general is a very creative
process, especially in the early stages of development like requirements
engineering or architectural design where sketching
techniques are used to manifest ideas and share thoughts. On
the one hand, a lot of diagram tools with sophisticated editing
features exist, aiming to support the engineers for this task.
On the other hand, research has shown that most formal tools
limit designer’s creativity by restricting input to valid data.
This raises the need for combining the flexibility of sketchbased
input with the power of formal tools. With an increasing
amount of available touch-enabled input devices, plenty of tools
supporting these and similar features were created but either they
require the developer to use a special diagram editor generation
framework or have very limited extension capabilities. In this
paper we propose Scribble: A generic, extensible framework
which brings sketching functionality to any new or existing GEF
based diagram editor in the Eclipse ecosystem. Sketch features
can be dynamically injected and used without writing a single
line of code. We designed Scribble to be open for new shape
recognition algorithms and to provide a great degree of user
control. We successfully tested Scribble in three diagram tools,
each having a different level of complexity.

To access the knowledge contained in developer communication, such as forum posts, it is useful to determine automatically the code elements referred to in the discussions. We propose a novel traceability recovery approach to extract the code elements contained in various documents. As opposed to previous work, our approach does not require an index of code elements to find links, which makes it particularly well-suited for the analysis of informal documentation. When evaluated on 188 StackOverflow answer posts containing 993 code elements, the technique performs with average 0.92 precision and 0.90 recall. As a major refinement on traditional traceability approaches, we also propose to detect which of the code elements in a document are salient, or germane, to the topic of the post. To this end we developed a three-feature decision tree classifier that performs with a precision of 0.65-0.74 and recall of 0.30-0.65, depending on the subject of the document.

There are more than twenty distinct software engineering tasks addressed with text retrieval (TR) techniques, such as, traceability link recovery, feature location, refactoring, reuse, etc. A common issue with all TR applications is that the results of the retrieval depend largely on the quality of the query. When a query performs poorly, it has to be reformulated and this is a difficult task for someone who had trouble writing a good query in the first place.
We propose a recommender (called Refoqus) based on machine learning, which is trained with a sample of queries and relevant results. Then, for a given query, it automatically recommends a reformulation strategy that should improve its performance, based on the properties of the query. We evaluated Refoqus empirically against four baseline approaches that are used in natural language document retrieval. The data used for the evaluation corresponds to changes from five open source systems in Java and C++ and it is used in the context of TR-based concept location in source code. Refoqus outperformed the baselines and its recommendations lead to query performance improvement or preservation in 84% of the cases (in average).

Pamela Samuelson is recognized as a pioneer in digital copyright law, intellectual property, cyberlaw and information policy. She has written and spoken extensively about the challenges that new information technologies are posing for public policy and traditional legal regimes. Since 1996, she has held a joint appointment with the Berkeley Law School and the School of Information. She is the director of the Berkeley Center for Law and Technology, serves on the board of directors of the Electronic Frontier Foundation and the Electronic Privacy Information Center, and on advisory boards for the Public Knowledge, and the Berkeley Center for New Media. She is also an advisor for the Samuelson Law, Technology, and Public Policy Clinic. Since 2002, she has also been an honorary professor at the University of Amsterdam.

Tony DeRose is currently a Senior Scientist and lead of the Research Group at Pixar Animation Studios. He received a BS in Physics in from the University of California, Davis, and a Ph.D. in Computer Science from the University of California, Berkeley. From 1986 to 1995 Dr. DeRose was a Professor of Computer Science and Engineering at the University of Washington. In 1998, he was a major contributor to the Oscar (c) winning short film "Geri's game", in 1999 he received the ACM SIGGRAPH Computer Graphics Achievement Award, and in 2006 he received a Scientific and Technical Academy Award (c) for his work on surface representations. In addition to his research interests, Tony is also involved in a number of initiatives to help make math, science, and engineering education more inspiring and relevant for middle and high school students. One such initiative is the Young Makers Program (youngmakers.org) that supports youth in building ambitious hands-on projects of their own choosing.

In 2006, Ultra-Large-Scale Systems: The Software Challenge of the Future (ISBN 0-9786956-0-7) documented the results of a year-long study on ultra-large, complex, distributed systems. Ultra-large-scale (ULS) systems are socio-technical ecosystems of ultra-large size on one or many dimensions number of lines of code; number of people employing the system for different purposes; amount of data stored, accessed, manipulated, and refined; number of connections and interdependencies among software components; number of hardware elements to which they interface. The characteristics of such systems require changes in traditional software development and management practices, which in turn require a new multi-disciplinary perspective and research. A carefully prescribed research agenda was suggested. What has happened since the study results were published? This talk shares a perspective on the post study reality --- a perspective based on research motivated by the study and direct experiences with ULS systems. Linda Northrop is director of the Research, Technology, and Systems Solution Program at the Software Engineering Institute (SEI) where she leads the work in architecture-centric engineering, software product lines, cyber-physical systems, advanced mobile systems, and ultra-large-scale systems. Linda is coauthor of the book Software Product Lines: Practices and Patterns and led the research group on ultra-large-scale systems that resulted in the book, Ultra-Large-Scale Systems: The Software Challenge of the Future. Before joining the SEI, she was associated with both the United States Air Force Academy and the State University of New York as professor of computer science, and with both Eastman Kodak and IBM as a software engineer. She is an SEI Fellow and an ACM Distinguished Member.

The term Technical Debt was coined over 20 years ago by Ward Cunningham in a 1992 OOPSLA experience report to describe the trade-offs between delivering the most appropriate albeit likely immature product, in the shortest time possible. Since then the repercussions of going into technical debt have become more visible, yet not necessarily more broadly understood. This panel will bring together practitioners to discuss and debate strategies for debt relief.

Agile development methods are growing in popularity with a recent survey reporting that more than 80% of organizations now following an agile approach. Agile methods were seen initially as best suited to small, co-located teams developing non-critical systems. The first two constraining characteristics (small and co-located teams) have been addressed as research has emerged describing successful agile adoption involving large teams and distributed contexts. However, the applicability of agile methods for developing safety-critical systems in regulated environments has not yet been demonstrated unequivocally, and very little rigorous research exists in this area. Some of the essential characteristics of agile approaches appear to be incompatible with the constraints imposed by regulated environments. In this study we identify these tension points and illustrate through a detailed case study how an agile approach was implemented successfully in a regulated environment. Among the interesting concepts to emerge from the research are the notions of continuous compliance and living traceability.

Agility without discipline cannot scale, and discipline without agility cannot compete. Agile methods are now mainstream. Software enterprises are adopting these practices in broad, comprehensive delivery contexts. There have been many successes, and there have been disappointments. IBMs framework for achieving agility at scale is based on hundreds of successful deployments and dozens of disappointing experiences in accelerating software delivery cycles within large-scale organizations. Our collective know-how points to three key principles to deliver measured improvements in agility with high confidence: Steer using economic governance, measure incremental improvements honestly, and empower teams with disciplined agile delivery This paper elaborates these three principles and presents practical recommendations for achieving improved agility in large-scale software delivery enterprises.

We offer a case study illustrating three rules for reporting research to industrial practitioners. Firstly, report relevant results; e.g. this paper explores the effects of dis- tributed development on software products. Second: recheck old results if new results call them into question. Many papers say distributed development can be harmful to software quality. Previous work by Bird et al. allayed that concern but a recent paper by Posnett et al. suggests that the Bird result was biased by the kinds of files it explored. Hence, this paper rechecks that result and finds significant differences in Microsoft products (Office 2010) between software built by distributed or collocated teams. At first glance, this recheck calls into question the widespread practice of distributed development. Our third rule is to reflect on results to avoid confusing practitioners with an arcane mathematical analysis. For example, on reflection, we found that the effect size of the differences seen in the collocated and distributed software was so small that it need not concern industrial practitioners. Our conclusion is that at least for Microsoft products, dis- tributed development is not considered harmful.

Software Architecture

This case study combines known software structure and revision history analysis techniques, in known and new ways, to predict bug-related change frequency, and uncover architecture-related risks in an agile industrial software development project. We applied a suite of structure and history measures and statistically analyzed the correlations between them. We detected architecture issues by identifying outliers in the distributions of measured values and investigating the architectural significance of the associated classes. We used a clustering method to identify sets of files that often change together without being structurally close together, investigating whether architecture issues were among the root causes. The development team confirmed that the identified clusters reflected significant architectural violations, unstable key interfaces, and important undocumented assumptions shared between modules. The combined structure diagrams and history data justified a refactoring proposal that was accepted by the project manager and implemented.

Undocumented evolution of a software system and its underlying architecture drives the need for the architectures recovery from the systems implementation-level artifacts. While a number of recovery techniques have been proposed, they suffer from known inaccuracies. Furthermore, these techniques are difficult to evaluate due to a lack of ground-truth architectures that are known to be accurate. To address this problem, we argue for establishing a suite of ground-truth architectures, using a recovery framework proposed in our recent work. This framework considers domain-, application-, and contextspecific information about a system, and addresses an inherent obstacle in establishing a ground-truth architecture the limited availability of engineers who are closely familiar with the system in question. In this paper, we present our experience in recovering the ground-truth architectures of four open-source systems. We discuss the primary insights gained in the process, analyze the characteristics of the obtained ground-truth architectures, and reflect on the involvement of the systems engineers in a limited but critical fashion. Our findings suggest the practical feasibility of obtaining ground-truth architectures for large systems and encourage future efforts directed at establishing a large scale repository of such architectures.

Siemens Corporate Development Center Asia Australia (CT DC AA) develops and maintains software applications for the Industry, Energy, Healthcare, and Infrastructure & Cities sectors of Siemens. The critical nature of these applications necessitates a high level of software design quality. A survey of software architects indicated a low level of satisfaction with existing design assessment practices in CT DC AA and highlighted several shortcomings of existing practices. To address this, we have developed a design assessment method called MIDAS (Method for Intensive Design ASsessments). MIDAS is an expert-based method wherein manual assessment of design quality by experts is directed by the systematic application of design analysis tools through the use of a three view-model consisting of design principles, project-specific constraints, and an ility-based quality model. In this paper, we describe the motivation for MIDAS, its design, and its application to three projects in CT DC AA. We believe that the insights from our MIDAS experience not only provide useful pointers to other organizations and practitioners looking to assess and improve software design quality but also suggest research questions for the software engineering community to explore.

A wide range of software metrics targeting various abstraction levels and quality attributes have been proposed by the research community. For many of these metrics the evaluation consists of verifying the mathematical properties of the metric, investigating the behavior of the metric for a number of open-source systems or comparing the value of the metric against other metrics quantifying related quality attributes. Unfortunately, a structural analysis of the usefulness of metrics in a real-world evaluation setting is often missing. Such an evaluation is important to understand the situations in which a metric can be applied, to identify areas of possible improvements, to explore general problems detected by the metrics and to define generally applicable solution strategies. In this paper we execute such an analysis for two architecture level metrics, Component Balance and Dependency Profiles, by analyzing the challenges involved in applying these metrics in an industrial setting. In addition, we explore the usefulness of the metrics by conducting semi-structured interviews with experienced assessors. We document the lessons learned both for the application of these specific metrics, as well as for the method of evaluating metrics in practice.

Peer code review is a cost-effective software defect detection technique. Tool assisted code review is a form of peer code review, which can improve both quality and quantity of reviews. However, there is a significant amount of human effort involved even in tool based code reviews. Using static analysis tools, it is possible to reduce the human effort by automating the checks for coding standard violations and common defect patterns. Towards this goal, we propose a tool called Review Bot for the integration of automatic static analysis with the code review process. Review Bot uses output of multiple static analysis tools to publish reviews automatically. Through a user study, we show that integrating static analysis tools with code review process can improve the quality of code review. The developer feedback for a subset of comments from automatic reviews shows that the developers agree to fix 93% of all the automatically generated comments. There is only 14.71% of all the accepted comments which need improvements in terms of priority, comment message, etc. Another problem with tool assisted code review is the assignment of appropriate reviewers. Review Bot solves this problem by generating reviewer recommendations based on change history of source code lines. Our experimental results show that the recommendation accuracy is in the range of 60%-92%, which is significantly better than a comparable method based on file change history.

This paper describes a software estimation technique that can be used in situations where there is no reliable historical data available to develop the initial effort estimate of a software development project. The technique described incorporates a set of key estimation principles and three estimation methods that are utilized in tandem to deliver the estimation results needed to have a robust initial estimation. An important contribution of this paper is bringing together into ONe Software Estimation Tool-kit (ONSET) multiple concepts, principles, and methods in the software estimation field, which are typically discussed separately in the estimation literature and can be employed when an organization does not have reliable historical data. The paper shows how these principles and methods are applied to derive estimates without the need of using complex or expensive tools. A case study is presented using ONSET which was carried out as an estimation pilot study conducted in one of the software development Business Units of ABB. The results of this pilot project provided insights on how to implement ONSET across ABB software development business units. Practical guidance is offered in this paper on how an organization that does not have reliable historical data can begin to collect data to use in future projects using ONSET. In contrast to many papers that describe estimation approaches, this paper explains how to use a combination of judgment-based and model-based methods such as the Planning Poker, Modified Wideband Delphi, and Monte Carlo simulation to derive the initial estimates. Once an organization begins collecting reliable historical data, ONSET will provide even more accurate estimation results and a smoother transition to the use of model-based estimation methods and tools can be achieved.

Mini-Tutorial

Producing industrial impact has often been one of the important goals of academic or industrial researchers when conducting research. However, it is generally challenging to transfer research results into industrial practices. There are some common challenges faced when pursuing technology transfer and adoption while particular challenges for some particular research areas. At the same time, various opportunities also exist for technology transfer and adoption. This mini-tutorial presents achievements and challenges of technology transfer and adoption in various areas in software engineering, with examples drawn from research areas such as software analytics along with software testing and analysis. This mini-tutorial highlights success stories in industry, research achievements that are transferred to industrial practice, and challenges and lessons learned in technology transfer and adoption.

Case Studies

User involvement in software engineering has been researched over the last three decades. However, existing studies concentrate mainly on early phases of user-centered design projects, while little is known about how professionals work with post-deployment end-user feedback. In this paper we report on an empirical case study that explores the current practice of user involvement during software evolution.
We found that user feedback contains important information for developers, helps to improve software quality and to identify missing features. In order to assess its relevance and potential impact, developers need to analyze the gathered feedback, which is mostly accomplished manually and consequently requires high effort. Overall, our results show the need for tool support to consolidate, structure, analyze, and track user feedback, particularly when feedback volume is high. Our findings call for a hypothesis-driven analysis of user feedback to establish the foundations for future user feedback tools.

SCOPE is adopted by thousands of developers from tens of different product teams in Microsoft Bing for daily web-scale data processing, including index building, search ranking and advertisement display. A SCOPE job is composed of declarative SQL-like queries and imperative C# user-defined functions (UDFs), which are executed in pipeline by thousands of machines. There are tens of thousands of SCOPE jobs executed on Microsoft clusters per day, while some of them fail after a long execution time and thus waste tremendous resources. Reducing SCOPE failures would save significant resources. This paper presents a comprehensive characteristic study on 200 SCOPE failures/fixes and 50 SCOPE failures with debugging statistics from Microsoft Bing, investigating not only major failure types, failure sources, and fixes, but also current debugging practice. Our major findings include (1) most of the failures (84.5%) are caused by defects in data processing rather than defects in code logic; (2) table-level failures (22.5%) are mainly caused by programmers mistakes and frequent data schema changes while row-level failures (62%) are mainly caused by exceptional data; (3) 93.0% fixes do not change data processing logic; (4) there are 8.0% failures with root cause not at the failure-exposing stage, making current debugging practice insufficient in this case. Our study results provide valuable guidelines for future development of data-parallel programs. We believe that these guidelines are not limited to SCOPE, but can also be generalized to other similar data-parallel platforms.

Brazil has been emerging as a destination for IT software and services. The country already had a strong domestic base of IT clients to global companies. One of the competitive factors is time zone location. Brazil has positioned itself as easy for collaboration because of time zone overlap with its primary partners in North America and Europe. In this paper we examine whether time zone proximity is an advantage for software development by conducting a country-level field study of the Brazilian IT industry using a cross section of firms. The results provide some support for the claims of proximity benefits. The Brazil-North dyads use moderate timeshifting that is perceived as comfortable for both sides. The voice coordination that the time overlap permits helps address coordination challenges and foster relationships. One company, in particular, practiced such intense time zone aligned collaboration using agile methods that we labeled this Real-time Simulated Co-location

Agile projects are showing greater promise in rapid fielding as compared to waterfall projects. However, there is a lack of clarity regarding what really constitutes and contributes to success. We interviewed project teams with incremental development lifecycles, from five government and commercial organizations, to gain a better understanding of success and failure factors for rapid fielding on their projects. A key area we explored involves how Agile projects deal with the pressure to rapidly deliver high-value capability, while maintaining project speed (delivering functionality to the users quickly) and product stability (providing reliable and flexible product architecture). For example, due to schedule pressure we often see a pattern of high initial velocity for weeks or months, followed by a slowing of velocity due to stability issues. Business stakeholders find this to be disruptive as the rate of capability delivery slows while the team addresses stability problems. We found that experienced practitioners, when faced with these challenges, do not apply Agile practices alone. Instead they combine practicesAgile, architecture, or otherin creative ways to respond quickly to unanticipated stability problems. In this paper, we summarize the practices practitioners we interviewed from Agile projects found most valuable and provide an overarching scenario that provides insight into how and why these practices emerge.

In this paper we present JST, a tool that automatically generates a high coverage test suite for industrial strength Java applications. This tool uses a numeric-string hybrid symbolic execution engine at its core which is based on the Symbolic Java PathFinder platform. However, in order to make the tool applicable to industrial applications the existing generic platform had to be enhanced in numerous ways that we describe in this paper. The JST tool consists of newly supported essential Java library components and widely used data structures; novel solving techniques for string constraints, regular expressions, and their interactions with integer and floating point numbers; and key optimizations that make the tool more efficient. We present a methodology to seamlessly integrate the features mentioned above to make the tool scalable to industrial applications that are beyond the reach of the original platform in terms of both applicability and performance. We also present extensive experimental data to illustrate the effectiveness of our tool.

Test automation, which involves the conversion of manual test cases to executable test scripts, is necessary to carry out efficient regression testing of GUI-based applications. However, test automation takes significant investment of time and skilled effort. Moreover, it is not a one-time investment: as the application or its environment evolves, test scripts demand continuous patching. Thus, it is challenging to perform test automation in a cost-effective manner.
At IBM, we developed a tool, called ATA, to meet this challenge. ATA has novel features that are designed to lower the cost of initial test automation significantly. Moreover, ATA has the ability to patch scripts automatically for certain types of application or environment changes.
How well does ATA meet its objectives in the real world? In this paper, we present a detailed case study in the context of a challenging production environment: an enterprise web application that has over 6500 manual test cases, comes in two variants, evolves frequently, and needs to be tested on multiple browsers in time-constrained and resource-constrained regression cycles. We measured how well ATA improved the efficiency in initial automation. We also evaluated the effectiveness of ATA's
change-resilience along multiple dimensions: application versions, browsers, and browser versions. Our study highlights several lessons for test-automation practitioners as well as open research problems in test automation.

Load testing is one of the means for evaluating the performance of Large Scale Systems (LSS). At the end of a load test, performance analysts must analyze thousands of performance counters from hundreds of machines under test. These performance counters are measures of run-time system properties such as CPU utilization, Disk I/O, memory consumption, and network traffic. Analysts observe counters to find out if the system is meeting its Service Level Agreements (SLAs). In this paper, we present and evaluate one supervised and three unsupervised approaches to help performance analysts to 1) more effectively compare load tests in order to detect performance deviations which may lead to SLA violations, and 2) to provide them with a smaller and manageable set of important performance counters to assist in root-cause analysis of the detected deviations. Our case study is based on load test data obtained from both a large scale industrial system and an open source benchmark application. The case study shows, that our wrapper-based supervised approach, which uses a search-based technique to find the best subset of performance counters and a logistic regression model for deviation prediction, can provide up to 89% reduction in the set of performance counters while detecting performance deviations with few false positives (i.e., 95% average precision). The study also shows that the supervised approach is more stable and effective than the unsupervised approaches but it has more overhead due to its semi-automated training phase.

Exchangeability between software components such as operating systems, middleware, databases, and hardware components is a common requirement in many software systems. One way to enable exchangeability is to promote indirect use through a common interface and an implementation for each component that wraps the original component. As developers use the interface instead of the underlying component, they assume that the software system will behave in a specific way independently of the actual component in use. However, differences in the implementations of the wrappers may lead to different behavior when one component is changed for another, which might lead to failures in the field. This work reports on a simple, yet effective approach to detect these differences. The approach is based on tool-supported reviews leveraging lightweight static analysis and machine learning. The approach is evaluated in a case study that analyzes NASAs Operating System Abstraction Layer (OSAL), which is used in various space missions. We detected 84 corner-case issues of which 57 turned out to be bugs that could have resulted in runtime failures.

Efficient bug triaging procedures are an important precondition for successful collaborative software engineering projects. Triaging bugs can become a laborious task particularly in open source software (OSS) projects with a large base of comparably inexperienced part-time contributors. In this paper, we propose an efficient and practical method to identify valid bug reports which a) refer to an actual software bug, b) are not duplicates and c) contain enough information to be processed right away. Our classification is based on nine measures to quantify the social embeddedness of bug reporters in the collaboration network. We demonstrate its applicability in a case study, using a comprehensive data set of more than 700,000 bug reports obtained from the Bugzilla installation of four major OSS communities, for a period of more than ten years. For those projects that exhibit the lowest fraction of valid bug reports, we find that the bug reporters' position in the collaboration network is a strong indicator for the quality of bug reports. Based on this finding, we develop an automated classification scheme that can easily be integrated into bug tracking platforms and analyze its performance in the considered OSS communities. A support vector machine (SVM) to identify valid bug reports based on the nine measures yields a precision of up to 90.3% with an associated recall of 38.9%. With this, we significantly improve the results obtained in previous case studies for an automated early identification of bugs that are eventually fixed. Furthermore, our study highlights the potential of using quantitative measures of social organization in collaborative software engineering. It also opens a broad perspective for the integration of social awareness in the design of support infrastructures.

For a large and evolving software system, the project team could receive many bug reports over a long period of time. It is important to achieve a quantitative understanding of bug-fixing time. The ability to predict bug-fixing time can help a project team better estimate software maintenance efforts and better manage software projects. In this paper, we perform an empirical study of bug-fixing time for three CA Technologies projects. We propose a Markov-based method for predicting the number of bugs that will be fixed in future. For a given number of defects, we propose a method for estimating the total amount of time required to fix them based on the empirical distribution of bug-fixing time derived from historical data. For a given bug report, we can also construct a classification model to predict slow or quick fix (e.g., below or above a time threshold). We evaluate our methods using real maintenance data from three CA Technologies projects. The results show that the proposed methods are effective.

The continuous growth of the use of Information and Communication Technology in different sectors of the market calls out for software professionals with the qualifications needed to solve complex and diverse problems. Innovative teaching methodologies, such as the "Software Internship" model and PBL teaching approaches that are learner-centered and focus on bringing market reality to the learning environment, have been developed and implemented with a view to meeting this demand. However, the effectiveness of these methods cannot always be satisfactorily proved. Prompted by this, this paper proposes a model for assessing students based on real market practices while preserving the authenticity of the learning environment. To evaluate this model, a case study on skills training for software specialists for the Telecom market is discussed, and presents important results that show the applicability of the proposed model for teaching Software Engineering.

Studio-based teaching is a method commonly used in arts and design that emphasizes a physical "home" for students, problem-based and peer-based learning, and mentoring by academic staff rather than formal lectures. There have been some attempts to transfer studio-based teaching to software engineering education. In many ways, this is natural as software engineering has significant practical elements. However, attempts at software studios have usually ignored experiences and theory from arts and design studio teaching. There is therefore a lack of understanding of what "studio" really means, how well the concepts transfer to software engineering, and how effective studios are in practice. Without a clear definition of "studio", software studios cannot be properly evaluated for their impact on student learning nor can best and worst practices be shared between those who run studios. In this paper, we address this problem head-on by conducting a qualitative analysis of what "studio" really means in both arts and design. We carried out 15 interviews with a range of people with studio experiences and present an analysis and model for evaluation here. Our results suggest that there are many intertwined aspects that define studio education, but it is primarily the people and the culture that make a studio. Digital technology on the other hand can have an adverse effect on studios, unless properly recognised.

The use of a studio approacha hands-on teaching method that emphasizes in-class discussion and activitiesis becoming an increasingly accepted method of teaching within software engineering. In such studios, emphasis is placed not only on the artifacts to be produced, but also on the process used to arrive at those artifacts. In this paper, we introduce Calico, a sketch-based collaborative software design tool, and discuss how it supports the delivery of a studio approach to software design education. We particularly describe our experiences with Calico in Software Design I, a course aimed at introducing students to the early, creative phases of software design. Our results show that Calico enabled students to work effectively in teams on their design problems, quickly developing, refining, and evaluating their designs.

There are hundreds of general contests targeting undergraduate and graduate students. The prizes vary from cash, trip, fame, conference participation and others. Contests could be class competitions, school, national, regional or global. In this paper, we compare between existing student contests that can be integrated with software engineering courses. We classify the contests and propose a framework to choose which one to suit curriculum. We also include best practices and samples of our practices in integrating software engineering course with class, regional, national and global contests.

Teaching Introductory Software Engineering

WebIDE is a framework that enables instructors to develop and deliver online lab content with interactive feedback. The ability to create lock-step labs enables the instructor to guide students through learning experiences, demonstrating mastery as they proceed. Feedback is provided through automated evaluators that vary from simple regular expression evaluation to syntactic parsers to applications that compile and run programs and unit tests. This paper describes WebIDE and its use in a CS0 course that taught introductory Java and Android programming using a test-driven learning approach. We report results from a controlled experiment that compared the use of dynamic WebIDE labs with more traditional static programming labs. Despite weaker performance on pre-study assessments, students who used WebIDE performed two to twelve percent better on all assessments than the students who used traditional labs. In addition, WebIDE students were consistently more positive about their experience in CS0.

There is a growing interest of the Computer Science education community for including testing concepts on introductory programming courses. Aiming at contributing to this issue, we introduce POPT, a Problem-Oriented Programming and Testing approach for Introductory Programming Courses. POPT main goal is to improve the traditional method of teaching introductory programming that concentrates mainly on implementation and neglects testing. According to POPT, students skills must be developed by dealing with ill-defined problems, from which students are stimulated to develop test cases in a table-like manner in order to enlighten the problems requirements and also to improve the quality of generated code. This paper presents POPT and a case study performed in an Introductory Programming course of a Computer Science program at the Federal University of Rio Grande do Norte, Brazil. The study results have shown that, when compared to a Blind Testing approach, POPT stimulates the implementation of programs of better external quality - the first program version submitted by POPT students passed in twice the number of test cases (professor-defined ones) when compared to non-POPT students. Moreover, POPT students submitted fewer program versions and spent more time to submit the first version to the automatic evaluation system, which lead us to think that POPT students are stimulated to think better about the solution they are implementing.

Both employers and graduate schools expect computer science graduates to be able to work as developers on software projects. Software engineering courses present the opportunity in the curriculum to learn the relevant skills. This paper presents our experience from Wayne State University and reviews challenges and constraints that we faced while trying to teach these skills. In our first software engineering course, we teach the iterative software development that includes practices of software change, summarized in the phased model of software change. The required resources for our software engineering course are comparable to the other computer science courses. The students - while working in teams - are graded based on their individual contribution to the team effort rather than on the work of the other team members, which improves the fairness of the grading and considerably lessens the stress for the best students in the course. Our students have expressed a high level of satisfaction, and in a survey, they indicated that the skills that they learned in the course are highly applicable to their careers.

Massive Open Online Courses (MOOCs) have recently gained high popularity among various universities and even in global societies. A critical factor for their success in teaching and learning effectiveness is assignment grading. Traditional ways of assignment grading are not scalable and do not give timely or interactive feedback to students. To address these issues, we present an interactive-gaming-based teaching and learning platform called Pex4Fun. Pex4Fun is a browser-based teaching and learning environment targeting teachers and students for introductory to advanced programming or software engineering courses. At the core of the platform is an automated grading engine based on symbolic execution. In Pex4Fun, teachers can create virtual classrooms, customize existing courses, and publish new learning material including learning games. Pex4Fun was released to the public in June 2010 and since then the number of attempts made by users to solve games has reached over one million. Our work on Pex4Fun illustrates that a sophisticated software engineering technique -- automated test generation -- can be successfully used to underpin automatic grading in an online programming system that can scale to hundreds of thousands of users.

This panel will engage participants in a discussion of recent changes in software engineering practice that should be reflected in curriculum guidelines for undergraduate software engineering programs. Current progress in revising the guidelines will be presented, including suggestions to update coverage of agile methods, security and service-oriented computing.

In this paper we describe distributed Scrum augmented with best practices in global software engineering (GSE) as an important paradigm for teaching critical competencies in GSE. We report on a globally distributed project course between the University of Victoria, Canada and Aalto University, Finland. The project-driven course involved 16 students in Canada and 9 students in Finland, divided into three cross-site Scrum teams working on a single large project. To assess learning of GSE competencies we employed a mixed-method approach including 13 post-course interviews, pre-, post-course and iteration questionnaires, observations, recordings of Daily Scrums as well as collection of project asynchronous communication data. Our analysis indicates that the Scrum method, along with supporting collaboration practices and tools, supports the learning of important GSE competencies, such as distributed communication and teamwork, building and maintaining trust, using appropriate collaboration tools, and inter-cultural collaboration.

Most university curricula consider software pro- cesses to be on the fringes of software engineering (SE). Students are told there exists a plethora of software processes ranging from RUP over V-shaped processes to agile methods. Furthermore, the usual students programming tasks are of a size that either one student or a small group of students can manage the work. Comprehensive processes being essential for large companies in terms of reflecting the organization struc- ture, coordinating teams, or interfaces to business processes such as contracting or sales, are complex and hard to teach in a lecture, and, therefore, often out of scope. We experienced tutorials on using Java or C#, or on developing applications for the iPhone to gather more attention by students, simply speaking, as these are more fun for them. So, why should students spend their time in software processes? From our experiences and the discussions with a variety of industrial partners, we learned that students often face trouble when taking their first real jobs, even if the company is organized in a lean or agile shape. Therefore, we propose to include software processes more explicitly into the SE curricula. We designed and implemented a course at Masters level in which students learn why software processes are necessary, and how they can be analyzed, designed, implemented, and continuously improved. In this paper, we present our courses structure, its goals, and corresponding teaching methods. We evaluate the course and further discuss our experiences so that lecturers and researchers can directly use our lessons learned in their own curricula.

Stakeholder consultation during course accreditation is now a requirement of new Australian government regulations as well as the Australian ICT professional society accreditation. Despite these requirements there remains some differences between universities and industry regarding the purpose, nature and extent of industry involvement in the curriculum. Surveys of industry and university leaders in ICT were undertaken to provide a representative set of views on these issues. The results provided insights into the perceptions of universities and industry regarding industry involvement into the curriculum. The results also confirmed previous research that identified a tension between industrys desire for relevant skills and the role of universities in providing a broader education for lifelong learning

Software security is a tough reality that affects the many facets of our modern, digital world. The pressure to produce secure software is felt particularly strongly by software engineers. Todays software engineering students will need to deal with software security in their profession. However, these students will also not be security experts, rather, they need to balance security concerns with the myriad of other draws of their attention, such as reliability, performance, and delivering the product on-time and on-budget. At the Department of Software Engineering at the Rochester Institute of Technology, we developed a course called Engineering Secure Software, designed for applying security principles to each stage of the software development lifecycle. As a part of this course, we developed a component called Vulnerability of the Day, which is a set of selected example software vulnerabilities. We selected these vulnerabilities to be simple, demonstrable, and relevant so that the vulnerability could be demonstrated in the first 10 minutes of each class session. For each vulnerability demonstration, we provide historical examples, realistic scenarios, and mitigations. With student reaction being overwhelmingly positive, we have created an open source project for our Vulnerabilities of the Day, and have defined guiding principles for developing and contributing effective examples.

Dependability Perspectives

Assurance cases provide a structured method of explaining why a system has some desired property, e.g., that the system is safe. But there is no agreed approach for explaining what degree of confidence one should have in the conclusions of such a case. In this paper, we use the principle of eliminative induction to provide a justified basis for assessing how much confidence one should have in an assurance case argument.

In this paper, we present a promising approach to systematically testing graphical user interfaces (GUI) in a platform independent manner. Our framework uses standard computer vision techniques through a python-based scripting language (Sikuli script) to identify key graphical elements in the screen and automatically interact with these elements by simulating keypresses and pointer clicks. The sequence of inputs and outputs resulting from the interaction is analyzed using grammatical inference techniques that can infer the likely internal states and transitions of the GUI based on the observations. Our framework handles a wide variety of user interfaces ranging from traditional pull down menus to interfaces built for mobile platforms such as Android and iOS. Furthermore, the automaton inferred by our approach can be used to check for potentially harmful patterns in the interface's internal state machine such as design inconsistencies (eg,. a keypress does not have the intended effect) and mode confusion that can make the interface hard to use. We describe an implementation of the framework and demonstrate its working on a variety of interfaces including the user-interface of a safety critical insulin infusion pump that is commonly used by type-1 diabetic patients.

Access control models implement mechanisms to restrict access to sensitive data from unprivileged users. Access controls typically check privileges that capture the semantics of the operations they protect. Semantic smells and errors in access control models stem from privileges that are partially or totally unrelated to the action they protect. This paper presents a novel approach, partly based on static analysis and information retrieval techniques, for the automatic detection of semantic smells and errors in access control models. Investigation of the case study application revealed 31 smells and 2 errors. Errors were reported to developers who quickly confirmed their relevance and took actions to correct them. Based on the obtained results, we also propose three categories of semantic smells and errors to lay the foundations for further research on access control smells in other systems and domains.

We present a technique that simplifies tests at the semantic level. We first formalize the semantic test simplification problem, and prove it is NP-hard. Then, we propose a heuristic algorithm, SimpleTest, that automatically transforms a test into a simpler test, while still preserving a given property. The key insight of SimpleTest is to reconstruct an executable and simpler test that exhibits the given property from the original one. Our preliminary study on 7 real-world programs showed the usefulness of SimpleTest.

Debugging and isolating changes responsible for regression test failures are some of the most challenging aspects of modern software development. Automatic bug localization techniques reduce the manual effort developers spend examining code, for example, by focusing attention on the minimal subset of recent changes that results in the test failure, or on changes to components with most dependencies or highest churn. We observe that another subset of changes is worth the developers' attention: the complement of the maximal set of changes that does not produce the failure. While for simple, independent source-code changes, existing techniques localize the failure cause to a small subset of those changes, we find that when changes interact, the failure cause is often in our proposed subset and not in the subset existing techniques identify. In studying 45 regression failures in a large, open-source project, we find that for 87% of those failures, the complement of the maximal passing set of changes is different from the minimal failing set of changes, and that for 78% of the failures, our technique identifies relevant changes ignored by existing work. These preliminary results suggest that combining our ideas with existing techniques, as opposed to using either in isolation, can improve the effectiveness of bug localization tools.

Supporting Tomorrow's Developer

Modern IDEs make many software engineering tasks easier by automating functionality such as code completion and navigation. However, this functionality operates on one version of the code at a time. We envision a new approach that makes code completion and navigation aware of code evolution and enables them to operate on multiple versions at a time, without having to manually switch across these versions. We illustrate our approach on several example scenarios. We also describe a prototype Eclipse plugin that embodies our approach for code completion and navigation for Java code. We believe our approach opens a new line of research that adds a novel, temporal dimension for treating code in IDEs in the context of tasks that previously required manual switching across different code versions.

Issue tracking systems play a central role in ongoing software development; they are used by developers to support collaborative bug fixing and the implementation of new features, but they are also used by other stakeholders including managers, QA, and end-users for tasks such as project management, communication and discussion, code reviews, and history tracking. Most such systems are designed around the central metaphor of the "issue" (bug, defect, ticket, feature, etc.), yet increasingly this model seems ill fitted to the practical needs of growing software projects; for example, our analysis of interviews with 20 Mozilla developers who use Bugzilla heavily revealed that developers face challenges maintaining a global understanding of the issues they are involved with, and that they desire improved support for situational awareness that is difficult to achieve with current issue management systems.
In this paper we motivate the need for personalized issue tracking that is centered around the information needs of individual developers together with improved logistical support for the tasks they perform. We also describe an initial approach to implement such a system — extending Bugzilla — that enhances a developer's situational awareness of their working context by providing views that are tailored to specific tasks they frequently perform; we are actively improving this prototype with input from Mozilla developers.

Debugging mobile phone applications is hard, as current debugging techniques either require multiple computing devices or do not support graphical debugging. To address this problem we present GROPG, the first graphical on-phone debugger. We implement GROPG for Android and perform a preliminary evaluation on third-party applications. Our experiments suggest that GROPG can lower the overall debugging time of a comparable text-based on-phone debugger by up to 2/3.

When a developer works on code that is shared with other developers, she needs to know why the code has been changed in particular ways to avoid reintroducing bugs. A developer looking at a code change may have access to a short commit message or a link to a bug report which may provide detailed information about how the code changed but which often lacks information about what motivated the change. This motivational information can sometimes be found by piecing together information from a set of relevant project documents, but few developers have the time to find and read the right documentation. We propose the use of multi-document summarization techniques to generate a concise natural language description of why code changed so that a developer can choose the right course of action.

Software teams record their work progress in task repositories which often require them to encode their activities in a set of edits to field values in a form-based user interface. When others read the tasks, they must decode the schema used to write the activities down. We interviewed four software teams and found out how they used the task repository fields to record their work activities. However, we also found that they had trouble interpreting task revisions that encoded for multiple activities at the same time. To assist engineers in decoding tasks, we developed a scalable method based on frequent pattern mining to identify patterns of frequently co-edited fields that each represent a conceptual work activity. We applied our method to our two years of our interviewee's task repositories and were able to abstract 83,000 field changes into just 27 patterns that cover 95% of the task revisions. We used the 27 patterns to render the teams' tasks in web-based English newsfeeds and evaluated them with the product teams. The team agreed with most of our patterns and English interpretations, but outlined a number of improvements that we will incorporate into future work.

Collaborative Development

The classical definition of pair programming (PP) describes it via two obvious roles: driver (the person currently having the keyboard) and observer (the other, alternatively called navigator). Although prior research has found some assumptions regarding these roles to be false, so far no alternative PP role model took hold. Instead, most PP research tacitly assumes the classical model to be true and thus PP to be no more difficult than solo programming. We perform qualitative research (using Grounded Theory Methodology) to find a more realistic role model, and have uncovered a suprising complexity: There are more than two roles, they are assumed and unassumed gradually, multiple roles can be held by one person at the same time, and some of their facets are subtle. Mastering this complexity requires specific PP skills beyond mere programming and communication skills. By ignoring such skills, previous PP studies (in particular the controlled experi- ments) have investigated a rather mixed bag of situations, which explains their heterogeneous results. The emerging result is that qualitative research on the PP process will lead to constructive behavioral advice (process patterns) for pair members and to more meaningful designs for quantitative PP research.

Many organisations have turned to crowdsource their software development projects. This raises important pricing questions, a problem that has not previously been addressed for the emerging crowdsourcing development paradigm. We address this problem by introducing 16 cost drivers for crowdsourced development activities and evaluate 12 predictive pricing models using 4 popular performance measures. We evaluate our predictive models on TopCoder, the largest current crowdsourcing platform for software development. We analyse all 5,910 software development tasks (for which partial data is available), using these to extract our proposed cost drivers. We evaluate our predictive models using the 490 completed projects (for which full details are available). Our results provide evidence to support our primary finding that useful prediction quality is achievable (Pred(30)>0.8). We also show that simple actionable advice can be extracted from our models to assist the 430,000 developers who are members of the TopCoder software development market.

GitHub projects attract contributions from a community of users with varying coding and quality assurance skills. Developers on GitHub feel a need for automated tests and rely on test suites for regression testing and continuous integration. However, project owners report to often struggle with implementing an exhaustive test suite. Convincing contributors to provide automated test cases remains a challenge. The absence of an adequate test suite or using tests of low quality can degrade the quality of the software product. We present an approach for reducing the effort required by project owners for extending their test suites. We aim to utilize the phenomenon of drive-by commits: capable users quickly and easily solve problems in others' projects---even though they are not particularly involved in that project---and move on. By analyzing and directing the drive-by commit phenomenon, we hope to use crowdsourcing to improve projects' quality assurance efforts. Valuable test cases and maintenance tasks would be completed by capable users, giving core developers more resources to work on the more complicated issues.

To facilitate software development for multiple, federated cloud systems, abstraction layers have been introduced to mask the differences in the offerings, APIs, and terminology of various cloud providers. Such layers rely on a common ontology, which a) is difficult to create, and b) requires developers to understand both the common ontology and how various providers deviate from it. In this paper we propose and describe a structured query language for the cloud, Cloud SQL, along with a system and methodology for acquiring and organizing information from cloud providers and other entities in the cloud ecosystem such that it can be queried. It allows developers to run queries on data organized based on their semantic understanding of the cloud. Like the original SQL, we believe the use of a declarative query language will reduce development costs and make the multi-cloud accessible to a broader set of developers.

Tests are central artifacts of software systems and play a crucial role for software quality. In system testing, a lot of test execution is performed manually using tests in natural
language. However, those test cases are often poorly written without best practices in mind. This leads to tests which are not maintainable, hard to understand and inefficient to execute.
For source code and unit tests, so called code smells and test smells have been established as indicators to identify poorly written code. We apply the idea of smells to natural language
tests by defining a set of common Natural Language Test Smells (NLTS). Furthermore, we report on an empirical study analyzing the extent in more than 2800 tests of seven industrial test suites.

Alternative Modeling

Prominent researchers and leading practitioners are questioning the long-term viability of model-driven development (MDD). Finkelstein recently ranked MDD as a bottom-ten research area, arguing that an approach based entirely on development and refinement of abstract representations is untenable. His view is that working with concrete artifacts is necessary for learning what to build and how to build it. What if this view is correct? Could MDD be rescued from such a critique? We suggest the answer is yes, but that it requires an inversion of traditional views of transformational MDD. Rather than develop complete, abstract system models, in ad-hoc modeling languages, followed by top-down synthesis of hidden concrete artifacts, we envision that engineers will continue to develop concrete artifacts, but over time will recognize patterns and concerns that can profitably be lifted, from the bottom-up, to the level of partial models, in general-purpose specification languages, from which visible concrete artifacts are generated, becoming part of the base of both concrete and abstract artifacts for subsequent rounds of development. This paper reports on recent work that suggests this approach is viable, and explores ramifications of such a rethinking of MDD. Early validation flows from experience applying these ideas to a healthcare-related experimental system in our lab.

Software engineers have successfully used Natural Language Processing for refactoring source code. Conversely, in this paper we investigate the possibility to apply software refactoring techniques to textual content. As a procedural program is composed of functions calling each other, a document can be modeled as content fragments connected each other through links. Inspired by software engineering refactoring strategies, we propose an approach for refactoring wiki content. The approach has been applied to the EMF category of Eclipsepedia with encouraging results.

Human-Computer Interaction (HCI) plays a critical role in software systems, especially when targeting vulnerable individuals (e.g., assistive technologies). However, there exists a gap between well-tooled software development methodologies and HCI techniques, which are generally isolated from the development toolchain and require specific expertise. In this paper, we propose a human-driven software development methodology making User Interface (UI) a full-fledged dimension of software design. To make this methodology useful in practice, a UI design lan- guage and a user modeling language are integrated into a tool suite that guides the stakeholders during the development process, while ensuring the conformance between the UI design and its implementation.

We focus on the problem of managing a collection of related software products realized via cloning. We contribute a framework that explicates operators required for developing and maintaining such products, and demonstrate their usage on two concrete scenarios observed in industrial settings: sharing of features between cloned variants and re-engineering the variants into "single-copy" representations advocated by software product line engineering approaches. We discuss possible implementations of the operators, including synergies with existing work developed in seemingly unrelated contexts, with the goal of helping understand and structure existing work and identify opportunities for future research.

This paper argues that understanding how professional software developers use diagrams and sketches in their work is an underexplored terrain. We illustrate this by summarizing a number of studies on sketching and diagramming across a variety of domains, and arguing for their limited generalizability. In order to develop further insight, we describe the design of a research project we are embarking upon and its grounding theoretical assumptions.

Posters

Software engineering methodologies, such as unit testing, propose that any effort made to ensuring that programs run correctly should be captured in repeatable and automated artifacts. However, when looking at developer activities on a spectrum from exploratory testing to scripted testing we find that many engineering activities include bursts of exploratory testing. In this paper we propose to leverage these exploratory testing bursts by automatically extracting scripted tests from a recording of live programming sessions. In order to do so, we wiretap the development environment so we can record all program input, all user-issued functions calls, and all program output of an exploratory testing session. We propose to then use clustering to extract scripted test cases from these recordings. We outline two early-stage prototypes, one for a static and one for a dynamic language. And we outline how this idea fits into the bigger research direction of live programming.

Mass customization of software products requires their efficient tailoring performed through combination of features. Such features and the constraints linking them can be represented by Feature Models (FMs), allowing formal analysis, derivation of specific variants and interactive configuration. Since they are seldom present in existing systems, techniques to re-engineer FMs have been proposed. There are nevertheless error-prone and require human intervention. This paper introduces an automated search-based process to test and fix FMs so that they adequately represent actual products. Preliminary evaluation on the Linux kernel FM exhibit erroneous FM constraints and significant reduction of the inconsistencies.

The purpose of requirements validation is to determine whether a large requirements set will lead to the achievement of system-related goals under different conditions – a task that needs automation if it is to be performed quickly and accurately. One reason for the current lack of software tools to undertake such validation is the absence of the computational mechanisms needed to associate scenario, system specification and goal analysis tools. Therefore, in this paper, we report first research experiments in developing these new capabilities, and demonstrate them with a non-trivial example associated with a Rolls Royce aircraft engine software component.

ommunities of developers have rapidly become global, encompassing multiple timezones and cultures alike. In previous work we investigated the possible shapes of communities for software development. In addition, we explored mechanisms to uncover communities emerging during development. However, we barely scratched the surface. We found that development communities yield properties of dynamic change and organic evolution. Much work is still needed to support such commu- nities with mechanisms able to proactively react to community dynamism. We argue that service-networks can be used to deliver this support. Service-networks are sets of people and information brought together by the internet. This paper is a first attempt at studying this research area by means of a real-life case-study in a large global software development organisation.

Size and effort estimation is a significant challenge for the management of large-scale formal verification projects. We report on an initial study of relationships between the sizes of artefacts from the development of seL4, a formally-verified embedded systems microkernel. For each API function we first determined its COSMIC Function Point (CFP) count (based on the seL4 user manual), then sliced the formal specifications and source code, and performed a normalised line count on these artefact slices. We found strong and significant relationships between the sizes of the artefact slices, but no significant relationships between them and the CFP counts. Our finding that CFP is poorly correlated with lines of code is based on just one system, but is largely consistent with prior literature. We find CFP is also poorly correlated with the size of formal specifications. Nonetheless, lines of formal specification correlate with lines of source code, and this may provide a basis for size prediction in future formal verification projects. In future work we will investigate proof sizing.

Model-clone detection is a relatively new area and there are a number of different approaches in the literature. As the area continues to mature, it becomes necessary to evaluate and compare these approaches and validate new ones that are introduced. We present a mutation-analysis based model-clone detection framework that attempts to automate and standardize the process of comparing multiple Simulink model-clone detection tools or variations of the same tool. By having such a framework, new research directions in the area of model-clone detection can be facilitated as the framework can be used to validate new techniques as they arise. We begin by presenting challenges unique to model-clone tool comparison including recall calculation, the nature of the clones, and the clone report representation. We propose our framework, which we believe addresses these challenges. This is followed by a presentation of the mutation operators that we plan to inject into our Simulink models that will introduce variations of all the different model clone types that can then be searched for by each respective model-clone detector.

Knowledge of similar code fragments, also known as code clones, is important to many software maintenance activities including bug fixing, refactoring, impact analysis and program comprehension. While a great deal of research has been conducted for finding techniques and implementing tools to identify code clones, little research has been done to analyze the relationships between code clones and other aspects of software. In this paper, we attempt to uncover the relationships between code clones and coupling among domain-level components. We report on a case study of a large-scale open source enterprise system, where we demonstrate that the probability of finding code clones among components with domain-based coupling is more than 90%. While such a probabilistic view does not replace a clone detection tool per se, it certainly has the potential to complement the existing tools by providing the probability of having code clones between software components. For example, it can both reduce the clone search space and provide a flexible and language independent way of focusing only on a specific part of the system. It can also provide a higher level of abstraction to look at the cloning relationships among software components.

Program slicing is a popular but imprecise technique for identifying which parts of a program affect or are affected by a particular value. A major reason for this imprecision is that slicing reports all program statements possibly affected by a value, regardless of how relevant to that value they really are. In this paper, we introduce quantitative slicing (q-slicing), a novel approach that quantifies the relevance of each statement in a slice. Q-slicing helps users and tools focus their attention first on the parts of slices that matter the most. We present two methods for quantifying slices and we show the promise of q-slicing for a particular application: predicting the impacts of changes.

We propose Example-Driven Modeling (EDM), an approach that systematically uses explicit examples for eliciting, modeling, verifying, and validating complex business knowledge. It emphasizes the use of explicit examples together with abstractions, both for presenting information and when exchanging models. We formulate hypotheses as to why modeling should include explicit examples, discuss how to use the examples, and the required tool support. Building upon results from cognitive psychology and software engineering, we challenge mainstream practices in structural modeling and suggest future directions.

Software engineering researchers develop great tech- niques consisting of practices and tools that improve efciency and quality of software development. Prior work evaluates developers use of techniques such as Test-Driven-Development and refactoring by measuring actions in the development environ- ment. What we still lack is a method to communicate effectively and motivate developers to adopt best practices and tools. This work proposes a game-like system to motivate adoption while continuously measuring developers use of more efcient development techniques.

Nowadays, most business processes are running in a parallel, distributed and time-constrained manner. How to guarantee their on-time completion is a challenging issue. In the past few years, temporal checkpoint selection which selects a subset of workflow activities for verification of temporal consistency has been proved to be very successful in monitoring single, complex and large size scientific workflows. An intuitive approach is to apply those strategies to individual business processes. However, in such a case, the total number of checkpoints will be enormous, namely the cost for system monitoring and exception handling could be excessive. To address such an issue, we propose a brand new idea which selects time points along the workflow execution time line as checkpoints to monitor a batch of parallel business processes simultaneously instead of individually. Based on such an idea, a set of new definitions as well as a time-point based checkpoint selection strategy are presented in this paper. Our preliminary results demonstrate that it can achieve an order of magnitude reduction in the number of checkpoints while maintaining satisfactory on-time completion rates compared with the state-of-the-art activity-point based checkpoint selection strategy.

Java 8 introduces two functional features: lambda expressions and functional operations like map or filter that apply a lambda expression over the elements of a Collection. Refactoring existing code to use these new features enables explicit but unobtrusive parallelism and makes the code more succinct. However, refactoring is tedious (it requires changing many lines of code) and error-prone (the programmer must reason about the control-flow, data-flow, and side-effects). Fortunately, these refactorings can be automated. We present LAMBDAFICATOR, a tool which automates two refactorings. The first refactoring converts anonymous inner classes to lambda expressions. The second refactoring converts for loops that iterate over Collections to functional operations that use lambda expressions. In 9 open-source projects we have applied these two refactorings 1263 and 1595 times, respectively. The results show that LAMBDAFICATOR is useful. A video highlighting the main features can be found at: http://www.youtube.com/watch?v=EIyAflgHVpU

Architectural drift is a widely cited problem in software engineering, where the implementation of a software system diverges from the designed architecture over time causing architecture inconsistencies. Previous work suggests that this architectural drift is, in part, due to programmers lack of architecture awareness as they develop code. JITTAC is a tool that uses a real-time Reflexion Modeling approach to inform programmers of the architectural consequences of their programming actions as, and often just before, they perform them. Thus, it provides developers with Just-In-Time architectural awareness towards promoting consistency between the as-designed architecture and the as-implemented system. JITTAC also allows programmers to give real-time feedback on introduced inconsistencies to the architect. This facilitates programmer-driven architectural change, when validated by the architect, and allows for more timely team-awareness of the actual architectural consistency of the system. Thus, it is anticipated that the tool will decrease architectural inconsistency over time and improve both developers and architect's knowledge of their software's architecture. The JITTAC demo is available at: http://www.youtube.com/watch?v=BNqhp40PDD4

Services, such as Stack Overflow, offer a web platform to programmers for discussing technical issues, in form of Question and Answers (Q&A). Since Q&A services store the discussions, the generated crowd knowledge can be accessed and consumed by a large audience for a long time. Nevertheless, Q&A services are detached from the development environments used by programmers: Developers have to tap into this crowd knowledge through web browsers and cannot smoothly integrate it into their workflow. This situation hinders part of the benefits of Q&A services. To better leverage the crowd knowledge of Q&A services, we created Seahawk, an Eclipse plugin that supports an integrated and largely automated approach to assist programmers using Stack Overflow. Seahawk formulates queries automatically from the active context in the IDE, presents a ranked and interactive list of results, lets users import code samples in discussions through drag & drop and link Stack Overflow discussions and source code persistently as a support for team work. Video Demo URL: http://youtu.be/DkqhiU9FYPI

PHP is a server-side language that is widely used for creating dynamic Web applications. However, as a dynamic language, PHP may induce certain programming errors that reveal themselves only at run time. A common type of error is dangling references, which occur if the referred program entities have not been declared in the current program execution. To prevent the run-time errors caused by such dangling references, we introduce Dangling Reference Checker (DRC), a novel tool to statically detect those references in the source code of PHP-based Web applications. DRC first identifies the path constraints of the program executions in which a program entity appears and then matches the path constraints of the entity's declarations and references to detect dangling ones. DRC is able to detect dangling reference errors in several real-world PHP systems with high accuracy. The video demonstration for DRC is available at http://www.youtube.com/watch?v=y_AKZYhLlU4.

Test suites, just like the applications they are testing, evolve throughout their lifetime. One of the main reasons for test-suite evolution is test obsolescence: test cases cease to work because of changes in the code and must be suitably repaired. There are several reasons why it is important to achieve a thorough understanding of how test cases evolve in practice. In particular, researchers who investigate automated test repair--an increasingly active research area--can use such understanding to develop more effective repair techniques that can be successfully applied in real-world scenarios. More generally, analyzing test-suite evolution can help testers better understand how test cases are modified during maintenance and improve the test evolution process, an extremely time consuming activity for any non-trivial test suite. Unfortunately, there are no existing tools that facilitate investigation of test evolution. To tackle this problem, we developed TestEvol, a tool that enables the systematic study of test-suite evolution for Java programs and JUnit test cases. This demonstration presents TestEvol and illustrates its usefulness and practical applicability by showing how TestEvol can be successfully used on real-world software and test suites. Demo video at http://www.cc.gatech.edu/~orso/software/testevol/

Developers search source code frequently during their daily tasks, to find pieces of code to reuse, to find where to implement changes, etc. Code search based on text retrieval (TR) techniques has been widely used in the software engineering community during the past decade. The accuracy of the TR-based search results depends largely on the quality of the query used. We introduce Refoqus, an Eclipse plugin which is able to automatically detect the quality of a text retrieval query and to propose reformulations for it, when needed, in order to improve the results of TR-based code search. A video of Refoqus is found online at http://www.youtube.com/watch?v=UQlWGiauyk4.

Many software maintenance tasks require locating code units that implement a certain feature (termed as feature location). Feature location has been an active research area for more than two decades. However, there is lack of publicly available, large scale benchmarks for evaluating and comparing feature location approaches. In this paper, we present a Linux-Kernel based benchmark for feature location research (video: http://www.youtube.com/watch?feature=player_embedded&v=_HihwRNeK3I). This benchmark is large scale and extensible. By providing rich feature and program information and accurate ground-truth links between features and code units, it supports the evaluation of a wide range of feature location approaches. It allows researchers to gain deeper insights into existing approaches and how they can be improved. It also enables communication and collaboration among different researchers.

Recently, several graphical tools have been proposed
to help developers avoid becoming disoriented when working with
large software projects. These tools visualize the locations that
developers have visited, allowing them to quickly recall where
they have already visited. However, developers also spend a significant
amount of time exploring source locations to visit, which
is a task that is not currently supported by existing tools. In this
work, we propose a graphical code recommender NavClus, which
helps developers find relevant, unexplored source locations to
visit. NavClus operates by mining a developer’s daily interaction
traces, comparing the developer’s current working context with
previously seen contexts, and then predicting relevant source
locations to visit. These locations are displayed graphically along
with the already explored locations in a class diagram. As a
result, with NavClus developers can quickly find, reach, and
focus on source locations relevant to their working contexts.
http://www.youtube.com/watch?v=rbrc5ERyWjQ

Formal Demonstrations 2

Adding features and fixing bugs in software often require systematic edits which are similar, but not identical, changes to many code locations. Finding all edit locations and editing them correctly is tedious and error-prone. In this paper, we demonstrate an Eclipse plug-in called LASE that (1) creates context-aware edit scripts from two or more examples, and uses these scripts to (2) automatically identify edit locations and (3) transform the code. In LASE, users can view syntactic edit operations and corresponding context for each input example. They can also choose a different subset of the examples to adjust the abstraction level of inferred edits. When LASE locates target methods matching the inferred edit context and suggests customized edits, users can review and correct LASEs edit suggestion. These features can reduce developers burden in repetitively applying similar edits to different methods. The tools video demonstration is available at https://www.youtube.com/ watch?v=npDqMVP2e9Q.

The design of object-oriented systems starts with modeling, a process to identify core concepts and their relations. Mainstream modeling techniques can be either informal (white board, CRC cards, etc.) or formal (e.g., UML editors). The former support well the creative modeling process, but their output is difficult to store, process and maintain. The latter reduce these problems, at the expense of creativity and productivity because they are tedious and not trivial to use.
We present CEL, a touch- and gesture-based iPad application to rapidly create, manipulate, and store language agnostic object- oriented software models, based on a minimal set of constructs.
Demo video URL: http://youtu.be/icQVS6w0jTE.

Mentoring project newcomers is a crucial activity in software projects, and requires to identify people having good communication and teaching skills, other than high expertise on specific technical topics. In this demo we present Yoda (Young and newcOmer Developer Assistant), an Eclipse plugin that identifies and recommends mentors for newcomers joining a software project. Yoda mines developers' communication (e.g., mailing lists) and project versioning systems to identify mentors using an approach inspired to what ArnetMiner does when mining advisor/student relations. Then, it recommends appropriate mentors based on the specific expertise required by the newcomer. The demo shows Yoda in action, illustrating how the tool is able to identify and visualize mentoring relations in a project, and suggest appropriate mentors for a developer who is going to work on certain source code files, or on a given topic.

Multiple tools can assist developers when debugging programs, but only a few solutions specifically target the common case of regression failures, to provide a more focused and effective support to debugging. In this paper we present RADAR, a tool that combines change identification and dynamic analysis to automatically explain regression problems with a list of suspicious differences in the behavior of the base and upgraded version of a program. The output produced by the tool is particularly beneficial to understand why an application failed. A demo video is available at http://www.youtube.com/watch?v=DMGUgALG-yE

Program comments have always been the key to understanding code. However, typical text comments can easily become verbose or evasive. Thus sometimes code reviewers find an audio or video code narration quite helpful. In this paper, we present our tool, called MCT (Multimedia Commenting Tool), which is an integrated development environment-based tool that enables programmers to easily explain their code by voice, video and mouse movement in the form of comments. With this tool, programmers can replay the audio or video when they feel like. A demonstration video can be accessed at: http://www.youtube.com/watch?v=tHEHqZme4VE

This tool paper presents a tool for performing memoized symbolic execution (Memoise), an approach we developed in previous work for more efficient application of symbolic execution. The key idea in Memoise is to allow re-use of symbolic execution results across different runs of symbolic execution without having to re-compute previously computed results as done in earlier approaches. Specifically, Memoise builds a trie-based data structure to record path exploration information during a run of symbolic execution, optimizes the trie for the next run, and re-uses the resulting trie during the next run. Our tool optimizes symbolic execution in three standard scenarios where it is commonly applied: iterative deepening, regression analysis, and heuristic search. Our tool Memoise builds on the Symbolic PathFinder framework to provide more efficient symbolic execution of Java programs and is available online for download. The tool demonstration video is available at http://www.youtube.com/watch?v=ppfYOB0Z2vY.

Controller synthesis provides an automated means to produce architecture-level behaviour models that are enacted by a composition of lower-level software components, ensuring correct behaviour. Such controllers ensure that goals are satisfied for any model-consistent environment behaviour. This paper presents a tool for developing environment models, synthesising controllers efficiently, and enacting those controllers using a composition of existing third-party components. Video: www.youtube.com/watch?v=RnetgVihpV4

Short Papers

Configurable software systems allow users to customize them according to their needs. Supporting such variability is commonly divided into three parts: configuration space, build space, and code space. In this research abstract, we describe our work in exploring what information these spaces contain in practice, and if this information is consistent. This involves investigating how these spaces work together to ensure that variability is correctly implemented, and to avoid any inconsistencies or anomalies. Our work identifies how variability is implemented in several configurable systems, and initially focuses on less studied parts such as the build system. Our goals include: 1) investigating what information each space provides, 2) quantifying the variability in the build system, 3) studying the effect of build system constraints on variability anomalies, and 4) analyzing how variability anomalies are introduced and fixed. Achieving these goals would help developers make informed decisions when designing variable software, and improve maintainability of existing configurable systems.

Although software can and does implement access control at the application layer, failure to enforce data access at the data layer often allows uncontrolled data access when individuals bypass application controls. The goal of this research is to improve security and compliance by ensuring access controls rules explicitly and implicitly defined within unconstrained natural language texts are appropriately enforced within a systems relational database. Access control implemented in both the application and data layers strongly supports a defense in depth strategy. We propose a tool-based process to 1) parse existing, unaltered natural language documents; 2) classify whether or not a statement implies access control and whether or not the statement implies database design; and, as appropriate, 3) extract policy elements; 4) extract database design; 5) map data objects found in the text to a database schema; and 6) automatically generate the necessary SQL commands to enable the database to enforce access control. Our initial studies of the first three steps indicate that we can effectively identify access control sentences and extract the relevant policy elements.

Maintenance costs can be substantial for large organizations (several hundreds of programmers) with very large and complex software systems. By large we mean lines of code in the range of hundreds of thousands or millions. Our research objective is to improve the process of handling anomaly reports for large organizations. Specifically, we are addressing the problem of the manual, laborious and time consuming process of assigning anomaly reports to the correct design teams and the related issue of localizing faults in the system architecture. In large organizations, with complex systems, this is particularly problematic because the receiver of an anomaly report may not have detailed knowledge of the whole system. As a consequence, anomaly reports may be assigned to the wrong team in the organization, causing delays and unnecessary work. We have so far developed two machine learning prototypes to validate our approach. The latest, a re-implementation and extension, of the first is being evaluated on four large systems at Ericsson AB. Our main goal is to investigate how large software development organizations can significantly improve development efficiency by replacing manual anomaly report assignment and fault localization with machine learning techniques. Our approach focuses on training machine learning systems on anomaly report databases; this is in contrast to many other approaches that are based on test case execution combined with program sampling and/or source code analysis.

Antipatterns and code smells have been widely proved to affect the change-proneness of software components. However, there is a lack of studies that propose indicators of changes for service-oriented systems. Like any other software systems, such systems evolve to address functional and non func- tional requirements. In this research, we investigate the change- proneness of service-oriented systems from the perspective of software engineers. Based on the feedback from our industrial partners we investigate which indicators can be used to highlight change-prone application programming interfaces (APIs) and service interfaces in order to improve their reusability and response time. The output of this PhD research will assist software engineers in designing stable APIs and reusable services with adequate response time.

At the core of model-driven software development, model-transformation compositions enable automatic generation of executable artifacts from models. Although the advantages of transformational software development have been explored by numerous academics and industry practitioners, adoption of the paradigm continues to be slow, and limited to specific domains. The main challenge to adoption is the fact that maintenance tasks, such as analysis and management of model-transformation compositions and reflecting code changes to model transformations, are still largely unsupported by tools. My dissertation aims at enhancing the field's understanding around the maintenance issues in transformational software development, and at supporting the tasks involved in the synchronization of evolving system features with their generation environments. This paper discusses the three main aspects of the envisioned thesis: (a) complexity analysis of model-transformation compositions, (b) system feature localization and tracking in model-transformation compositions, and (c) refactoring of transformation compositions to improve their qualities.

Software architecture is considered as a set of architectural design decisions (ADDs). Capturing and representing ADDs during the architecting process is necessary for reducing architectural knowledge evaporation. Moreover, managing the evolution of ADDs helps to maintain consistency between requirements and the deployed system. In this work, we create the Triple View Model (TVM) as a general architecture framework for documenting ADDs. The TVM clarifies the notion of ADDs in three different views and covers key features of the architecting process. Based on the TVM, we propose a scenario-based method (SceMethod) to manage the documentation and the evolution of ADDs. Furthermore, we also develop a UML metamodel that incorporates evolution-centered characteristics to manage evolutionary architectural knowledge. We conduct a case study to validate the applicability and the effectiveness of our model and method. In our future work, we plan to investigate how to support ADD documentation and evolution in geographically separated software development (GSD).

Modern computer systems are prone to various classes of runtime faults due to their reliance on features such as concurrency and peripheral devices such as sensors. Testing remains a common method for uncovering faults in these systems. However, commonly used testing techniques that execute the program with test inputs and inspect program outputs to detect failures are often ineffective. To test for concurrency and temporal faults, test engineers need to be able to observe faults as they occur instead of relying on observable incorrect outputs. Furthermore, they need to be able to control thread or process interleavings so that they are deterministic. This research will provide a framework that allows engineers to effectively test for subtle and intermittent faults in modern systems by providing them with greater observability and controllability.

One expected characteristic in modern systems is self-adaptation, the capability of monitoring and reacting to changes into the environment. A particular case of self-adaptation is affective-driven self-adaptation. Affective-driven self-adaptation is about having consciousness of user’s affects (emotions) and drive self-adaptation reacting to changes in those affects. Most of the previous work around self-adaptive systems deals with performance, resources, and error recovery as variables that trigger a system reaction. Moreover, most effort around affect recognition has been put towards offline analysis of affect, and to date only few applications exist that are able to infer user’s affect in real-time and trigger self-adaptation mechanisms. In response to this deficit, this work proposes a software product line approach to jump-start the development of affect-driven self-adaptive systems by offering the definition of a domain-specific architecture, a set of components (organized as a framework), and guidelines to tailor those components. Study cases with systems for learning and gaming will confirm the capability of the software product line to provide desired functionalities and qualities.

The literature reports that source code lexicon plays a paramount role in program comprehension, especially when software documentation is scarce, outdated or simply not available. In source code, a significant proportion of vocabulary can be either acronyms and-or abbreviations or concatenation of terms that can not be identified using consistent mechanisms such as naming conventions. It is, therefore, essential to disambiguate concepts conveyed by identifiers to support program comprehension and reap the full benefit of Information Retrieval-based techniques (e.g., feature location and traceability) whose linguistic information (i.e., source code identifiers and comments) used across all software artifacts (e.g., requirements, design, change requests, tests, and source code) must be consistent. To this aim, we propose source code vocabulary normalization approaches that exploit contextual information to align the vocabulary found in the source code with that found in other software artifacts. We were inspired in the choice of context levels by prior works and by our findings. Normalization consists of two tasks: splitting and expansion of source code identifiers. We also investigate the effect of source code vocabulary normalization approaches on software maintenance tasks. Results of our evaluation show that our contextual-aware techniques are accurate and efficient in terms of computation time than state of the art alternatives. In addition, our findings reveal that feature location techniques can benefit from vocabulary normalization approaches when no dynamic information is available.

Modern integrated development environments (IDEs) support one live codebase at a given moment, which imposes limitations to software development. For example, with only one codebase, the developer must pause development while running tests, or a static analysis, as any edit could invalidate the ongoing computation. Were the IDEs supported a copy of developers codebase, the analyses could have run on this copy, in parallel with the development process. In this paper, we propose techniques and tools that integrate multiple live codebases support to the software development process. Our hypothesis is that IDE support for multiple live codebases can provide a richer development process and aid developers.

Posters

Software quality assessment shall monitor and guide the evolution of a system based on quality measurements. This continuous process should ideally involve multiple stakeholders and provide adequate information for each of them to use. We want to support an effective selection of quality measurements based on the type of software and individual information needs of the involved stakeholders.
We propose an approach that brings together quality measurements and individual information needs for a context-sensitive tailoring of information related to a software quality assessment.
We address the following research question: How can we better support different stakeholders in the quality assessment of a software system?
For that we will devise theories, models, and prototypes to capture their individual information needs, tailor information from software repositories to these needs, and enable a contextual analysis of the quality aspects.
Such a context-sensitive tailoring will provide a effective and individual view on the latest development trends in a project. We outline the milestones as well as evaluation approaches in this paper.

I present an approach to avoid functional failures at runtime in component-based application systems. The approach exploits the intrinsic redundancy of components to find workarounds as alternative sequences of operations to avoid a failure. A first Java prototype is presented, and an evaluation plan, as some preliminary results, are discussed.

Building high assurance secure applications requires the proper use of security mechanisms and assurances provided by the underlying secure platform. However, applications are often built using security patterns and best practices that are agnostic with respect to the intricate specifics of the different underlying platforms. This independence from the underlying platform leaves a gap between security patterns and underlying secure platforms. In this PhD research abstract, we propose a novel approach to bridge this gap. Specifically, we propose reusable capability-specific design fragments for security patterns, which are specialization for patterns in a capability-based system. The focus is on systems that adhere to a capability-based security model, which we consider as the underlying platforms, to provide desired application-wide security properties. We also discuss assumptions and levels of assurance for these reusable designs and their use in the verification of application designs.

Opportunistic reuse, a need based sourcing of software modules without any prior plan is a common practice in software development. It is popular due to rapid productivity improvement and fewer impediments while undertaking reuse task. However, developers use informal criteria to select an
external module for reuse. The composition of such a module may introduce undesirable emergent behavior due to new or unknown design decisions. Hence, we propose to systematize selection of an external module by defining selection criteria based on extracted design decisions from source code. This would help developers in making informed selection of external modules there by avoiding or being aware of design mismatches when reusing opportunistically.

Software engineers generate vast quantities of development artifacts such as source code, bug reports, test cases, usage logs, etc., as they create and maintain their projects. The information contained in these artifacts could provide valuable insights into the software quality and adoption, as well as development process. However, very little of it is available in the way that is immediately useful to various stakeholders. This research aims to extract and analyze data from software repositories to provide software practitioners with up-to-date and insightful information that can support informed decisions related to the business, management, design, or development of software systems. This data-centric decision-making is known as analytics. In particular, we demonstrate that by employing software development analytics, we can help developers make informed decisions around user adoption of a software project, code review process, as well as improve developers' awareness of their working context.

Simulations have been used in various areas, yielding good results, but their application to software evolution is still limited. Simulations of software evolution can help people understand the driving forces that shape software evolution, and predict future evolutionary paths. To move towards simulation of software evolution, this research tries to explore possible models to simulate software evolution, and the applicability of different data to parameterize the models. The simulations will both be based on fine-grained code changes obtained by comparing the abstract syntax trees of source code. The use of fine-grain code changes could reveal information about software evolution that is unavailable by other means.

Programmers are able to understand source code because they are able to relate program elements (e.g. modules, objects, or functions), with the real world concepts these elements are addressing.
The main goal of this work is to enhance current program comprehension by systematically creating bidirectional mappings between domain concepts and source code. To achieve this, semantic bridges are required between natural language terms used in the problem domain and program elements written using formal programming languages. These bridges are created by an inference engine over a multi-ontology environment, including an ontological representation of the program, the problem domain, and
the real world effects program execution produces. These ontologies are populated with data collected from both domains, and enriched using available Natural Language Processing and Information Retrieval techniques.

Forensic analysis of software log files is used to extract user behavior profiles, detect fraud, and check compliance with policies and regulations. Software systems maintain several types of log files for different purposes. For example, a system may maintain logs for debugging, monitoring application performance, and/or tracking user access to system resources. The objective of my research is to develop and validate a minimum set of log file attributes and software security metrics for user nonrepudiation by measuring the degree to which a given audit log file captures the data necessary to allow for meaningful forensic analysis of user behavior within the software system. For a log to enable user nonrepudiation, the log file must record certain data fields, such as a unique user identifier. The log must also record relevant user activity, such as creating, viewing, updating, and deleting system resources, as well as software security events, such as the addition or revocation of user privileges. Using a grounded theory method, I propose a methodology for observing the current state of activity logging mechanisms in healthcare, education, and finance, then I quantify differences between activity logs and logs not specifically intended to capture user activity. I will then propose software security metrics for quantifying the forensic-ability of log files. I will evaluate my work with empirical analysis by comparing the performance of my metrics on several types of log files, including both activity logs and logs not directly intended to record user activity. My research will help software developers strengthen user activity logs for facilitating forensic analysis for user nonrepudiation.

This paper sketches a research path that seeks to examine the search for suitable code problem, based on the observation that when code retargeting is included within a code search activity, developers can justify the suitability of these results upfront and thus reduce their searching efforts looking for suitable code. To support this observation, this paper introduces the Snippet Retargeting Approach, or simply SNIPR. SNIPR complements code search with code retargeting capabilities. These capabilities' intent is to help expedite the process of determining if a found example is a best fit. They do that by allowing developers to explore code modification ideas in place, without requiring to leave the search interface. With SNIPR, developers engage in a virtuous loop where they find code, retarget code, and select only code choices they can justify as suitable. This assures immediate feedback on retargeted examples and thus saves valuable time searching for appropriate code.

Program Analysis

Best practices in programming typically imply coding using classes and interfaces that are not (fully) defined yet. However, integrated development environments (IDEs) do not support such incremental programming seamlessly. Instead, they get in the way by reporting ineffective error messages. Ignoring these messages altogether prevents the programmer from getting useful feedback regarding actual inconsistencies and type errors. But attending to these error messages repeatedly breaks the programming workflow. In order to smoothly support incremental programming, we propose to extend IDEs with support of undefined entities, called Ghosts. Ghosts are implicitly reified in the IDE through their usages. Programmers can explicitly identify ghosts, get appropriate type feedback, interact with them, and bust them when ready, yielding actual code.

Program analysis tools are available to make developers' jobs easier by
automating tasks that would otherwise be performed manually or not at all.
To communicate with the developer, these tools use notifications which may be
textual, visual, or a combination of both. Research has shown that these notifications need improvement in two areas:
expressiveness and scalability. In the research described here, I begin an
investigation into the expressiveness and scalability of existing program
analysis tools and potential improvements in expressiveness and scalability in
and across these tools for novice and expert developers. I begin with novices
because I have conducted research with expert developers which found that both
expressiveness and scalability play a part in an expert's ability to effectively
use a subset of program analysis tools.

The increasing proliferation of mobile handsets, and the migration of the information access paradigm to mobile platforms, leads researchers to study the energy consumption of this class of devices. The literature still lacks metrics and tools that allow software developers to easily measure and optimize the energy efficiency of their code. Energy efficiency can definitely improve user experience increasing battery life. This paper aims to describe a technique to adapt the execution of a mobile application, based on the actual energy consumption of the device, without using external equipment.

Debugging

This paper presents ConfDiagnoser, an automated configuration error diagnosis tool for Java software. ConfDiagnoser identifies the root cause of a configuration error a single configuration option that can be changed to produce desired behavior. It uses static analysis, dynamic profiling, and statistical analysis to link the undesired behavior to specific configuration options. ConfDiagnoser differs from existing approaches in two key aspects: it does not require users to provide a testing oracle (to check whether the software functions correctly) and thus is fully-automated; and it can diagnose both crashing and non-crashing errors. We demonstrated ConfDiagnosers accuracy and speed on 5 non-crashing configuration errors and 9 crashing configuration errors from 5 configurable software systems.

As confirmed by a recent survey among developers of the Apache, Eclipse, and Mozilla projects, failures of the software that occur in the field, after deployment, are difficult to reproduce and investigate in house. To address this problem, we propose an approach for in-house reproducing and debugging failures observed in the field. This approach can synthesize several executions similar to an observed field execution to help reproduce the observed field behaviors, and use these executions, in conjunction with several debugging techniques, to identify causes of the field failure. Our initial results are promising and provide evidence that our approach is able to reproduce failures using limited field execution information and help debugging.

Concurrency bugs are difficult to find because they occur with specific memory-access orderings between threads. Traditional bug-finding techniques for concurrent programs have focused on detecting raw-memory accesses representing the bugs, and they do not identify memory accesses that are responsible for the same bug. To address these limitations, we present an approach that uses memory-access patterns and their suspiciousness scores, which indicate how likely they are to be buggy, and clusters the patterns responsible for the same bug. The evaluation on our prototype shows that our approach is effective in handling multiple concurrency bugs and in clustering patterns for the same bugs, which improves understanding of the bugs.

Process and Maintenance

This paper proposes an extension of the Earned Value Management EVM technique through the integration of historical cost performance data of processes under statistical control as a means to improve the predictability of the cost of projects. The proposed technique was evaluated through a case-study in industry, which evaluated the implementation of the proposed technique in 22 software development projects Hypotheses tests with 95% significance level were performed, and the proposed technique was more accurate and more precise than the traditional technique for calculating the Cost Performance Index - CPI and Estimates at Completion - EAC.

Software change history plays an important role in measuring software quality and predicting defects. Co-change metrics such as number of files changed together has been used as a predictor of bugs. In this study, we further investigate the impact of specific characteristics of co-change dispersion on software quality. Using statistical regression models we show that co-changes that include files from different subsystems result in more bugs than co-changes that include files only from the same subsystem. This can be used to improve bug prediction models based on co-changes.

Object-Oriented Programming (OOP) is one of the most used programming paradigms. Thus, researches dedicated in improvement of software quality that adhere to this paradigm are demanded. Complementarily, maintainability is considered a software attribute that plays an important role in its quality level. In this context, Object-Oriented Software Maintainability (OOSM) has been studied through years and several researchers proposed a high number of metrics to measure it. Nevertheless, there is no standardization or a catalogue to summarize all the information about these metrics, helping the researchers to make decision about which metrics can be adopted to perform their experiments in OOSM. Actually, distinct areas in both academic and industrial environment, such as Software Development, Project Management, and Software Research can adopt them to support decision-making processes. Thus, this work researched about the usage of OOSM metrics in academia and industry in order to help researchers in making decision about the metrics suite to be adopted. We found 570 OOSM metrics. Additionally, as a preliminary result we proposed a catalog with 36 metrics that were most used in academic works/experiments, trying to guide researchers with their decision-make about which metrics are more indicated to be adopted in their experiments.

Models and Requirements

To produce an optimal component-based software system for a given application, it is necessary to consider both the required functionality of the system and its stakeholders' preferences over various non-functional properties. We propose a new modular end-to-end framework for component-based system development that combines formal specification and verification of functional requirements with a novel method for representing and reasoning with stakeholders' qualitative preferences over properties of the system. This framework will facilitate the use of formal verification to ensure system correctness while making it easier to identify truly optimal component-based system designs.

Ever increasing complexity of modern software systems demands new powerful development mechanisms. Model-driven engineering (MDE) can ease the development process through problem abstraction and automated code generation from models. In order for MDE solutions to be trusted, such generation should preserve the system's properties defined at modelling level, both functional and extra-functional, all the way down to the target code. The outcome of our research is an approach that aids the preservation of system's properties in MDE of embedded systems. More specifically, we provide generation of full source code from design models defined using the CHESS-ML, monitoring of selected extra-functional properties at code level, and back-propagation of observed values to design models. The approach is validated against industrial case-studies in the telecommunications applicative domain.

Service-based systems (SBS) must be able to adapt their architectural configurations during runtime in order to keep satisfied their specification models. These models are the result of design time derivation of requirements into precise and verifiable specifications by using the knowledge about the current service offerings. Unfortunately, the design time knowledge may be no longer valid during runtime. Then, non- functional constraints may have different numerical meanings at different time even for the same observers. Thus, specification models become obsolete affecting the SBS’ capability of detecting requirement violations during runtime and therefore they trigger reconfigurations when appropriated. In order to mitigate the obsolescence of specification models, we propose to specify and verify them using the computing with words (CWW) methodology. First, non-functional properties (NFPs) of functionally-equivalent services are modeled as linguistic variables, whose domains are concepts or linguistic values instead of precise numbers. Second, architects specify at design time their requirements as linguistic decision models (LDMs) using these concepts. Third, during runtime, the CWW engine monitors the requirements satisfaction by the current chosen architectural configuration. And fourth, each time a global concept drift is detected in the NFPs of the services market, the numerical meanings are updated. Our initial results are encouraging, where our approach mitigates effectively and efficiently the obsolescence of the specification models used by SBS to drive their reconfigurations.

Budget and schedule constraints limit the number of requirements that can be worked on for a software system and is thus necessary to select the most valuable requirements for implementation. However, selecting from a large number of requirements is a decision problem that requires negotiating with multiple stakeholders and satisficing their value propositions. In this paper I present a two-step value-based requirements prioritization approach based on TOPSIS, a decision analysis framework that tightly integrates decision theory with the process of requirements prioritization. In this two-step approach the software system is initially decomposed into high-level Minimal Marketable Features (MMFs) which the business stakeholders prioritize against business goals. Each individual MMF is further decomposed into low-level requirements/features that are primarily prioritized by the technical stakeholders. The priorities of the low-level requirements are influenced by the MMFs they belong to. This approach has been integrated into Winbook, a social-networking influenced collaborative requirements management framework and deployed for use by 10 real-client project teams for the Software Engineering project course at the University of Southern California in Fall 2012. This model allowed the clients and project teams to effectively gauge the importance of each MMF and low-level requirement and perform various sensitivity analyses and take value-informed decisions when selecting requirements for implementation.

Developers and Users

As software systems get more complex, the companies developing them consist of larger teams and therefore results in more complex communication artifacts. As these software systems grow, so does the impact of every action to the product. To prevent software failure created by this growth and complexity, companies need to find more efficient and effective ways to communicate. The method used in this paper presents developer communication in the form of social networks of which have properties that can be used to detect software failures.

Software systems have not only become larger over time, but the amount of technical contributors and dependencies have also increased. With these expansions also comes the increasing risk of introducing a software failure into a pre-existing system. Software failures are a multi-billion dollar problem in the industry today and while integration and other forms of testing are helping to ensure a minimal number of failures, research to understand full impacts of code changes and their social implications is still a major concern. This paper describes how analysis of code changes and the technical relationships they infer can be used to detect pairs of developers whose technical dependencies may induce software failures. These developer pairs may also be used to predict future software failures as well as provide recommendations to contributors to solve these failures caused by source code changes.

In the past few years, the number of smartphone apps has grown at a tremendous rate. To compete in this market, both independent developers and large companies seek to improve the ratings of their apps. Therefore, understanding the user's perspective of mobile apps is extremely important. In this paper, we study the user's perspective of iOS apps by qualitatively analyzing app reviews. In total, we manually tag 6,390 reviews for 20 iOS apps. We find that there are 12 types of user complaints. Functional errors, requests for additional features, and app crashes are examples of the most common complaints. In addition, we find that almost 11% of the studied complaints were reported after a recent update. This highlights the importance of regression testing before updating apps. This study contributes a listing of the most frequent complaints about iOS apps to aid developers and researchers in better understanding the user's perspective of apps.

System testing of applications with graphical user interfaces (GUIs) such as web browsers, desktop or mobile apps, is more complex than testing from the command line. Specialized tools are needed to generate and run test cases, models are needed to quantify behavioral coverage, and changes in the environment, such as the operating system, virtual machine or system load, as well as starting states of the executions, impact the repeatability of the outcome of tests making tests appear flaky. In this tutorial, we present an overview of the state of the art in GUI testing, consisting of both lectures and demonstrations on various platforms (desktop, web and mobile applications), using an open source testing tool, GUITAR. We show how to setup a system under test, how to extract models without source code, and how to then use those models to generate and replay test cases. We then present a lecture on the various factors that may cause flakiness in the execution of GUI-centric software, and hence impact the results of analyses and experiments based on such software. We end with a demonstration of a community resource for sharing GUI testing artifacts aimed at controlling these factors. This tutorial targets both researchers who develop techniques for testing GUI software, and practitioners from industry who want to learn more about model-based GUI testing or who run and rerun GUI tests and often find their runs are flaky.

Model checking has established as an effective method for automatic system analysis and verification. It is making its way into many domains and methodologies. Applying model checking techniques to a new domain (which probably has its own dedicated modeling language) is, however, far from trivial. Translation-based approach works by translating domain specific languages into input languages of a model checker. Because the model checker is not designed for the domain (or equivalently, the language), translation-based approach is often ad hoc. Ideally, it is desirable to have an optimized model checker for each application domain. Implementing one with reasonable efficiency, however, requires years of dedicated efforts. In this tutorial, we will briefly survey a variety of model checking techniques. Then we will show how to develop a model checker for a language combining real-time and probabilistic features using the PAT (Process Analysis Toolkit) step-by-step, and show that it could take as short as a few weeks to develop your own model checker with reasonable efficiency. The PAT system is designed to facilitate development of customized model checkers. It has an extensible and modularized architecture to support new languages (and their operational semantics), new state reduction or abstraction techniques, new model checking algorithms, etc. Since its introduction 5 years ago, PAT has attracted more than 2500 registered users (from 500+ organisations in 60 countries) and has been applied to develop model checkers for 20 different languages.

Target audience: Software practitioners and researchers wanting to understand the state of the art in using data science for software engineering (SE). Content: In the age of big data, data science (the knowledge of deriving meaningful outcomes from data) is an essential skill that should be equipped by software engineers. It can be used to predict useful information on new projects based on completed projects. This tutorial offers core insights about the state-of-the-art in this important field. What participants will learn: Before data science: this tutorial discusses the tasks needed to deploy machine-learning algorithms to organizations (Part1: Organization Issues). During data science: from discretization to clustering to dichotomization and statistical analysis. And the rest: When local data is scarce, we show how to adapt data from other organizations to local problems. When privacy concerns block access, we show how to privatize data while still being able to mine it. When working with data of dubious quality, we show how to prune spurious information. When data or models seem too complex, we show how to simplify data mining results. When data is too scarce to support intricate models, we show methods for generating predictions. When the world changes, and old models need to be updated, we show how to handle those updates. When the effect is too complex for one model, we show how to reason across ensembles of models. Pre-requisites: This tutorial makes minimal use of maths of advanced algorithms and would be understandable by developers and technical managers.

A huge wealth of various data exist in the practice of software development. Further rich data are produced by modern software and services in operation, many of which tend to be data-driven and/or data-producing in nature. Hidden in the data is information about the quality of software and services or the dynamics of software development. Software analytics is to utilize a data-driven approach to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information; such information is used for completing various tasks around software systems, software users, and software development process. This tutorial presents achievements and challenges of research and practice on principles, techniques, and applications of software analytics, highlighting success stories in industry, research achievements that are transferred to industrial practice, and future research and practice directions in software analytics.

Dafny is a programming language and program verifier. The language includes specification constructs and the verifier checks that the program lives up to its specifications. These tutorial notes give some Dafny programs used as examples in the tutorial.

Using software metrics to keep track of the progress and quality of products and processes is a common practice in industry. Additionally, designing, validating and improving metrics is an important research area. Although using software metrics can help in reaching goals, the effects of using metrics incorrectly can be devastating. In this tutorial we leverage 10 years of metrics-based risk assessment experience to illustrate the benefits of software metrics, discuss different types of metrics and explain typical usage scenarios. Additionally, we explore various ways in which metrics can be interpreted using examples solicited from participants and practical assignments based on industry cases. During this process we will present the four common pitfalls of using software metrics. In particular, we explain why metrics should be placed in a context in order to maximize their benefits. A methodology based on benchmarking to provide such a context is discussed and illustrated by a model designed to quantify the technical quality of a software system. Examples of applying this model in industry are given and challenges involved in interpreting such a model are discussed. This tutorial provides an in-depth overview of the benefits and challenges involved in applying software metrics. At the end you will have all the information you need to use, develop and evaluate metrics constructively.

Java Pathfinder (JPF) is an open source analysis system that automatically verifies Java programs. The JPF tutorial provides an opportunity to software engineering researchers and practitioners to learn about JPF, be able to install and run JPF, and understand the concepts required to extend JPF. The hands-on tutorial will expose the attendees to the basic architecture framework of JPF, demonstrate the ways to use it for analyzing their artifacts, and illustrate how they can extend JPF to implement their own analyses. One of the defining qualities of JPF is its extensibility. JPF has been extended to support symbolic execution, directed automated random testing, different choice generation, configurable state abstractions, various heuristics for enabling bug detection, configurable search strategies, checking temporal properties and many more. JPF supports these extensions at the design level through a set of stable well defined interfaces. The interfaces are designed to not require changes to the core, yet enable the development of various JPF extensions. In this tutorial we provide attendees a hands on experience of developing different interfaces in order to extend JPF. The tutorial is targeted toward a general software engineering audiencesoftware engineering researchers and practitioners. The attendees need to have a good understanding of the Java programming language and be fairly comfortable with Java program development. The attendees are not required to have any background in Java Pathfinder, software model checking or any other formal verification techniques. The tutorial will be self-contained.

Variability is becoming an increasingly important concern in software development but techniques to cost-effectively verify and validate software in the presence of variability have yet to become widespread. This half-day tutorial offers an overview of the state of the art in an emerging discipline at the crossroads of formal methods and software engineering: quality assurance of variability-intensive systems. We will present the most significant results obtained during the last four years or so, ranging from conceptual foundations to readily usable tools. Among the various quality assurance techniques, we focus on model checking, but also extend the discussion to other techniques. With its lightweight usage of mathematics and balance between theory and practice, this tutorial is designed to be accessible to a broad audience. Researchers working in the area, willing to join it, or simply curious, will get a comprehensive picture of the recent developments. Practitioners developing variability-intensive systems are invited to discover the capabilities of our techniques and tools, and to consider integrating them in their processes.

Software requirements reuse becomes a fundamental activity for those IT organizations that conduct requirements engineering processes in similar settings. One strategy to implement this reuse is by exploiting a catalogue of software requirement patterns (SRPs). In this tutorial, we provide an introduction to the concept of SRP, summarise several existing approaches, and reflect on the consequences on several requirements engineering processes and activities. We take one of these approaches, the PABRE framework, as exemplar for the tutorial and analyse in more depth the catalogue of SRP that is proposed. We apply the concepts given on a practical exercise.

Software plays a key role in high-risk systems, i.e., safety- and security-critical systems. Several certification standards and guidelines, e.g., in the defense, transportation (aviation, automotive, rail), and healthcare domains, now recommend and/or mandate the development of assurance cases for software-intensive systems. As such, there is a need to understand and evaluate (a) the application of assurance cases to software, and (b) the relationship between the development and assessment of assurance cases, and software engineering concepts, processes and techniques. The ICSE 2013 Workshop on Assurance Cases for Software-intensive Systems (ASSURE) aims to provide an international forum for high-quality contributions (research, practice, and position papers) on the application of assurance case principles and techniques for software assurance, and on the treatment of assurance cases as artifacts to which the full range of software engineering techniques can be applied.

This paper is a report on The 8th IEEE/ACM International Workshop on Automation of Software Test (AST 2013) at the 35th Interna¬tional Conference on Software Engineering (ICSE 2013). It sets a special theme on testing-as-a-service (TaaS). Keynote speech and charette discussions are organized around this special theme. Eighteen full research papers and six short papers will be presented in the two-day workshop. The report will give the background of the workshop and the selection of the special theme, and report on the organization of the workshop. The provisional program will be presented with a list of the sessions and papers to be presented at the workshop.

The quality of empirical studies is critical for the success of the Software Engineering (SE) discipline. More and more SE researchers are conducting empirical studies involving the software industry. While there are established empirical procedures, relatively little is known about the dynamics of conducting empirical studies in the complex industrial environments. What are the impediments and how to best handle them? This was the primary driver for organising CESI 2013. The goals of this workshop include having a dialogue amongst the participating practitioners and academics on the theme of this workshop with the aim to produce tangible output that will be summarised in a post-workshop report.

Software is created by people for people working in a range of environments and under various conditions. Understanding the cooperative and human aspects of software development is crucial in order to comprehend how methods and tools are used, and thereby improve the creation and maintenance of software. Both researchers and practitioners have recognized the need to investigate these aspects, but the results of such investigations are dispersed in different conferences and communities. The goal of this workshop is to provide a forum for discussing high quality research on human and cooperative aspects of software engineering. We aim to provide both a meeting place for the community and the possibility for researchers interested in joining the field to present and discuss their work in progress and to get an overview over the field.

Modelling plays a vital role in software engineering, enabling the creation of larger, more complex systems. Search-based software engineering (SBSE) offers a productive and proven approach to software engineering through automated discovery of near-optimal solutions to problems, and has proven itself to be effective on a wide variety of software engineering problems. The aim of this workshop is to highlight that SBSE and modelling have substantial conceptual and technical synergies, and to discuss and present opportunities and novel ways in which they can be combined, whilst fostering the growing community of researchers working in this area.

Software engineering project courses where student teams are geographically distributed can effectively simulate the problems of globally distributed software development. (DSD) However, this pedagogical model has proven difficult to adopt or sustain. It requires significant pedagogical resources and collaboration infrastructure. Institutionalizing such courses also requires compatible and reliable teaching partners.The purpose of this workshop is to continue building on our outreach efforts to foster a community of international faculty and institutions committed to developing, teaching and researching DSD. Foundational materials presented will include pedagogical materials and infrastructure developed and used in teaching DSD courses along with results and lessons learned. The third CTGDSD workshop will also focus on publishing workshop results and collaborating with the larger DSD community. Long-range goals include: lowering adoption barriers by providing common pedagogical materials, collaboration infrastructure, and a pool of potential teaching partners from around the globe.

Data scientists in software engineering seek insight in data collected from software projects to improve software development. The demand for data scientists with domain knowledge in software development is growing rapidly and there is already a shortage of such data scientists. Data science is a skilled art with a steep learning curve. To shorten that learning curve, this workshop will collect best practices in form of data analysis patterns, that is, analyses of data that leads to meaningful conclusions and can be reused for comparable data. In the workshop we compiled a catalog of such patterns that will help experienced data scientists to better communicate about data analysis. The workshop was targeted at experienced data scientists and researchers and anyone interested in how to analyze data correctly and efficiently in a community accepted way.

After decades of research, and despite significant advancement, formal methods are still not widely used in industrial software development. This may be due to the fact that the formal methods community has not enough focused its attention to software engineering needs, and kits specific role in the software process. At the same time, from a software engineering perspective, there could be a number of fundamental principles that might help to guide the design of formal methods in order to make them more easily applicable in the development of software applications. The main goal of FormaliSE 2013, the FME (Formal Methods Europe; www.fmeurope.org) Workshop on Formal Methods in Software Engineering is to foster integration between the formal methods and the software engineering communities with the purpose to examine the link between the two more carefully than is currently the case.

We present a summary of the 3rd ICSE Workshop on Games and Software Engineering: Engineering Computer Games to Enable Positive, Progressive Change in this article. The full day workshop is planned to include a keynote speaker, panel discussion, and paper presentations on game software engineering topics related to requirements specification and verification, software engineering education, re-use, and infrastructure. An overview of the accepted papers is included in this summary.

Software can become greener by being more energy efficient, hence using less resources; or by making its supported processes more sustainable, hence decreasing the environmental impact of governments, companies and individuals using software applications and services. While research results exist in measuring and controlling the level of greenness of hardware components, major research is needed to relate energy consumption of hardware to energy consumption of its executing software. Measuring the level of greenness of software and reporting it back to the users is the focus of GREENS 2013 with special theme Leveraging Energy Efficiency to Software Users. GREENS brings together software engineering researchers and practitioners to discuss the state-of-the-art and state-of-the-practice in green software, as well as research challenges, novel ideas, methods, experiences, and tools to support the engineering of sustainable and energy efficient software systems.

Most academic disciplines emphasize the importance of their general theories. Examples of well-known general theories include the Big Bang theory, Maxwell’s equations, the theory of the cell, the theory of evolution, and the theory of demand and supply. Less known to the wider audience, but established within their respective fields, are theories with names such as the general theory of crime and the theory of marriage. Few general theories of software engineering have, however, been proposed, and none have achieved significant recognition. This workshop, organized by the SEMAT initiative, aims to provide a forum for discussing the concept of a general theory of software engineering. The topics considered include the benefits, the desired qualities, the core components and the form of a such a theory.

Software Clones are identical or similar pieces of code, models or designs. In this, the 7th International Workshop on Software Clones (IWSC, we will discuss issues in software clone detection, analysis and management, as well as applications to software engineering contexts that can benefit from knowledge of clones. These are important emerging topics in software engineering research and practice. Special emphasis will be given this time to clone management in practice, emphasizing use cases and experiences. We will also discuss broader topics on software clones, such as clone detection methods, clone classification, management, and evolution, the role of clones in software system architecture, quality and evolution, clones in plagiarism, licensing, and copyright, and other topics related to similarity in software systems. The format of this workshop will give enough time for intense discussions.

Live programming is an idea espoused by programming environments from the earliest days of computing (such as Lisp machines and SmallTalk) but have since lain dormant. Recently, the prevalence of asynchronous feedback in programming languages such as Javascript and advances in visualizations and user interfaces have lead to a resurgence of live programming in online education communities (such as Khan Academy) and in experimental IDEs (such as LightTable). The LIVE 2013 workshop includes 12 papers describing visions, implementations, mashups, and new directions of live programming environments. The participants include both practitioners of live coding and researchers in programming languages and software engineering. Finally, several demos curated on the live workshop page are presented.

Models are an important tool in conquering the increasing complexity of modern software systems. Key industries are strategically directing their development environments towards more extensive use of modeling techniques. This workshop sought to understand, through critical analysis, the current and future uses of models in the engineering of software-intensive systems. The MISE-workshop series has proven to be an effective forum for discussing modeling techniques from the MDD and the software engineering perspectives. An important goal of this workshop was to foster exchange between these two communities. The 2013 Modeling in Software Engineering (MiSE) workshop was held at ICSE 2013 in San Francisco, California, during May 18-19, 2013. The focus this year was analysis of successful applications of modeling techniques in specific application domains to determine how experiences can be carried over to other domains. Details about the workshop are at: https://sselab.de/lab2/public/wiki/MiSE/index.php

Mobile-enabled systems make use of mobile devices, RFID tags, sensor nodes, and other computing-enabled mobile devices to gather contextual data from users and the surrounding changing environment. Such systems produce computational data that can be stored and used in the field, shared between mobile and resident devices, and potentially uploaded to local servers or the cloud a distributed, heterogeneous, context-aware, data production and consumption paradigm. Mobile-enabled systems have characteristics that make them different from traditional systems, such as limited resources, increased vulnerability, performance and reliability variability, and a finite energy source. There is significantly higher unpredictability in the execution environment of mobile apps. This workshop brings together experts from the software engineering and mobile computing communities with notable participation from researchers and practitioners in the field of distributed systems, enterprise systems, cloud systems, ubiquitous computing, wireless sensor networks, and pervasive computing to share results and open issues in the area of software engineering of mobile-enabled systems.

Although now 20 years old, only recently has the concept of technical debt gained some momentum and credibility in the software engineering community. The goal of this fourth workshop on managing technical debt is to engage researchers and practitioners in exchanging ideas on viable research directions and on how to put the concept to actual use, beyond its usage as a rhetorical instrument to discuss the fate and ailments of software development projects. The workshop participants presented and discussed approaches to detect, analyze, visualize, and manage technical debt, in its various forms, on large software-intensive system developments.

Software engineers produce code that has formal syntax and semantics, which establishes its formal meaning. However it also includes significant natural language found in identifier names and comments. Additionally, programmers not only work with source code but also with a variety of software artifacts, predominantly written in natural language. Examples include documentation, requirements, test plans, bug reports, and peer-to-peer communications. It is increasingly evident that natural language information can play a key role in improving a variety of software engineering tools used during the design, development, debugging, and testing of software.The focus of the NaturaLiSE workshop is on natural language analysis of software artifacts. This workshop will bring together researchers and practitioners interested in exploiting natural language informationfound in software artifacts to create improved software engineering tools. Relevant topics include (but are not limited to) natural language analysis applied to software artifacts, combining natural language and traditional program analysis, integration of natural language analyses into client tools, mining natural language data, and empirical studies focused on evaluating the usefulness of natural language analysis.

PESOS 2013 is a forum that brings together software engineering researchers from academia and industry, as well as practitioners working in the areas of service-oriented systems to discuss research challenges, recent developments, novel application scenarios, as well as methods, techniques, experiences, and tools to support engineering, evolution and adaptation of service-oriented systems. The special theme of the 5th edition of PESOS is Service Engineering for the Cloud The goal is to explore approaches to better engineer service-oriented systems, to either take advantage of the qualities offered by cloud infrastructures or to account for lack of full control over important quality attributes. PESOS 2013 also continues to be the key forum for collecting case studies and artifacts for educators and researchers in this area.

This paper summarizes PLEASE 2013, the Fourth International Workshop on Product LinE Approaches in Software Engineering. The main goal of PLEASE is to encourage and promote the adoption of Software Product Line Engineering. To this end, we aim at bringing together researchers and industrial practitioners involved in developing families of related products in order to (1) facilitate a dialogue between these two groups and (2) initiate and foster long-term collaborations.

The RAISE13 workshop brought together researchers from the AI and software engineering disciplines to build on the interdisciplinary synergies which exist and to stimulate research across these disciplines. The first part of the workshop was devoted to current results and consisted of presentations and discussion of the state of the art. This was followed by a second part which looked over the horizon to seek future directions, inspired by a number of selected vision statements concerning the AI-and-SE crossover. The goal of the RAISE workshop was to strengthen the AI-and-SE community and also develop a roadmap of strategic research directions for AI and software engineering.

Release engineering deals with all activities in between regular development and actual usage of a software product by the end user, i.e., integration, build, test execution, packaging and delivery of software. Although research on this topic goes back for decades, the increasing heterogeneity and variability of software products along with the recent trend to reduce the release cycle to days or even hours starts to question some of the common beliefs and practices of the field. In this context, the International Workshop on Release Engineering (RELENG) aims to provide a highly interactive forum for researchers and practitioners to address the challenges of, find solutions for and share experiences with release engineering, and to build connections between the various communities.

Computational Science and Engineering (CSE) software supports a wide variety of domains including nuclear physics, crash simulation, satellite data processing, fluid dynamics, climate modeling, bioinformatics, and vehicle development. The increases importance of CSE software motivates the need to identify and understand appropriate software engineering (SE) practices for CSE. Because of the uniqueness of the CSE domain, existing SE tools and techniques developed for the business/IT community are often not efficient or effective. Appropriate SE solutions must account for the salient characteristics of the CSE development environment. SE community members must interact with CSE community members to understand this domain and to identify effective SE practices tailored to CSEs needs. This workshop facilitates that collaboration by bringing together members of the CSE and SE communities to share perspectives and present findings from research and practice relevant to CSE software and CSE SE education. A significant portion of the workshop is devoted to focused interaction among the participants with the goal of generating a research agenda to improve tools, techniques, and experimental methods for CSE software engineering.

Our ability to deliver timely, effective and cost efficient healthcare services remains one of the worlds foremost challenges. The challenge has numerous dimensions including: (a) the need to develop a highly functional yet secure electronic health record system that integrates a multitude of incompatible existing systems, (b) in-home patient support systems to reduce demand on professional health-care facilities, and (c) innovative technical devices such as advanced pacemakers that support other healthcare procedures. Responding to this challenge will necessitate increased development and usage of software-intensive systems in all aspects of healthcare services. However the increased digitization of healthcare has identified extensive requirements related to the development, use, evolution, and integration of health software in areas such as the volume and dependability of software required, and the safety and security of the associated devices. The goal of the fifth workshop on Software Engineering for Health Care (SEHC) is to discuss recent research innovations and to continue developing an interdisciplinary community to develop a research, educational and industrial agenda for supporting software engineering in the health care sector.

By acting as the interface between digital and physical worlds, wireless sensor networks (WSNs) represent a fundamental building block of the upcoming Internet of Things and a key enabler for Cyber-Physical and Pervasive Computing Systems. Despite the interest raised by this decade-old research topic, the development of WSN software is still carried out in a rather primitive fashion, by building software directly atop the operating system and by relying on an individual's hard-earned programming skills. WSN developers must face not only the functional application requirements but also a number of challenging, non-functional requirements and constraints resulting from scarce resources. The heterogeneity of network nodes, the unpredictable environmental influences, and the large size of the network further add to the difficulties. In the WSN community, there is a growing awareness of the need for methodologies, techniques, and abstractions that simplify development tasks and increase the confidence in the correctness and performance of the resulting software. Software engineering (SE) support is therefore sought, not only to ease the development task but also to make it more reliable, dependable, and repeatable. Nevertheless, this topic has received so far very little attention by the SE community.

The 2nd International Workshop on Software Engineering Challenges for the Smart Grid focuses on understanding and identifying the unique challenges and opportunities for SE to contribute to and enhance the design and development of the smart grid. In smart grids, the geographical scale, requirements on real-time performance and reliability, and diversity of application functionality all combine to produce a unique, highly demanding problem domain for SE to address. The objective of this workshop is to bring together members of the SE community and the power engineering community to understand these requirements and determine the most appropriate SE tools, methods and techniques.

TOPI (http://se.inf.ethz.ch/events/topi2013/) is a
workshop started in 2011 to address research questions involving
plug-ins: software components designed and written to execute
within an extensible platform. Most such software components
are tools meant to be used within a development environment
for constructing software. Other environments are middle-ware
platforms and web browsers. Research on plug-ins encompasses
the characteristics that differentiate them from other types of
software, their interactions with each other, and the platforms
they extend.

The disciplines of requirements engineering (RE) and software architecture (SA) are fundamental to the success of software projects. Even though RE and SA are often considered separately, it has been argued that drawing a line between RE and SA is neither feasible nor reasonable as requirements and architectural design processes impact each other. Requirements are constrained by what is feasible technically and also by time and budget restrictions. On the other hand, feedback from the architecture leads to renegotiating architecturally significant requirements with stakeholders. The topic of bridging RE and SA has been discussed in both the RE and SA communities, but mostly independently. Therefore, the motivation for this ICSE workshop is to bring both communities together in order to identify key issues, explore the state of the art in research and practice, identify emerging trends, and define challenges related to the transition and the relationship between RE and SA.

We have met many software engineering researchers who would like to evaluate a tool or system they developed with real users, but do not know how to begin. In this second iteration of the USER workshop, attendees will collaboratively design, develop, and pilot plans for conducting user evaluations of their own tools and/or software engineering research projects. Attendees will gain practical experience with various user evaluation methods through scaffolded group exercises, panel discussions, and mentoring by a panel of user-focused software engineering researchers. Together, we will establish a community of like-minded researchers and developers to help one another improve our research and practice through user evaluation.

The International Workshop on Emerging Trends in Software Metrics, aims at gathering together researchers and practitioners to discuss the progress of software metrics. The motivation for this workshop is the low impact that software metrics has on current software development. The goals of this workshop includes critically examining the evidence for the effectiveness of existing metrics and identifying new directions for metrics. Evidence for existing metrics includes how the metrics have been used in practice and studies showing their effectiveness. Identifying new directions includes use of new theories, such as complex network theory, on which to base metrics.