Detailed Sessions and Talks

Wednesday

09:30 - 10:30

The last decade I supervised a significant amount of PhD students, served in numerous PhD committees and reviewed a countless number of papers. During these occasions I always did my very best to give constructive remarks, and lots of young (sometimes even not so young) researchers have acknowledged that it did indeed help them becoming a better researcher. However, over the years I got this nagging feeling that I was often repeating myself; sometimes wondering whether I was sounding like an old grumpy grandfather instead of an active enthusiastic researcher. Worse, I was often giving tidbits of advice but noticed that the overall picture, the reference framework so to speak, was lacking. My advice was sometimes misunderstood, neglected or otherwise ignored and I often wondered if I could nail down this reference framework.

During my sabbatical leave at the University of Zürich (August 2009 - January 2010) in the research group SEAL I was finally presented with the opportunity to develop this reference framework. I did an awful lot of reading, had several brainstorms and in the end created a tutorial entitled "Research methods in Computer Science". This tutorial explores the role of research methods in computer science, drawing upon practical examples from empirical approaches in software engineering.

11:00 - 12:30

Session 1: Human in the Middle

Recent research has provided evidence that software developers experience a wide range of emotions. We argue that among those emotions anger deserves special attention as it can serve as an onset for tools supporting collaborative software development. This, however, requires a fine-grained model of the anger emotion, able to distinguish between anger directed towards self, others and objects. Detecting anger towards self cold be useful to support developers experiencing difficulties; detection of anger towards others might be helpful for community management; detecting anger towards objects might be helpful to recommend and prioritize improvements. As a first step towards automatic identification of anger direction, we built a classifier for anger direction, based on a manually annotated gold standard of 723 sentences obtained from Apache issue reports.

We carry out a quantitative empirical comparison of the macro-level evolution of software packaging ecosystems for a multitude of different programming languages. We report on the most important observed differences and commonalities in the evolution of their package dependency networks.
We hypothesise that the observed commonalities are emerging properties due to the ecosystem scale and complexity.
Inspired by Lehman's laws of software evolution, we postulate that they may ultimately lead to a new series of empirically observable "laws of software ecosystem evolution".

Socio-Technical Analysis of Developer Retention in the RubyGems Software Ecosystem
Eleni Constantinou and Tom Mens
Paper: Upon request from the authors

Software ecosystems can be viewed as socio-technical networks consisting of software packages developed and maintained by communities of contributors. Ecosystems evolve over time through changes in the code and the social structure, and some of these changes that may have an important impact on the sustainability of the ecosystem. Some social changes may lead to a technical degradation of the ecosystem, e.g., by resulting in abandoned software packages that are still being used by many other packages in the ecosystem. To avoid this, it is important to identify those factors leading to an increased probability of developer abandonment. Using the statistical technique of survival analysis, we empirically analyse such factors for the RubyGems software ecosystem. To achieve our goal, we analysed the development activity of gems in GitHub, as well as the social interactions between gem developers in both GitHub and developer mailing lists. Our findings showed that: the more intensive and frequent the communication, the higher the probability of remaining active longer; developers with a large number of commits and multi-project activity stay longer in the ecosystem; and developers that frequently abandon projects in the ecosystem will abandon the ecosystem altogether sooner.

Static analysis tools may produce false positive results, which negatively impact the overall usability of these tools. However, even a correct static analysis report is sometimes classified as a false positive if a developer does not understand it or does not agree with it. Lately developers' classification of false positives is treated on a par with the actual static analysis performance which may distort the knowledge about the real state of static analysis.
In this paper we discuss various use cases where a false positive report is not false and the issue is caused by another aspects of static analysis. We provide an in-depth explanation of the issue for each use case followed by recommendations on how to solve it, and thus exemplify the importance of careful false positive classification.

14:00 - 17:30 with a coffee break at 15:30

Session 2: Dr. Scratch + Hackathon Introduction + Tutorial

Dr. Scratch is a web-tool that analyzes Scratch projects to assess the development of computational thinking skills. This paper presents the current state of the validation process of the tool. The process involves several investigations to test the validity of Dr. Scratch from different perspectives, such as the extent to which learners improve their coding skills while using the tool in real life scenarios; the relationships of the score provided by the tool with other, similar measurements; the capacity of the tool to discriminate between different types of Scratch projects; as well as the vision and feelings of educators who are using the tool in their lessons. The paper also highlights the actions that are still pending to complete the formal validation of Dr. Scratch.

Scratch is increasingly popular, both as an introductory programming language and as a research target in the computing education research field. In this paper, we present a dataset of 250K recent Scratch projects from 100K different authors scraped from the Scratch project repository. We processed the projects’ source code and metadata to encode them into a database that facilitates querying and further analysis. We further evaluated the projects in terms of programming skills and mastery, and included the project scoring results. The dataset enables the analysis of the source code of Scratch projects, of their quality characteristics, and of the programming skills that their authors exhibit. The dataset can be used for empirical research in software engineering and computing education.

Tutorial
Felipe Ortega

Python has long been great for data munging and preparation, but less so for data analysis and modeling. pandas (the Python Data Analysis Library) helps fill this gap, enabling you to carry out your entire data analysis workflow in Python without having to switch to a more domain specific language like R. Combined with the excellent Jupyter toolkit (including IPython) and other libraries, the environment for doing data analysis in Python excels in performance, productivity, and the ability to collaborate. In this tutorial, we will provide an overview of pandas using the dataset of Scratch programs released by the TU Delft, and used in the hackathon of SATToSE 2017.

Thursday

09:00 - 10:30

Session 3: Source Code Analysis

Understanding API usage is important for API developers to manage the evolution of their projects.
They need to know who their clients are and which API versions and features they use.
Many API usage studies analyze either a few hand-selected projects or whole package repositories.
Those approaches do not suit the needs of an API developer, who is only interested in the ecosystem formed around his API.
We present an approach to find clients of a specific API by exploiting dependency management systems.
As a validation we perform a case study on the ecosystem around Apache Lucene, an open-source full-text search engine.
We inspect who uses Lucene and how it is used.
We implement our approach in KOWALSKI for Maven, but the same approach could also be applied on other systems, for example NPM.

This research investigates idioms in Rust programming language to improve both readability and code quality, by refactoring suboptimal code. This is done by enabling Rascal Metaprogramming Language to parse, analyse, and process the grammar of the Rust programming language.

In industry the development of similar products is often addressed by cloning and modifying existing artifacts. This so called "clone-and-own" approach is often considered to be a bad practice, but is still perceived as a favorable and natural software reuse approach by many practitioners. In this paper, we present the research direction, context and related literature of an exploratory study on the analysis of an industry system developed using the clone-and-own approach. The main objective of our study is to understand and quantify the benefits and drawbacks of this approach, by analyzing the evolution and characteristics of this system.

Dynamically-typed languages allow faster software development by not posing the type constraints. Static type information facilitates program comprehension and software maintenance. Type inference algorithms attempt to reconstruct the type information from the code, yet they suffer from the problem of false positives or false negatives. The use of the complex type inference algorithms is questionable during the development phase, due to their tardiness. Instead, we propose lightweight heuristics to improve simple type inference algorithms and, at the same time, preserve their swiftness.

11:00 - 12:30

Session 4: Testing

The test suite is essential for fault detection during software evolution. First-order mutation coverage is an accurate metric to quantify the quality of the test suite. However, it is computationally expensive. Hence, the adoption of this metric is limited. In our study, we addressed this issue by proposing a realistic model able to estimate first-order mutation coverage using only higher-order mutation coverage. Our study shows how the estimation evolves along with the order of mutation. We validated the model with an empirical study based on 17 open-source projects.

Research on software testing generally focusses on the effectiveness of test suites to detect bugs. The quality of the test code in terms of maintainability remains mostly ignored. However, just like production code, test code can suffer from code smells that imply refactoring opportunities. In this paper, we will summarize the state-of-the-art in the field of test refactoring. We will show that there is a gap in the tool support, and propose future work which will aim to fill this gap.

Many software development projects frequently rely on testing-related libraries to test the functionality of the software product automatically and efficiently. To obtain insights in the nature of the evolution of testing library usage, we empirically analyzed the usage of eight testing-related libraries in 6,424 open source Java projects hosted on GitHub.
We observed how frequently specific (pairs of) versions of libraries are used over time, for how much they are used within a project and we identified the delay to upgrade to a new version. We also identified over time the most used packages of libraries and we analyzed if groups of packages are usually used together. We studied the evolution of the number of test Java files and we also studied how often developers use testing libraries to test classes that provide a particular functionality. We found that some versions of certain libraries are quickly adopted than the others and some of them are quickly upgraded. We observed that most packages of some libraries tend to be used in a few numbers of Java files. These findings may pave the way for recommendation tools that allow project developers
to choose the most appropriate library and library developers to better maintain their library.

Software testing is an important part of the software engineering process, widely used in industry.
Software testing is partly covered by test suites, comprising unit tests written by developers.
As projects grow, the size of the test suites grows along.
Monitoring the quality of these test suites is important as they often influence the cost of maintenance.
Part of this monitoring process is to measure the effectiveness of test suites in detecting faults.
Unfortunately, this is computationally expensive and requires the ability to run the tests, which often have dependencies on other systems or require non-transferable licenses.
To mitigate these issues, we investigate whether metrics obtained from static analysis could predict test suite effectiveness, as measured with mutation testing.
Preliminary results show that, when size is ignored, there is a correlation between statically estimated code coverage and effectiveness.
However, when suites of equal sizes are compared the correlation drops significantly.
Our current focus is investigating the reasons of this behavior.

14:00 - 15:30

Session 5: Quality

Spreadsheets are used extensively for calculations in many domains. Their easy to use and intuitive interface allow the users to build various complexities of calculations. That often lead to maintainability issues, including performance problems. Despite the number of resources containing possible spreadsheets formulas performance anti-patterns, little research has been done to validate them. In this paper, we analyze 40,122 spreadsheets from four different data sources to investigate the effect of 20 spreadsheet performance-related metrics. These metrics are chosen following a smell-driven analysis. Thereafter, our analysis constructs a linear regression model with 12 metrics that are found significant to the explanation of the spreadsheet performance. We further identify each metric contribution to the performance model. Initial Results show that the most three significant spreadsheet metrics are: the repetition of a formula over a large range (35.89%), the conditional formatting of cells (19.53%) and the calls to special Excel functions, such as lookup functions (18.84%).

Code smells indicate the presence of quality issues in a software system. For a thorough large scale smell mining study, researchers require tools that not only allow them to detect a wide range of smells in a large number of repositories automatically but also offer mechanisms to customize the analysis. In this paper, we present a tool Designite that detects 19 design and 11 implementation smells for source code written in C# programming language. Designite provides a command line tool, in addition to an interactive user interface, to support automation required for a large scale mining study. Furthermore, the tool allows customization of quality analysis parameters, such as metric thresholds, to serve a wider range of users.

While some university-level programming courses focus on software quality, often in introductory courses code quality is little touched upon due to time constraints. When students work on a programming assignment, they usually get feedback on the code quality after the assignment was graded. This feedback cannot be used on that same assignment, before the grade is determined. Better Code Hub is a service that checks code quality according to ten guidelines. We employ Better Code Hub as a formative assessment and feedback tool that students can use to monitor their progress on code quality. Our aim is to improve students' skills for code quality during the evolution of a students' programming assignment, while keeping the overhead low for teaching staff as well as for students. Preliminary findings indicate that there is an improvement in the code quality of the students' assignments over the period Better Code Hub was used.

New Approach to Understand the Origins of a Bug
Gema Rodriguez-Perez
Paper: pdf

This paper presents our ongoing research work. The core of our study is focused on bug fixing and bug seeding process, concretely, in the study of the assumption made in the literature which says that a given bug was introduced by the lines of code that were modified to fix it. After exploring carefully how a number of bugs were introduced in two different projects, we have developed an approach of how bugs are introduced. Furthermore, based on the above assumption we are carrying out a systematic review about the use of this assumption in previous research works.

Friday

09:30 - 10:30

Sustainable business requires maintainable software. This is the premise of the book “Building Maintainable Software — ten guidelines for future-proof code”, by Joost Visser and colleagues from SIG. In this presentation, Joost will explain the importance of maintainable software and he will discuss how software teams can use the 10 guidelines to more effectively work together in creating great software products. Joost will also highlight pitfalls and success factors for using code analysis tools that help to measure maintainability of software.

Keeping a trace of the evolution of software architectures is an important issue for architects. This paper is a summary of a submitted paper which states that versioning activity does not propose adapted solutions that fit software architectures, especially when dealing with co-evolution of different architecture representations that may be produced during the development process.
This work is based on a three-level architecture description language (ADL) named Dedal, which gives a representation of architectures at the main stages of a software life-cycle. Dedal also performs co-evolution through change propagation within these representations.
Another advantage of Dedal is that it has been formalized and thus provides a formal base for studying version propagation. We based our study on substitutability relations that exist when a component is replaced at any of the three architecture levels. We aim at being able to predict compatibility of versioned artifacts in terms of impact on the different architecture levels.

Incremental builds are commonly used to speed up the edit- compile-test loop during program development. By updating only the
required parts of a system, build systems can shorten the compilation phase by orders of magnitude. While this technique is commonly used for local builds, it is seldom enabled during continuous integration. Current build system do not offer strong correctness properties on incremental builds. In continuous integration setups, it is therefore difficult to achieve both correctness and efficiency simultaneously. Facing this choice, release engineers tend to favor correct builds over optimized builds.
In this article, we show that it is possible to obtain both incremental and correct builds. We start by showing that incremental builds are a desirable optimization in continuous integration environments. Different reasons that prevent release engineers to enable incremental builds in practice are discussed. From these, we derive requirements to be met by future build systems to support incremental continuous integration. Whenever possible, we illustrate shortcomings of build systems with insights from current research and industry efforts in new build systems. We also list existing projects that could be combined into a complete tool supporting efficient and correct incremental compilation in continuous integration environments. Ultimately, this paper defines a new research direction at the intersection of build systems and continuous integration.

Integrating code from different sources can be an error-prone and effort-intensive process. While an integration may appear sound, unexpected errors may still surface at run time. What exactly constitutes these errors is not always clear, nor is it known which are most common. We want to create a hierarchic categorisation of errors that break merge commits. We will do this in two phases. First, a manual categorisation to enable the definition of categories based on real world examples. Second, a declarative specification of every category to automate categorisation and enable performing an empirical study.

How long does it take to fix the code: A case study of OpenStack.
Dorealda Dalipaj and Jesus M. Gonzalez-Barahona
Paper: pdf

Code review is an excellent source of metrics that can be used to improve the software development process. Metrics
benefits varies from measuring the progress of a development team to investigating into software development policies and guidelines. In this paper, we analyse some of the absolute metrics, specifically review process metrics. Our case study is the
large open source cloud computing project OpenStack. We bring evidence of code review process response time by quantifying the time spent by developers to identify the bug reports in the issue tracking system (bug triaging) and the
time they spent to carry out the reviewing process (time to review ) in the code review system. Last, we contrast our findings with the results of similar analysis from traditional software inspection conducted on the Lucent project and from open source software code review on six projects, including AMD, Microsoft, and Google-led projects. This analysis is one of the intermediate phases of a current research line. Conducted and funded under the SENECA, a EU project, the scope of the research is to analyse the process quality characterising key performance indicators of the software development processes.

14:00 - 15:30

Session 7: MSR2

GitHub is the most used online code platform in the world, with +35 million repositories. Mining information from these millions of projects and analysing that data is very useful for both researchers and companies. It is shown a methodology for extracting information from these Free/Libre/Open Source Software repositories stored on GitHub, applied to a case study: the search of UML models for quantify and analise its use in this type of projects (Michel R. V. Chaudron, Gregorio Robles et al.). For that, it starts from a database provided by the GHTorrent project (which creates a scalable, queriable, offline mirror of data from the GitHub REST API), and a series of scripts are used for extracting metadata from the repositories in order to look for patterns and/or specific file extensions. Once the interesting projects have been identified through an external process, they are analysed with Perceval, a program which extracts metrics from them.

How do Apps Evolve in Their Permission Requests? A Preliminary Study
Paolo Calciati and Alessandra Gorla
Paper: pdf

In this preliminary study we try to understand how apps evolve in their permission requests across different releases. We analyze over 14K releases of 227 Android apps, and we see how permission requests change and how they are used. We find that apps tend to request more permissions in their evolution, and many of the newly requested permissions are initially overprivileged. Our qualitative analysis, however, shows that the results that popular tools report on overprivileged apps may be biased by incomplete information or by other factors. Finally, we observe that when apps no longer request a permission, it does not necessarily mean that the new release offers less in terms of functionalities.

Developers often document their code with semi-structured comments such as Javadoc. Such comments are a form of specification, and often document the intended behavior of a code unit, as well as its preconditions. The goal of our project is to analyze Javadoc comments to generate assertions aiming to verify that a software unit indeed behaves as expected. Existing works with this goal mainly rely on syntactic-based techniques to match natural language terms in comments to elements in the code under test. In this paper we show the limitations of syntax-based techniques, and we present our roadmap to semantically analyze Javadoc comments.

Our study is focused on creating a web tool that can help beginners and advanced programmers to make their Python code more legible and readable with the use of Pythonic idioms, that is, using typical ways to accomplish some tasks.

Hackathon Results & Open MIC

Session Chair: Vadim Zaytsev
In this session, everybody is welcome to present the results of their hackathon. Also whoever wants to share something with the audience, this is the right platform. One can present some results, demo a tool, share a nice paper, give an opinion on some issue to trigger discussions, etc. We will run this session in an ad-hoc manner.