Monday, July 24, 2017

Software engineers face a constantly growing pressure to build secure software by design, where systems have to be designed from the ground up to be secure and resistant to attacks. To achieve this goal, security architects work with various stakeholders to identify security requirements and adopt appropriate architectural solutions to address these requirements. These architectural solutions are often based on security tactics. Security tactics as reusable solutions to satisfy security quality attributes regarding resisting attacks (e.g., tactic “Authenticate Actors”), detecting attacks (e.g., tactic “Detect Intrusion”), reacting to attacks (e.g., tactic “Revoke Access”), and recovering from attacks (e.g., tactic “Audit”). Despite the significant efforts that go into designing secure systems, security can slowly erode because of ongoing maintenance activities. Incorrect implementation of security tactics or the deterioration of security tactics during coding and maintenance activities can result in vulnerabilities in the security architecture of the system, thus compromising key security requirements. We refer to these vulnerabilities as tactical vulnerabilities.

The code snippet in Listing 1 from a J2EE web application shows an example of such tactical vulnerabilities which is the incorrect implementation of the “Manage User Sessions” tactic. The correct implementation of this tactic in a web application would allow the system to keep track of users that are currently authenticated (including permissions they hold). However, in the given code snippet, the application authenticates users with LoginContext.login() without first calling HttpSession.invalidate() to invalidate any existing session. This enables attackers to fixate (i.e., find or set) another user’s session identifier (e.g., by inducing a user to initiate a session using the session identifier provided by the attacker). Once the user authenticates him/herself with this forged session identifier, the attacker would be able to hijack or steal his/her authenticated session. Although architects have used the “Manage User Sessions” tactic in the architecture design of the web application, the developers have failed to implement it correctly, resulting in a tactical vulnerability that can be exploited for session fixation attacks.

Recent empirical studies of security vulnerabilities have neglected the architectural context, including design decisions such as tactics and patterns. They mostly focus on studying and understanding coding issues related to the management of data structures and variables (e.g., buffer overflow/over-read).

Goal of This Study

Here, I’d like to report an in-depth case study of software vulnerabilities associated with architectural security tactics across three large-scale open-source systems (Chromium, PHP, and Thunderbird). In this blog post, I only present the results, the scientific process and systematic approach used to make the conclusions can be found in our research article here.

Common Tactical Vulnerabilities

Table I lists the root causes (i.e., vulnerability types) of tactical vulnerabilities in each of the three studied systems, the related architecture tactics, as well as the total number of CVEs caused by the given vulnerability type.

Key findings

While Chromium, PHP, and Thunderbird have adopted a wide range of architectural tactics to secure the systems by design, a remarkable number of vulnerabilities discovered in these systems are due to incorrect implementations of these tactics.

While Chromium, PHP, and Thunderbird have adopted a wide range of architectural tactics to secure the systems by design, a remarkable number of vulnerabilities discovered in these systems are due to incorrect implementations of these tactics

Improper Input Validation (CWE-20) and Improper Access Control (CWE-284) are the most occurring root causes for security vulnerabilities in Chromium, PHP, and Thunderbird.

Vulnerabilities in the three studied systems are mostly related to tactics “Validate Inputs” and “Authorize Actors” for resisting attacks.

Security of studied projects was compromised by reusing or importing vulnerable versions of third-party libraries. In the case of Chromium, such vulnerabilities occurred 106 times, while in Thunderbird and PHP, 7 and 8 times, respectively.

Tactical and non-tactical vulnerabilities have a similar distribution over time and releases, even though the absolute numbers of tactical and non-tactical vulnerabilities differ.

When fixing tactical vulnerabilities, there is no statistically higher or lower code churn compared to fixing non-tactical vulnerabilities.

When fixing tactical vulnerabilities, the number of affected files is not statistically significantly higher or lower compared to fixing non-tactical vulnerabilities.

Monday, July 17, 2017

From a functional perspective, the quality of open source software (OSS) is on par with comparable closed-source software [1]. However, in terms of nonfunctional attributes, such as reliability, scalability, or performance, the quality is less well-understood. For example, Heger et al. [2] stated that performance bugs in OSS go undiscovered for a longer time than functional bugs, and fixing them takes longer.As many OSS libraries (such as apache/log4j) are used almost ubiquitously across a large span of other OSS or industrial applications, a performance bug in such a library can lead to widespread slowdowns. Hence, it is of utmost importance that the performance of OSS is well-tested.We studied 111 Java-based open source projects from GitHub to explore to what extent and how OSS developers conduct performance tests. First, we searched for projects that included at least one of the keywords 'bench' or 'perf' in the 'src/test' directory. Second, we manually identified the performance and functional tests inside that project. Third, we identified performance-sensitive projects, which mentioned in the description of their GitHub repository that they are the 'fastest', 'most efficient', etc. For a more thorough description of our data collection process, see our ICPE 2017 paper [3]. In the remainder of this blog post, the most significant findings of our study are highlighted.

Finding # 1 - Performance tests are maintained by a single developer or a small group of developers. In 50% of the projects, all performance test developers are one or two core developers of the project. In addition, only 44% of the test developers worked on the performance tests as well.

Finding # 2 - Compared to the functional tests, performance tests are small in most projects. The median SLOC (source lines of code) in performance tests in the studied projects was 246, while the median SLOC of functional tests was 3980. Interestingly, performance-sensitive projects do not seem to have more or larger performance tests than non-performance-sensitive projects.

Finding # 3 - There is no standard for the organization of performance tests. In 52% of the projects, the performance tests are scattered throughout the functional test suite. In 9% of the projects, code comments are used to communicate how a performance test should be executed. For example, the RangeCheckMicroBenchmark.java file from thenbronson/snaptree project contains the following comment:

/*

* This is not a regression test, but a micro-benchmark.

*

* I have run this as follows:

*

* repeat 5 for f in -client -server;

* do mergeBench dolphin . jr -dsa\

*-da f RangeCheckMicroBenchmark.java;

* done

*/

public class RangeCheckMicroBenchmark {

...

}In four projects, we even observed that code comments were used to communicate the results of a previous performance test run.

Finding # 4 - Most projects have performance smoke tests.

We identified the following five types of performance tests in the studied projects:

Performance smoke tests: These tests (50% of the projects) typically measure the end-to-end execution time of important functionality of the project.

Microbenchmarks: 32% of the projects use microbenchmarks, which can be considered performance unit tests. Stefan et al. [4] studied microbenchmarks in depth in their ICPE 2017 paper.

One-shot performance tests: These tests (15% of the projects) were meant to be executed once, e.g., to test the fix for a performance bug.

Performance assertions: 5% of the projects try to integrate performance tests in the unit-testing framework, which results in performance assertions. For example, the TransformerTest.java file from the anthonyu/Kept-Collections project asserts that one bytecode serialization method is at least four times as fast as the alternative.

Implicit performance tests: 5% of the projects do not have performance tests, but simply yield a performance metric (e.g., the execution time of the unit test suite).

The different types of tests show that there is a need for performance tests at different levels, ranging from low-level microbenchmarks to higher-level smoke tests.

Finding # 5 - Dedicated performance test frameworks are rarely used.

Only 16% of the studied projects used a dedicated performance test framework, such as JMH or Google Caliper. Most projects use a unit test framework to conduct their performance tests. One possible explanation is that developers are trying hard to integrate their performance tests into the continuous integration processes.

The main takeaway of our study

Our observations imply that developers are currently missing a “killer app” for performance testing, which would likely standardize how performance tests are conducted, in the same way as JUnit standardized unit testing for Java. An ubiquitous performance testing tool will need to support performance tests on different levels of abstraction (smoke tests versus detailed microbenchmarking), provide strong integration into existing build and CI tools, and support both, extensive testing with rigorous methods as well as quick-and-dirty tests that pair reasonable expressiveness with being fast to write and maintain even by developers who are not experts in software performance engineering.

[3] P. Leitner and C.-P. Bezemer. An exploratory study of the state of practice of performance testing inJava-based open source projects. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering (ICPE). 2017.

[4] P. Stefan, V. Horky, L. Bulej, and P. Tuma. Unit testing performance in Java projects: Are we there yet? In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering (ICPE). 2017.

Monday, July 10, 2017

Software products have an undeniable impact on people's daily life. However, software can only help if it matches user’s needs. Often, up to 80% of software features are never or almost never used. To bring the real impact into the society, understanding the specific needs of the users is critical. Social media provide such opportunity to a good extent.

This is a post summarizing the main idea of an ICSE 2017 SEIS track paper titled "Crowdsourced exploration of mobile app features: a case study of the Fort McMurray wildfire". The two interviews complement the description and highlight the results.

We gathered the online communications of Albertans about Fort McMurray fire at the time of this crisis. People formed unofficial online support groups on Facebook and Twitter trying to distribute and reply to the needs of evacuees. For example, for sharing a car, and fuel, baby clothes, getting information about road traffic and gas station lineups, reporting incidents or criminal movements and so on they put a post on Twitter or Facebook and add #yymfire or #FortMcMurray fire hashtags. Then, other members following these hashtags offered them help. In the case of emergency situations (such as natural disaster or man-made attacks), a cell phone may suddenly become the victims only resources.

We developed a method called MAPFEAT to gather and analyze social media posts. With MAPFEAT we elicit requirements from the unstructured communication and automatically map them into an app feature already existing in one of the apps of the whole app store. By evaluating these features through crowdsourcing, MAPFEAT ranks and prioritizes the app features that expect the highest match with user needs and thus should be included in a software application.

In the case of Fort McMurray fire, we analyzed almost 70,000 tweets and mapped them into app features using MAPFEAT. We compared the features we had with the features already existing in 26 emergency apps. The results showed that none of the top 10 most essential features for victims is available in any of the 26 apps. Among top 40 essential features as we gathered, only six was provided by some of the existing wildfire apps. In total, we mined 144 features, and 85% of them were evaluated as essential and worthwhile by the general public.

The mismatch between user’s requirements and software design is a well-known software engineering problem. With the lightweight and high capacity cell devices, now software engineering is, more than ever, a way to help people in solving problems. This would be only possible if we find solutions to involve the general public in the decision process.