Software process simulation is a complex task and in order to conduct a simulation project practitioners require support through a process for software process simulation modelling (SPSM), including what steps to take and what guidelines to follow in each step. This paper provides a literature based consolidated process for SPSM where the steps and guidelines for each step are identified through a review of literature and are complemented by experience from using these recommendations in an action research at a large Telecommunication vendor. We found five simulation processes in SPSM literature, resulting in a seven-step process. The consolidated process was successfully applied at the studied company, with the experiences of doing so being reported.

Value stream mapping (VSM) has been successfully applied in the context of software process improvement. However, its current adaptations from Lean manufacturing focus mostly on the flow of artifacts and have taken no account of the essential information flows in software development. A solution specifically targeted toward information flow elicitation and modeling is FLOW. This paper aims to propose and evaluate the combination of VSM and FLOW to identify and alleviate information and communication related challenges in large-scale software development. Using case study research, FLOW-assisted VSM was used for a large product at Ericsson AB, Sweden. Both the process and the outcome of FLOW-assisted VSM have been evaluated from the practitioners’ perspective. It was noted that FLOW helped to systematically identify challenges and improvements related to information flow. Practitioners responded favorably to the use of VSM and FLOW, acknowledged the realistic nature and impact on the improvement on software quality, and found the overview of the entire process using the FLOW notation very useful. The combination of FLOW and VSM presented in this study was successful in systematically uncovering issues and characterizing their solutions, indicating their practical usefulness for waste removal with a focus on information flow related issues.

Systems of systems (SoS) are highly complex and are integrated on multiple levels (unit, component, system, system of systems). Many of the characteristics of SoS (such as operational and managerial independence, integration of system into system of systems, SoS comprised of complex systems) make their development and testing challenging. Contribution: This paper provides an understanding of SoS testing in large-scale industry settings with respect to challenges and how to address them. Method: The research method used is case study research. As data collection methods we used interviews, documentation, and fault slippage data. Results: We identified challenges related to SoS with respect to fault slippage, test turn-around time, and test maintainability. We also classified the testing challenges to general testing challenges, challenges amplified by SoS, and challenges that are SoS specific. Interestingly, the interviewees agreed on the challenges, even though we sampled them with diversity in mind, which meant that the number of interviews conducted was sufficient to answer our research questions. We also identified solution proposals to the challenges that were categorized under four classes of developer quality assurance, function test, testing in all levels, and requirements engineering and communication. Conclusion: We conclude that although over half of the challenges we identified can be categorized as general testing challenges still SoS systems have their unique and amplified challenges stemming from SoS characteristics. Furthermore, it was found that interviews and fault slippage data indicated that different areas in the software process should be improved, which indicates that using only one of these methods would have led to an incomplete picture of the challenges in the case company.

Context: The study selection process is critical to improve the reliability of secondary studies. Goal: To evaluate the selection strategies commonly employed in secondary studies in software engineering. Method: Building on these strate- gies, a study selection process was formulated and evalu- ated in a systematic review. Results: The selection process used a more inclusive strategy than the one typically used in secondary studies, which led to additional relevant articles. Conclusions: The results indicates that a good-enough sam- ple could be obtained by following a less inclusive but more efficient strategy, if the articles identified as relevant for the study are a representative sample of the population, and there is a homogeneity of results and quality of the articles.

Software security can be improved by identifying and correcting vulnerabilities. In order to reduce the cost of rework, vulnerabilities should be detected as early and efficiently as possible. Static automated code analysis is an approach for early detection. So far, only few empirical studies have been conducted in an industrial context to evaluate static automated code analysis. A case study was conducted to evaluate static code analysis in industry focusing on defect detection capability, deployment, and usage of static automated code analysis with a focus on software security. We identified that the tool was capable of detecting memory related vulnerabilities, but few vulnerabilities of other types. The deployment of the tool played an important role in its success as an early vulnerability detector, but also the developers perception of the tools merit. Classifying the warnings from the tool was harder for the developers than to correct them. The correction of false positives in some cases created new vulnerabilities in previously safe code. With regard to defect detection ability, we conclude that static code analysis is able to identify vulnerabilities in different categories. In terms of deployment, we conclude that the tool should be integrated with bug reporting systems, and developers need to share the responsibility for classifying and reporting warnings. With regard to tool usage by developers, we propose to use multiple persons (at least two) in classifying a warning. The same goes for making the decision of how to act based on the warning.

Software security risk analysis is an important part of improving software quality. In previous research we proposed countermeasure graphs (CGs), an approach to conduct risk analysis, combining the ideas of different risk analysis approaches. The approach was designed for reuse and easy evolvability to support agile software development. CGs have not been evaluated in industry practice in agile software development. In this research we evaluate the ability of CGs to support practitioners in identifying the most critical threats and countermeasures. The research method used is participatory action research where CGs were evaluated in a series of risk analyses on four different telecom products. With Peltier (used prior to the use of CGs at the company) the practitioners identified attacks with low to medium risk level. CGs allowed practitioners to identify more serious risks (in the first iteration 1 serious threat, 5 high risk threats, and 11 medium threats). The need for tool support was identified very early, tool support allowed the practitioners to play through scenarios of which countermeasures to implement, and supported reuse. The results indicate that CGs support practitioners in identifying high risk security threats, work well in an agile software development context, and are cost-effective.

Software security is an important quality aspect of a software system. Therefore, it is important to integrate software security touch points throughout the development life-cycle. So far, the focus of touch points in the early phases has been on the identification of threats and attacks. In this paper we propose a novel method focusing on the end product by prioritizing countermeasures. The method provides an extension to attack trees and a process for identification and prioritization of countermeasures. The approach has been applied on an open-source application and showed that countermeasures could be identified. Furthermore, an analysis of the effectiveness and cost-efficiency of the countermeasures could be provided.

Code reviews with static analysis tools are today recommended by several security development processes. Developers are expected to use the tools' output to detect the security threats they themselves have introduced in the source code. This approach assumes that all developers can correctly identify a warning from a static analysis tool (SAT) as a security threat that needs to be corrected. We have conducted an industry experiment with a state of the art static analysis tool and real vulnerabilities. We have found that average developers do not correctly identify the security warnings and only developers with specific experiences are better than chance in detecting the security vulnerabilities. Specific SAT experience more than doubled the number of correct answers and a combination of security experience and SAT experience almost tripled the number of correct security answers.

Component-based software systems require decisions on component origins for acquiring components. A component origin is an alternative of where to get a component from. Objective: To identify factors that could influence the decision to choose among different component origins and solutions for decision-making (For example, optimization) in the literature. Method: A systematic review study of peer-reviewed literature has been conducted. Results: In total we included 24 primary studies. The component origins compared were mainly focused on in-house vs. COTS and COTS vs. OSS. We identified 11 factors affecting or influencing the decision to select a component origin. When component origins were compared, there was little evidence on the relative (either positive or negative) effect of a component origin on the factor. Most of the solutions were proposed for in-house vs. COTS selection and time, cost and reliability were the most considered factors in the solutions. Optimization models were the most commonly proposed technique used in the solutions. Conclusion: The topic of choosing component origins is a green field for research, and in great need of empirical comparisons between the component origins, as well of how to decide between different combinations of them.

Background: Systematic literature studies are commonly used in software engineering. There are two main ways of conducting the searches for these type of studies; they are snowballing and database searches. In snowballing, the reference list (backward snowballing - BSB) and citations (forward snowballing - FSB) of relevant papers are reviewed to identify new papers whereas in a database search, different databases are searched using predefined search strings to identify new papers. Objective: Snowballing has not been in use as extensively as database search. Hence it is important to evaluate its efficiency and reliability when being used as a search strategy in literature studies. Moreover, it is important to compare it to database searches. Method: In this paper, we applied snowballing in a literature study, and reflected on the outcome. We also compared database search with backward and forward snowballing. Database search and snowballing were conducted independently by different researchers. The searches of our literature study were compared with respect to the efficiency and reliability of the findings. Results: Out of the total number of papers found, snowballing identified 83% of the papers in comparison to 46% of the papers for the database search. Snowballing failed to identify a few relevant papers, which potentially could have been addressed by identifying a more comprehensive start set. Conclusion: The efficiency of snowballing is comparable to database search. It can potentially be more reliable than a database search however, the reliability is highly dependent on the creation of a suitable start set.

Rubrics and oral feedback are approaches to help students improve performance and meet learning outcomes. However, their effect on the actual improvement achieved is inconclusive. This paper evaluates the effect of rubrics and oral feedback on student learning outcomes. An experiment was conducted in a software engineering course on requirements engineering, using the two approaches in course assignments. Both approaches led to statistically significant improvements, though no material improvement (i.e., a change by more than one grade) was achieved. The rubrics led to a significant decrease in the number of complaints and questions regarding grades.

Background: Software quality is complex with over investment, under investment and the interplay between aspects often being overlooked as many researchers aim to advance individual aspects of software quality. Aim: This paper aims to provide a consolidated overview the literature that addresses trade-offs between aspects of software product quality. Method: A systematic literature map is employed to provide an overview of software quality trade-off literature in general. Specific analysis is also done of empirical literature addressing the topic. Results: The results show a wide range of solution proposals being considered. However, there is insufficient empirical evidence to adequately evaluate and compare these proposals. Further a very large vocabulary has been found to describe software quality. Conclusion: Greater empirical research is required to sufficiently evaluate and compare the wide range of solution proposals. This will allow researchers to focus on the proposals showing greater signs of success and better support industrial practitioners.

Context: Value stream mapping (VSM) as a tool for lean development has led to significant improvements in different industries. In a few studies, it has been successfully applied in a software engineering context. However, some shortcomings have been observed in particular failing to capture the dynamic nature of the software process to evaluate improvements i.e. such improvements and target values are based on idealistic situations. Objective: To overcome the shortcomings of VSM by combining it with software process simulation modeling, and to provide reflections on the process of conducting VSM with simulation. Method: Using case study research, VSM was used for two products at Ericsson AB, Sweden. Ten workshops were conducted in this regard. Simulation in this study was used as a tool to support discussions instead of as a prediction tool. The results have been evaluated from the perspective of the participating practitioners, an external observer, and reflections of the researchers conducting the simulation that was elicited by the external observer. Results: Significant constraints hindering the product development from reaching the stated improvement goals for shorter lead time were identified. The use of simulation was particularly helpful in having more insightful discussions and to challenge assumptions about the likely impact of improvements. However, simulation results alone were found insufficient to emphasize the importance of reducing waiting times and variations in the process. Conclusion: The framework to assist VSM with simulation presented in this study was successfully applied in two cases. The involvement of various stakeholders, consensus building steps, emphasis on flow (through waiting time and variance analysis) and the use of simulation proposed in the framework led to realistic improvements with a high likelihood of implementation. (C) 2015 Elsevier B.V. All rights reserved.

Context Software process simulation modelling (SPSM) captures the dynamic behaviour and uncertainty in the software process. Existing literature has conflicting claims about its practical usefulness: SPSM is useful and has an industrial impact; SPSM is useful and has no industrial impact yet; SPSM is not useful and has little potential for industry. Objective To assess the conflicting standpoints on the usefulness of SPSM. Method A systematic literature review was performed to identify, assess and aggregate empirical evidence on the usefulness of SPSM. Results In the primary studies, to date, the persistent trend is that of proof-of-concept applications of software process simulation for various purposes (e.g. estimation, training, process improvement, etc.). They score poorly on the stated quality criteria. Also only a few studies report some initial evaluation of the simulation models for the intended purposes. Conclusion There is a lack of conclusive evidence to substantiate the claimed usefulness of SPSM for any of the intended purposes. A few studies that report the cost of applying simulation do not support the claim that it is an inexpensive method. Furthermore, there is a paramount need for improvement in conducting and reporting simulation studies with an emphasis on evaluation against the intended purpose.

Developing efficient and effective decision making support includes identifying means to reduce repeated manual work and providing possibilities to take advantage of the experience gained in previous decision situations. For this to be possible, there is a need to explicitly model the context of a decision case, for example to determine how much the evidence from one decision case can be trusted in another, similar context. In earlier work, context has been recognized as important when transferring and understanding outcomes between cases. The contribution of this paper is threefold. First, we describe different ways of utilizing context in an envisioned decision support system. Thereby, we distinguish between internal and external context usage, possibilities of context representation, and context inheritance. Second, we present a systematically developed context model comprised of five types of context information, namely organization, product, stakeholder, development method & technology, and market & business. Third, we exemplary illustrate the relation of the context information to architectural decision making using existing literature.

Background: In large-scale corporations in the software engineering context information overload problems occur as stakeholders continuously produce useful information on process life-cycle issues, matters related to specific products under development, etc. Information overload makes finding relevant information (e.g., how did the company apply the requirements process for product X?) challenging, which is in the primary focus of this paper. Contribution: In this study the authors aimed at evaluating the ease of implementing a semantic knowledge management system at Ericsson, including the essential components of such systems (such as text processing, ontologies, semantic annotation and semantic search). Thereafter, feedback on the usefulness of the system was collected from practitioners. Method: A single case study was conducted at a development site of Ericsson AB in Sweden. Results: It was found that semantic knowledge management systems are challenging to implement, this refers in particular to the implementation and integration of ontologies. Specific ontologies for structuring and filtering are essential, such as domain ontologies and ontologies distinct to the organization. Conclusion: To be readily adopted and transferable to practice, desired ontologies need to be implemented and integrated into semantic knowledge management frameworks with ease, given that the desired ontologies are dependent on organizations and domains.

This paper presents the construction and evaluation of SERP-test, a taxonomy aimed to improve communication between researchers and practitioners in the area of software testing. SERP-test can be utilized for direct communication in industry academia collaborations. It may also facilitate indirect communication between practitioners adopting software engineering research and researchers who are striving for industry relevance. SERP-test was constructed through a systematic and goal-oriented approach which included literature reviews and interviews with practitioners and researchers. SERP-test was evaluated through an online survey and by utilizing it in an industryâacademia collaboration project. SERP-test comprises four facets along which both research contributions and practical challenges may be classified: Intervention, Scope, Effect target and Context constraints. This paper explains the available categories for each of these facets (i.e., their definitions and rationales) and presents examples of categorized entities. Several tasks may benefit from SERP-test, such as formulating research goals from a problem perspective, describing practical challenges in a researchable fashion, analyzing primary studies in a literature review, or identifying relevant points of comparison and generalization of research.

Background: Lean Software Development (LSD) aims for improvement, yet this improvement requires measures to identify whether a difference has been achieved, and provide decision support for further improvement. Objective: This study identifies measures and indicators proposed in literature on LSD, then structures them according to ISO/IEC 15939, allowing for comparability due to a use of a standard. Method: Systematic mapping is the research methodology. Result: The published literature on LSD measures has significantly increased since 2010. The two pre-dominant study types are evaluation research and experience reports. 22 base measures, 13 derived measures, and 14 indicators were identified. Conclusion: Gaps exist with respect to LSD principles. In particular: deferring commitment, respecting people and knowledge creation. The principle of delivering fast is well supported.

Context: The global software industry and the software engineering (SE) academia are two large communities. However, unfortunately, the level of joint industry-academia collaborations in SE is still relatively very low, compared to the amount of activity in each of the two communities. It seems that the two âcampsâ show only limited interest/motivation to collaborate with one other. Many researchers and practitioners have written about the challenges, success patterns (what to do, i.e., how to collaborate) and anti-patterns (what not do do) for industry-academia collaborations. Objective: To identify (a) the challenges to avoid risks to the collaboration by being aware of the challenges, (b) the best practices to provide an inventory of practices (patterns) allowing for an informed choice of practices to use when planning and conducting collaborative projects. Method: A systematic review has been conducted. Synthesis has been done using grounded-theory based coding procedures. Results: Through thematic analysis we identified 10 challenge themes and 17 best practice themes. A key outcome was the inventory of best practices, the most common ones recommended in different contexts were to hold regular workshops and seminars with industry, assure continuous learning from industry and academic sides, ensure management engagement, the need for a champion, basing research on real-world problems, showing explicit benefits to the industry partner, be agile during the collaboration, and the co-location of the researcher on the industry side. Conclusion: Given the importance of industry-academia collaboration to conduct research of high practical relevance we provide a synthesis of challenges and best practices, which can be used by researchers and practitioners to make informed decisions on how to structure their collaborations.

Software organizations face challenges in managing and sustaining their measurement programs over time. The complexity of measurement programs increase with exploding number of goals and metrics to collect. At the same time, organizations usually have limited budget and resources for metrics collection. It has been recognized for quite a while that there is the need for prioritizing goals, which then ought to drive the selection of metrics. On the other hand, the dynamic nature of the organizations requires measurement programs to adapt to the changes in the stakeholders, their goals, information needs and priorities. Therefore, it is crucial for organizations to use structured approaches that provide transparency, traceability and guidance in choosing an optimum set of metrics that would address the highest priority information needs considering limited resources. This paper proposes a decision support framework for metrics selection (DSFMS) which is built upon the widely used Goal Question Metric (GQM) approach. The core of the framework includes an iterative goal-based metrics selection process incorporating decision making mechanisms in metrics selection, a pre-defined Attributes/Metrics Repository, and a Traceability Model among GQM elements. We also discuss alternative prioritization and optimization techniques for organizations to tailor the framework according to their needs. The evaluation of the GQM-DSFMS framework was done through a case study in a CMMI Level 3 software company.

Context: Testing techniques proposed in the literature rely on various sources of information for test case selection (e.g., require- ments, source code, system structure, etc.). The challenge of test selection is amplified in the context of heterogeneous systems, where it is unknown which information/data sources are most important. Contribution: (1) Achieve in-depth understanding of test processes in heterogeneous systems; (2) Elicit information sources for test selection in the context of heterogeneous systems. (3) Capture the relative importance of the identified information sources. Method: Case study research is used for the elicitation and understanding of which information sources are relevant for test case privatization, followed by an exploratory survey capturing the relative importance of information sources for testing heterogeneous systems. Results: We classified different information sources that play a vital role in the test selection process, and found that their importance differs largely for the different test levels observed in heterogeneous testing. However, overall all sources were considered essential in test selection for heterogeneous systems. Conclusion: Heterogeneous system testing requires solutions that take all information sources into account when suggesting test cases for selection. Such approaches need to be developed and compared with existing solutions.

During exploratory testing sessions the tester simultaneously learns, designs and executes tests. The activity is iterative and utilizes the skills of the tester and provides flexibility and creativity. Test charters are used as a vehicle to support the testers during the testing. The aim of this study is to support practitioners in the design of test charters through checklists. We aimed to identify factors allowing practitioners to critically reflect on their designs and contents of test charters to support practitioners in making informed decisions of what to include in test charters. The factors and contents have been elicited through interviews. Overall, 30 factors and 35 content elements have been elicited.

Exploratory testing (ET) is a powerful and efficient way of testing software by integrating design, execution, and analysis of tests during a testing session. ET is often contrasted with scripted testing, and seen as a choice of either exploratory testing or not. In contrast, we pose that exploratory testing can be of varying degrees of exploration from fully exploratory to fully scripted. In line with this, we propose a scale for the degree of exploration and define five levels. In our classification, these levels of exploration correspond to the way test charters are defined. We have evaluated this classification through focus groups at four companies and identified factors that influence the choice of exploration level. The results show that the proposed levels of exploration are influenced by different factors such as ease to reproduce defects, better learning, verification of requirements, etc., and that the levels can be used as a guide to structure test charters. Our study also indicates that applying a combination of exploration levels can be beneficial in achieving effective testing.

Heterogeneous systems comprising sets of inherent subsystems are challenging to integrate. In particular, testing for interoperability and conformance is a challenge. Furthermore, the complexities of such systems amplify traditional testing challenges. We explore (1) which techniques are frequently discussed in literature in context of heterogeneous system testing that practitioners use to test their heterogeneous systems; (2) the perception of the practitioners on the usefulness of the techniques with respect to a defined set of outcome variables. For that, we conducted an exploratory survey. A total of 27 complete survey answers have been received. Search-based testing has been used by 14 out of 27 respondents, indicating the practical relevance of the approach for testing heterogeneous systems, which itself is relatively new and has only recently been studied extensively. The most frequently used technique is exploratory manual testing, followed by combinatorial testing. With respect to the perceived performance of the testing techniques, the practitioners were undecided regarding many of the studied variables. Manual exploratory testing received very positive ratings across outcome variables.

Background: The need for empirical investigations in software engineering is growing. Many researchers nowadays, conduct and validate their solutions using empirical research. The Survey is an empirical method which enables researchers to collect data from a large population. The main aim of the survey is to generalize the findings.

Aims: In this study, we aim to identify the problems researchers face during survey design and mitigation strategies.

Method: A literature review, as well as semi-structured interviews with nine software engineering researchers, were conducted to elicit their views on problems and mitigation strategies. The researchers are all focused on empirical software engineering.

Results: We identified 24 problems and 65 strategies, structured according to the survey research process. The most commonly discussed problem was sampling, in particular, the ability to obtain a sufficiently large sample. To improve survey instrument design, evaluation and execution recommendations for question formulation and survey pre-testing were given. The importance of involving multiple researchers in the analysis of survey results was stressed.

Conclusions: The elicited problems and strategies may serve researchers during the design of their studies. However, it was observed that some strategies were conflicting. This shows that it is important to conduct a trade-off analysis between strategies.

Background: Cost avoidance through reuse shows the benefits gained by the software organisations when reusing an artefact. Cost avoidance captures benefits that are not captured by cost savings e.g. spending that would have increased in the absence of the cost avoidance activity. This type of benefit can be combined with quality aspects of the product e.g. costs avoided because of defect prevention. Cost avoidance is a key driver for software reuse. Objectives: The main objectives of this study are: (1) To assess the status of capturing cost avoidance through reuse in the academia; (2) Based on the first objective, propose improvements in capturing of reuse cost avoidance, integrate these into an instrument, and evaluate the instrument in the software industry. Method: The study starts with a systematic literature review (SLR) on capturing of cost avoidance through reuse. Later, a solution is proposed and evaluated in the industry to address the shortcomings identified during the systematic literature review. Results: The results of a systematic literature review describe three previous studies on reuse cost avoidance and show that no solution, to capture reuse cost avoidance, was validated in industry. Afterwards, an instrument and a data collection form are proposed that can be used to capture the cost avoided by reusing any type of reuse artefact. The instrument and data collection form (describing guidelines) were demonstrated to a focus group, as part of static evaluation. Based on the feedback, the instrument was updated and evaluated in industry at 6 development sites, in 3 different countries, covering 24 projects in total. Conclusion: The proposed solution performed well in industrial evaluation. With this solution, practitioners were able to do calculations for reuse costs avoidance and use the results as decision support for identifying potential artefacts to reuse.

Studies report on the negative effect on quality in global software development (GSD) due to communication and coordination-related challenges. However, empirical studies reporting on the magnitude of the effect are scarce. This paper presents findings from an embedded explanatory case study on the change in quality over time, across multiple releases, for products that were developed in a GSD setting. The GSD setting involved periods of distributed development between geographically dispersed sites as well as a handover of project management responsibilities between the involved sites. Investigations were performed on two medium-sized products from a company that is part of a large multinational corporation. Quality is investigated quantitatively using defect data and measures that quantify two source code properties, size and complexity. Observations were triangulated with subjective views from company representatives. There were no observable indications that the distribution of work or handover of project management responsibilities had an impact on quality on both products. Among the product-, process- and people-related success factors, we identified well-designed product architectures, early handover planning and support from the sending site to the receiving site after the handover and skilled employees at the involved sites. Overall, these results can be useful input for decision-makers who are considering distributing development work between globally dispersed sites or handing over project management responsibilities from one site to another. Moreover, our study shows that analyzing the evolution of size and complexity properties of a product’s source code can provide valuable information to support decision-making during similar projects. Finally, the strategy used by the company to relocate responsibilities can also be considered as an alternative for software transfers, which have been linked with a decline in efficiency, productivity and quality.

The link between maintenance and product quality, as well as the high cost of software maintenance, highlights the importance of efficient maintenance processes. Sustaining maintenance work efficiency in a global software development setting that involves a transfer is a challenging endeavor. Studies report on the negative effect of transfers on efficiency. However, empirical evidence on the magnitude of the change in efficiency is scarce. In this study we used a lean indicator to visualize variances in defect resolution cycles for two large products during evolution, before, during and after a transfer. Focus group meetings were also held for each product. Study results show that during and immediately after the transfer the defect inflow is higher, bottlenecks are more visible, and defect resolution cycles are longer, as compared to before the transfer. Furthermore we highlight the factors that influenced the change in defect resolution cycles before, during, and after the transfer.

Evidence-based software engineering (EBSE) provides a process for solving practical problems based on a rigorous research approach. The primary focus so far was on mapping and aggregating evidence through systematic reviews. Objectives: We extend existing work on evidence-based software engineering by using the EBSE process in an industrial case to help an organization to improve its automotive testing process. With this we contribute in (1) providing experiences on using evidence based processes to analyze a real world automotive test process and (2) provide evidence of challenges and related solutions for automotive software testing processes. Methods: In this study we perform an in-depth investigation of an automotive test process using an extended EBSE process including case study research (gain an understanding of practical questions to define a research scope), systematic literature review (identify solutions through systematic literature), and value stream mapping (map out an improved automotive test process based on the current situation and improvement suggestions identified). These are followed by reflections on the EBSE process used. Results: In the first step of the EBSE process we identified 10 challenge areas with a total of 26 individual challenges. For 15 out of those 26 challenges our domain specific systematic literature review identified solutions. Based on the input from the challenges and the solutions, we created a value stream map of the current and future process. Conclusions: Overall, we found that the evidence-based process as presented in this study helps in technology transfer of research results to industry, but at the same time some challenges lie ahead (e.g. scoping systematic reviews to focus more on concrete industry problems, and understanding strategies of conducting EBSE with respect to effort and quality of the evidence).

Value Stream Mapping is one of the several Lean practices, which has recently attracted interest in the software engineering community. In other contexts (such as military, health, production), Value Stream Mapping has achieved considerable improvements in processes and products. The goal is to also leverage on these benefits in the software intensive product development context. The primary contribution is that we are extending the definition of waste to fit in the software intensive product development context. As traditionally in Value Stream Mapping everything that is not considered valuable is waste, we do this practically by looking at value beyond the customer perspective, and using the Software Value Map. A detailed illustration, via application in an industrial case at Ericsson AB, demonstrates usability and usefulness of the proposed extension. The case study results consist of two parts. First, the instantiation and motivations for selecting certain strategies have been provided. Second, the outcome of the value stream map is described in detail. Overall, the conclusion is that this case study indicates that Value Stream Mapping and the integration with the Software Value Map is useful in a software intensive product development context. In a retrospective the value stream approach was perceived positively by the practitioners with respect to process and outcome.

Agile methodologies are often not used "out of the box" by practitioners, instead they select the practices that fit their needs best. However, little is known which agile practices the practitioners choose. This study investigates agile practice adoption by asking practitioners which practices they are using on project and organizational level. We investigated how commonly used individual agile practices are, combinations of practices and their frequency of usage, as well as the degree of compliance to agile methodologies (Scrum and XP), and as how successful practitioners perceive the adoption. The research method used is survey. The survey has been sent to over 600 respondents, and has been posted on LinkedIn, Yahoo, and Google groups. In total 109 answers have been received. Practitioners can use the knowledge of the commonality of individual practices and combinations of practices as support in focusing future research efforts, and as decision support in selecting agile practices

Context: Search-based software testing promises to provide users with the ability to generate high quality test cases, and hence increase product quality, with a minimal increase in the time and effort required. The development of the Interactive Search-Based Software Testing (ISBST) system was motivated by a previous study to investigate the application of search-based software testing (SBST) in an industrial setting. ISBST allows users to interact with the underlying SBST system, guiding the search and assessing the results. An industrial evaluation indicated that the ISBST system could find test cases that are not created by testers employing manual techniques. The validity of the evaluation was threatened, however, by the low number of participants. Objective: This paper presents a follow-up study, to provide a more rigorous evaluation of the ISBST system. Method: To assess the ISBST system a two-way crossover controlled experiment was conducted with 58 students taking a Verification and Validation course. The NASA Task Load Index (NASA-TLX) is used to assess the workload experienced by the participants in the experiment. Results:The experimental results validated the hypothesis that the ISBST system generates test cases that are not found by the same participants employing manual testing techniques. A follow-up laboratory experiment also investigates the importance of interaction in obtaining the results. In addition to this main result, the subjective workload was assessed for each participant by means of the NASA-TLX tool. The evaluation showed that, while the ISBST system required more effort from the participants, they achieved the same performance. Conclusions: The paper provides evidence that the ISBST system develops test cases that are not found by manual techniques, and that interaction plays an important role in achieving that result. (C) 2016 Elsevier B.V. All rights reserved.

Testing plays a vital role for assuring software quality. Among the activities performed during testing process, test cases generation is a challenging and labor intensive task. Test case generation techniques based on UML models are getting the attention of researchers and practitioners. This study provides a systematic mapping of test case generation techniques based on interaction diagrams. The study compares the test case generation techniques, regarding their capabilities and limitations, and it also assesses the reporting quality of the primary studies. It has been revealed that UML interaction diagrams based techniques are mainly used for integration testing. The majority of the techniques are using sequence diagrams as input models, while some are using collaboration. A notable number of techniques are using interaction diagram along with some other UML diagram for test case generation. These techniques are mainly focusing on interaction, scenario, operational, concurrency, synchronization and deadlock related faults.

From the results of this study, we can conclude that the studies presenting test case generation techniques using UML interaction diagrams failed to illustrate the use of rigorous methodology, and these techniques did not demonstrate the empirical evaluation in an industrial context. Our study revealed the need for tool support to facilitate the transfer of solutions to industry.

Context: Regression testing is a well-researched area. However, the majority regression testing techniques proposed by the researchers are not getting the attention of the practitioners. Communication gaps between industry and academia and disparity in the regression testing goals are the main reasons. Close collaboration can help in bridging the communication gaps and resolving the disparities.Objective: The study aims at exploring the views of academics and practitioners about the goals of regression testing. The purpose is to investigate the commonalities and differences in their viewpoints and defining some common goals for the success of regression testing.Method: We conducted a focus group study, with 7 testing experts from industry and academia. 4 testing practitioners from 2companies and 3 researchers from 2 universities participated in the study. We followed GQM approach, to elicit the regression testing goals, information needs, and measures.Results: 43 regression testing goals were identified by the participants, which were reduced to 10 on the basis of similarity among the identified goals. Later during the priority assignment process, 5 goals were discarded, because the priority assigned to these goals was very low. Participants identified 47 information needs/questions required to evaluate the success of regression testing with reference to goal G5 (confidence). Which were then reduced to10 on the basis of similarity. Finally, we identified measures to gauge those information needs/questions, which were corresponding to the goal (G5).Conclusions: We observed that participation level of practitioners and researchers during the elicitation of goals and questions was same. We found a certain level of agreement between the participants regarding the regression testing definitions and goals.But there was some level of disagreement regarding the priorities of the goals. We also identified the need to implement a regression testing evaluation framework in the participating companies.

Context: A majority of the regression testing techniques proposed by the research have not been adopted in industry. To increase adoption rates, we need to better understand the practitioners' perspectives on regression testing.

Objective: This study aims at exploring the regression testing state of practice in the large-scale embedded software development. The study has two objectives, 1) to highlight the potential challenges in practice, and 2) to identify the industry-relevant research areas regarding regression testing.

Method: We conducted a qualitative study in two large-scale embedded software development companies, where we carried out semi-structured interviews with representatives from five software testing teams. We did conduct the detailed review of the process documentation of the companies to complement/validate the findings of the interviews.

Results: Mostly, the practitioners run regression testing with a selected scope, the selection of scope depends upon the size, complexity, and location of the change. Test cases are prioritized on the basis of risk and critical functionality. The practitioners rely on their knowledge and experience for the decision making regarding selection and prioritization of test cases.The companies are using both automated and manual regression testing, and mainly they rely on in-house developed tools for test automation. The challenges identified in the companies are: time to test, information management, test suite maintenance, lack of communication, test selection/prioritization, lack of assessment, etc. The proposed improvements are in line with the identified challenges. Regression testing goals identified in this study are customer satisfaction, critical defect detection, confidence, effectiveness, efficiency, and controlled slip through of faults.

Conclusions: Considering the current state of practice and identified challenges we conclude that there is a need to reconsider the regression test strategy in the companies. Researchers need to analyze the industry perspective while proposing new regression testing techniques. The industry-academia collaboration projects would be a good platform in this regard.

Context: Research quality is intended to assess the design and reporting of studies. It comprises a series of concepts such as methodological rigor, practical relevance, and conformance to ethical standards. Depending on the perspective, different views of importance are given to the conceptual dimensions of research quality.

Objective: We aim to better understand what constitutes research quality from the perspective of the empirical software engineering community. In particular, we intend to assess the level of alignment between researchers with regard to a conceptual model of research quality.

Method: We conducted a mixed methods approach comprising an internal case study and a complementary focus group. We carried out a hierarchical voting prioritization based on the conceptual model to collect relative values for importance. In the focus group, we also moderate discussions with experts to address potential misalignment.

Results: We provide levels of alignment with regard to the importance of quality dimensions in the view of the participants. Moreover, the conceptual model fairly expresses the quality of research but has limitations with regards the structure and description of its components.

Conclusion: Based on the results, we revised the conceptual model and provided an updated version adjusted to the context of empirical software engineering research. We also discussed how to assess quality alignment in research using our approach, and how to use the revised model of quality to characterize an assessment instrument.

Context: Research quality is intended to assess the design and reporting of studies. It comprises a series of concepts such as methodological rigor, practical relevance, and conformance to ethical standards. Depending on the perspective, different views of importance are given to the conceptual dimensions of research quality.

Objective: We intend to assess the level of alignment between researchers with regard to a conceptual model of research quality. This includes aligning the definition of research quality and reasoning on the relative importance of quality characteristics.

Method: We conducted a mixed methods approach comprising an internal case study and a complementary focus group. We carried out a hierarchical voting prioritization based on the conceptual model to collect relative values for importance. In the focus group, we also moderate discussions with experts to address potential misalignment.

Results: The alignment at the research group level was higher compared to that at community level. Moreover, the interdisciplinary conceptual quality model was seeing to express fairly the quality of research, but presented limitations regarding its structure and components' description, which resulted in an updated model.

Conclusion: The interdisciplinary model used was suitable for the software engineering context. The process used for reflecting on the alignment of quality with respect to definitions and priorities was working well.

Context: Over the past decade Software Engineering research has seen a steady increase in survey-based studies, and there are several guidelines providing support for those willing to carry out surveys. The need for auditing survey research has been raised in the literature. Checklists have been used to assess different types of empirical studies, such as experiments and case studies.

Objective: This paper proposes a checklist to support the design and assessment of survey-based research in software engineering grounded in existing guidelines for survey research. We further evaluated the checklist in the research practice context.

Method: To construct the checklist, we systematically aggregated knowledge from 12 methodological studies supporting survey-based research in software engineering. We identified the key stages of the survey process and its recommended practices through thematic analysis and vote counting. To improve our initially designed checklist we evaluated it using a mixed evaluation approach involving experienced researchers.

Results: The evaluation provided insights regarding the limitations of the checklist in relation to its understanding and objectivity. In particular, 19 of the 38 checklist items were improved according to the feedback received from its evaluation. Finally, a discussion on how to use the checklist and what its implications are for research practice is also provided.

Conclusion: The proposed checklist is an instrument suitable for auditing survey reports as well as a support tool to guide ongoing research with regard to the survey design process.

Context: Given the current state of the art in research, practitioners are faced with the challenge of choosing scripted testing (ST) or exploratory testing (ET). Objective: This study aims at systematically incorporating strengths of ET and ST in a hybrid testing process to overcome the weaknesses of each. Method: We utilized systematic review and practitioner interviews to identify strengths and weaknesses of ET and ST. Strengths of ET were mapped to weaknesses of ST, and vice versa. Noblit and Hare’s Lines of Argument method was used for data analysis. The results of the mapping were used as input to co-design a hybrid process with experienced practitioners. Results: We found a clear need to create a hybrid process as: 1) both ST and ET provide strengths and weaknesses and these depend on some particular conditions, which prevents preference of one approach to another, and 2) the mapping showed that it is possible to address the weaknesses in one process by the strengths of the other in a hybrid form. With the input from literature and industry experts a flexible and iterative hybrid process was designed. Conclusions: Practitioners can clearly benefit from using a hybrid process given the mapping of advantages and disadvantages.