There is no denying that large-scale software systems evolve. Software developers are routinely faced with new features and bugs that drive essential changes. Oftentimes, they are not even the original authors of the code they need to change. They need to piece together many aspects to realize these changes whilst juggling various project parameters such as quality assurance and deadlines. The knowledge space for this existential activity typically spans several artifacts from the problem (e.g., requirements) and solution (design and implementation) domains. The explicit connectivity among these artifacts is often missing in practice [1, 2], which further adds to the developer’s plea. Questions that are left to be answered include “Where is the relevant code to this feature/bug?” or “What would be the design impact due to a change request?”. The promise of a software traceability tool is to establish (recover) and maintain (evolve) links among artifacts as software evolves [3]. Traceability benefits are projected in several key software development tasks such as program comprehension [4, 5] and impact analysis [6, 7], which address the aforementioned questions.

Over the years, software engineering researchers have proposed techniques for automatically recovering and maintaining (explicit) traceability links among software artifacts. The application of Information Retrieval (IR) techniques is a popular and heavily experimented choice. It is an artifact-centric approach that has an underlying model of textual similarities in software artifacts. The expressiveness, effectiveness, and usefulness rely on the artifacts, and their state of availability and quality during the software’s lifecycle.

What We Did

Our research direction was to investigate a human-centric approach to traceability. The underlying premise of our approach is based on what humans look at while they are performing software-engineering tasks, including those of bug fixes and implementing new features. The prerequisite to such an approach is that it should be unobtrusive to developers and blend into the background. We use an eye tracker to collect developers' gazes on software artifacts while they work on their tasks within the IDE (we use Eclipse; however, the same concept applies to others). Our eye tracking infrastructure is called iTrace [8, 9]. It seamlessly works within the IDE to map eye gazes to source code elements on large files in the presence of scrolling and context switching between files. A preliminary version of iTrace is available at https://github.com/Sereslab/iTrace-Archive. An enhanced release with additional support for IDEs and Web browsers is planned for the near future.

We first did a pilot study [10] to determine the feasibility of using eye gaze for traceability link recovery. After seeing promising results, we conducted a larger, realistic study [11] with thirteen software developers who were asked to perform bug-localization tasks for eight submitted bug reports in Jabref (an open-source reference management system). The gaze-link algorithm uses gaze data of developers during the session using certain heuristics and weights. In a bug fixing task for example, the main premise is that as you get closer to fixing a bug you will focus on the part(s) of the code that is most related to the bug report, thus having more weight. We compared trace link results of our gaze-link algorithm with IR methods such as Latent Semantic Indexing (LSI) and Vector Space Model (VSM).

What We Found

The gaze-link algorithm outperforms both LSI and VSM in terms of precision and recall with respect to the commit oracle (how the JabRef developers fixed the bug). We recorded an average precision of 55% and recall of 67% for all tasks. The gaze-link algorithm outperformed in 6 out of the 8 tasks. Another set of developers found the links generated with iTrace to be significantly more useful than the IR links in a majority of the tasks. The gaze-link algorithm underperforms when the developer prematurely attempts to fix the bug without the adequate understanding of the bug or its solution. Interestingly and perhaps surprisingly, the gazes captured from a developer in cases where they did not quite fix the bug, but was close enough, ended up being helpful to another developer as a starting point towards eventually fixing the bug.

See Table 1 for an example of links generated for a bug report across the different approaches at both class and method level granularity. The gaze algorithm is crucial to weeding out irrelevant and stray glances (shown in ETraw). Results from the gaze link algorithm (ETweighted) are more specific than the rankings from current IR methods (LSI and VSM).

Table 1. An example of class and method level source code elements found linked to Bug ID: 1489454 titled “Acrobat Launch fails on Win 98”. Variables are shown in square brackets. ETraw represents raw eye tracking data before the algorithm is run, ETweighted is the result of running gaze-link on the gazes. The Commit is our oracle representing the fix by the JabRef developer.

Concluding Remarks

The eye tracker captures the gaze data without any additional effort on the part of the developers. This property allows it to be as effortless as we can get to provide traceability under the hood while developers work. An added benefit of using the eye tracker is that we also learn about those hidden links that information retrieval methods have a hard time finding as they are related to tacit developer knowledge in various related code entities. Imagine a software development world where your gazes could inform things you do. The transparency and minimal effort required by developers makes gaze tracking an attractive possibility. It is reasonable to imagine a future in which eye trackers capture information while we work to help us with many more tasks than just software traceability.

Several thousands of developers daily head to Stack Overflow (SO) for asking technical questions, hoping to receive swift help and fix the issues that they have been facing. To increase the chances of getting help from others, the SO community provides members with detailed guidelines on how to write more effective questions (e.g., see [9]). These official recommendations also include those provided by Jon Skeet, the highest reputation member, whose guidelines have become over time a de facto standard for the community [10].

For example, SO states that the site is “all about getting answers. It's not a discussion forum. There's no chit-chat.” Thus, askers are recommended to avoid “greetings and sign-offs […], as they’re basically a distraction,” which are also supposed to be edited out by other users [10]. Still, many askers finish their questions showing gratitude in advance towards potential helpers. Why do they go against this explicit recommendation? Are they just unaware of it or do they feel that having a positive attitude may attract more potential solutions?

In our work [5], we provide an evidence-based netiquette for writing effective questions by empirically validating several SO guidelines, retrieved from both the community and previous empirical studies on Q&A sites. Specifically, we analyzed a dataset of 87K questions by combining a logistic regression analysis with a user survey, first, to estimate the effect of these guidelines on the probability of receiving a successful answer and, then, to compare their actual effectiveness to that perceived by SO users.

Actionable Factors for Good Questions

Figure 1. Our conceptual framework of success factors for writing good questions in Stack Overflow. The success of a question is defined as the probability of receiving an answer that is accepted as solution. Affect, Presentation Quality, and Time are the three actionable success factors of interest, while Reputation (non-actionable) serves as a control factor. Each success factor is related to several metrics used in turn to inform guidelines.

We developed a conceptual framework (Figure 1) for the analysis of the aspects that may influence the success of a question. We focused on 9 actionable metrics that can be acted upon by developers when writing a question and, therefore, are useful to inform guidelines. These metrics are grouped into 3 success factors, concerning affect (i.e., the positive/negative tone conveyed by a question), presentation quality (i.e., the readability and comprehensibility of its text), and time (i.e., when to post it). As per the asker’s reputation, it is included in our model, but only as a non-actionable control factor. While reputation has been already found to help people receive more and fasters answers in other communities such as Reddit [1], one can’t really do anything to increase their score just when posting a question.

Findings: Evidence vs. User Perception

Table 1. The evidence-based netiquette for effective question writing on Stack Overflow. Guidelines are shown in bold when supported by evidence.

Table 1 reports the main findings of our study, from which some interesting observations emerge. For 4 out of the 8 guidelines studied, we found that the perceived and actual effectiveness match. In particular, SO users are correct to think that minding their tone (#1), providing snippets with examples (#2), and avoiding the inappropriate use of capital letters (#3) all increase the probability to receive a successful answer to their questions. Instead, the use of short titles is neither perceived nor found to be an effective guideline (#5).

Perhaps even more interesting considerations arise from the remaining cases of mismatch, though. First, SO users seem to be unaware that writing questions concisely (#4) and posting them during GMT evening hours (#8) increase the chances of a question to be resolved. Regarding efficiency times, Bosu et al. [4] have speculated that these time slices are the most successful because they correspond to the working time in the USA. Second, contrary to users’ perception, we found that providing context for questions by adding extra tags (#6) and including in the text URLs to external resources (#7) have no positive effects on the chance of getting a successful answer.

Final Remarks

One of the greatest issues with Stack Overflow is the sheer number of unresolved questions. Currently, the site hosts almost 15 million questions of which about 7 million are still unresolved. Helping users ask “better” questions can increase the number of those resolved. One way to do so is increasing the awareness among community members about the existence of effective question-writing guidelines while also trimming down the list to only those supported by evidence.

For more information about our study, please refer to our paper “How to Ask for Technical Help? Evidence-based Guidelines for Writing Questions on Stack Overflow” [5].

Monday, November 13, 2017

The September/October Issue of IEEE Software, top magazine for all things software, again delivers a range of interesting topics for thought and discussion in the SE community. The topics discussed in this issue included software requirements and testing, DevOps, gamification, and software architecture.

The feature topic in this issue of IEEE Software was software testing. This issue featured the following articles related to software testing:

For those interested in getting a quick overview of how testing is used and viewed in the software community, "Software Testing: The State of the Practice" and "Worlds Apart: Industrial and Academic Focus Areas in Software Testing" are great articles to read.

In "Software Testing: The State of the Practice", Kassab and colleagues conducted a comprehensive survey of software practitioners to gain a much needed understanding of the state-of-the-art in software testing and quality assurance practices.

Their ultimate goal was to gather and be able to disseminate best practices to community based on their findings, such as practicing test-driven development.

In "Worlds Apart: Industrial and Academic Focus Areas in Software Testing", Garousi and Felderer discusses the problem of the often disparate efforts between industry and academy when it comes to academia doing research that matters and industry applying this research to their work.

Their focus was on software testing and how we can improve industry-academic collaborations.

They observed titles of presentations given at industrial and academic conferences and, using Wordle to create word clouds, found that these communities often focus on different aspects of testing.

For example, industry conferences most often talk about automation while academic conferences most often talk about models. The authors make suggestions, such as inviting practitioners to research-intensive SE conferences to gain their perspective on the research we're doing.

Some of the articles in this issue focus on a specific subset of testing, such as GUI testing.

In "Adaptive Virtual Gestures for GUI Testing on Smartphones", Hsu and colleagues propose an approach to testing mobile software called adaptive GUI testing (AGT). AGT allows for faster cross-device testing and is based on touch events known as visually oriented gestures (VOG).

Along the same lines, in "Replicating Rare Software Failures with Exploratory Visual GUI Testing", Elégroth and colleagues discuss the benefits of visual GUI testing (VGT) based on their own experiences. They speak about how VGT can be used to replicate failures and push forward analysis of infrequent or nondeterministic failures.

For aspiring software engineers out there, Evgeny Shadchnev discussed code schools, or programs available that prepare students to become software developers in a few months.

For those in management positions, or aspiring to be a manager one day, Ron Lichty joined SE Radio to discuss difficulties and suggestions for managing programmers. He also provided a link to his blog on the topic. Also useful is the information provided by Harsh Sinha, VP of TransferWise, on another type of management known as product management.

Monday, November 6, 2017

40 years after Requirements Engineering (RE) was acknowledged for the first time as an independent discipline in an issue of the Transactions of Software Engineering, it has received much attention in research and practice due to its importance to software project success. The importance of RE cannot be refuted as many decisions in software projects are rooted therein; same holds for the problems. As Nancy Leveson (MIT) was cited in an article in The Atlantic:

The serious problems that have happened with software have to do with requirements, not coding errors.

In fact, it has become conventional wisdom that many problems emerge from "insufficient RE" and that the later the problems are discovered, the harder (and, thus, more expensive) they become to fix. Yet it remains difficult to obtain reliable empirical figures that would describe what "insufficient RE" exactly means, how it manifests in the processes and artefacts created, and what root causes and effects this has. Such figures are, however, critical determinants for a problem-driven research, i.e., to support contributions that are in tune with the problems they intend to solve. As a matter of fact, the state of empirical evidence in RE is still weak and much of everyday industrial practices and research as well are both dominated by wisdom and beliefs rather than being governed by empirical evidence. This results in research contributions with a potentially low practical impact and, in the end, a continuously increasing disconnect between research and practice.

The Naming the Pain in Requirements Engineering Initiative

Motivated by this situation where we need a stronger body of knowledge about the state of the industrial practice in RE , we initiated the Naming the Pain in Requirements Engineering (short: NaPiRE) initiative in 2012. The initiative constitutes a globally distributed family of practitioner surveys on Requirements Engineering (RE) with the overall objective to build a holistic theory on industrial practices, trends, and problems. It is run by the RE research community with the purpose of serving researchers and practitioners alike and is, in fact, the first of its kind. In a nutshell, each survey replication aims at distilling

the status quo in company practices and industrial experiences,

problems and how those problems manifest themselves in the process, and

what potential success factors for RE are.

To achieve the long-term objective of increasing the practical impact of research, the community behind NaPiRE has committed itself to open science principles. All publications, but also results obtained from the studies, are open to the public, including the anonymised raw data, codebooks, and analysis scripts. This shall support other researchers in running independent data analyses, interpretations, and replications, and it helps practitioners in assessing their own current situation in context of a broad picture illustrating overall industrial practices.

Current State of NaPiRE

While our first survey round focused on surveying German companies and had a first replication in the Netherlands, the second replication was run in 2014/15 and already took place in 10 countries with the support of 23 researchers. That run yielded fruitful insights into contemporary problems practitioners experience as well as into the criticality of those problems as shown in the following chart.

Top Problems in Requirements Engineering as reported in "Naming the Pain in Requirements Engineering: Contemporary Problems, Causes, and Effects in Practice" (Empirical Software Engineering Journal)[/caption] The colour coding visualises the criticality of the problems in the sense of illustrating the extent to which problems are seen as the main reason for project failure. We can see, for example, that incomplete requirements constitute the most frequently stated problem. At the same time, we can also see that moving targets, although ranked as the fourth most frequent problem, becomes the top priority problem when considering the project failure ratios alone. An overview of the further results including more fine-grained analyses of the root-causes and the effects going beyond a simple notion of "project failure", can be taken from the publications on the project website www.re-survey.org. Motivated by our past success in revealing insights into the status quo, but also in being able to transfer those analytical results into first constructive methods, e.g. in the context of risk management, we have initiated the third replication. By now, the NaPiRE community has grown into an international alliance of nearly 60 researchers who share the vision of contributing our part in increasing the practical impact of research contributions to RE.

Call for Participation

We are reaching out to you, the IEEE Software readers, as a highly relevant community of software practitioners. Please volunteer 20 minutes of your valuable time to contribute your insights and experience by participating in the current run of the survey. To foster problem-driven research with high practical impact, we depend on your input. The link to the survey is: participate.re-survey.org (open until the end of December).

We develop software in teams. We write code in separate environments and attempt to integrate our changes, along with everyone else's changes, to form a coherent shared version of the software. Most of the time this process works, most of the time version control systems can easily integrate different versions of the code into one, but approximately 19% of the time there are problems [1][2]. These merge conflicts require human intervention to resolve conflicting changes to the code, and these interventions take time away from regular development.

Prior research efforts have focused on developing smarter merging algorithms [3][4], systems for proactively conflict detection [2][5][6], and discussing the merits of syntax- and semantic-aware merges [7][8]. However, these efforts have not considered the core component of collaborative software development: the practitioners that are writing the conflicting code. As a research community, we need to remember that software engineering involves software practitioners and their perspectives are critical for tuning any proposed solutions to make real-world impact. This fundamental rationale is why we reached out to practitioners; to obtain their perspectives on merge conflicts.

We found that practitioners rely on their own expertise in the conflicting code to understand and assess merge conflicts. Instead of using precise metrics calculated by tools, practitioners tend to rely on their own simple estimations. We additionally found that the complexity of merge conflicts and the size of merge conflicts factor heavily into the assessment of merge conflicts.

Fig 1. Perceptions of merge toolset support along dimensions of conflict complexity and conflict size

To get a better understanding of these two factors, we asked practitioners in the survey to relate their tools along these two dimensions. The combination of four bubble plots in Fig. 1 represents the results of these four scenario questions, and show the number of responses for a given effect level broken out into experience level groups. The size and depth of color within each bubble are also used to convey the size of responses for any given combination.

To better illustrate how to read and interpret these plots, let’s take an example participant with 1-5 years of experience who indicates that her merge toolset is Extremely Effective for small, simple merge conflicts, she would be represented in the bubble containing 15 in quadrant A1. She would also be represented in the bubble containing 9 in the bottom right plot (quadrant A4) if she indicated that her merge toolset was Moderately effective for large, complex merge conflicts.

Mean scores for each A1-A4 plot from Fig. 1 and the difference across dimensions

The greater picture between each of these plots is that the move from plot A1 to A2 (the vertical axis) represents a change only in the dimension of merge conflict size: from small to large. Similarly, the move from A1 to A3 (the horizontal axis) represents a change only in the dimension of merge conflict complexity: from simple to complex. Fig. 2 is an annotated version of Fig. 1 with mean scores for each plot listed in the corners. These scores represent the mean where responses of Extremely Effective is scored as 5 and Not at all is scored as 1. Using numerical analysis, we examined the impact of both dimensions on the mean score and found the shift from small to large merge conflict size (A1 to A2) results in a difference in mean response of 0.496, whereas the shift from simple to complex merge conflict complexity results in a difference in mean response of 0.930. The change in mean score based on a shift in complexity is more than double the mean score based on a shift in size.

These results suggest that merge tools are currently equipped to handle increases in the size of merge conflicts, but not as well equipped for increases in complexity. With the increasing amount of code being developed in distributed environments, researchers and tool developers need to scale their solutions in both dimensions if they are going to continue to be relevant and useful to practitioners.

Further examination and discussions on merge conflicts can be found in our paper:

Participation of Women in Free/Libre/Open Source Software. What is the current status and how much has changed in the last 10 years?

It is well known that women are generally underrepresented in the IT sector. However, in FLOSS (free, libre, and open source) projects the number of women reported is even lower (from 2% to 5% according to several surveys, such as FLOSS 2002 [1] or FLOSSPols [2]). The FLOSS community is aware of this situation, and some projects, such as GNOME and Debian, have promoted the attraction of female participants.

As the previous surveys date back from the early 2000s, we designed a new web survey in 2013 which was answered by more than 2,000 FLOSS contributors, of which more than 200 were women. From the analysis of the answers provided by men and women, mainly those ones that are on their involvement with FLOSS, their educational level and background and personal status, we wanted to shed some light on how the status of female participation in FLOSS is. This blog post shows a glimpse of this. The survey responses are publicly available and documented for further analysis by third parties [3].

We have found that women begin to collaborate at a later age than men. Interestingly enough, as can be seen from Figure 1, while both peak at the age of 21, the tail for women is not that abrupt than the one for men. So, women that start in their thirties are 70% of the ones in the 20s, while for men this number decreases to 30%.

Figure 1 - Age of first contribution to FLOSS. Source [4].

Women perform more diverse tasks than men. While the latter are mostly concentrated on coding tasks (only slightly above 20% of men perform other tasks), with almost 45% "other type of contributions" is the main task chosen by women. The percentage of women who mainly code is 31%, as can be seen from Figure 2.

Figure 2 - Type of contributions to FLOSS projects. Source [4]

Figure 3 shows that while a majority of FLOSS participants does not have children, the number of those who have varies largely depending on their gender. So, the number of women with children (19%) is almost half the number of men with children (34%) as shown in Figure 3.

Figure 3 - Answers to the question Do you have children?.

Graphs have different scales. Source [4]

If we have a look at how much time FLOSS contributors devote to FLOSS, we obtain Figure 4. At first sight the distribution might seem very similar, but a closer look shows that the share of men devoting less than 5 hours/week (50%) is lower than for women (54%), as is the the amount of men working 40 or more hours per week (12% for men and 15% for women). So, contributions to FLOSS projects by women are over-represented among the less active and the full-time professional contributors, in the latter case being probably hired by an industrial software company.

Figure 4 - Number of hours per week devoted to contributing to FLOSS projects.

Graphs have different scales. Source [4]

All in all, our study confirms (and extends) the results from FLOSS [1] and FLOSSPols [2], even if almost 10 years have passed between both. Even if it is possible that the amount of women is now slightly higher, many contextual patterns have remained the same. So, a study of GitHub developers from 2015 found that only around 6% were women [5], but this increase could be due to the major involvement of the software industry in FLOSS. The current situation is far from what ten years ago was set as a goal, so we may speak of a "lost decade" in the inclusion of women in FLOSS.

Monday, October 2, 2017

Quite often, we can be quite familiar with a set of data; But looking at it
with a different viewpoint reveals a completely unexpected reality.

I am part of the keyring-maint group in Debian: The developers that manages
the cryptographic keyring through which developers identify themselves for
most of our actions — Most important, voting in project decisions and uploading
software.

There are two main models for establishing trust via cryptographic
signatures: The centralized, hierarchical model based on Certification
Authorities (CAs) from where all trust and authority stems and flows only
downwards, and a decentralized one, the Web of Trust where each participant
in the network can sign other participants' public keys; the first one is
most often used in HTTPS (encrypted Web), and our project uses a specific mode
of the second one, which we have termed the Curated Web of Trust [1].

Cryptographic signatures are way more secure as identifications to the
omnipresent but very weak username/password scheme. However, technological
advances must be factored in. Being already 24 years old, Debian is a very
long-lived free software project, and it sports a great retention: Many of its
developers have been active for over ten years. In the late 1990s, the
recommended key size was 1024 bits — Public key cryptography is very expensive
computationally, and said key size was perceived secure enough for the
foreseeable future. However, according to a study on algorithms and keysizes in
2012, [2] this key size is today good only for Short-term protection against
medium organizations, medium-termprotection against small organizations —
Clearly not below our required standards.

Our group had started pushing for migration to stronger keys back in 2009. By
2014, as the Figure 1 shows, we had achieved the migration of half of the
developers to stronger keys; But the pace of migration was really
insufficient. At the Debian Developers' Conference that year (DebConf14), we
announced that by January 1st, 2015, we would remove all keys shorter than 2048
bits.

Figure 1: Number of keys by length in the Debian Developers keyring, between mid 2008 and late 2015

The migration process was hard and intensive. Given the curated Web of Trust
model followed, our policy is that, for a key replacement, a new key must be
signed by two currently trusted keys in our keyrings; being Debian a globally
distributed project, many people are simply unable to meet other
developers. This migration process resulted in us losing close to a fourth of
all keys (that is, a fourth of all Debian Developers could no longer perform
their work without asking somebody to "sponsor" their actions), we felt it to
be quite successful. This migration prompted a deeper analysis into what the
keyrings were able to tell about the developers themselves.

Trying to find any emergent properties, we graphed the signatures in the
keyring at different points in time. Although most times we got just a useless
huge blob, we observed a very odd division starting around 2011; Figure 2
shows the graph close to the point where this split was maximum: January
2012.

Figure 2: Trust relationships in the Debian keyring near the maximum /split/, January 2012

Then, trying to find the meaning of this split, we colored the edges according
to their relative age — How long before each of the snapshots was each of the
signatures made. This is shown for the above presented keyring in Figure 3.

Figure 3: Trust relationships in the Debian keyring near the maximum /split/, January 2012, graphed by signature age: (Blue: less than one year old; green: one to two years old; yellow: two to three years old; orange: three to four years old; red: over four years old)

Our hypothesis was that the red blob mostly represents an earlier generation
of Debian Developers, who have mostly faded from project activity. We presented
our findings last May at the 13th International Conference on Open Source Systems [3], together with a first
foray into a statistical analysis on key aging and survival.

This rich data set can still yield much more information on the Debian
project's social interactions. It's basically just a matter of finding other
angles from which to read it!

Several control systems used for
factory and industrial automation of electromechanical processes are developed
using Programmable Logic Controllers (PLC). PLCs are specifically adapted for
the control of a manufacturing process and programmed using domain specific
languages. The International Electro-technical Commission (IEC) (a non-profit
international standards organization) developed a standard called as IEC
61131-3 which defines the basic programming elements, syntactic and semantic
rules for text-based and graphical or visual programming languages for
programming PLCs [IEC]. Structured Text (ST) is one the text-based PLC
programming languages defined by the IEC 61131-3 standard and is widely used in
industrial automation engineering application development. Similarly, Ladder
Diagram (LD) programming language is a visual programming language and is one
of the five languages defined in the IEC 61131-3 standard. ST is a domain
specific language and several of its language features and programming
constructs are different than that of general purpose programming languages
such as Java, C++, C# and Python. ST has several characteristics which are
different than that of general purpose programming languages because the
primary purpose of a PLC is to control an industrial process and the ST
language is defined for solving problems in a specific domain of factory and
industry process automation [Roos2008].

There has been a lot of work done
in the area of predicting change proneness and faulty components or modules of
software using source code metrics for general purpose programming languages.
However, defining software metrics and investigating their impact on important
software engineering prediction problems such as change proneness and faulty
module identification is relatively unexplored in the domain of applications
developed using IEC 61131-3 languages [Kumar2016] [Kumar2017]. We believe that
one of the primary reasons of lack of empirical studies in the area of source
code analysis for PLC programming languages and their impact on software
maintainability and quality is the lack of publicly available data for
researchers and scientists in academia. Through this blog, our objective is to
share our work in this relatively unexplored but a promising research direction
[Kumar2016] [Kumar2017]. In [Kumar2016], we proposed source code level metrics
to measure size, vocabulary, cognitive complexity and testing complexity of a
visual Programmable Logic Controller (PLC) programming language. We apply
Weykur’s property to validate the metrics and evaluate the number of properties
satisfied by the proposed metric [Kumar2016]. In [Kumar2017], we study the
correlation between the 10 ST source code metrics and their relationship with
change proneness. We built predictive models using Artificial Neural Network
(ANN) based techniques to predict change proneness of the software [Kumar2017].
Similarly, we are working towards examining whether source code metrics can be
used to identify defective and faulty components.

Bringing Both Relevance and Rigor through Industry Academia
Collaboration

Our experimental results are
encouraging and provides evidences that there is a correlation between source
code metrics and software quality and maintainability in the domain of
industrial automation and PLC applications. However, more research studies from
industry through industry and academia collaboration is needed to further add to
the body of knowledge in this topic. It is hard for companies working in the
industrial automation engineering domain to make their source code available in
public domain for researchers in academia which creates a natural barrier for
entry for academics to conduct research in this unexplored area. Industrial
research labs does not have all the competencies, resources and time within a
single organization to address all the research and technical challenges in
this area and we believe that university and academia collaboration is the only
possible solution to advance and disseminate knowledge on this topic. Our work
is also a collaboration between industry and academia and gave positive results
due to the successful partnership. One possible solution to build bridges is to
conduct focused workshops on software engineering issues and challenges for
industrial and factory automation. Some progress has been made in this
direction which is encouraging and more needs to be done.

FLOSS, or Free/Libre Open Source Software, is becoming an increasingly important, some may even say dominant, factor in the modern software economy [2]. As opposed to the traditional methods of software development, FLOSS projects function and receive high-quality code submissions often despite the lack of financial compensation and the lack of any formalized management or governance structure [1, 2, 5]. While FLOSS projects are typically guided or managed by a small number of core developers, they survive and, indeed, thrive by attracting contributions from new, talented software developers who join the project. There are many types of contributors, ranging from those who have contributed a number of patches and are on their way to becoming part of the core, to those that lack adequate coding skills and contribute primarily by reporting bugs and editing documentation [3, 4, 6]. To be successful, FLOSS projects must constantly recruit new code contributors to replace those that leave (many do so within a year of joining) [7].

In our work [8], we define a special type of FLOSS contributor called “One-Time code Contributors” (OTCs) as those contributors who have successfully contributed, that is had merged by the project, one, and only one, code patch to a given FLOSS project. This successful contribution indicates that the OTC (1) has an appropriate level of coding skills and knowledge to make a valuable contribution to the project and (2) has the determination necessary to write the code, submit the patch, and participate in the review project to get the patch accepted. Because FLOSS projects could greatly benefit by attracting these technically competent participants to submit more code, two questions arise: (1) why do OTCs not contribute additional patches? and (2) is there any way the FLOSS projects can attract and retain these valuable contributors?

To answer these questions, and understand how OTCs could better contribute to FLOSS projects in the future, we conducted a survey of 184 OTCs from 23 popular FLOSS projects. The remainder of this post summarizes some of our key findings.

Initial Impression of Project Members

When asked about their initial impression of other project members, a large percentage of the respondents indicated they had a positive or very positive impressions of their fellow project members (Figure 1). This result is surprising because previous research indicated that peripheral developers may be neglected, so we expected them to have more negative impressions [6]. Some of the most common positive responses included: project members are skilled, helpful, or responsive. These responses indicate that OTCs appreciate the assistance that other project members can provide. However, some OTCs did report negative impressions of fellow project members. These negative impressions were just as strong as the positive impressions. The most common negative impressions included: project members are busy, unresponsive, or otherwise unhelpful.

Figure 1 - Overall Impressions

Tradeoff Between Skill and Busyness

There was an interesting interaction between the most common positive impression, skilled, and the most common negative impression, busy. OTCs expected those project members who were more skilled to be less approachable and less obtainable, in other words, busier. Conversely, they expected less skilled project members to be less busy and more approachable. OTCs did not seem displeased with this trade-off—in fact, they seemed to expect it. Even so, the OTCs whose experience was that skilled members were not too busy to pay attention to them reported more positive impressions, while others who experienced the busier skilled members expressed more neutral or negative impressions.

OTCs’ Motivations for Contributing

Previous research found that peripheral contributors (of which OTCs are a subgroup) tend to be motivated more extrinsically (that is by external factors like fixing a bug) than intrinsically (that is, by internal factors such as enjoying coding as a hobby) [4, 7]. This observation would suggest that a large portion of the OTC contributions would be in the form of “drive-by commits,” that is, fixing a flaw without any real desire to join the project. When asked about their motivation to contribute their patch, the respondents listed a variety of motivations, as Figure 2 illuminates. While respondents indicated the desire to fix a bug as the most common motivation, they gave a more intrinsic motivation, share with the community, as the second most common motivation. Third, respondents indicated they were motivated by an employer’s need. These respondents were not hired to contribute to the FLOSS, but rather added or fixed the FLOSS project in support of other employer goals. Fourth, another intrinsic motivation, respondents wanted to add a new feature and have it maintained by the project. Perhaps unsurprisingly, this motivation often overlapped with the intrinsic motivation, share with the community. Other, less common motivations include ‘scratching an itch,’ personal reputation, or curiosity about the project. This list of varied motivations suggest that while, some OTCs may not be interested in long-term project participation, others have deeper motivations and could ultimately be attracted to join the project and contribute more than a single patch.

Figure 2 - OTC’s Motivations

Barriers Faced by OTCs

When asked whether there were any barriers that prevented them from continuing to contribute to the project, only half of the respondents indicated that the faced barriers, as seen in figure 3. The most commonly reported barrier was time. Either it took the respondent too long to make the contribution, or the respondent was too busy with other work. The next most common barriers were patch submission difficulties and entry difficulties. While FLOSS projects and tools oriented to help project newcomers cannot address the time barrier, they can encourage OTCs to continue contributing by reducing the difficulties associated with project entry and with patch submission.

Figure 3 - OTC Barriers

OTCs Who Stopped Contributing Despite Facing Barriers

Of the respondents who did not face barriers, nothing else to contribute was the most common reason to stop contributing.. This response is interesting because it suggests that if these OTCs did find something else to contribute, they might return to the project. In other words, these OTCs could be motivated to continue contributing to the project, because they exhibited no particularly strong desire to leave it. Other reasons for leaving a project included their employer no longer used the project and they never had any intention of becoming a project member. These results suggest that while not all OTCs can be motivated to make additional contributions, some very likely could be, under the right circumstances.

Conclusion

Though OTCs have only successfully contributed one code patch, FLOSS projects may be able to attract some of them to make further contributions. While some OTCs truly are “drive by committers,” whose only interest in the project was to have their patch included, others are more community-minded and more invested in the project. This second group of OTCs may have contributed additional code patches had they not encountered barriers or had they identified another interesting patch to contribute. By lowering the entry barriers and making patch submission easier, FLOSS projects can likely retain more OTCs, thereby increasing the size and skill of the contributor base, which leads to more successful FLOSS projects.