The main goal of this visit is to work on finding an automated method
to evaluate the bias in bugs datasets. This bias is introduced when
the bug-fix reports are linked with commits in the version control
system. When a developer accepts and/or fix a bug report, she decides
and accordingly marks the report with a severity level. In Bugzilla,
one of the most used bug tracking systems, a developer can mark
severity using a seven levels scale. In a previous paper (PDF
available), I have shown that not all developers use the same criteria
to select the severity, and it should be enough with only three
levels. This difference in the developers criteria to mark and
classify bug reports is one of the sources of bias in the bug-fix
datasets (PDF of the paper available). Another source of bias is
developer confidence; not all developers mark commits or bug reports
with commit ids when they are starting in a project, because they are
afraid of exposing themselves. However, those commits do correspond to
bug fixes, and should be accounted for in a bug-fix dataset.

Clearly, reusing datasets for empirical software engineering is a good
idea, which fosters reproducibility and verifiability, essential
properties of any empirical research discipline. However, if we can
not assure the quality of the reusable datasets, reusable datasets can
cause more harm than benefits.

My goal with this visit is to apply statistical methods to evaluate
the bias in a bug-fix dataset. The two papers about the distribution
of bugs in Eclipse are an example of the kind of work I want to do. If
we can be sure of the quality and lack of bias of a dataset, carefully
built to act as a "canonical" dataset, we can compare other datasets
against that canonical dataset, to find out if there is any bias. The
two papers about Eclipse mentioned above show that the distribution of
bugs can vary in the presence of bias. The first paper used a biased
dataset, and the second paper repeated the data gathering process from
scratch, avoiding the use of the biased dataset. Although it can also
be due to methodological differences, they found different
distributions for software bugs.

So my goal is to measure this difference in the distribution using a
statistical technique, to detect the presence of bias, and develop a
statistical test to find bias in reusable datasets. I am assuming here
that the distribution cannot change due to other factors (and we
already know that there are other sources of bias in bug reports), and
that the shape of the distribution is unique. The second assumption is
quite fair, but the first assumption is more complicated, and it will
require to find more than one dataset that is known to be
unbiased.

I hope this work will provide a tool to assess the quality of a
bug-fix dataset, and to avoid the problems of bias, which are a threat
to the validity of all the empirical studies using these bug-fix
datasets.