Sunday, November 6, 2016

When and Why Your Code Starts to Smell Bad

Have you ever tried to modify a
large class with too many methods? And what about those unnecessarily
complicated multi-level nested loops? How did you feel about that? Those are code smells, and can make the evolution
of your system a nightmare.

The question is: when and why those code smells are introduced? Common wisdom suggests that
they are introduced during maintenance and evolution activities on software
artifacts. However, such a conjecture has never been empirically verified. In
this work, we empirically answer such questions by analyzing the complete
change history of 200 Java software systems, belonging to three ecosystems -
Apache, Android and Eclipse. We considered five types of code smells: Blob,
Complex Class, Class Data Should Be Private, Functional Decomposition, and
Spaghetti Code [1].

Design

When? - To answer this question we
checked out every single commit of the analyzed systems and ran a code smell
detector (i.e., DECOR [5]) on the Java classes introduced/modified in the
commit. We also computed the value of quality metrics on such classes in order
to obtain evolutionary metric trends. These steps allowed us to (i) understand
after how many modifications on a software artifact code smells are usually
introduced, and (ii) compare the metric trends of clean and smelly software
artifacts, looking for significant differences in how fast their metrics’
values increase or decrease.

Curiosity - How much did it take? Eight weeks on a Linux server
with

7 quad-core 2.67 GHz CPU
(28 cores) and 24 Gb of RAM.

Why? - Here we wanted to understand
why developers introduce code smells. In particular, does their workload
influence the probability of introducing a code smell? What about the deadline
pressure for releases?Which are the
tasks (implementation of new features, bug fixing, refactoring, etc.) that
developers perform when introducing code smells? To this aim, we tagged the commits that introduced the smells.
To perform such an analysis, we needed to identify those commits responsible
for the introduction of a code smell. When the code smell is introduced during
the creation of the software artifact, trivially, we just analyzed the first
commit, but what about code smell instances that appear after several commits?
Which commits should we analyze? If we analyze only the one in which the code
smell is identified, we would discard all the change history that led to a
smelly artifact! For this reason we defined smell-introducing
commits as commits which might have pushed a software artifact toward a
smelly direction, looking at discriminating metrics’ trends. For example, in
the following figure, commits c3, c5 and c7 are identified as smell-introducing
commits and tagged as such.

Results

When? - While common wisdom suggests
that smells are introduced after several activities made on a code component,
we found instead that such a component is generally affected by a smell since its creation. Thus, developers
introduce smells when they work on a code component for the very first time.

However, there are also cases where
the smells manifest themselves after several changes were performed on the code
component. In these cases, files that will become smelly exhibit specific
trends for some quality metric values that are significantly different than
those of clean (non-smelly) files.

For example, the Weighted Method
Complexity (WMC) of classes that eventually become Blobs, increases more than
230 times faster with respect to clean classes, considering the same initial
development time.

Why? - Smells are generally
introduced by developers when enhancing existing features or implementing new
ones. As expected, smells are generally introduced in the last month before a
deadline, while there is a considerable number of instances introduced in the
first year from the project startup. Finally, developers that introduce smells
are generally the owners of the file (i.e., they are responsible for at least
75% of the changes made to the file) and they are more prone to introducing
smells when they have higher workloads.