What is the right interplay between testing code (unit test and the like) and static code analysis (lint, Reek, and their ilk)?

Kevin Tools such as Reek[1], lint, flay are intended to be helpers for the "refactor" step of TDD. Not everyone is good at finding the smells in their code, and I certainly find that I detect fewer smells in code that I'm familiar with; so having a tool to help my sense of smell can be quite beneficial.

But there are a couple of clear problems with using tools here, and so I'll start off our discussion by trying to phrase them clearly enough for group discussion...

Firstly, tools such as Reek attempt to make a subjective concept (code maintainability) into an objective one. In any codebase there will be code smells we're happy to tolerate — at least in the short term — and yet the tool will continue to nag us about them. (This is why Reek provides a config mechanism that allow certain code elements to be ignored.) On the plus side, Reek sees smells that I miss; and fixing them usually improves my code. But on the minus side, Reek can be more pedantic than I would be, and sometimes I have to spend time doing refactoring just to keep Reek happy. So, is it worth the effort to keep code quality tools quiet?

Andy Not everyone is familiar with these tools, in the Perl world we use perlcritic. Reek and perlcritic are very similar, and yet so very different. Here's what perlcritic tells me about some code (PBP stands for Perl Best Practices):

Pat Let's go back to Kevin's question, it's a tough one. I like seeing all of the warnings from a tool like Reek. It makes me question how or why I've written something. On the other hand, having seen it once and having made a decision about it, continuing to see it becomes annoying. I hadn't known about configuring it to ignore code elements, thanks for pointing it out. Maybe it would be less annoying if tools like this that kept some metadata and separated new warnings from the ones you've seen before.

Andy It can be harder to manage the results of static analysis, because they're rarely binary, unlike unit tests which are pass/fail. With static analysis results, I like to track trends over time, since I rarely ever have a clean run of Perl::Critic or splint.

Static analysis tools are also hard to manage becasue their results can often lead to arguments, if only with yourself! Which automated exhortations do we need to follow, and which can we ignore? Is it OK to have this long loop in just this one case? And when you decide to ignore a setting, do you annotate the source code? Source code annotations are subject to the same bit rot as source code itself.

All that said, static analysis is very different with C, where lint, splint and gcc warnings may tell you about actual bugs where memory may get corrupted, not just stylistic improvements.

Russ The thing that you have to keep in mind is that the code analysis tools are optional helpers: They look into your code and tell you things that might be wrong. They might help, they might not and you can take them or leave them. Tests, on the other hand, are life and death: If you don't have tests or good tests or if your code isn't passing the tests that you do have, then you have no idea whether your system works or not. This is true no matter if you are using Java or C#, but it is particularly true when you are working with dynamically typed languages like Ruby.

Having said that, I think that one good thing that static analysis of code can do that is a real help is to enforce a uniform style on your code, so that everyone is coding more or less the same way. In the long term that has real value.

There is also an organizational aspect to this issue. I've worked for a lot of big, bureaucratic organizations, and the mindset that develops in those kind of situations is that anything that is not required should be forbidden. So you tend to get mandates: Your code shall have test coverage; You shall run this or that code analysis tool. Now it's hard to argue with requirements for test coverage, but mandating the use of this or that secondary tool almost always becomes counter productive. Of course, the fault here lies not in our tools but in ourselves - or at least in our managers - but it is a factor anyway.

Andy Static analysis is the fun part that lets you refactor. It's the testing code that lets you refactor with confidence.

Kay I find that code written "test-driven" comes out pretty clean. However, I don't always live up to my discipline goals, and of course there's always legacy code to deal with. So I find static analysis tools very helpful in identifying "cleanup" areas.

Kevin My second problem concerns process. I want to remember to run Reek frequently, because whenever I let it slide for a while I do find that my code deteriorates. And so I create a test that runs Reek and fails if there are any smells. But now the existence of this test pushes Reek into the 'Green' step of TDD, and that breaks the TDD micro-process. Using the tool within a test pushes refactoring upstream; no longer can I "do the simplest thing" to get to green, because there's a failing test requiring me to refactor right now. What to do?

Pat It almost seems like we need a different tool here. Testing belongs on the Red/Green border, while analysis belongs on the Green/Refactor border. (Maybe we can talk about Cucumber[2] and similar tools being the Refactor/Red border in other post.) What would happen if there were an umbrella tool that kept us in the specify-test-analylize groove?

Andy What do you mean when you talk about the Red/Green border?

Pat Well, when I think of Red/Green/Refactor cycle, I see testing as the process that helps us move from "Red" to "Green" by letting us know what code to write and that it makes our existing tests pass. Cucumber is similar in that it lets us know which tests to write so that we can move from refactoring to to a red state before we start writing new code.

Kay I prefer to run the static analysis tools on the continuous integration server, freeing my coding time up to focus on the test-code-refactor microcycle. Although to keep me on track, I'll often run a code coverage on my new code before I check in.

I think consistent style and formatting is important too. I do a lot of Java programming, where IntelliJ does a good job of keeping formatting consistent across the project.

13 comments:

Interesting comments. I think both refactoring and testing benefit from an understanding of code complexity, but no one mentioned it. It sort of ties everything together since it allows you to know when to refactor something as well as gives an indication of testing completeness. Sadly the coverage tools to provide branch/path coverage don't seem to exist in the dynamic language world. (I'm more concerned with Python, but last time I checked they were lacking in perl/ruby).

Re: where linting tools should reside. I've got emacs running pyflakes (a linting tool) using flymake-mode and outlining issues. I'm not sure a pass/fail test is the best place to put this because as you mentioned some of it is subjective.

Interesting discussion. I played with code analysis for quite some time now (and for the most part with c# where static analysis is completely different of course since it's a static language) and build excellent in the last couple of weeks. It combines concepts and checks of roodi, reek and flog and also adds Rails specific checks. I think especially those Rails specific checks can be really helpful since they don't check general language patterns but concrete framework specific rules.

@pate: Excellent does not directly relate to Rails at all. It just performs the Rails related checks if the Code looks like Rails Code (class derives from ActiveRecord::Base or file is named like a Rails partial). In general, Excellent checks any Ruby Code you might have. Check the documentation (especially that for SexpContext and Warnings::Base) and the source for help on how to build your own checks.

chorny,I haven't been too involved in the Perl world since about 2000. I'm always interested in learning about the Devel tools that exist there though. So may of them bring great insights that can be pulled into other languages. Thanks for the pointers.