Have you ever listened to someone describe a process that they follow at work and thought “that’s completely insane!”? Maybe part of their build process involves manually editing sixty different files. Maybe their computer crashes every twenty minutes, so they only ever do anything for about fifteen minutes at a time. Or worse, maybe they use Rational Clear Case. A common element in situations where there’s an expression of disbelief when comparing modus operandi is that the person who calmly describes the absurdity is usually in boiled frog kind of situation. Often, they respond with, “yeah, I guess that isn’t normal.”

But just as often, a curious phenomenon ensues from there. The disbelieving, non-boiled person says, “well, you can easily fix that by better build/new computer/anything but Clear Case,” to which the boiled frog replies, “yeah… that’d be nice,” as if the two were fantasizing about winning the lottery and retiring to Costa Rica. In other words, the boiled frog is unable to conceive of a world where things aren’t nuts, except as a remote fantasy.

I believe there is a relatively simple reason for this apparent breaking of the spirit. Specifically, the bad situation causes them to think all alternative situations within practical reach are equally bad. Have you ever noticed the way during economic downturns people predict gloom lasting decades, and during economic boom cycles pundits write about how we’ve moved beyond–nay transcended–bad economic times? It’s the same kind of cognitive bias–assuming that what you’re witnessing must be the norm.

But the phenomenon runs deeper than simply assuming that one’s situation must be normal. It causes the people subject to a bad paradigm to assume that other paradigms share the bad one’s problems. To illustrate, imagine someone with a twelve mile commute to work. Assuming an average walking speed of three miles per hour, imagine that this person spends each day walking four hours to work and four hours home from work. When he explains his daily routine to you and you’ve had a moment to bug out your eyes and stammer for a second, you ask him why on earth he doesn’t drive or take a bus or…or something!

He ruefully replies that he already spends eight hours per day getting to and from work, so he’s not going to add learning how to operate a car or looking up a bus schedule to his already-busy life. Besides, if eight hours of winter walking are cold, just imagine how cold he’ll be if he spends those eight hours sitting still in a car. No, better just to go with what works now.

Absurd as it may seem, I’ve seen rationale like this from other developers, groups, etc. when it comes to tooling and processes. A proposed switch or improvement is rejected because of a fundamental failure to understand the problem being solved. The lesson to take away from this is to step outside of your cognitive biases as frequently as possible by remaining open to the idea of not just tweaks, but game changers. Allow yourself to understand and imagine completely different ways of doing things so that you’re not stuck walking in an age of motorized transport. And if you’re trying to sell a walking commuter on a new technology, remember that it might require a little of bit of extra prodding, nudging, and explaining to break the trance caused by the natural cognitive bias. Whether breaking through your own or someone else’s, it’s worth it.

Today, I’d like to highlight a couple of features of Moq that I didn’t know about until relatively recently (thanks to a recent google+ hangout with Moq author, Daniel Cazzulino). Since learning about these features, I’ve been getting a lot of mileage out of them. But, in order to explain these two features and the different paradigm they represent, let me reference my normal use of Moq.

Let’s say we have some class PencilSharpner that takes an IPencil and sharpens it, and we want to verify that this is accomplished by setting the Pencil’s length and sharpness properties:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

[TestMethod,Owner("ebd"),TestCategory("Proven"),TestCategory("Unit")]

publicvoidSharpen_Sets_IsSharp_To_True()

{

varmyPencilDouble=newMock<IPencil>();

myPencilDouble.SetupProperty(pencil=>pencil.IsSharp);

myPencilDouble.Object.IsSharp=false;

myPencilDouble.SetupProperty(pencil=>pencil.Length);

myPencilDouble.Object.Length=12;

varmySharpener=newPencilSharpener();

mySharpener.Sharpen(myPencilDouble.Object);

Assert.IsTrue(myPencilDouble.Object.IsSharp);

}

So, I create a test double for the pencil, and I do some setup on it, and then I pass it into my sharpener, after which I verify that the sharpener mutates it in an expected way. Fairly straight forward. I create the double and then I manipulate its setup, before passing its object in to my class under test. (Incidentally, I realize that I could call “SetupAllProperties()”, but I’m not doing that for illustrative purposes).

But, sometimes I’d rather not think of the test double as a double, but just some object that I’m passing in. That is, perhaps I don’t need to invoke any setup on it, and I just want to reason about the actual proxy implementation, rather than stub.object. Well, that’s where Mock.Of<>() comes in:

1

2

3

4

5

6

7

8

9

10

11

[TestMethod,Owner("ebd"),TestCategory("Proven"),TestCategory("Unit")]

publicvoidSharpen_Sets_IsSharp_To_True()

{

varmyPencil=Mock.Of<IPencil>();

myPencil.IsSharp=false;

varmySharpener=newPencilSharpener();

mySharpener.Sharpen(myPencil);

Assert.IsTrue(myPencil.IsSharp);

}

Much cleaner, eh? I never knew I could do this, and I love it. In many tests now, I can reason about the object not as a Mock, but as a T, which is an enormous boost to readability when extensive setup is not required.

Ah, but Erik, what if you get buyer’s remorse? What if you have some test that starts off simple and then over time and some production cycles, you find that you need to verify it, or do some setup. What if we have the test above, but the Sharpen() method of PencilSharpener suddenly makes a call to a new CanBeSharpened() method on IPencil that must suddenly return true… do we need to scrap this approach and go back to the old way? Well, no, as it turns out:

Notice the third line in this test. Mock.Get() takes some T and grabs the Mock containing it for you, if applicable (you’ll get runtime exceptions if you try this on something that isn’t a Mock’s object). So, if you want to stay in the context of creating a T, but you need to “cheat”, this gives you that ability.

The reason I find this so helpful is that I tend to pick one of these modes of thinking and stick with it for the duration of the test. If I’m creating a true mock with the framework — an elaborate test double with lots of customized returns and callbacks and events — I prefer to instantiate a new Mock(). If, on the other hand, the test double is relatively lightweight, I prefer to think of it simply as a T, even if I do need to “cheat” and make the odd setup or verify call on it. I find that this distinction aids a lot in readability, and I’m taking full advantage. I realize that one could simply retain a reference to the Mock and another to the T, but I’m not really much of a fan (though I’m sure I do it now and again). The problem with that, as I see it, is that you’re maintaining two levels of abstraction simultaneously, which is awkward and tends to be confusing for maintainers (or you, later).

Back Story

Some time back, I setup Team Foundation Server (TFS) on a server machine more or less dedicated to the cause. This was to test drive it to consider it as a replacement for legacy source control, requirements management, deployment, etc. Since it was a trial, run, I opted for keeping setup simpler initially, reasoning that I could expand later if I so chose. As a result, I didn’t bother with Sharepoint setup initially, and I allowed the default installation of a database, which was SQLExpress.

Once I got used to the features of the basic installation, I wanted to test out the more advanced ones, but this has proven annoyingly difficult. Setting up Sharepoint and trying to retrofit it on existing projects was an enormous hassle, and I wound up having to delete my old projects and ‘recrate’ them with Sharepoint. Granted, these were playpen sorts of projects, but there was actual work in them and they were useful — just not primetime. So, losing them would be a hassle. And besides, it’s kind of hard to fully test a system without using it to do something useful.

After letting the dust settle a bit on that annoyance, I decided I’d switch from SQLExpress to SQL Standard to get the full benefit of TFS reporting services (via SQL reporting services). This was another huge pain point, and I’m going to document below what I had to do. Basically, it involved backing up all the SQL Express databases, installing SQL Server 2008 standard, and importing those backups. This guide is by no means comprehensive and there are a lot of moving parts, so this isn’t exactly a walk through, but hopefully it will help save someone at least some annoyance in this battle and maybe shave a little time off. Read More

Code Reviews

One thing that I tend to contemplate from time to time and have yet to post about is the idea of code reviews and what constitutes a good one. I’ve worked on projects where there was no code review process, a half-hearted review process, an annoying or counter-productive code review process, and a well organized and efficient code review process. It’s really run the gamut. So, I’d like to put pen to paper (finger to keyboard) and flesh out my ideas for what an optimal code review would entail. Before I can do that, however, I think I need to define some terms up front, and then identify some things about code reviews that I view as pitfalls, counter-productive activities or non-useful activities.

What is a Code Review?

According to the wikipedia entry for a code review, there are three basic types: the formal review, pair programming, and the lightweight review. I’ll add a fourth to this for the sake of pure definition, and call that the “automated review”. The automated review would be use of one or more static analysis tools like FX Cop or Style Cop (the wiki article includes someone using tools like this on the code of another developer, but I’m talking strictly automated steps like “you fail the build if you generate Style Cop warnings”). Pair programming is self explanatory for anyone familiar with the concept. Lightweight reviews are more relaxed and informal in the sense that they tend to be asynchronous and probably more of a courtesy. An example of this is where you email someone a source code file and ask for a sanity check.

The formal review is the one with which people are probably most familiar if code review is officially part of the SDLC on the project. This is a review where the developer sits in a room with one or more other people and presents written code. The reviewers go through in detail, looking for potential bugs, mistakes, failures to comply with convention, opportunities for improvement, design issues, etc. In a nutshell, copy-editing of code while the author is present.

What I’ll be discussing here mainly pertains to the formal review.

What I Don’t Like

Here are some things that I think ought to be avoided in a code review process.

1. Failing to have code reviews

This is important in the same way that QC is important. We’re all human and we could all use fresh eyes and sanity checks.

2. Procedural Code Review: The Nitpick

“You made that camel case instead of Pascal case.” “You could be dereferencing null there.” In general, anything that a static analysis tool could tell someone a lot faster and more accurately than a room full of people inspecting code manually. Suresh addresses this idea in a blog post:

I happen to see only “superficial” reviews happening. By superficial, I mean the types where you get review comments like, “You know the documentation for that method doesn’t have the version number”, or “this variable is unused”, etc.

“Superficial” is an excellent description as these things are generally trivial to identify and correct. It is not an effective use of the time of the developer or reviewers anymore than turning off spell check and doing it manually is an effective use of authors’ and editors’ time. There are tools for this — use them!

3. Paranoid Code Review

This is what happens when reviewer(s) go into the review with the notion that the only thing standing between a functioning application and disaster is their keen eye for the mistakes of others. This leads to a drawn out activity in which reviewers try to identify any possible, conceivable mistake that might exist in the code. It’s often easily identified by reviewers scrunching their noses, concentrating heavily, pointing, or tracing code through control flow statements with their fingers on the big screen while the developer twiddles his thumbs.

Again, there’s a tool for this. It’s called the unit test. It’s not flawless and it assumes a decent amount of coverage and competence from the unit tests, but if executed properly, the unit tests will express and prove corner case behavior far better than people on too little sleep staring at a screen and trying to step through the call stack mentally. This mental execution is probably the least reliable possible way of examining code.

4. Pure Gatekeeper Code Review

This is less of an individual property of code review, but more a property of the review process. It’s where you have a person or committee in the department that acts as the Caesar of code, giving thumbs up or thumbs down to anything that anyone submits. Don’t get me wrong — the aim of this makes sense. You want somebody that’s doing a sanity check on the code and someone who more or less has his or her finger on the pulse of everything that’s happening.

The issue that occurs here is a subtle one that results from having the same person or people reviewing all code. Specifically, those people become the gatekeepers and the submitters become less concerned about writing good code and innovating and more principally concerned with figuring out how to please the reviewer(s). Now, if the reviewers are sharp and savvy, this may not be a big deal, though discouraging new ideas and personal growth is always going to have some negative impact. However, if the reviewers are not as sophisticated, this is downright problematic.

This is relatively easily addressed by rotating who performs code reviews or else distinguishing between “suggestion” and “order”. I’ve participated in review processes where both of these mitigating actions were applied, and it’s a help.

5. Tyrant Gatekeeper

In the previous example, I mentioned a hypothetical gatekeeper reviewer or committee with ultimate authority, and this is a subset of that. In this case, the reviewer(s) have ultimate yes/no authority and are very heavy (and possibly even combative or derisive) with the “no” option. Where the previous example might stifle innovation or developer growth, this creates bottlenecks. Not only is it hard to get things approved, but developers will naturally start asking the reviewer(s) what to do at every turn rather than attempting to think for themselves.

In essence, this creates a state of learned helplessness. Developers are more concerned with avoiding negative feedback at the code reviews than learning, doing a good job, or becoming able to make good decisions based on their own experience. As a result, the developers don’t really make any decisions and ask the reviewer(s) what to do at every step, waiting until they have time, if necessary. The review(s) become a bottleneck in the process.

I have not personally witnessed or been subject to this form of code reviews, but I have heard of such a thing and it isn’t difficult to imagine this happening.

6. Discussion Drift

This occurs during a code review when a discussion of the specific implementation gets sidetracked by a more general discussion of the way things ought to be. Perhaps the code is instantiating an object in the constructor, and a reviewer recommends using dependency injection instead. From here, the participants in the review being to discuss how nice it would be if the architecture relied on a IOC framework instead of whatever it is at the moment.

That’s both a valid and an interesting discussion, but it has nothing to do with some developer checking in implementation code within the framework of the existing architecture. Discussions can have merit and still not be germane to the task at hand.

And so on and so forth. Code reviews can easily devolve into this sort of thing over purely or nearly-purely subjective matters. People get very entrenched about their own subjective preferences and feel the need to defend them. We see this from political affiliation to rooting for sports teams. Subjective matters of preference in code are no different. Neither side is likely to convince the other during the scope of the review, but what is quite likely, and probably certain, is that time will be wasted.

If matters like that are part of the coding standards policy on the project, than it’s an open and shut case. If they’re not, they’re better left alone.

So, How Does a Good Code Review Go?

Having explained what I don’t like in a code review, I’ve provided some context for what I do find helpful. I’m going to outline a procedure that is simply an idea. This idea is subject to suggestions for improvement by others, and ongoing refinement from me. Also, this idea is geared toward the gatekeeper scenario. I may make another post on what I perceive to be the most effective method for voluntary(ish), peer-conducted code reviews where the reviewer(s) are not approval gate-keepers.

Developer finishes the pre-defined unit of code and is ready to have it reviewed for promote/commit.

Once no static-check violations are present, developer notifies reviewer(s) of desire for code review.

Reviewers run static analysis tools asynchronously and reject the request if any rules are violated.

Reviewers examine the code for obvious mistakes not caught by static analysis and/or developer unit tests, and write unit tests that expose the deficiency. (Alternatively, they can run something like Pex)

Developer appeals changes with which he doesn’t agree. This could involve the reviewer(s) providing “proof”, for example if they say that the developer shouldn’t do something because of X negative consequence, they should demonstrate that consequence somehow. This appeal could be resolved via proof/demonstration or it could go to a third party where the developers and reviewers each state their cases.

Any suggested changes and the results of any appeals are then promoted/committed.

This process naturally addresses (1) and (2) of the things that I don’t like in that you’re having a code review and getting the procedural, easy stuff out of the way offline, prior to meeting. (3) is made more difficult by the fact that the reviewer(s) are given the opportunity to write unit tests that expose the badness about which they might be paranoid. (4) and (5) are addressed by the appeal process and the general concept that changes are suggestions rather than decrees. (6) and (7) are addressed by the mediator who has no skin in the game and will probably have a natural tendency to want to keep things short and sweet.

One drawback I can see to what I’m proposing here is that you could potentially undercut the authority of the reviewer if the person doing the reviews is, say, the most senior or high ranking person. Perhaps people would want that role a bit less if they could be officially second guessed by anyone. However, I think that creates a nice environment where good ideas are valued above all else. If I were in that role myself, I’d welcome the challenge of having to demonstrate/prove ideas that I think are good ones before telling others to adopt them. In the long run, being steered toward that kind of rigor makes you better at your craft. I also think that it might be something of a non-issue, given that people who wind up in these types of leadership and senior roles are often there based on the merit of their work and on successfully “proving” things over and over throughout the course of a career.

My Chase of Code Coverage

Perhaps it’s because fall is upon us and this is the first year in a while that I haven’t been enrolled in a Master’s of CS program (I graduated in May), I’m feeling a little academic. As I mentioned in my last post, I’ve been plowing through following TDD by the letter, and if nothing else, I’m pleased that my code coverage is more effortlessly at 100%. I try to keep my code coverage around 100% whether or not I do TDD, so the main difference I’ve noticed is that TDD versus retrofitted tests seems to hit my use cases a lot harder, instead of just going through the code at least once.

Now, it’s important to me to get close to or hit that 100% mark, because I know that I’m at least touching everything going into production, meaning that I don’t have anything that would blow up if the stack pointer ever got to it, and I’m saved only by another bug preventing it from executing. But, there is a difference between covering code and exercising it.

More than 100% Code Coverage?

As I was contemplating this last night, I realized that some lines of my TDD code, especially control flow statements, were really getting pounded. There are lines in there that are covered by dozens of tests. So, the first flicker of an idea popped into my head — what if there were two factors at play when contemplating coverage: LOC Covered/Total LOC (i.e. our current code coverage metric) and Covering tests/LOC (I’ll call this coverage density).

High coverage is a breadth-oriented thing, while high density is depth — casting a wide net versus a narrow one deeply. And so, the ultimate solution would be to cast a wide net, deeply (assuming unlimited development time and lack of design constraints).

Are We Just Shifting the Goalposts?

So, Code Density sounded like sort of a heady concept, and I thought I might be onto something until I realized that this suffered the same potential for false positive feedback as code coverage. Specifically, I could achieve an extremely high density by making 50 copies of all of my unit tests. All of my LOC would get hit a lot more but my test suite would be no better (in fact, it’d be worse since it’s now clearly less efficient). So code coverage is weaker as a metric when you cheat by having weak asserts, and density is weaker when you cheat by hitting the same code with identical (or near identical) asserts.

Is there a way to use these two metrics in combination without the potential for cheating? It’s an interesting question and it’s easy enough to see that “higher is better” for both is generally, but not always true, and can be perverted by developers working under some kind of management edict demanding X coverage or, now, Y density.

Stepping Back a Bit

Well, it seems that Density is really no better than Code Coverage, and it’s arguably more obtuse, or at least it has the potential to be more obtuse, so maybe that’s not the route to go. After all, what we’re really after here is how many times a line of code is hit in a different scenario. For instance, hitting the line double result = x/y is only interesting when y is zero. If I hit it 45,000 times and achieve high density, I might as well just hit it once unless I try y at zero.

Now, we have something interesting. This isn’t a control flow statement, so code coverage doesn’t tell the whole story. You can cover that line easily without generating the problematic condition. Density is a slightly (but not much) better metric. We’re really driving after program correctness here, but since that’s a bit of a difficult problem, what we’ll generally settle for is notable, or interesting scenarios.

A Look at Pex

Microsoft Research made a utility called Pex (which I’ve blogged about here). Pex is an automated test generation utility that “finds interesting input-output values of your methods”. What this means, in practice, is that Pex pokes through your code looking for edge cases and anything that might be considered ‘interesting’. Often, this means conditions that causes control flow branching, but it also means things like finding our “y” div by zero exception from earlier.

What Pex does when it finds these interesting paths is it auto-generates unit tests that you can add to your suite. Since it finds hard-to-find edge cases and specializes in branching through your code, it boasts a high degree of coverage. But, what I’d really be interested in seeing is the stats on how many interesting paths your test suite cover versus how many there are or may be (we’d likely need a good approximation as this problem quickly becomes computationally unfeasible to know for certain).

I’m thinking that this has the makings of an excellent metric. Forget code coverage or my erstwhile “Density” metric. At this point, you’re no longer hoping that your metric reflects something good — you’re relatively confident that it must. While this isn’t as good as some kind of formal method that proves your code, you can at least be confident that critical things are being exercised by your test suite – manual, automated or both. And, while you can achieve this to some degree by regularly using Pex, I don’t know that you can quantify it other than to say, “well, I ran Pex a whole bunch of times and it stopped finding new issues, so I think we’re good.” I’d like a real, numerical metric.

Anyway, perhaps that’s in the offing at some point. It’d certainly be nice to see, and I think it would be an advancement in the field of static analysis.