~ using my evil powers for good

Monthly Archives: August 2011

Programmers are an obvious choice for members of a software team. However, various points of view attribute different value to the other potential roles.

Matt Heusser‘s “How to Speak to an Agilista (if you absolutely must)” Lightning Talk from CAST 2011 referred to Agilista programmers who rejected the notion that testers are necessary. Matt elaborates that extreme programming began back in 1999 when testing as a field was not as mature, so these developers spoke about only two primary roles, customer and programmer, making an allowance that while the “team may include testers, who help the Customer define the customer acceptance tests. … The best teams have no specialists, only general contributors with special skills.” In general, Agile approaches teams as composed of members who “normally take responsibility for tasks that deliver the functionality an iteration requires. They decide individually how to meet an iteration’s requirements.” The Agile variation Scrum suggests teams of “people with cross-functional skills who do the actual work (analyse, design, develop, test, technical communication, document, etc.). It is recommended that the Team be self-organizing and self-led, but often work with some form of project or team management.” At the time, this prevailing view that the whole team should own quality diminished the significance of a distinct testing role. Now that testing has grown, Matt encourages us to revisit this relationship to find ways to engage world-changers and their methodologies to achieve a better result. Private questioning about the value of the activities that testers do to contribute is more constructive than head-on public confrontation. One source of the problem is that testers need to acknowledge Agile programmer awesomeness and their specific skills. This produces a more favorable environment to discuss the need for testing, which Agilists do see though they want to automate it. Matt advocates observing what works in cross-functional project teams to determine patterns that work in a particular context, engaging our critical thinking skills to develop processes that support better testing. As a process naturalist, Matt reminds us that “[software teams in the wild] may not fit a particular ideology well, but they are important, because by noticing them we can design a software system to have less loss and more self-correction.”

Having not experienced this rejection by Agile zealots for myself despite working only in Agile-friendly software shops, I have struggled with commenting on his suggestions. I decided to dig a bit deeper to find some classic perspectives on roles within software teams so that I can put the XP and Agile assertions into perspective.

One such analogy for the software team is Harlan Mills’ Chief Programmer Team as “a surgical team … one does the cutting and the others give him every support that will enhance his effectiveness and productivity.” In this model, “Few minds are involved in design and construction, yet many hands are brought to bear. … the system is the product of one mind – or at most two, acting uno animo,” referring to the surgeon and copilot roles, which are respectively the primary and secondary programmer. However, this model recognizes a tester as necessary, stating “the specialization of the remainder of the team is the key to its efficiency, for it permits a radically simpler communication pattern among the members,” referring to centralized communication between the surgeon and all other team members, including the tester, administrator, editor, secretaries, program clerk, toolsmith, and language lawyer for a team size of up to ten. This large support staff presupposes that “the Chief Programmer has to be more productive than everyone else on the team put together … in the rare case in which you have a near genius on your staff–one that is dramatically more productive than the average programmer on your staff.”

Acknowledging the earlier paradigm of the CPT, the Pragmatic Programmers assert another well-known approach to team composition: “The [project’s] technical head sets the development philosophy and style, assigns responsibilities to teams, and arbitrates the inevitable ‘discussions’ between people. The technical head also looks constantly at the bigger picture, trying to find any unnecessary commonality between teams that could reduce the orthogonality of the overall effort. The administrative head, or project manager, schedules the resources that the teams need, monitors and reports on progress, and helps decide priorities in terms of business needs. the administrative head might also act as a team’s ambassador when communicating with the outside world.” While these authors do not directly address the role of tester, they state that “Most developers hate testing. They tend to test gently, subconsciously knowing where the code will break and avoiding the weak spots. Pragmatic Programmers are different. We are driven to find our bugs now, so we don’t have to endure the shame of others finding our bugs later.” In contrast, traditional waterfall software teams have individuals “assigned roles based on their job function. You’ll find business analysts, architects, designers, programmers, testers, documenters, and the like. There is an implicit hierarchy here – the closer to the user you’re allowed, the more senior you are.” This juxtaposition sets up a competitive relationship between the roles rather than seeing them as striving toward the same goal.

A healthier model of cross-functional teams communicates that all of the necesary skills “are not easily found combined in one person: Test Know How, UX/Design CSS, Programming, and Product Development / Management.” This view advocates reducing communication overhead by involving all of the relevant perspectives within the team environment rather than segregating them by job function. Here, a tester role “works with product manager to determine acceptance tests, writes automatic acceptance tests, [executes] exploratory testing, helps developers with their tests, and keeps the testing focus.”

Finally, we have approached my favorite description of a collaborative software team as found in Peopleware: “the musical ensemble would have been a happier metaphor for what we are trying to do in well-jelled work groups” since “a choir or glee club makes an almost perfect linkage between the success or failure of the individual and that of the group. (You’ll never have people congratulating you on singing your part perfectly while the choir as a whole sings off-key.)” When we think of our team as producing music together, we see that this group composed of disparate parts must together be responsible for the quality of the result, allowing for a separate testing role but not reserving a testing perspective to that one individual. All team members must pursue a quality outcome, rather than only the customer and the programmer, as Agile purists would have it. One aspect of this committed team is willingness to contend respectfully with one another, for we would readily ignore others whose perspectives had no value. Yet, when we see that all of our striving contributes to the good of the whole, the struggle toward understanding and consensus encourages us to embrace even the brief discomfort of disagreement.

At CAST this year, Michael Larsen gave a talk about testing team development lessons learned from the Boy Scouts of America. I have some familiarity with the organization since my kid brother was once a boy scout, my college boyfriend was an Eagle scout, a close family friend is heavily involved in scouts, and I anticipate my husband and both of my sons will “join up” as soon as the boys are old enough. I just might be a future Den Mother.

However, when I was growing up, I joined the Girl Scouts of America through my church. We didn’t have the same models of team development, but we had some guiding principles underpinning our troop:

The Girl Scout Promise
On my honor, I will try:
To serve God and my country,
To help people at all times,
And to live by the Girl Scout Law.

The Girl Scout Law
I will do my best to be
honest and fair,
friendly and helpful,
considerate and caring,
courageous and strong, and
responsible for what I say and do,
and to
respect myself and others,
respect authority,
use resources wisely,
make the world a better place, and
be a sister to every Girl Scout.

If we as testers live up to these principles of serving, helping, and living honesty, fairness, and respect in our professional relationships, we can become the talented leaders that Michael encourages us to be:

CAST 2011 Emerging Topics: Michael Larsen “Beyond Be Prepared: What can scouting teach testing?”
From boyscouts, watching how people learn and how people form into groups
1960s model for team development

Movies such as Remember the Titans and October Sky demonstrate this process.
Failure to “pivot” can prevent someone from moving through the continuum!
Without daily practice, skills can be lost or forgotten, so may need to drop to a lower stage for review.

After Michael’s lightning talk, members of the audience brought these questions for his consideration:

Q: Is this a model? Or does every team actually go through these steps?
Duration of the steps varies, some may be very brief.
Unlikely to immediately hit the ground running.

Q: What about getting stuck in the Storming mode?
Figure out who can demonstrate. If you don’t like the demonstrated idea, toss it. Just get it out there!

Q: How does this model work when one person leaves and another comes in?
Definitely affects the group, relative to the experience of the person who joins.
May not revert the group back to the very beginning of the process.
Team rallies and works to bring the new person up to the team’s level.
New member may be totally fresh insights, so that person doesn’t necessarily fall in line.

Q: What happens when a group that is Norming gets a new leader?
Can bring the group down unless you demonstrate why you’re a good leader.
Get involved! Get your hands dirty!
Build the trust, then you can guide. Team will accept your guidance.

If this works on young squirrely kids, imagine how well this works on young squirrely developers … testers. – Michael Larsen

Paul Holland‘s interstitial Lightning Talk at CAST 2011 was a combination of gripe session, comic relief, and metrics wisdom. The audience in the Emerging Topics track proffered various metrics from their own testing careers for the assembled testers to informally evaluate.

Although I attended CAST remotely via the UStream link, I live-tweeted the Emerging Topics track sessions and was able to contribute my own metric for inclusion in the following list, thanks to the person monitoring Twitter for @AST_News:

number of bugs estimated to be found next week
ratio of bugs in production vs. number of releases
number test cases onshore vs. offshore
percent of automated test cases
number of defects not linked to a test case
total number of test cases per feature
number of bug reports per tester
code coverage
path coverage
requirements coverage
time to reproduce bugs found in the field
number of people testing
equipment usage
percentage of pass/fail tests
number of open bugs
amount of money spent
number of test steps
number of hours testing
number of test cases executed
number of bugs found
number of important bugs
number of bugs found in the field
number of showstoppers
critical bugs per tester as proportion of time spent testing

“Counting test cases is stupid … in every context I have come across” – Paul Holland

Paul mentioned that per tester or per feature metrics create animosity among testers on the same team or within the same organization. When confronted with a metric, I ask myself, “What would I do to optimize this measure?” If the metric motivates behavior that is counter-productive (e.g. intrateam competition) or misleading (i.e. measuring something irrelevant), then that metric has no value because it does not contribute to the goal of user value. Bad metrics lead to people in positions of power saying, “That’s not the behavior I was looking for!” To be valid, a metric must improve the way you test.

In one salient example, exceeding the number of showstopper bugs permitted in a release invokes stopping or exit criteria, halting the release process. Often, this number is an arbitrary selection that was made long before, perhaps by someone who may no longer be on staff, as Paul pointed out, and yet it prevents the greater goal of shipping the product. Would one critical bug above the limit warrant arresting a rollout months in the making?

Paul’s argument against these metrics resonated with my own experience and with the insight I gathered from attending Pat O’Toole’s Metrics that Motivate Behavior! [pdf] webinar back in June of this year:

“A good measurement system is not just a set of fancy tools that generate spiffy charts and reports. It should motivate a way of thinking and, more importantly, a way of behaving. It is also the basis of predicting and heightening the probability of achieving desired results, often by first predicting undesirable results thereby motivating actions to change predicted outcomes.”

Pat’s example of a metric that had no historical value and that instead focused completely on behavior modification introduced me to a different way of thinking about measurement. Do we care about the historical performance of a metric or do we care more about the behavior that metric motivates?

Another point of departure from today’s discussion is Pat’s prioritizing behavior over thinking. I think the context-driven people who spoke in the keynotes and in the Emerging Topics sessions would take issue with that.

My experience with metrics tells me that numbers accumulated over time are not necessarily evaluated at a high level but are more likely as the basis for judgment of individual performance, becoming a rod of discipline rather than the protective rod of a shepherd defending his flock.

Paul did offer some suggestions for bringing metrics back to their productive role:

valid coverage metric that is not counting test cases

number of bugs found/open

expected coverage = progress vs. coverage

He also reinforced the perspective that the metric “100% of test cases that should be automated are automated” is acceptable as long as the overall percentage automated is low.

Metrics have recently become a particular interest of mine, but I have so much to learn about testing software that I do not expect to specialize in this topic. I welcome any suggestions for sources on the topic of helpful metrics in software testing.

1. Calendar year?
The Gregorian calendar is only one of many that have been used over time.
“There are only 14 different calendars when Easter Sunday is not involved. Each calendar is determined by the day of the week January 1 falls on and whether or not the year is a leap year. However, when Easter Sunday is included, there are 70 different calendars (two for each date of Easter).” – Wikipedia article

2. Fiscal year = financial year = budget year
This is a period used for calculating annual (“yearly”) financial statements in businesses and other organizations that “fixes each month at a specific number of weeks to facilitate comparisons from month to month and year to year” – Wikipedia article

3. Astronomical year numbering system = proleptic Gregorian calendar
This standard includes the year “0” and eliminates the need for any prefixes or suffixes by attributing the arithmetic sign to the date. This definition is used by MySQL, NASA, and non-Maya historians.

4. Billing year
Companies often expect you to sign a contract for service that may encompass the period of a year (e.g. signing a 2-year cell phone contract).

5. Year of your life/age
Count starts at zero and increments on your birthday.

6. Years of working for a company
Count starts at zero and increments on the anniversary of your hire date. This definition is often used to award benefits based on longevity (e.g. more vacation after “leveling up” having completed a given number of work years).

7. Religious year
For example, the Roman Catholic Church starts its liturgical year with the four weeks of Advent that precede Christmas. Other religious calendars include Julian, Revised Julian, Hebrew (a.k.a. Jewish), Islamic (a.k.a. Muslim, Hijri), Hindu, Buddhist, Bahá’í

8. National year
Some countries use nation-based calendars for internally organizing time (e.g. Chinese, Indian, Iranian/Persian, Ethiopian, Thai solar)

When we cannot even speak clearly about a familiar term like “year,” it should be no surprise that we have difficulty communicating on computing projects with cross-functional teams composed of individuals with different professional backgrounds. Each of us masters the jargon of our field as we encounter it, leaving us with gaping holes in our knowledge about domain-specific concepts that are essential when implementing a project.

Ubiquitous Language is “a language structured around the domain model and used by all team members to connect all the activities of the team with the software.”
– Domain-Driven Design by Eric Evans

In order to succeed in designing and implementing good software, we must be willing to revisit our assumptions about terminology, avoiding the “inconceivable” situation in which two team members in a discussion are using the same word to represent different ideas. In practical terms, that means asking dumb questions like “What do you mean by that?” even when the answer appears to be obvious, or we risk replaying the old story of the blind men and an elephant.

Once we have a firm grounding in a context-specific set of words to use when speaking about the work, we can proceed with confidence that we will find ourselves in the same position of confusion later in the project, as we iterate through modeling parts of the system again and again. Thus, we must remain vigilant for statements with multiple interpretations. In addition, Evans reminds us that a domain expert must understand this ubiquitous language so that it guides us to design a result that ultimately satisfies a business need.

Testers must consciously use the agreed upon expressions in our test ideas, test plans, test cases, and any other record of testing, whether planned or executed. Consistent usage is key in both explaining the testing approach to a new recruit and in maintaining the history of the project, including the testing effort.