We met in a local coffee shop that has limited space and is quite noisy. As a result our ability to communicate verbally was compromised as was our ability to visually hide the activities of one group from the other. The German chocolate cake was pretty good though and the general atmosphere of the place was characteristically energetic.

There were seven of us in attendance. We split into three groups: two Stakeholders, three Developers, and two QA. A latecomer joined the Developer group to make four. We decided we wanted to something other than the "better mousetrap" project and discussed several alternatives. We settled on a "robotic house cleaner" which turned out to be too ambitious for the time we had - a fact that lead us to some interesting insights. We propped two white boards up against the wall on top of a long bench which put them at eye level. The QA group used a third white board which they kept at their table. We did most of our work on the white boards except for our ArchitecturalSpike, initial time estimates and calculations for LoadFactor.

Stories and Spike

When doing the initial stories and architectural spike we were unsure about the level of interaction allowed between the Stakeholders and the Developers. Also, ExtremeHour doesn't assign any role to the QA group during this time. In retrospect I think we concluded that a great deal of communication is desirable during this phase. QA should ensure that the stories developed are testable at very least. The Developers clearly need the stories from the Stakeholders in order to do the spike. It seemed like it would be beneficial for the Developers to give feedback to the Stakeholders as to the clarity and granularity of the stories and for the Stakeholders to advise the Developers about the viability and scope of the initial spike. We found that the initial spike embodied architectural decisions that were virtually impossible to change later in the game. The initial stories and spike took us twenty minutes to complete instead of ten. This was our first indicator that the problem was too big for the time budget we had.

Write stories

The Stakeholders wrote stories about cleaning floors, automated operation and future expandability (does anyone from SVP have a copy of the actual stories?). They seemed to be driven by a desire to make a product that was both practical and marketable. The "marketability" requirements tended to be very vague (web, voice, and PalmPilot interfaces).

Architectural Spike

The Design team chose a pool cleaner as the system metaphor and concentrated on designing a self-propelled vacuum cleaner. The idea was to create a machine that would randomly roll through the entire house vacuuming all the different floor surfaces as it went. ExtremeHour specifies that we were supposed to model the "build environment" as well but we didn't understand what that meant. Perhaps PeterMerel can provide some insight.

Priority and Scope

It took us fifteen minutes to establish priorities and estimate times. The estimations took the longest by far. Some of us felt that risk was an important factor that should be included in this part of the process but ExtremeHour didn't specify how this was to be done and none of us could remember the XP procedure for dealing with risk during the PlanningGame.

Estimate Stories

We found during estimation that we needed several stories clarified and some broken down. We had to have an initial design for every story before we could estimate it. We had one story where the machine would have to be able to navigate from room to room. Initially we thought it would be extremely expensive until one of our developers came up with the idea of placing radio beacons throughout the house. We determined that this new design would be relatively easy to implement. We also added an engineering story for a battery pack to the list of stories. We felt that we needed this as a feature in order to implement many of the other major features and told the Stakeholders that it was mandatory. I don't know that this is allowed in XP, but I don't know how XP handles this situation if it isn't. The practice of breaking up stories that had long estimated times worked very well. It limited the amount of design work that went into each estimate which in turn limited the size of the designed artifact. It did, however, cause some interdependencies between stories and their estimates.

Prioritize Stories

I was in the developer group and don't have much insight into how the stories were prioritized other than that the seemed to have the idea of testing feasibility as a criteria for selecting stories. Perhaps one of the Stakeholders can provide more details on this section.

Commitment Schedule

We found that the developers wanted to suggest stories at this time in order to get better times. We also discovered that we wanted to re-estimate stories as they came up in different combinations - different combinations suggested different engineering approaches which resulted in markedly different time estimates. This was especially true when one feature depended on another. For instance there was a story that specified that the platform was to use a vacuum cleaner as its means of cleaning and another that specified that it was to clean a variety of different surfaces. We estimated the time to clean different surfaces based on the assumption that the vacuum cleaning function would already be there. The Stakeholders eliminated the vacuum requirement and we had to scramble to re-estimate the multi-surface function based on the new set of stories. It seems to me that this sort of interdependence among stories is quite common and I wonder how real XP projects address it. I expect it has something to do with the risk estimations that we tried but failed to use.

First Iteration

The Developers finished thirteen ideal minutes of work in 36 minutes for a nominal LoadFactor of 2. We built a vacuum cleaner on a triangular platform with two driver wheels and one steering wheel. Steering was controlled by a microprocessor that gathered input from three sensors on the bumper of the unit. It was powered by a rechargeable battery pack so we didn't have to figure out how to keep it from unplugging itself. We worked in pairs and built the parts that had to be tightly integrated first (we didn't plan it that way, this is just something that I noticed we did). When we did integration we did it as a group. The code for the microprocessor was written as a sequence of if-then statements.

At first testing was done entirely by the QA group: they made up tests and invented the results and started reporting them back to us. We decided to change the process so that QA was responsible only for describing the tests. The Developers then described how the product would respond to the tests and the Stakeholders would decide whether the tests passed. This worked to eliminate false requirements that were injected into the tests by QA and to ferret out bogus explanations by the Development team. Testing also went much faster when we did this largely because there was much less arguing.

There is a subtle pressure on the Stakeholders to pass tests. This comes from the fact that failed tests increase the LoadFactor and therefore reduce the budget of ideal minutes for the next iteration. This creates a tension between scope and quality for the Stakeholders. This is good because Stakeholders are ideally suited to resolve this tension.

ExtremeHour says that "Any story with bugs is incomplete and affects subsequent LoadFactor". We took that to mean that we should subtract the ideal minutes allocated to the story from the total ideal minutes credited to the Developers for the iteration. We failed one story worth three minutes so we ended up with only ten ideal minutes of work completed in thirty-six minutes for a real LoadFactor of 3.6. Dividing the forty minutes of real developer minutes allocated for the next iteration by 3.6 yielded about eleven ideal minutes of developer time to be allocated among stories.

Reschedule

The story that failed in the first iteration failed largely because the Developers didn't understand the it. Since the failure cost the Stakeholders two ideal minutes worth of new stories in the second iteration they were eager to ensure that the Developers understood all of the new and revised stories so that the next iteration would go more smoothly and ensure larger future time budgets. The result was much more communication between the two groups during the second planning session.

We found that the Developers wanted to suggest stories to the Stakeholders that fit better with the existing architecture than did the stories that the Stakeholders were coming up with. I think that to some extent the Stakeholder's stories contained design assumptions that were in conflict with the actual design. I wonder if there isn't some way to separate these design assumptions from the stories and thereby eliminate the desire of the Developers to change them while leaving intact the wishes of the Stakeholders. This is something that JimSawyer? is interested in. I wish he had been there to set us straight.

When we started estimating the new stories we found that some of them were very difficult to estimate. We also realized that some stories would take so long to complete that they were beyond the scope of the project. We developed a rule, belatedly, that we would reject any story that took longer than one minute to estimate. We didn't come up with a way to fix the story, we just decided that we'd wasted too much time on some of the stories and decided that the next time we did this we wouldn't make that mistake again.

Early Finish & Post-Mortem

Since we were running low on time we decided not to complete the second round of scheduling and development and to discuss our results instead. I think that overall we were a little frustrated that the estimation tasks took so long and that the QA group had so little to do. Limiting the number of stories or prioritizing the stories before giving them to Development for estimation might help with the first problem. Involving QA in the process of writing stories would help some with the second.

We talked about the estimation problems we had in the second iteration and decided that the problem space was just too big for the two hour session we had planned. In particular our original architectural spike couldn't support the scope of the stories that the Stakeholders were coming up with. We found that some of the stories would require a complete redesign to implement. Others opened up such a large array of design alternatives (all of them fairly complex) that we couldn't assess them effectively enough to come up with a time estimate we felt comfortable with. One requirement, to clean under a dining room table with the chairs pushed in, seemed very simple, but couldn't be implemented on the platform we'd originally built (the unit was too big to fit between the chairs and we didn't have any mechanism to support an attachment that would). It seems that certain design decisions have implications that touch every part of the system and cannot easily be refactored. I think that we could respond to this information by making an effort to identify these decisions early on and to either accept the limitations that they impose for the lifetime of the project or to make a separate project out of changing those limitations. Doesn't XP have a kind of iteration that concerns itself with this issue? -- PhilGoodwin
Thanks to PeterMerel for his suggestions on improving our process:

Timekeeping Trouble?

PM: Thanks for taking such wonderful notes on this Phil. In general I think letting activities run to completion seems to have tripped you up here. Letting the details of designing blow out the time allotted is more BigDesignUpFront, no offense intended, and not doing the second iteration it sounds like you had almost a waterfall kind of experience.

PG: We did do a lot of design work. It seemed to be necessary for estimating times. SteveMcConnell devotes almost an entire chapter on this subject in _Rapid Development_. That said, I think that you're right that we got sucked into doing too much early design. We all felt it, I think, but we didn't know how to get away from it. Maybe a firm timebox on estimating each story would be the way to go.

PM: In my ExtremeHour I was very strict about time - if they didn't get an activity "finished" in the allotted time, too bad, go with what you have so far and move on - mop it up in subsequent iterations. It may be that we were more focussed on playing the process than on the quality of the result, but to make an ExtremeHour happen you need, as coach/dungeonmaster, to keep your players from getting bogged down in details. Giving warnings at 5 minutes left and 1 minute left really helped supply my group with the appropriate sense of urgency.

PG: Right. We'll do it this way next time. We had a shortage of people so I played double duty as coach/developer. I think that the resulting conflict of interest hurt us. Someone else (on OTUG) suggested limiting the number of initial stories (to less than five!) I think that would help to make the time limit more palettable.

PM: This might sound unrealistic, but it's important to remember that no matter how "complete" the design gets, all you're doing is drawing. The only reality test for your drawing is what the QA folk come up with as a functional test - other details that come up in the developers' minds aren't really relevant except as they represent UnitTest(s). This is why doing at least two iterations is important, so that the developers get a chance to correct their drawings when functional tests say they're not adequate.

PG: I think that we got a lot of the flavor of two iterations because we spent so much time in our second planning session. I think, though, that the next time we try this (and I think we will), we'll focus on pushing through to getting the full experience of that second iteration.

PM: On the incomplete stories affecting LoadFactor, what we did was to estimate how incomplete they were on the fly - say the developers had something that satisfied most but not all of the FunctionalTest(s) for a large story, you wouldn't want to count out their entire estimate, just a small percentage of it.

PG: What we did worked out surprisingly well. It was quick, there were no arguments and it helped drive the process forward. If we wanted more accuracy we could have just subtracted out the estimated times for rework stories.
Build Environment

PM: On the build environment, my ExtremeHour group didn't worry about it either. What I had in mind was that they might think about the manufacturing facilities required for the product too. A mousetrap factory. In practice they didn't have stories about the factory so they didn't trouble with it.

Story Interdependence

PM: On story interdependence, what we generally do is note dependencies on story cards and then make sure they're satisfied when doing the CommitmentSchedule. During our XH the issue didn't really come up.

PG: That sounds completely practical. We could also pull out engineering stories based on anticipated design and assign dependencies to them as well. This sounds like a good argument for putting stories on cards.

PM: Yah, cards just have a perfect form factor for stories. During our XH, though, we didn't use 'em because we wanted to be able to present to a reasonably large room. Anyway DanRuskin? at WebSense? is trying to use an access database for his group instead, and I'm observing closely to see if that might make things more manageable. I'll ask him about the dependency issue and see what he says ...

Idle QA

PM: On your QA group having little to do, I believe this was because they weren't well separated from the developers. If they can't see what the developers are doing they can really get creative in their tests. Also the timekeeping issue would have had an impact here too - with only 10 minutes to think up and write down tests my QA team was kept very busy for the last 40 minutes of the hour.

PG: Lack of creativity was definitely not a problem for our QA team. The problem arose during the initial stories/architectural spike and later during prioritization/estimation. I think that giving them a role in the story writing is definitely the right thing to do. The prioritization/estimation timeslot probably wouldn't have been as much of a problem if we'd stuck to the schedule.

PM: Next time we do an XH I'll be careful to see whether the QA folk don't get enough load. I think we're looking at different styles of presentation - my problem was to get as many people off their butts and on the stage as I could, so the last thing I wanted to do was combine roles. As noted on ExtremeHour I had a separate tracker for timekeeping too.

Inadequate architecture

PM: On cleaning under the dining room table with the chairs in, why didn't you just draw a big claw on your robot to pull the chairs out? Or a long vacuum proboscis to snake around the legs? Or miniaturize and pluralize your robot - make it into a whole team of tiny cleaners? Hmm, that last one actually sounds frighteningly doable - think of a whole bunch of hot-wheels cars with brushes and scrapers mounted on 'em ... I think we have another MillionDollarIdea here ...

PG: We designed a fairly simple thing and then drew it in great detail. If we'd drawn any of your suggestions above to the same level of detail it would have taken much more time than we had. Well, except for the tiny cleaners. That really is a good idea...

This is one area where our group has continually questioned XP: it seems clear to us that there are some things, refactoring not withstanding, that are far more expensive to change than they are to build initially. Our experience with ExtremeHour seemed to demonstrate that not designing these things in up front can cause them to be prohibitively expensive later.

PM: This gets back to the IronGeek idea. I know where I'd place my bet, but I'd really like to try to present an IronGeek session, maybe at XP2K, and see what happens. Hmm. Okay, I guess we're just gonna have to do that now, right?

On your qualm, though, remember that the XH 10 minute architectural spike is supposed to define the level of detail. Sounds to me like you got way more detailed than a ten minute spike would have suggested. In real XP, I think the spike has to take enough time to make everyone comfortable with the things that are too expensive to change easily. But maybe one of our X-boffins will straighten this out.

Disappointing experience?

PM: In general, it sounds like the experience you had was kind of disappointing. I wonder whether, if you have a chance to try it being firmer about time-keeping, it might turn out to be more fun. Anyway, I think that'd be worth a try. But thanks again for investing so much effort in it!

PG: Hmmm, maybe I should edit the story so that it doesn't give that impression. In general we had a lot of fun. We did run into some difficulties but we found them to be some of the most interesting parts of what we did (second only to experiencing the synergy of the things that did work - and that can't be adequately described by merely writing amen on that last). -- PM