Embedded Systems

Embedded Agile: A Case Study In Numbers

Agile/XP advocate Nancy Van Schooenderwoert describes a developer lead-conversion to agile programming methods using a simple home-made unit test framework for development in the C language.

Although Agile methodologies have been in use since the late 1990's it
is still rare to find more than anecdotal evidence for how well they
really work. In evaluating whether to use agile methods, engineers have
very little on which to base a judgment unless they happen to have some
direct knowledge of an agile software project.

This paper describes a developer-led conversion to agile methodswhere the
software team themselves recorded detailed data throughout the project.
They used a very simple home-made unit test framework for development
in C. Since the close of that project the senior members of the
software team built a better unit test framework intended for doing
agile software development in C. This paper gives a brief overview of
the Catsrunner framework (CATS = C Automated Test System).

More analysis has been done on the data collected during the
project, and some additional work has been completed to compare the
team's results with the software industry in general. The purpose of
this paper is two-fold:

1) close the gap in
quantitative understanding of Agile methods for embedded software
development (at least, as much as can be done given that the scope
covers only one project). 2) describe a test framework
for embedded C that was developed based on our agile experience.

Project Context
The Grain Monitor System (GMS) project entailed building a ruggedized,
mobile spectrometer initially for farming applications. Using
spectroscopy principles the technology could quantify the components of
a material, e.g. how much protein is in a wheat sample.

The team size varied between 4 and 6 members, and development went
on for three years. The initial field units were ready about 6 months
into the project but so much was learned in the process of deploying
them with a partner farm equipment company, that the team continued on
to support further work and implement many more new features.

At the start of the project there were many unknowns and technology
risks that made it impossible to use waterfall techniques for this
work. They include:

1) New scientific algorithm
to decode near infra-red signals from grain samples 2) Early customer for new MPC555
microprocessor 3) First use of this operating
system by the team 4) First customer of operating
system port to 555 5) New prototype near infra-red
sensor hardware 6) Early algorithms use too
much MIPS for any known microprocessor 7) Must handle extremes of
temperature, vibration 8) Very low-noise circuitry
required 9) No experience with CAN
bus
protocol10) CAN bus protocol standard
not in finalized form 11) Difficulty getting early
MPC555 chips 12) Team lacked experience in
multitasked apps

The team used generic agile practices at first " strong unit tests,
iterations, common ownership of the code " and transitioned to full
Extreme Programming methods during the project. As with the vast
majority of agile teams, this one didn't implement every practice fully
or flawlessly.

This was a "green field" project. Using the practices described here
for legacy software would not be easy but might be worthwhile,
especially if it is a safety-critical application. The advantages of
agile methods for safety critical applications are covered in another
paper [2]. Whether to back-fit
agile unit tests to a legacy code base is a question that can only be
answered case by case, and is outside the scope of this paper.

Data Gathering Methods
My role in the project was as Software Technical Lead. As such I
compiled a list of all the defects that were found in integration test
or later stages, including any found after delivery to our customer
(the partner company that was conducting field trials of the GMS units
on real farm machinery). For each defect I wrote up a root cause
analysis at the time the defect was resolved.

We had independent testers engaged for the later part of the
project, but for most of it, the software team delivered their code
directly to internal users and to the partner company. Labor data was
reported weekly by the team members themselves, and tracked by the
team. The company's official time records were not available to us, and
weren't broken out in categories useful to us. Data on source code size
and cyclomatic complexity was obtained using C-Metric
v. 1.0 from Software Blacksmiths.

The Results
At the end of three years of development the product was fully ready
for manufacturing. There had been a grand total of 51 software defects
since the start of the project (see
Figure 1, below). There were never more than two open defects at
any one time throughout the project. The team had produced 29,500 (non
comment) lines of tested, working embedded code, plus several sets of
related utility software that is outside the scope of this paper.

Figure
1. Defects and software releases over three years

The embedded GMS C code was equivalent to 230 function points (per
the conversion given here.
The team's productivity in the first iteration was just under three
times the industry norm for embedded software teams. The team became
increasingly adept at delivering code on time according to the
commitments made at the start of each iteration. Early iteration
lengths varied from two to eight weeks but two weeks became typical,
and toward the end of the project, a new release could be turned around
in one day.

Labor
Team size varied during the project from 4 to 6 people, and is shown in
the below staffing profile, in Figure
2, below.

Figure
2. Staffing profile for the GMS project

The team used the following set of categories in Figure 3 below to track labor
through the first two years of the project. Year three's labor was not
tracked, but there is no reason to think it would vary much from the
rest.

Figure
3. Labor tracking categories used for GMS software

The labor distribution charts below in Figure 4, Figure 5, and Figure 6 give a view into the
activities of the first iteration, first full year (including first
iteration), and the second full year. Note that the labor in iteration
1 reflects the activities of a new team that has not worked together
before, e.g. much time spent working out team processes.

Figure
4. Team's labor distribution for first iteration

Figure
5. Team's labor distribution for first year of the project

Figure
6. Team's labor distribution for second year of project

Code Size
The code base for the GMS embedded software grew from zero to a raw
lines count of 60,638 (see Figure 7,
below). C-Metric does a count that omits blank lines, comment
lines, and lines with a single brace "}" on them. That filtered count
of "effective source lines of code" (ESLOC) was 29,500 for the software
at the end of the project. Short header files with long preambles, and
lengthy change history blocks in all files is mainly the cause for the
high percentage of non-code lines.

Figure
7. Growth of the code base over three years

It isn't possible to directly compute a figure for labor per line of
code for two reasons: Much of the coding was change activity, not net
additional code; and the team worked on utility applications to let
users create and load calibration tables, exercise the hardware for
test, or import new algorithm test data into our test harness. The
labor for those utility code bases was not broken out separately.

Schedule
Early in the project, before changing to Extreme Programming methods, the
team had difficulty delivering by a target date. There aren't any
figures to illustrate this. One of the reasons that Extreme Programming
seemed appealing is its practice called "the Planning Game", which
brings developers and management into partnership to negotiate the
deliverables for each iteration.

The early use of the Planning Game gave us some difficulty. That
experience is described in detail in an earlier paper [1]. Once the team mastered the
Planning Game technique, their releases were never more than a couple
days late unless there was some drastic unforeseen circumstance
(happened only once).

Defects
The defect rate remained fairly constant over the development period,
despite the growing size of the code base. The team averaged about 1.5
defects per month. The open bug list never held more than two items all
through development. In the below
Figure 8, defects are grouped according to the quarter in which
they were reported.

Figure
8. Absolute number of defects per quarter

Because the defect rate stayed low, independent of the code size, I
conclude that the team's techniques of software development were
effective at handling complexity. C-Metric was used to take a look at
cyclomatic complexity. Four of the later releases were analyzed and the
result was an average cyclomatic complexity of 6 or 7 for each of the
releases. For more detail on this metric in our code, refer to [3].

Although we used agile development, the software still had phases
such as detailed design, coding, test, and so on. In agile there is a
tight loop of doing requirements, design and coding all in short
increments of time so that you can re-run your unit tests about every
10 to 30 minutes.

Illustrated in Figure 9below
is a look at the phase where
bugs were inserted and where they were found. This information comes
from the root cause analysis of each defect. More discussion on the
nature of the defects found is given in [3].

Figure
9. Defect life span, year 1 of project

It should be mentioned that the numerous software releases shown
toward the end of the project (in Figure
1) do not represent panicky bug fix activity. Rather this was
the software team creating custom releases to help electrical and
optics engineers to isolate difficult system-level problems that only
appeared when the whole system was running. The software was very
stable and the team could deliver well-tested releases on a 1-day
turnaround.

Comparison With Software Industry
I was able to make use of three industry sources of data for comparison
of this team's performance. The first two are covered briefly since
they will not be generally available to readers for measuring their own
team's capability. The third (the data from Capers Jones) is something
that anyone can make use of if their code can be characterized in terms
of function points. This paper will therefore discuss that in some
detail.

SEER SEM Estimation Data.
Before the start of the project, our management considered an
estimating tool called SEER SEM from Galorath.
Consultants from that company did an estimate as part of demonstrating
the tool. It gave a breakdown of staffers needed for each waterfall
style phase and the hours that would be used by each, all based on a
figure for lines of code at completion, which they got from me.

The one thing the prediction software could not foresee is the
completed size of the application. The point is that with this data I
could figure out the value for ESLOC/developer-hour that their database
uses for this type of project. It was 1.2 ESLOC/hour. That's for fully
tested, working embedded code in C. When iteration 1 was complete, the
numbers showed the team had delivered 3.5 ESLOC/hour, or 292% of the
industry norm, as given by Galorath's database. 9.2.

QSM Industry Data.QSM
Associates Inc. also supplies software planning tools, and
used to offer a free service via their website to compare your team's
project data with their database of thousands of projects. I took the
opportunity to input data for our iteration 1, such as the number of
people on the team, duration, lines of code delivered, defects found,
etc. The result was that the "Productivity Index" they calculated for
the GMS Iteration 1 ranked us in the 90th percentile! This index, as
they compute it, covers code complexity (based on size), schedule,
efficiency, effort, and reliability.

The only thing necessary for anyone to compare their team's data
with the information from Capers Jones is to be able to state their
defects per function point. We did not count function points in our
project. Knowing the ESLOC, you can simply look up a conversion to
function points on the SPR website. See
http://www.theadvisors.com/langcomparison.htm

The data in Figure 12 can
be expressed in terms of defects delivered to the customer. The "Best
In Class" software teams had 2.0 defects per function point (FP), and a
defect removal efficiency of 95%. Defects to customer = Total FP *
defects per FP * (1.0 - defect removal efficiency).

Table
1. Defects delivered to customer per Capers Jones, tabular form

Let's look at how the "Best In Class" teams would perform if their
code was the same size as GMS, that is, 230 function points. Their
total
number of defects would be 230 * 2, or 460. Then they'd remove 95% of
those: 460 *(1.0 " 0.95) = 23 per Table
1, above. They would deliver 23 defects to the customer. The GMS
embedded team delivered 21 bugs to their customer, according to Figures 9 - 11.

How to Achieve These Results for
Your Team
Lean Thinking is the fundamental concept underlying Agile software
development practices [4]. The
two essentials you must have in place to succeed with this approach
are:

1) You must match the amount
of work undertaken to your capacity 2) You must mistake-proof the
steps you use to produce the work

The first item is satisfied by using agile iteration planning
techniques, and is outside the scope of this paper. For a developer-led
agile conversion, regulation of the work stream is often very difficult
to achieve because management must support it " or at least tolerate
it. The second item is covered by a previous paper on agile test
techniques for embedded software [5].

The remaining sections of this paper discuss the most powerful way
of mistake-proofing your software:; the use of an appropriate test
harness to efficiently catch bugs early.

Dual-platform Unit Testing as Key
For embedded software the hardware represents an extra dimension that
must be addressed in the testing strategy. The GMS team built all the
code as "dual target" software. It could run on a desktop PC as well as
on the target MPC555 microprocessor, through the use of compile-time
switches.

This strategy allowed the software to be tested first on the PC
where hardware was stable. Timing would be incorrect but the logic
could be fully exercised. Other compile-time switches would bypass
sensor hardware and inject dummy grain data to drive computations.

The team's unit tests consisted of a conditionally-compiled "main()"
within each file that held a set of related functions. This 'tester
main' had calls to each function in the module, often multiple calls to
the same function but with parameters intended to test boundary cases.

There were perl scripts to execute the 'tester main' routines of all
the modules and report the pass-fail status of all the tests. This
simple test framework had tests designed to run on both platforms, and
was used throughout the duration of the project.

Catsrunner " A Better Technique
The experience gained via the simple unit test framework of GMS led, a
few years later, to the development of Catsrunner and CATS (C Automated
Test System) by the partners at Agile Rules, some of whom were on the
GMS project. Catsrunner has a more consistent way of inputting test
parameters, and its output is easier to interpret. It allows separation
of test code from production code. Also it behaves exactly the same on
the PC and the target platform.

In short, it's the test framework we wish we'd had time to write
during the GMS project. Catsrunner is a C software unit and acceptance
testing suite based on CATS (see
Figure 11, below). CATS is a cross-platform testing framework
for C, especially designed to work well in embedded and multi-platform
environments. Catsrunner provides the wrapper that calls the test and
reports the results. Catsrunner is open source software released under
GPL 2.0. See [6] for
downloading the Catsrunner software.

Catsrunner does three basic things:
1) Reads, from the host PC, a list of unit tests to be run; 2) Runs each unit test, in turn, and
3) Sends the results of each
test back to the host PC The middle step " running each unit test " can
occur on either platform. Platform is determined by environment
variable settings when building the Catsrunner executable. The present
version of Catsrunner runs on a PC and on an ARM7 core.

Catsrunner calls CATS, which looks up the name of the test in a
table holding pointers to the testing functions. At the heart of the
CATS unit testing framework is an array of structures associating the
names of functions with pointers to those functions.

When the name of a test function and its input parameters are passed
to CATS, it looks up the function name in this array. When Catsrunner
executes on the target hardware, it must communicate with the host to
know which test to run next, and then to store the result of the test.

A module named "Hostops" is part of Catsrunner, and in the case of
the ARM7 target, hostops makes use of the Angel background debug
monitor to accomplish the data transfer to the host. A user wishing to
port Catsrunner to a new target will have to create a version of
hostops that makes use of its I/O capabilities to do the equivalent
data transfers.

A Catsrunner Test Examined
Catsrunner's approach to testing divides all the software into two
categories: software that is inherently platform-independent, and
software that "touches hardware". Platform-independent code can easily
be run in an automated fashion but when software drives a motor or
turns on a LED, the result of that test cannot be captured without
special test hardware (which was out of the question for us).

When testing hardware-related code on the target platform, we used
manual tests. That is, the test code is contained in the unit test
file, but when testing actual hardware we'd step through it by hand to
watch the behavior of the hardware. Catsrunner uses this philosophy, as
illustrated in Figure 12 below
("pure software" indicates platform-independent code).

Figure
12. Unit test concept for software that drives hardware

When testing hardware-related software on the PC, we'd capture
outputs that would otherwise go to hardware, and the tester code could
validate their correctness. For sensor input data, we'd just bring in
dummy data in order to let the software continue on.

These practices are reflected in the code by having some modules
with layered directories. For a LED module, the main directory would
contain the platform-independent parts of the code and be called "led".
Below that are directories for each platform, in this case ARM and PC,
which contain functions having the same names which are implemented
differently on the platforms.

The linker will bring in the platform-independent code from "led"
directory, and only one of the code sets from the lower directories,
either "ARCH_ARM" or "ARCH_PC". The prefix "ARCH" indicates
architecture-specific software. The directory layout is illustrated in Figure 13 below.

Figure
13. Directory of LED driver

It would seem that manually stepping through hardware-related code
would slow development unacceptably. In practice, the GMS team found it
to be no problem because those parts of the code changed little once
they were written, and they were well encapsulated. (The team used a
more primitive test framework that had this same philosophy for testing
hardware-related code.)

This has been a brief introduction to the Catsrunner agile test
framework. A complete user manual with much more detail is available
with the open source download package
[6].

Conclusion
The GMS team was a group of ordinary developers who achieved highly
extraordinary results through the power of an idea. The team did not
work excessive hours. Most needed to learn some significant skill on
the job. They didn't follow the agile practices 100%, and didn't have
any outside coaching or mentoring in how to use agile development
practices.

It has been said that in order to do Extreme Programming you need a
team of hand-picked gurus. Not so. All you need is people empowered to
govern their work. The powerful idea is simply this: If you make it
easier to find bugs than it is to create new ones, you have the
possibility of producing bug-free software.

Bug-free software lets you build trust with your sponsors and
customers, spend more of your time productively (troubleshooting is
waste!), and stay in control of your project. These results are within
reach for every software team whose management will support sufficient
empowerment.

Nancy Van Schooenderwoert of Agile Rules / XP
Embedded, has extensive
experience in building large-scale, real-time systems for flight
simulation and ship sonars, as well as software development for
safety-critical applications such as factory machine control and
medical devices.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!