Fit's standard interpretation tells us how well a program does against a set of test cases. We can design new semantics for reporters (that give us interesting information) and for rewriters (that make interesting transformations of our tests).

The Standard Interpretation

A Fit test looks like a table (or a series of tables) in an HTML document:

Calculator

x

y

plus()

times()

2

1

3

2

0

3

3

0

2

-1

1

-2

fit.ActionFixture

start

ScientificCalculator

enter

value

2

press

plus

enter

value

1

press

equals

check

value

3

But what does a Fit test mean?

This is almost a trick question. One answer is: whatever people agree it means. Another answer is: whatever the Fit framework, the fixtures, and the application make it mean. These two things don't always line up, unfortunately.

Even when we think we're talking about the same interpretation, there can be differences in what a test means. One version of the Calculator fixture could work with the business objects; another could work at a presentation layer; another could work by manipulating the user interface. These fixtures might find different types of defects; they might be easier or harder to write; and so on.

These all generally assume a standard interpretation: use Fit to run the fixture on the table. The good side of this is that it gives us something we care about: an answer to the question, "How well does our program work?"

I'd like to move to a different question:

What can we know about our tests if we don't understand the fixtures?

Alternative Semantics

A different interpretation of the directory containing my test above might yield this table as a result:

Index

Fixture

File

Table Number

Calculator

MyTest.html

1

Calculator

OtherTest.html

3

fit.ActionFixture

MyTest.html

2

This is a cross-reference chart, showing where every fixture is used. I can get this information with no reference to the fixture implementation at all: it's just an index of the first cell of each table.

We can find out other interesting things with just a little knowledge of fixtures. For example, if we know that "Calculator" above is a ColumnFixture, then we know that the first row consists of input and outputs. Even without knowing anything about how the fixture is implemented, we can create another interesting table:

Vocabulary

Fixture

Field

Value

Calculator

plus()

1

Calculator

plus()

3

Calculator

times()

-2

Calculator

times()

0

Calculator

times()

2

Calculator

x

0

Calculator

x

2

Calculator

y

-1

Calculator

y

1

Calculator

y

3

This table gives you a picture of the input and output domains. A good tester could look at this and notice all kinds of things that weren't tested:

no large or small numbers (in either the input or output)

no non-numbers

no sums that resulted in 0

no sums that were negative

no sums that were even

only even numbers for x, odd numbers for y

minus(), divide()

etc.

We could create a tool that would give us a domain analysis of our test data:

Test Data for Calculator

Field

Max Neg

Neg

Zero

Pos

Even

Odd

Max Pos

plus()

no

no

no

yes

no

yes

no

times()

no

yes

yes

yes

yes

no

no

x

no

no

yes

yes

yes

no

no

y

no

yes

no

yes

no

yes

no

Reporters and Rewriters

Two types of interpretations stand out for me:

Reporters – tell you something about a set of tests

Rewriters – change a set of tests

The Index and Vocabulary examples above are reporters. The standard Fit interpretation is close to being a rewriter: it produces a version of the input file, modified to show the results of the tests. (The only thing that keeps it from being a true rewriter is that it leaves the original file in place, so you can run it again.)

Here are some more ideas for useful tools:

Count test cases- A reporter telling the total number of test cases for a fixture

Count fixture – A reporter telling how many times a fixture occurs

Operators – A reporter telling the column names in RowFixtures and ColumnFixtures (or the second column of ActionFixtures)

Column changer – A rewriter that can rename, delete, or insert a column

Cell rewriter – A rewriter that can change cell values (for a specific fixture in a specific column or row)

(I've created a simple version of the Index example above using the AllFiles fixture; the others are speculation.)

Reflection is OK

There are useful semantics that don't try to interpret a fixture, but it can help to "peek" a little. For example, knowing that something is a ColumnFixture tells us that it's likely that the row after the fixture name consists of input and output fields. We can use this information fruitfully. The Vocabulary example above made use of this knowledge.

Furthermore, there is nothing wrong with getting help. If someone had a new type of fixture that subclassed Fixture, but still had ColumnFixture-like semantics, they could provide a helper analysis class that would let us know this.

The goal is not to avoid using fixture-aware code, it's just to avoid the quagmire of trying to interpret another program.

Call to Action

We've had a few years to work with Fit. People are creating test suites large enough to be interesting, and large enough that they need help managing them.

It's time to experiment with new interpretations of Fit tests. (We still may use Fit to help with this task.) The need is there now, by real people doing real work.

Starting Fit

To use fit, you create a web page that has tables in it; the tables specify tests. (There are other options but that is easiest.) In this case, I'm using Microsoft Word™ and saving the file in HTML format.

The fit FileRunner acts as a filter: given a web page, it copies text outside of tables as is, and runs your program on the table entries. Some table entries represent tests that can pass or fail; fit colors them green or red respectively. The output is another HTML file.

Fit will also put a summary in the file if you put in a table like this:

fit.Summary

counts

0 right, 0 wrong, 0 ignored, 0 exceptions

input file

C:\P4\FirstFit\fit\FirstFit-in.htm

input update

Thu May 01 10:51:42 EDT 2003

output file

C:\P4\FirstFit\fit\FirstFit-out.htm

run date

Thu May 01 10:58:28 EDT 2003

run elapsed time

0:00.05

With this tool, you don’t manipulate screen elements directly. Instead, you work with an abstraction of them. To me, it feels like talking to somebody over the phone, trying to tell them how to use an application. (“In cell cee seventeen, put equals a one; then go to a one and type ‘fish’.”)

This article shows the input to[output from] fit; the result of running it is here [input is here].

Programming and Configuration Notes

Fit is a tool for customers and testers, but programmers will use it as well, and will have to write some of the fixtures the team uses. In this paper, I’ve tried to use the framework mostly straight out of the box.

The CLASSPATH needs to include fit.jar (both in the DOS window and the IDE). The runner command I’m using is:

javafit.FileRunner FirstFit-in.htm FirstFit-out.htm

When I do this on the file I have so far, it creates the output file and writes this to the console:

0 right, 0 wrong, 0 ignored, 0 exceptions

Fixtures

Tables in the input file have the name of a fixture in the first row. A fixture is a class that knows how to process the table. Fit comes with several fixtures built in, and programmers can create others.

One simple fixture is the ColumnFixture. In this fixture, the first row is the fixture name, and the second row has the names of data. If a name ends without parentheses, it is regarded as a field to fill in; with parentheses, it’s treated as a method (function) call. The fixture fills in all the data fields, and then calls the methods to verify that they return the expected results.

Another standard fixture is the ActionFixture. This one consists of a series of commands. These include:

start classname: Creates an object of the specified class

enter field value: Sets the field to the value

press button-name: Calls the method corresponding to the button

check method value: Checks that the method returns the expected value

The ActionFixture ignores anything past the first three columns; we’ll use the fourth column for comments.

So, we’re finally ready to start our application.

fit.ActionFixture

start

Spreadsheet

Create a new spreadsheet.

This test doesn’t ask for much, but of course it fails. (There isn’t any code yet!)

0 right, 0 wrong, 0 ignored, 1 exceptions

Programmer Notes

The exception is thrown because the Spreadsheet object doesn’t exist. To create it as simply as possible, make it extend Fixture:

import fit.Fixture;

public class Spreadsheet extends Fixture {}

This gets us back to

0 right, 0 wrong, 0 ignored, 0 exceptions

I’ve put together stubs for the fixtures used in this article: Spreadsheet.java, SpreadsheetFormula.java, and Address.java; here’s a zip file containing all three.

A Few Stories

We have several things we want our spreadsheet to do:

Track the contents of cells

Distinguish data from formulas

Provide both data and formula views of cells

Support “+” for appending strings, “’” for reversing strings, “()” for grouping, and “>” for string containment.

Cells

The spreadsheet has a number of cells, each of which has an address. Cells contain string data or formulas.

We’ll assume several screen elements:

address – the address we’re working with; something like “B19”

cell – the cell contents we enter (to the last “address”)

formula – the cell contents as entered (for the last “address”)

display – the cell contents as seen when the formulas are applied (for the last “address”)

We’ll start with a simple data cell.

fit.ActionFixture

Comments

start

Spreadsheet

enter

a1

abc

check

a1

abc

Text in cell

check

formula

abc

Formula is same. (Looks in last-mentioned cell.)

Now let’s add in a formula cell. (Note that this table omits the “start” line; this means it’s working on the same object as before. This lets us not repeat the setup, but it also makes the tests less independent.)

fit.ActionFixture

Comments

enter

a1

abc

enter

b1

=A1

Simple copying formula

check

formula

=A1

Formula is there

check

a1

abc

Original text in A1

check

b1

abc

Text was copied to B1

The essence of a spreadsheet is the automatic updates. Let’s change A1 and see it happen.

fit.ActionFixture

Comments

enter

a1

abc

enter

b1

=A1

Simple copying formula

check

b1

abc

Copied value

enter

a1

revised

Update A1

check

b1

revised

Automatically updates B1

We already have quite a few elements in use, though we haven’t specified exactly what is valid. Let’s just note the “specification debt” and move on.

What can a cell hold? Empty string, other string, formula starts with “=”

What’s a valid formula? So far, we’ve just used a simple cell reference, but we want operators too.

What happens when a cell has an invalid formula?

What happens when a cell refers to a cell containing a formula?

What happens when formulas form a loop?

We’ll pursue all these, but let’s start with formulas.

Formulas

Formulas can reference formulas.

SpreadsheetFormula

a1

b1

c1

d1

a1()

b1()

c1()

d1()

data

=A1

=B1

=C1

data

data

data

data

Formulas get more interesting when there are operators available. The reverse operator (‘) is probably a good one to start with.

SpreadsheetFormula

a1

b1

b1()

abc

=A1'

cba

abc

=A1''''

abc

The most useful string operator is probably append (+):

SpreadsheetFormula

a1

b1

c1

b1()

c1()

abc

=A1+A1

blank

abcabc

blank

abc

def

=A1+B1+B1+A1

def

abcdefdefabc

We have enough features that we can demonstrate an identity: (XY)’=Y’X’. We don’t have parentheses yet, but we can simulate this by putting the parts in separate cells.

SpreadsheetFormula

a1

b1

c1

d1

e1

d1()

e1()

abc

xyz

ignored

=A1+B1

=D1'

abcxyz

zyxcba

abc

xyz

=B1'

=A1'

=C1+D1

cba

zyxcba

Parentheses can be used to group operators. Let’s re-do the previous test, allowing parentheses:

SpreadsheetFormula

a1

b1

c1

c1()

abc

xyz

=(A1+B1)'

zyxcba

abc

xyz

=B1'+A1'

zyxcba

The operator “>” tells whether one string contains another one. If the first string contains the second, the result is the second. If the first string doesn’t contain the second, the result is an empty string.

SpreadsheetFormula

a1

b1

c1

c1()

banana

ana

=A1>B1

ana

banana

bab

=A1>B1

blank

We haven’t talked about precedence yet. The ‘ and () operators have the highest precedence, then +, then >. A1+B1+C1 is a legal expression, but A1>B1>C1 is not.

SpreadsheetFormula

a1

b1

c1

c1()

abc

xyz

=A1+B1'

abczyx

abc

xyz

=(A1+B1)'

zyxcba

SpreadsheetFormula

a1

b1

c1

d1

e1

e1()

abcdef

ghijkl

e

hgf

=A1+B1>C1+D1'

efgh

Backfill

We have several questions left open:

What can a cell hold? Empty string, other string, formula starts with “=”

The previous tests made a quick pass through the system. I think of them as generative: they help define the essence of the system. But questions like the above require us to fill in the gaps. I think of tests that do things like check “corner cases,” error cases, and how features interact as elaborative; they fill in what we already have. They might find problems, but they may well work already, depending on how the system was built.

What a cell holds

We already have test cases where a cell holds a string, and where a cell holds a formula, but it would be prudent to check that the operators work correctly on empty strings. If e is the empty string and x is a non-empty string, we expect:

e’ = e

e+e=e

e+x=x

x+e=x

e>e=e

e>x=e

x>e=e

As I go to write the test, I realize that we never specified what a cell starts with. The answer, of course, is the empty string. So we’ll rely on that: A1 will be empty.

fit.ActionFixture

Comments

start

Spreadsheet

check

a1

Verify that cell starts empty.

Then we can verify those rules about working with the empty string:

SpreadsheetFormula

a1

b1

c1

c1()

Comment

blank

blank

=A1'

blank

e’=e

blank

blank

=A1+A1

blank

e+e=e

blank

blank

=A1>A1

blank

e>e=e

blank

abc

=A1+B1

abc

e+x=x

blank

abc

=B1+A1

abc

x+e=x

blank

abc

=A1>B1

blank

e>x=e

blank

abc

=B1>A1

blank

x>e=e

Valid Addresses

There are two places we use addresses: in the address field and in the cells with formulas. When we get a “real” (graphical) interface, the address will mostly be implicit. But even so, we’ll test it here just to be safe.

Let’s use a ColumnFixture for this: we’ll put address in one column, valid() in another, and standardized() in another. (A programmer will have to write the new fixture for us.)

The rules are: a valid address is a letter (A-Z, a-z) followed by one or more digits (0-9). Case is ignored. Leading 0s are ignored. “0” is not a valid row number.

Address

address

valid()

standardized()

A1

true

A1

a1

true

A1

A9874

true

A9874

Z1

true

Z1

z1

true

Z1

Z3992

true

Z3992

z3992

true

Z3992

AA393

false

zX202

false

é17

false

1

false

~1

false

~D1

false

y&1

false

^

false

X392%

false

H001

true

H1

j00010

true

J10

e000

false

A0

false

z0

false

Let’s make sure that case-insensitivity works in formulas:

SpreadsheetFormula

a1

b1

b1()

abc

=A1+a1

abcabc

Formula Errors

If a formula contains an error, we’d like it to display as “#error.” We’ll put all the invalid names from the previous table into formulas, and verify that formulas behave correctly. Then we’ll try various improper combinations of operators.

fit.ActionFixture

start

Spreadsheet

Create a new spreadsheet.

enter

a1

=AA393

Bad address

check

a1

#error

Marked as error

check

formula

=AA393

Formula as written

enter

a1

=A2

Change to valid address

check

a1

Make sure #error is cleared

SpreadsheetFormula

a1

a1()

Comment

=zX202

#error

Two letters

=é17

#error

expected

? actual

Non-ASCII

=1

#error

No letters

=~1

#error

No letters

=~D1

#error

Unacceptable character

=y&1

#error

Extra character

=^

#error

No letters/digits

=e000

#error

expected

? actual

Too many digits

=A0

#error

expected

? actual

Invalid row #

=z0

#error

expected

? actual

Invalid row #

=

#error

Missing formula

Then we’ll get to some operators:

SpreadsheetFormula

a1

a1()

Comment

='A2

#error

' should be postfix

='A2'

#error

Can’t be before and after

=A2+

#error

Need other term

=A3+A4+

#error

Need other term

=A2++A3

#error

Missing term

=A2+'+A3

#error

‘ isn’t a term

=A2'''+A3

blank

OK to mix things

=A2)

#error

Missing (

=(A2

#error

Missing (

=((((((((((((A2))))))))))))

blank

OK – big expression

=((((((A2+(A3))))+A4)

#error

Unbalanced – too few )

=(((A2>A3

#error

Unbalanced – too few )

=(A2>A3)))

#error

Unbalanced – too many )

=A2>A3>

#error

Can’t trail >

=A2>A3>A4

#error

Can’t repeat >

Loops

If a formula uses itself (directly or indirectly), we don’t want it to loop forever trying to figure it out. Instead, we’d like the display to be “#loop.”

SpreadsheetFormula

a1

b1

c1

d1

e1

a1()

e1()

=A1

blank

blank

blank

blank

#loop

blank

=B1

=C1

=F1+D1

=E1

no-loop

no-loop

no-loop

=B1

=C1

=F1+D1

=E1

=A1

#loop

#loop

Conclusions

This paper has demonstrated a set of tests using the fit acceptance testing framework. Some things to note:

The tests here have been written as if a customer specified them, without much demonstration of the programming cycle. But programmers can work with these tests in much the way they would with JUnit.

The tests are written without benefit of the feedback of a working system. (I wrote just enough code to make the first test not throw an exception.)

The tests look at only part of the system: the core functionality. There are other aspects of a real application that we aren’t testing. (For example, it may be non-trivial to connect a screen to the core code.)

Even a small application such as this requires a fairly large set of tests. With more programming work on the fixtures, we might be able to reduce some of the noise. Real applications will organize tests into multiple files, and will have to pay more attention to the challenges of consistency, test independence, and feature interaction.

I’ve heard that many teams use xUnit for unit testing, but still struggle to get customer tests before or even after stories are implemented. I hope frameworks such as fit can help lower the barriers to doing this crucial task.

fit.Summary

counts

94 right, 4 wrong, 0 ignored, 0 exceptions

input file

C:\P4\FirstFit\fit\FirstFit-in.htm

input update

Thu May 01 10:51:42 EDT 2003

output file

C:\P4\FirstFit\fit\FirstFit-out.htm

run date

Thu May 01 10:58:28 EDT 2003

run elapsed time

0:00.14

Resources and Related Articles

[Written April 20, 2003; revised April 26, 2003, to correct mis-stated identity & in response to Ward Cunningham's great suggestions about improving the fixtures; 2012 – the WordPress version is designed to simulate the original look.]

Ward Cunningham has created an acceptance testing framework known as fit. (See http://fit.c2.com for more details.) In this brief experiment, we'll use tests to help specify a simple spreadsheet for strings.

Starting Fit

To use fit, you create a web page that has tables in it; the tables specify tests. (There are other options but that is easiest.) In this case, I'm using Microsoft Word(tm) and saving the file in HTML format.

The fit FileRunner acts as a filter: given a web page, it copies text outside of tables as is, and runs your program on the table entries. Some table entries represent tests that can pass or fail; fit colors them green or red respectively. The output is another HTML file.

Fit will also put a summary in the file if you put in a table like this:

fit.Summary

With this tool, you don't manipulate screen elements directly. Instead, you work with an abstraction of them. To me, it feels like talking to somebody over the phone, trying to tell them how to use an application. ("In cell cee seventeen, put equals a one; then go to a one and type 'fish'.")

This article shows the input to fit; the result of running it is here.

Programming and Configuration Notes

Fit is a tool for customers and testers, but programmers will use it as well, and will have to write some of the fixtures the team uses. In this paper, I've tried to use the framework mostly straight out of the box.

The CLASSPATH needs to include fit.jar (both in the DOS window and the IDE). The runner command I'm using is:

javafit.FileRunner FirstFit-in.htm FirstFit-out.htm

When I do this on the file I have so far, it creates the output file and writes this to the console:

0 right, 0 wrong, 0 ignored, 0 exceptions

Fixtures

Tables in the input file have the name of a fixture in the first row. A fixture is a class that knows how to process the table. Fit comes with several fixtures built in, and programmers can create others.

One simple fixture is the ColumnFixture. In this fixture, the first row is the fixture name, and the second row has the names of data. If a name ends without parentheses, it is regarded as a field to fill in; with parentheses, it's treated as a method (function) call. The fixture fills in all the data fields, and then calls the methods to verify that they return the expected results.

Another standard fixture is the ActionFixture. This one consists of a series of commands. These include:

start classname: Creates an object of the specified class

enter field value: Sets the field to the value

press button-name: Calls the method corresponding to the button

check method value: Checks that the method returns the expected value

The ActionFixture ignores anything past the first three columns; we'll use the fourth column for comments.

So, we're finally ready to start our application.

fit.ActionFixture

start

Spreadsheet

Create a new spreadsheet.

This test doesn't ask for much, but of course it fails. (There isn't any code yet!)

0 right, 0 wrong, 0 ignored, 1 exceptions

Programmer Notes

The exception is thrown because the Spreadsheet object doesn't exist. To create it as simply as possible, make it extend Fixture:

import fit.Fixture;

public class Spreadsheet extends Fixture {}

This gets us back to

0 right, 0 wrong, 0 ignored, 0 exceptions

I've put together stubs for the fixtures used in this article: Spreadsheet.java, SpreadsheetFormula.java, and Address.java; here's a zip file containing all three.

A Few Stories

We have several things we want our spreadsheet to do:

Track the contents of cells

Distinguish data from formulas

Provide both data and formula views of cells

Support "+" for appending strings, "'" for reversing strings, "()" for grouping, and ">" for string containment.

Cells

The spreadsheet has a number of cells, each of which has an address. Cells contain string data or formulas.

We'll assume several screen elements:

a1 – the cell "A1". For "enter," we'll put something in the cell; for "check," we'll get its displayed value.

b1 – the same for cell "B1".

formula – the formula of the last-mentioned cell.

We'll start with a simple data cell.

fit.ActionFixture

Comments

start

Spreadsheet

enter

a1

abc

check

a1

abc

Text in cell

check

formula

abc

Formula is same. (Looks in last-mentioned cell.)

Now let's add in a formula cell. (Note that this table omits the "start" line; this means it's working on the same object as before. This lets us not repeat the setup, but it also makes the tests less independent.)

fit.ActionFixture

Comments

enter

a1

abc

enter

b1

=A1

Simple copying formula

check

formula

=A1

Formula is there

check

a1

abc

Original text in A1

check

b1

abc

Text was copied to B1

The essence of a spreadsheet is the automatic updates. Let's change A1 and see it happen.

fit.ActionFixture

Comments

enter

a1

abc

enter

b1

=A1

Simple copying formula

check

b1

abc

Copied value

enter

a1

revised

Update A1

check

b1

revised

Automatically updates B1

We already have quite a few elements in use, though we haven't specified exactly what is valid. Let's just note the "specification debt" and move on.

What can a cell hold? Empty string, other string, formula starts with "="

What's a valid formula? So far, we've just used a simple cell reference, but we want operators too.

What happens when a cell has an invalid formula?

What happens when a cell refers to a cell containing a formula?

What happens when formulas form a loop?

We'll pursue all these, but let's start with formulas.

Formulas

Formulas can reference formulas. We'll use a new ColumnFixture, SpreadsheetFormula, that lets us specify the inputs and expected outputs of cells. This fixture should access the same type spreadsheet as used by Spreadsheet.

SpreadsheetFormula

a1

b1

c1

d1

a1()

b1()

c1()

d1()

data

=A1

=B1

=C1

data

data

data

data

Formulas get more interesting when there are operators available. The reverse operator (') is probably a good one to start with.

SpreadsheetFormula

a1

b1

b1()

abc

=A1'

cba

abc

=A1''''

abc

The most useful string operator is probably append (+). Fit ignores input cells that are left blank, so we'll explicitly use the word "blank" when we want an empty cell. The fixture will have to take this into account.

SpreadsheetFormula

a1

b1

c1

b1()

c1()

abc

=A1+A1

blank

abcabc

abc

def

=A1+B1+B1+A1

def

abcdefdefabc

We have enough features that we can demonstrate an identity: (XY)'=Y'X'. We don't have parentheses yet, but we can simulate this by putting the parts in separate cells.

SpreadsheetFormula

a1

b1

c1

d1

e1

d1()

e1()

abc

xyz

ignored

=A1+B1

=D1'

abcxyz

zyxcba

abc

xyz

=B1'

=A1'

=C1+D1

cba

zyxcba

Parentheses can be used to group operators. Let's re-do the previous test, allowing parentheses:

SpreadsheetFormula

a1

b1

c1

c1()

abc

xyz

=(A1+B1)'

zyxcba

abc

xyz

=B1'+A1'

zyxcba

The operator ">" tells whether one string contains another one. If the first string contains the second, the result is the second. If the first string doesn't contain the second, the result is an empty string.

SpreadsheetFormula

a1

b1

c1

c1()

banana

ana

=A1>B1

ana

banana

bab

=A1>B1

We haven't talked about precedence yet. The ' and () operators have the highest precedence, then +, then >. A1+B1+C1 is a legal expression, but A1>B1>C1 is not.

SpreadsheetFormula

a1

b1

c1

c1()

abc

xyz

=A1+B1'

abczyx

abc

xyz

=(A1+B1)'

zyxcba

SpreadsheetFormula

a1

b1

c1

d1

e1

e1()

abcdef

ghijkl

e

hgf

=A1+B1>C1+D1'

efgh

Filling in the Gaps

We have several questions left open:

What can a cell hold? Empty string, other string, formula starts with "="

The previous tests made a quick pass through the system. I think of them as generative: they help define the essence of the system. But questions like the above require us to fill in the gaps. I think of tests that do things like check "corner cases," error cases, and how features interact as elaborative; they fill in what we already have. They might find problems, but they may well work already, depending on how the system was built.

What a cell holds

We already have test cases where a cell holds a string, and where a cell holds a formula, but it would be prudent to check that the operators work correctly on empty strings. If e is the empty string and x is a non-empty string, we expect:

e' = ee+e=ee+x=xx+e=x
e>e=e
e>x=e
x>e=e

As I go to write the test, I realize that we never specified what a cell starts with. The answer, of course, is the empty string. So we'll rely on that: A1 will be empty.

fit.ActionFixture

Comments

start

Spreadsheet

check

a1

Verify that cell starts empty.

Then we can verify those rules about working with the empty string:

SpreadsheetFormula

a1

b1

c1

c1()

Comment

blank

blank

=A1'

blank

e'=e

blank

blank

=A1+A1

blank

e+e=e

blank

blank

=A1>A1

blank

e>e=e

blank

abc

=A1+B1

abc

e+x=x

blank

abc

=B1+A1

abc

x+e=x

blank

abc

=A1>B1

blank

e>x=e

blank

abc

=B1>A1

blank

x>e=e

Valid Addresses

There are two places we use addresses: in the address field and in the cells with formulas. When we get a "real" (graphical) interface, the address will mostly be implicit. But even so, we'll test it here just to be safe.

Let's introduce a new fixture, Address. It will be a ColumnFixture: we'll put address in one column, valid() in another, and standardized() in another. (A programmer will have to write the new fixture for us.)

The rules are: a valid address is a letter (A-Z, a-z) followed by one or more digits (0-9). Case is ignored. Leading 0s are ignored. "0" is not a valid row number.

Address

address

valid()

standardized()

A1

true

A1

a1

true

A1

A9874

true

A9874

Z1

true

Z1

z1

true

Z1

Z3992

true

Z3992

z3992

true

Z3992

AA393

false

zX202

false

é17

false

1

false

~1

false

~D1

false

y&1

false

^

false

X392%

false

H001

true

H1

j00010

true

J10

e000

false

A0

false

z0

false

Let's make sure that case-insensitivity works in formulas:

SpreadsheetFormula

a1

b1

b1()

abc

=A1+a1

abcabc

Formula Errors

If a formula contains an error, we'd like it to display as "#error." We'll put all the invalid names from the previous table into formulas, and verify that formulas behave correctly. Then we'll try various improper combinations of operators.

fit.ActionFixture

start

Spreadsheet

Create a new spreadsheet.

enter

a1

=AA393

Bad address

check

a1

#error

Marked as error

check

formula

=AA393

Formula as written

enter

a1

=A2

Change to valid address

check

a1

Make sure #error is cleared

SpreadsheetFormula

a1

a1()

Comment

=zX202

#error

Two letters

=é17

#error

Non-ASCII

=1

#error

No letters

=~1

#error

No letters

=~D1

#error

Unacceptable character

=y&1

#error

Extra character

=^

#error

No letters/digits

=e000

#error

Too many digits

=A0

#error

Invalid row #

=z0

#error

Invalid row #

=

#error

Missing formula

Then we'll get to some operators:

SpreadsheetFormula

a1

a1()

Comment

='A2

#error

' should be postfix

='A2'

#error

Can't be before and after

=A2+

#error

Need other term

=A3+A4+

#error

Need other term

=A2++A3

#error

Missing term

=A2+'+A3

#error

' isn't a term

=A2'''+A3

blank

OK to mix things

=A2)

#error

Missing (

=(A2

#error

Missing (

=((((((((((((A2))))))))))))

blank

OK – big expression

=((((((A2+(A3))))+A4)

#error

Unbalanced – too few )

=(((A2>A3

#error

Unbalanced – too few )

=(A2>A3)))

#error

Unbalanced – too many )

=A2>A3>

#error

Can't trail >

=A2>A3>A4

#error

Can't repeat >

Loops

If a formula uses itself (directly or indirectly), we don't want it to loop forever trying to figure it out. Instead, we'd like the display to be "#loop."

SpreadsheetFormula

a1

b1

c1

d1

e1

a1()

e1()

=A1

blank

blank

blank

blank

#loop

blank

=B1

=C1

=F1+D1

=E1

no-loop

no-loop

no-loop

=B1

=C1

=F1+D1

=E1

=A1

#loop

#loop

Conclusions

This paper has demonstrated a set of tests using the fit acceptance testing framework. Some things to note:

The tests here have been written as if a customer specified them, without much demonstration of the programming cycle. But programmers can work with these tests in much the way they would with JUnit.

The tests are written without benefit of the feedback of a working system. (I wrote just enough code to make the first test not throw an exception.) When I went back to implement the system, I found a number of bugs in the tests.

The tests look at only part of the system: the core functionality. There are other aspects of a real application that we aren't testing. (For example, it may be non-trivial to connect a screen to the core code.)

Even a small application such as this requires a fairly large set of tests. With more programming work on the fixtures, we might be able to reduce some of the noise. Real applications will organize tests into multiple files, and will have to pay more attention to the challenges of consistency, test independence, and feature interaction.

I've heard that many teams use xUnit for unit testing, but still struggle to get customer tests before or even after stories are implemented. I hope frameworks such as fit can help lower the barriers to doing this crucial task.

fit.Summary

Resources and Related Articles

[Written April 20, 2003; revised April 26, 2003, to correct mis-stated identity & in response to Ward Cunningham's great suggestions about improving the fixtures. Revised May 1, 2003 to fix some test problems. 2012 – the WordPress version is designed to simulate the original look.]

Background

Suppose we're testing a library search system. The library has a database of book descriptions, and we give it queries. Each query produces a result set of the matching items.

When we consider this system, we might think of the database as a collection of records, and think of the query as a string. The query string represents a boolean expression, so we want to test its logical nature. Finally, the query string conforms to a grammar, so it might be well- or ill-formed.

Test Strategy

To get ideas on testing, we might go to a site like http://www.testing.com, and see Brian Marick's suggestions. Summarizing:

How will we define the tests? These tests work well with a spreadsheet. We can have one table for the database contents, and another for queries.

Data

Author

Title

Year

Queries

Query

Test

Value

Comment

The semantics will be: load each line of the data table (after the first two), then run each query. Apply the test (with its optional value) to the result. The tests I have in mind are:

count – tell the number of items in the result

fails – query gets an error ("value" column unused)

contains – verifies that a particular phrase is in the result

We might find we need to extend this list later, but this is a good starting point. We won't even use "contains" in these examples.

Queries

Queries have a simple structure:

query = term
query = term op term
op = AND | OR | NOT

The precedence is left to right, so "fish or beef and steak" is interpreted "((fish or beef) and steak" (though the parentheses are not allowed). Matching should not be case sensitive; the operator names are not case sensitive either. Blanks are ignored.

The NOT operator may need a little explanation. In classical logic, not is a unary operator. In the search system, "a NOT b" is interpreted "a AND NOT b" (but the latter is not a legal expression in our query language). The NOT operator is used to eliminate unwanted terms. Thus, "extreme AND programming NOT sports" would match "Extreme Programming Explored" but not "Is Extreme Programming an Extreme Sport?"

Test Strategy Revisited

We'll start with the collection rules, and make four separate test databases: empty, one record, many records, and duplicates. For each of these, we'll try a variety of queries.

For the integer values, there are really only two places that use integers: the number of records in the collection (which we covered), and the size of the result. Results can have anywhere from zero to the number of items in the database; only 0 and "max" apply from the testing rules. But we'll certainly hit a result set of size "1" as well as some intermediate values.

The Simplest Test: An Empty Collection

What can we learn from an empty collection? It seems like that isn't even worth trying–who would have a library with only one book? So: the data set is empty. (We may not even create the spreadsheet page for it.) Let's cover the test case for a blank or non-blank query string:

Queries

Query

Test

Value

Comment

count

0

Blank query

fish

count

0

Non-blank query

Install this test and run it. That will show you the first reason to start with the simplest test: setting up the environment and running the test is non-trivial. You'd rather start with a small test set. In our example, there's a second reason it pays to start with an empty collection: we decided that a blank query returns a result with 0 items. We could have decided that it was an error. Murphy's Law tells you that if you didn't say, the programmers would have done the opposite of what you ended up wanting.

We can also test non-grammatical queries with an empty collection. It seems reasonable that error checking would happen before the query actually does its work, so it won't matter if the collection is empty or not. (And a quick chat with the programmers confirms our intuition.)

Queries

Query

Test

Value

Comment

fish AND

error

Nothing after AND

fish OR

error

Nothing after OR

fish NOT

error

Nothing after NOT

AND fish

error

Nothing before AND

OR fish

error

Nothing before OR

NOT fish

error

Nothing before NOT

fish sticks

error

No operator

We can certainly find other non-grammatical queries to try. Finally, we might toss in one legal but complicated and tricky one:

Queries

Query

Test

Value

Comment

AND AND OR OR NOT NOT NOT

count

0

Ugly but legal

One-Element Collection

We need a table with one record:

Data

Author

Title

Year

William C. Wake

Extreme Programming Explored

2001

(What else would I use?)

You might start by repeating the earlier queries; they'll get the same answer. We can check that a record is found as expected for a simple query:

Queries

Query

Test

Value

Comment

Explored

count

1

Simple query

ExPLoRed

count

1

Verify case insensitive

Brian Marick (www.testing.com) has suggested some good rules for testing booleans:

AND: try once with all conditions true, and once with each condition false and the others true

OR: try once with all conditions false, and once with each condition true and the others false

In the comments, TF/FT/etc. indicates which term is true and which is false.

Queries

Query

Test

Value

Comment

Extreme AND Explored

count

1

AND TT

Extreme AND sports

count

0

AND TF

sports ANd programming

count

0

AND FT

skiing Or skipping

count

0

OR FF

Wake OR skateboard

count

1

OR TF

dive OR wake

count

1

OR FT

2001 NOT dancing

count

1

NOT TF

dancing NOT singing

count

0

NOT FF

2001 NoT extreme

count

0

NOT TT

Note that we've checked a couple other things by having variety in the test cases: we've checked things in all three columns, and we've mixed case in both terms and operators. This can be beneficial or not. On the good side, it cuts down the number of test cases. On the bad side, it confounds two things. If "dive OR wake" fails, is it because matching is case-sensitive, or because there's a problem in "OR"?

You'll have to come to your own balance on this. If you find yourself unable to figure out what the problem is, it can be a sign to simplify your tests. This can be partially mitigated by keeping early tests simple, focused on one aspect, and let later tests build complexity.

We can explore a few other aspects of booleans. For one thing, there's a set of identity rules: "A OR A = A"; "A AND A = A"; "A NOT A = false". This shows up in our queries like this:

Queries

Query

Test

Value

Comment

Extreme AND Extreme

count

1

a AND a

Programming OR Programming

count

1

a OR a

Explored NOT Explored

count

0

a NOT a

We might want to verify that the term order doesn't matter. (We want "Programming AND Extreme" to have the same result as "Extreme AND Programming".)

Queries

Query

Test

Value

Comment

Extreme AND Programming

count

1

a AND b

Programming AND Extreme

count

1

b AND a

Finally, we'd like to verify that operator precedence is right. Remember that "a OR b AND c" should be interpreted "(a OR b) AND c" rather than "a OR (b AND c)". (Some grammars for logical operators, and most programming languages, treat AND as higher precedence than OR; we'd like to make sure that doesn't bubble out to the query language.)

A few minutes thought (or a half hour of logical derivation:) will convince you that if "(a OR b) AND c" is true, then "a OR (b AND c)" will also be true. To show that the interpretation is wrong, we'd have to find a case where the second one matches but the first one doesn't. This case exists when A matches, but B and C do not.

Queries

Query

Test

Value

Comment

Extreme OR Sports AND Agile

count

0

precedence

We're definitely hitting some subtleties now. What happens if we don't test this case? Are we doomed? What about the many other ways queries could be wrong?

You always have to draw a line somewhere. It took me about an hour to convince myself that the precedence problem could exist, to come up with an example, and to prove that the example could be logically derived. If I hadn't done this, there's a chance this bug would surface. The biggest danger is that the programmers would misunderstand the specification, and assume "normal" precedence. But as customer, I'm in the room with them talking about it. If we had just missed this subtlety, note that this is an obscure case in the grand scheme of things: I've tested one- and two-term queries more thoroughly, and those are by far the most common ones users will form.

That doesn't completely excuse the potential problem. If I were worried enough, I could get someone to write a small program that generated "all" queries with 3 or 4 terms. ("All" meaning all combinations of operators and terms present or not, not "all" possible terms.) "a op b op c op d": Each term might be present or not (2^4=16), and each operator could have one of 3 values (3*3*3), so there are 16*27=432 combinations I might look at. This is an uncomfortable number, but not unbearable. In normal use, I'd probably just pick a handful of problematic-looking ones, and trust the result.

Collection with Many Elements

We'll define a bigger collection, and repeat our queries. (They'll have slightly different result counts, but none will go from 0 to a positive number or vice versa.)

Data

Author

Title

Year

Kent Beck

Extreme Programming Explained: Embrace Change

1999

Kent Beck and Martin Fowler

Planning Extreme Programming

2000

Martin Fowler et al.

Refactoring

1999

Ron Jeffries et al.

Extreme Programming Installed

2000

William C. Wake

Extreme Programming Explored

2001

Queries

Query

Test

Value

Comment

Explored

count

4

Simple query

ExPLoRed

count

4

Verify case insensitive

Extreme AND Explored

count

1

AND TT

Extreme AND sports

count

0

AND TF

sports ANd programming

count

0

AND FT

skiing Or skipping

count

0

OR FF

Wake OR skateboard

count

1

OR TF

dive OR wake

count

1

OR FT

2001 NOT dancing

count

1

NOT TF

dancing NOT singing

count

0

NOT FF

2001 NoT extreme

count

0

NOT TT

Extreme AND Extreme

count

4

a AND a

Programming OR Programming

count

4

a OR a

Explored NOT Explored

count

0

a NOT a

Extreme AND Programming

count

4

a AND b

Programming AND Extreme

count

4

b AND a

Extreme OR Sports AND Agile

count

0

precedence

We'll also add a new query to cover the case "all records are in the result set." We had this for the one-record case, but we'd feel better seeing it on a bigger set as well.

Queries

Query

Test

Value

Comment

Programming OR Fowler

count

5

Find all

Collection with Duplicates

Verify that duplicated entries are treated the same. We decide we won't "de-dup" but rather just return what we find.

Data

Author

Title

Year

Kent Beck

Extreme Programming Explained: Embrace Change

1999

Kent Beck

Extreme Programming Explained: Embrace Change

1999

Kent Beck and Martin Fowler

Planning Extreme Programming

2000

William C. Wake

Extreme Programming Explored

2001

Martin Fowler et al.

Refactoring

1999

Ron Jeffries et al.

Extreme Programming Installed

2000

William C. Wake

Extreme Programming Explored

2001

Kent Beck

Extreme Programming Explained: Embrace Change

1999

Queries

Query

Test

Value

Comment

Extreme

count

7

Retain dups

Extreme OR Refactoring

count

8

Find all with dups

Refactoring

count

1

Find 1 with dups

Conclusion

We've created an initial set of tests for a simple query system that might show up in library software. Several things stand out:

Many tests are straightforward, but some of them bring out real issues in the specification.

Sidebar: Running the Tests

A spreadsheet provides the test creator with a comfortable, well-understood interface, and it can write easily processed files.

For the example we used, the tester and the programmer have to agree on the columns and their meanings, and the file format. When the tester is done writing a file, they can save it into a tab-delimited or bar-delimited file (whatever they agree on with the programmers). The programmers can write a simple program that will read the file and do the appropriate function. The programmers might generate code, or just have the program read and execute functions directly.

I think of the GUI (graphical user interface) as being lopped off or set aside, and the spreadsheet as providing an alternative interface. It's often best to focus on feature testing first, without worrying about the GUI until the features are right. A spreadsheet is nice for this approach: the tester still gets a GUI, just not one customized to the problem at hand.

Resources and Related Articles

An acceptance test is a test that the user defines, to tell whether the system as a whole works the way the user expects. Ideally, the acceptance tests are defined before the code that implements the feature.

Acceptance tests are run frequently in an XP project, usually daily or more often, and certainly every week. The tests give a picture of how much of the desired functionality will work. Because the tests are run so often, there is a payoff if they can be automated.

Progress on acceptance tests is one of the crucial metrics that XP projects often collect.

There are several mechanisms that can be used to run acceptance tests:

Manual Test

GUI testing tools

Code

Script

Spreadsheet

Template

Manual Test

The simplest mechanism for running a test is to just make a plan on paper, and then do the steps manually.

GUI Testing Tools

There are commercial tools that let you run a system, recording your activities and the system's responses. You run it once to establish a "golden" copy, and then automatically after that to verify that things haven't changed.

This might be useful for some parts of testing, but it's not as good as you might think.

If the interface changes, the tests are invalidated. (The tools' exact approach determines how significant an interface change will trigger this problem.)

The interface is often (usually?) not the core of the problem being solved. Thus it might be the case that interface is done toward the end of the project. All of the functionality should be tested along the way.

Code

In the code approach, you get a programmer to write the code that will run the tests you specify. This code is usually straightforward. Some teams use frameworks such as JUnit to manage and run the acceptance test code.

Script

"Script" is a simplified form of code. Programming languages typically have several ways to repeat actions, to do something if a condition is or isn't true, and so on, but most test code doesn't need these more complicated parts of programming.

So, programmers might create a scripting language. Tests are usually stylized, simple code, and the programmers will provide a way to run them. The language might be a subset of the one the programmers are using (e.g., Java), or a more "scripting" language like Perl, Ruby, or Python.

Downsides: Script languages still require a lot of attention to details, and may have unwanted complications. Even in this simple example, there are several issues that the test writer has to be careful about:

Don't forget the semicolon at the end of each line.

How do you put quotes inside quotes?

Numbers vs. strings

Benefits: It's not all bad news:

It's easy to get started (by adapting an existing language)

Gives the user direct ownership of the tests

The solution is flexible.

Providing a scripting language is often the easiest way to get a user writing tests.

Spreadsheet

When you get to their essence, many tests can be represented in a spreadsheet. This gives the user a well-understood way to create, edit, and manage tests.

Example:

Title

Author

Year

Extreme Programming Explored

Kent Beck

1999

Planning Extreme Programming

Kent Beck and Martin Fowler

2000

Query

Expected

Comment

Extreme and Programming

2

AND found

Explored

1

No booleans

Fish

0

Term not present

Explored or Planning

2

OR found

Spreadsheet formats are not hard for a program to handle. Spreadsheets almost always have a way to save files in a format where commas, tabs, or "|" bars separate the fields. Then the program just reads one row per line and does what is needed. Different types of tests might use different programs, but still work with the spreadsheet approach.

Template

As you grow a bunch of different tests, it becomes more of a pain to write a new small program for each test type. You can use a "template" approach that combines scripting and spreadsheets.

The user writes the script that would test one row, and then "templatizes" it by replacing data values with a generic name. Then one program can create the actual script by merging the template and the spreadsheet.

Example:

*{
test($Comment);
expect($Query, #Expected);
}*

In this made-up example template, the part contained in the "*{ }*" brackets is repeated once per spreadsheet row.

Conclusion

There are several options for creating and running acceptance tests. Each team will have to decide which option (or combination) will work best for it.