New Paper: Improving spreadsheet test practices

My first solo paper (without my PhD supervisor) ‘Improving spreadsheet test practices’ will be published at CASCON ’13 in Toronto. This paper explores the idea of applying testing to spreadsheets.

Core idea

The core observation I did for this paper is the fact that spreadsheet users do exhibit some basic testing. Many spreadsheet creator use formulas like IF(A1=100,”OK”,”Results should sum up to 100″) to validate input and calculations. If you think about such a formula, you can clearly see it is a different beast than a formula like SUM(A1:A6). This last one is a formula to calculate, the first one is a formula to validate.

Distinguishing between formula types

The idea of this paper, and the corresponding new tool Expector, is to scan a spreadsheet and decide on each formula whether it is a test formula. We define a test formula with the following.

A test formula has the operation IF as root function

A test formula contains at least one branch that can result in a string

A test formula’s result is not used in other formulas, unless these formulas are test formulas too

Do spreadsheet users test?

They do! With our method, we found that 8.8% of all spreadsheets in the EUSES corpus (a well known open spreadsheet test set) contains such test formulas.

On average, the spreadsheets with tests had a ‘coverage’ of 35%. This means the test formulas referred to 35% of the other formulas, on average.

So what can be improved?

First of all, the coverage can be improved. From software engineering, we know that a higher coverage tends to improve the chance an error is spotted. This is what our tool Expector helps users with. Expector is an Excel add-on which visualized to the user what cells are currently tested, as follows: