End-User Software Engineering - Empirical Results to Date

We have conducted about 2 dozen empirical studies, some formative to inform our end-user
software engineering design work and some summative to evaluate it.
Here is a brief synopsis of the more interesting ones.

WYSIWYT testing study, led by Karen Rothermel, Fall 99.
This was the one written up in
our ICSE
2000 paper.

Time: Fall 2000, end-user
participants, led by Miguel Arredondo-Castro. The purpose was to compare end users' comprehension
and performance using a graphical approach (termed Model SNF) vs. using the traditional textual approach of earlier and fby. SNF was much better. The tie to end-user software engineering is the role of time-based patterns and comprehension in helping with the "oracle" problem. Written up in this JVLC paper: "End-User Programming of Time as an 'Ordinary' Dimension in Grid-Oriented Visual Programming Languages,"

WYWISYT testing given recursion: Winter
2001 study on testing recursive spreadsheets, led by Bing Ren and Andy Ko,
this study's purpose was to inform our design about which of the 2 possible
alternatives we should choose. We decided to choose the 'Copy Representative'
approach as a result of this study. Here is the paper (appeared in IEEE VL/HCC'01): "Visually Testing Recursive Programs in Spreadsheet Languages."

Time and the oracle problem: Winter
2001: end-user participants, again led by Miguel Arredondo-Castro. The purpose was to
investigate if participants using the temporal window could perform better
as oracles than participants that did not use the temporal window. The answer was yes.

Assertions: Spring
2001 Assertions Think Aloud Study led by Christine Wallace. This used end users
as subjects. One group had assertions, the other group did not (a total
of 10 subject). This study was used to determine if end users could
understand and use assertions. The findings said that end users could
use and understand assertions. There were some additional interesting
findings dealing with how people want to do testing. Written up in a short paper (at IEEE VL/HCC 02).

Assertions: Winter
2002 -- Grid Assertions, led by Laura Beckwith. Think aloud study with
5 subjects. Was used to help design how guards on grids should be handled.
The experiment was conducted on paper. The subsequent paper
is in the proceedings of IEEE HCC2002.

Help-Me-Test: Winter
2002, led by Prashant Shah. HMT = "Help Me Test". Think aloud
study with 12 subjects (6 with HMT and 6 without) doing a maintenance task.
The HMT subjects were much better at the task. Another thing we learned
was that people do as much testing as they can first with their own values,
and when the going gets tough turn to HMT. Once they do turn to HMT, they
seem to really like it, and they turn to it often.

Assertions: Spring
2002 Assertions Study, led by Omkar Pendse. Statistical study, intended
as a follow-up to the think-aloud version done by Chris Wallace (see Spring 2001
assertions think-aloud). One group had assertions (pre-planted by us computer
scientists), and one group had no assertions. Both had WYSIWYT. The task
was debugging. The assertions subjects were significantly better at fixing
the bugs. The paper containing this study appeared in ICSE 2003: "End-User Software Engineering with Assertions in the Spreadsheet Paradigm".

Surprise-Explain-Reward + Assertions: Summer
2002 study, by Ledah Casburn, Aaron Wilson, Orion
Granatir, Laura Beckwith. The purpose was to build upon
Omkar's by asking "will people actually put any assertions in?", given
our Surprise-Explain-Reward strategy. The task
was debugging. Pretty much replicates Omkar's experiment. The answer was "yes". Written up in our description of Surprise-Explain-Reward that appeared at CHI'03: "Harnessing Curiosity to Increase Correctness in End-User Programming".

Fault localization: Fall
2002, performed
by Joey Ruthruff, Rogan Creswick, Shreenivasarao Prabhakararao, Marc Fisher,
and Martin Main. The purpose of the experiment was to formatively evaluate
three fault
localization techniques: A "Blocking" Technique, developed by Dusty
Reichwein, a "Test-Count" Technique, developed by Marc Fisher as a simpler
substitute for Dusty's technique that fits in well with our test-reuse capabilities,
and a "Nearest-Consumers" Technique, developed by Joey Ruthruff, Margaret Burnett
and Rogan Creswick as an inexpensive technique that tries to mimic the blocking
capabilities of Dusty's technique. We used the transcripts taken from the
Spring
2001 Help-Me-Test experiment as end user test suites to evaluate the
effectiveness and robustness of each technique. The results appeared at
ACM SoftViz (2003): "End-User Software Visualizations for Fault Localization".

Fault localization: Winter
2003. Performed by Shrinu Prabhakararao,Joey Ruthruff,
Orion Granatir, Rogan Creswick, Martin Main, and Mike Durham, along
with Margaret Burnett and Curtis Cook. This was a think-aloud study, to investigate
how visual fault localization techniques affect and interact with the debugging
efforts of end-user programmers. The subjects treated fault localization techniques
as a resource to be called upon when they had exhausted their own debugging
abilities. It often helped when they turned to fault localization. One key way
the fault localization technique helped was to lead them into a suitable strategy. The paper about this study appeared in IEEE VL/HCC'03: "Strategies and Behaviors of End-User Programmers with Interactive Fault Localization".

Surprise-Explain-Reward + attention:
Summer 2003. Performed by Shrinu Prabhakararao,
T.J. Robertson, Joey Ruthruff, Laura Beckwith and Amit Phalgune along with
Margaret Burnett and Curtis Cook. This was a controlled experiment, to investigate
the Impact of two interruption styles on end-user debugging. We found several
reasons to use negotiated-style interruptions for informing the user about
the "surprise" and no reason to use immediate-style interruptions. This study was written up in our CHI'04 paper.