We have spent a lot of time thinking about how to assess the impact that Software Carpentry is having.
We've done
somesmallstudies
and collected a few testimonials,
but it's been small potatoes compared to the 5000 people we taught last year alone.

After some back and forth with a colleague whose work I have admired for years,
though,
I realize that I've been trying to do this the wrong way.
My training as an engineer taught me that
only controlled, quantitative experiments were "real" science—that
as Ernest Rutherford said,
it's either physics or stamp collecting.
I now understand that there are other rigorous ways to generate actionable insights,
some of which are better suited to our needs than something like randomized control trials.
More than that,
I finally understand what one of my first teachers told me:

Teaching only works well when the teacher is also learning.

Let's start with assessment.
I used to think that we needed something like a medical trial:
give some people training,
don't give it to others,
then measure how much science the treatment group does compared to the control group.
The problems with that are:

we don't know how to measure the productivity of scientists, and

we don't want them to do the old thing faster:
we want them to do new things.

There is value in measuring learning outcomes,
but we already know the answer:
some people are learning some of what we teach,
and some of them are using it in ways that change how they do science.
Quantifying the first is possible, but largely pointless:
what we really want to know is not
"how much Git or SQL do participants remember a month later?"
but
"how does knowing about version control and structured data change how scientists think?"
Randomised trials aren't going to show this:
evaluating the adoption of practices by scientists and scientific teams requires a rigorous qualitative investigation.

An example of this is Marian Petre's award-winning paper
"UML in Practice".
UML (the Unified Modeling Language) is a graphical notation for describing software.
It was created in the mid-1990s by amalgamating three earlier design notations,
and while it has been widely adopted in undergraduate teaching,
there has been much less uptake in industry.
As part of her long-running research into how people actually build software,
Petre interviewed 50 professional software developers over two years
to find out which bits of UML they were using, how, and most importantly, why.

The results are fascinating (in the usual academic sense of that word).
Thirty-five of her interviewees don't use UML at all,
but their reasons for not using it are varied.
Those who do use it only use parts,
with varying degrees of formality,
and in idiosyncratic ways.
Questionnaires and performance metrics wouldn't have revealed this,
but it's exactly what we need to know if we want to understand adoption into practice,
knowledge exchange in a new community,
and everything else that actually matters.

I quoted Ernest Rutherford at the outset of this article,
but in checking that quote,
I discovered that I'd been getting it wrong for years.
I thought he'd said,
"All science is either physics or butterfly collecting."
Those of us indoctrinated in quantitative methods have also been indoctrinated to look down on their qualitative kin,
but the fact is,
the greatest scientific theory of all time—evolution by natural selection—didn't come out of
double-blind controlled trials.
It came out of butterfly collecting—that,
and careful thinking about why those butterflies were different from each other.

I also quoted my father at the start of this article:
"Teaching only works well when the teacher is also learning."
If whoever is at the front of the classroom isn't learning something about or from their students,
they're probably not actually teaching:
they're probably just reciting.
I've learned a lot over the last five years about what to teach
from running workshops for ecologists and marine biologists
as well as physicists and astronomers.
I hope to learn even more now that we're starting to teach social scientists,
partly because I need to know more about their methods in order to do my job well,
but also because I believe that they probably won't learn much from me
if I'm not learning something from them.

Which brings us back to assessment.
The real goal is to discover phenomena that we don't yet know to look for.
I could not have predicted that Software Carpentry's innovative teaching practices
would do as much to convince scientists to take its lessons seriously as those lessons' content;
I equally could not have predicted the importance of having people sign up for the class with their labmates,
how compelling seemingly trivial things like tab completion and the 'history' command are to novices,
or that the impedance mismatch between Microsoft Office file formats and version control systems
is probably the single biggest barrier to wider uptake of the latter in the life sciences.
Given how little funding there is for assessment,
we need to spend our dollars and hours where they are most likely to produce those game-changing insights.
And if that means stepping out of our comfort zone and learning something new,
well,
it's no more than we ask of our students every time we say,
"Good morning, and welcome to Software Carpentry."