Teaching basic lab skillsfor research computing

Links for Summer Interns

Our summer interns started today; our first job is to define exactly what they'll be working on this summer, so it seems like a good time to round up a few links on interesting topics. My apologies for those hidden behind paywalls...

If I said, "I just got a really interesting result in the lab, but I didn't record the steps I took or the settings on the machine," no reputable journal would publish my paper. If I said, "I just got a really interesting computational result," most reviewers and editors wouldn't even ask if I'd archived my code and the parameters I used, or whether that code would run on someone else's machine. Reproducible research (RR) is the idea of making computational science as trustworthy as experimental science by creating tools and working practices that will allow scientists to re-create past results.

Special issue of Computing in Science & Engineering on reproducibility

Data Provenance

The "provenance" of an object is the history of where it came from, and how it got here. The provenance of a piece of data is similar: what raw values is it derived from, and what processing was done to create it? Ideally, every piece of scientific software should track this automatically; in practice, very few do, and most scientists don't take advantage of the capability when it's there. That's changing, though, particularly as emphasis on reproducibility grows.

The Provenance Challenge: a series of competitions to benchmark provenance tools against one another.

Special issue of Concurrency and Computation: Practice & Experience reporting the results of the first challenge

Science 2.0

Also called "computer-supported collaborative science", this is the idea of leveraging modern web-based collaboration tools to better connect scientists, their experiments, and their results. It encompasses a broad range of ideas, but "social networking for scientists" based on their interests is near the core, as is "open science" (the idea of making scientific results public in the same way as open source software or Creative Commons publications).

Compared to professional software developers, most scientists use fairly primitive programming environments, in part because they've been too busy learning quantum chemistry to learn distributed version control, and in part because software developers seem to go out of their way to make tools hard to set up and learn. Lots of people have tackled this from a variety of angles. Unfortunately, a lot of work to date has focused on supercomputing, which is sort of like studying modern medicine by focusing on heart surgeons...