Teaching basic lab skillsfor research computing

Comments on Course Reorganization

I'm grateful to Lorin Hochstein for sending detailed feedback on my proposal to reorganize the course. His comments are below, with my replies and his counter-replies interspersed; more comments would be very welcome.

Content I think you could drop if you wanted to save time:

Read Data Directly From Hardware. I suspect that this would be relevant to only a small minority of your audience. Especially if you're teaching the course mostly in Python, because this is the sort of thing you should really do in C.

Greg: Agreed; it's mostly to motivate a discussion of binary data handling, which I guess isn't that important to most people either.

Vectorization: I think you could drop this, especially since you have the general "Make a Program Go Faster" section. (Then again, I don't know that much about vectorization...).

Greg: Would a title change make it clearer? This is where I wanted to introduce whole-array manipulations (MATLAB-style operations), which I think many scientists do care about.

Lorin: Ah, I didn't realize this was about MATLAB vectorization (I thought it was related to using an optimizing compiler to take advantage of SIMD instructions). You're right, this is worth teaching. Back when I was a grad student, I was amazed at the orders of magnitude performance improvement you can get in MATLAB by getting rid of loops and recasting your problems as linear algebra operations. There was a grad student I knew at Boston University who was amazing at turning loops into matrix multiplications.

Content-specific comments:

Clean Up This Code. Great idea for a topic. I'm not sure "cyclomatic complexity" is really that important. I vaguely recall a paper that demonstrated that all complexity metrics correlated very closely with function size, so that "size" is really the most important complexity metric there is.

Test Some Software. I was surprised to see this so late in the curriculum. One of the hardest things I've found about unit testing is writing code so that it's testable. I would have put it up earlier and used unit tests throughout the problems, which would also illustrate how to use unit tests in the different contexts (e.g., unit testing with image analysis). It would also be nice to see some SE testing concepts like category partition testing, code coverage, and fuzz testing.

Greg: I've tried that, but given most people's instinctive aversion to testing, I found that I had to move it later so that I'd built up enough credibility that they'd listen to me :-) You're right, though, I should move it earlier.

Lorin: I think that if you could do nothing else but reduce people's aversion to testing, the course would still be worth it. ;) An astounding development (to me, anyways), is how "cool" testing has become in the (agile) software engineering community, unit testing in particular. There are all sorts of testing tools and frameworks everywhere, and many TDD advocates. I don't have a clue how to transfer this interest to the scientific community, though.

Share Work With Colleagues. In the version control lecture, you note that "this lecture will use a GUI like SmartSVN so that students don't need to know how to use a shell in order to use version control." But, don't the students really need to learn how to use the shell to use many of their tools effectively? You have "Using the Unix Shell" as a topic in the course announcement, but I don't see it show up as its own topic.

Greg: I'm planning to take the shell out—while I use it all the time, and think most power users do likewise, it didn't make the cut when the number of lectures was restricted. (And it's hard to convince someone who's used to GUIs that the shell is worth learning: the payoff takes a long time to arrive...) If I cut binary data handling and/or vectorization, this is a strong candidate to go back in.

Lorin: That makes sense... It does take a long time before you're more productive in the shell than the GUI. It's a shame, though.

XML. You could probably drop XHTML safely. I don't think it's that popular in practice, and since most HTML out there is not valid XML, if they tried to use XML-based approaches to do HTML scraping, it would fail pretty quickly. (You really need something like Beautiful Soup to do HTML parsing, but I wouldn't use that to teach XML!).

Greg: Agreed.

Some of the topics I would call "paradigms", these are going to be hard to fit into a single lecture, such as:

Object-Oriented Programming. I'm torn about this. It's hard for me to imagine teaching the OOP concepts in a single lecture. I think the Liskov Substitution Principle could probably be dropped (how often does it really come up in practice?) I'm also a little fearful because inheritance tends to be overused in practice. I'd also drop the design patterns (I don't think they'll understand OO well enough to observe that at this point), and possibly even the overloading operators.

Greg: I agree that it's impossible, but everyone asks for it every time the course is taught.

Represent Information. This is a lot of concepts to squeeze into a lecture. If you were to prioritize this, I think database design (and ERD) are more importance in practice than some of the UML stuff. RDF can be safely dropped.

Greg: I was going to use Tkinter—yes, it's broken, but if the main goal is to teach event-driven programming, it'll get the idea across without students having to install anything else.

Lorin: Yeah, that sounds reasonable. Tkinter is nice and simple, and it's a great example of the application of first-class functions. It's too bad Python doesn't come with a drag-and-drop GUI builder. When you're starting out with GUI building, it's hard to see the advantage of programmatically defining a GUI layout.

Other comments:

Maybe have some content about online resources: where to go to ask a question when you try to apply these and get stuck. StackOverflow, IRC channels, "How to ask questions the smart way", pastebin.com/pastie.com, showmedo.com, etc. (This really wouldn't be a full lecture, maybe just a web page on this?)

Personally, I'm bored to tears sitting in a lecture when there's source code in the slides. I think your ultimate idea of having a self-paced web-based course is a good one. There's lots of reference material out there on these concepts, but finding worked out examples is rarer. I think the biggest challenge for someone trying these things will be when their personal problem diverges for the example problem in some way and they don't know how to proceed.

Final question: Have you followed up on previous SC students to see what techniques/practices they adopt after attending the course?

Greg: I did once, but can't use the data (long story); I'll be following up with the students from this past July at Christmas to see what's stuck and what hasn't. Wish I'd been more systematic in the past, but 20/20 hindsight...