Archive for the ‘Community’ Category

Our apologies for the flood of re-posts that some of you may have seen over the weekend: apparently, adding a category to a post, or changing its existing category, makes some blog readers believe the whole post is new. We’re sorry for any confusion or inconvenience the clutter may have caused.

I mentioned yesterday that I maintain a list of books that haven’t been written yet. Partly it’s an exercise in sympathetic magic—if the reviews exist, maybe the books will follow—but it’s also useful for organizing my thoughts about what a programmer’s education should look like. Looking at the books I’ve matched to various topics in the Software Carpentry course outline, there are some distressing gaps:

Given that programmers spend upwards of 40% of their time debugging, there are very few books about it, and only one collection of exercises (Barr’s Find the Bug).

Material on systems programming—manipulating files and directories, running sub-processes, etc.—is equally scattered. The Art of Unix Programming includes all the right topics, but covers too much, in too much detail, at too low a level. Gift & Jones’ Python for Unix and Linux System Administration has the same two faults (from Software Carpentry’s point of view—I think both are excellent books in general), but uses a scripting language for examples, so it made the list.

Mark Guzdial and others have done excellent research showing the benefits of teaching programming using multimedia, i.e., showing students how to manipulate images, sound, and video as a way of explaining loops and conditionals. That’s half of why the revised course outline includes image processing early on (the other halves being “it’s fun” and “it’s useful”). Once again, most of what I’m familiar with is either documentation for specific libraries, or textbooks on the theory of computer vision, but there are some promising titles in the MATLAB world that I need to explore further.

Performance. It’s been 15 years since I first grumbled about this, and the situation hasn’t improved. Most books on computer systems performance are really textbooks on queueing theory; of that family, Jain’s Art of Computer Systems Performance Analysis is still head and shoulders above the crowd. Souders’ High Performance Web Sites is the closest modern equivalent I’ve found to Bentley’s classic Writing Efficient Programs, but neither is really appropriate for scientists, who need to think about disk I/O (biologists and their databases), pipelining and caching (climatologists with their differential equations), and garbage collection (everybody using a VM-based language). I had hoped that High Performance Python would fill this gap, but it seems to have been delayed indefinitely. (And yes, I’ve looked at Writing Efficient Ruby Code; it has some of what our students want, but not nearly enough.)

There are lots of books about data modeling, but all the ones I know focus exclusively on either the relational approach or object-oriented design, with a smattering that talk about XML, RDF, and so on. I haven’t yet found something that compares and contrasts the three approaches; pointers would be welcome.

Web programming. There are (literally) thousands of books on the subject, but that’s the problem: almost all treatments are book-length, and this course only has room for one or two lectures. It is possible to build a simple web service in that time, but only by (a) using a cookbook approach, rather than teaching students how things actually work, and (b) ignoring security issues completely. I’m not comfortable with the first, and flat-out refuse to do the second: if this course shows people how to write a simple CGI script that’s vulnerable to SQL injection and cross-site scripting, then it’s our fault when the students’ machines are hacked. This gap is as much in the available libraries as in the books, but that doesn’t make it any less pressing.

Given these gaps, I may drop one or two topics (such as performance and web programming) and either swap in one of the discarded topics or spend more time on some of the core material. I’m hoping neither will be necessary; as I said above, pointers to books in any language that are at the right level, and cover the right areas, would be very welcome.

Most research effort does not produce what is thought of as a traditionally publishable result. That doesn’t mean, however, that nothing was gained by conducting the research. These results, whether they are failures or merely perplexing, can provide valuable insights into open problems and prevent other researchers from duplicating work. We started a journal that focuses on serendipitous (I have no idea why this worked) and unexpected (it seems like this technique should work on this problem but it doesn’t) results. The goal of the journal is to provide a venue where ideas can flow and be debated.

The Journal of Serendipitous and Unexpected Results (JSUR) is an open-access forum for researchers seeking to further scientific discovery by sharing surprising or unexpected results. These results should provide guidance toward the verification (or negation) of extant hypotheses. JSUR has two branches, one focusing on Computational Sciences and the other on the Life Sciences. JSUR submissions include, but are not limited to, short communications of recent research results, full-length papers, review articles, and opinion pieces.

Recently, we launched the beta version of the journal site at http://jsur.org. We would love to get your feedback and even better, a submission for the first issue.

To get the journal started, we’re looking to collect a large number of short (2-4 page) reports. I know you have something to publish. Please help us spread the word and forward this information to interested colleagues.

Mutation Sensitivity Testing, by Daniel Hook (Engineering Seismology Group Solutions) and Diane Kelly (Royal Military College of Canada)

Automated Software Testing for MATLAB, by Steve Eddins (The MathWorks)

The libflame Library for Dense Matrix Computations, by Field G. Van Zee, Ernie Chan, and Robert A. van de Geijn (University of Texas at Austin), and Enrique S. Quintana-Ortí and Gregorio Quintana-Ortí (Universidad Jaime I de Castellón)

Engineering the Software for Understanding Climate Change, by Steve Easterbrook (University of Toronto) and Timothy Johns (Hadley Centre for Climate Prediction and Research)

“Publish or perish” is the central credo of academic life: despite all the hoopla about the blogosphere and online what-not, the reality for most of us is that if our work doesn’t get into a respected journal or conference, it doesn’t count.

But what do you do if there isn’t a home for your kind of work? People working in scientific computing have been struggling with this for at least a quarter century: while there are many places to submit the results of programs, there are very few places where you can publish a description of the program itself, even if building it took years and required one intellectual breakthrough after another. In contrast, if you design a new telescope, there are at least half a dozen places you could turn.

(This isn’t just a problem in scientific computing, by the way: Software: Practice & Experience and The Journal of Systems & Software are the only academic venues I know for descriptions of real systems, which may be one of the reasons why so much of the software written in academia is crap—there’s just no payoff for doing it right.)

I don’t know if this situation is going to change, but one hopeful sign is a new journal called Geoscientific Model Development (which I found via Jon Pipitone). It’s still early days, but I hope that giving people some kind of credit for talking about how they do things will encourage them to do those things better, and allow newcomers (like us) to get up to speed more quickly.