at OpenHelix

Software testing in bioinformatics

This post at Bioinformatics Zen (Why data testing is important in computational research) got me thinking about the software testing I have done in the past for various databases. I don’t actually write code but I have worked closely with programmers in various situations. Bringing the knowledge of the biology to the software development team has been really fun in some cases–trying to explain why the data should or shouldn’t be represented a certain way challenges your own understanding of the data.

I’m not going to share all of my secrets of the software testing wizardry I have done (I think you should hire me to test it is one of my favorite things to do), but here’s one that I have used more than once:

giant lists of stuff are great for finding odd characters and constructions that break software. One of my favorite sources is here: MGI Data and Statistical Reports. The Mouse Genome Informatics team has been building databases for decades and has a wide range of data types available.

Let’s say, for example, your database has genes in it. Not that hard to imagine in bioinformatics tools, of course. But what happens when you search for the nonagouti gene? What is the symbol for nonagouti? a. Just a. Sounds simple. But it can be remarkably hard to find!

That’s just one example of the things I think about when testing software. I also have had to be sure that superscripts can be represented correctly (those knockout mouse strains have rather tricky official nomenclature). Check out the phenotypic alleles list. Can your software deal with the dashes, the slashes, the parenthesis, the superscripts, and the length of those terms these?

Another favorite trick of mine is to use the huge lists as input to try to break the software. Or huge genes. Or huge exons. Or teeny ones. I keep a collection of biological oddities in my back pocket for testing situations. I figure if the software can handle the extreme cases, it can probably handle the average stuff as well. But handling the average situation does not mean it can handle the uncommon things.