Sunday, June 15, 2008

Connection between Video Games and Bioinformatics?

The Scientist Magazine has a nice piece on one of my favorite people in all of Science - Sean Eddy. In the article, they discuss how Sean is one of those bioinformatics folks who does not just hack together some code to do something but actually writes really good code for his programs. For those of you who do not know, Sean has made a whole collection of software tools for biologists (see his web site here). Perhaps the most widely used is HMMER, which is designed for making and using hidden markov models. But there are some other good ones he has put out. My favorite is Forester, which was made by Christian Zmasek in his lab and is supposed to be available here, although the site is not working right now (NOTE - Christian has posted a new link for it in the comments). I like this because, well, it is software for "phylogenomic" analysis.

Anyway - it is a nice article about Sean, especially the parts talking about how his background in video games contributed to his success in bioinformatics. Back to something I said above, Sean is without a doubt one of my favorite people in science. There are many reasons for this but here are a few.

He is very open with ideas.

Once, at a conference, I gave a talk on this bizarre new pattern we had found when we were comparing the genomes of E. coli and V. cholerae. We had found that when we did genome-level alignments of these species there was an X-like pattern (see our paper on this here). Anyway, in the talk I said something to the effect of "we have no friggin idea how these X-like alignments could be generated" And Sean, I think in the quesiton session, pointed out that in another paper of ours we had seen what appeared to be symmetric inversions occurring around the origin of replication and that could create the X-alignment. And lo and behold he was right. We got the paper, but in a large part it was his push that got us looking at the inversions sooner than we would have.

He is very open with science.

Most of Sean's work is on the open side of science. Open Source software. Open Access publications. Open everything. And I should point out that it was a talk by Sean that catalyzed my conversion into an Open Science supporter. I was attending a meeting in Ft. Lauderdale to discuss data release policies for genome projects. This meeting led to the "Ft Lauderdale Agreement" on data release, by the way. A the meeting there were many genomics players like Eric Lander and Francis Collins who were trying to push for not completely open data release policies where genome centers could release data but there would be constraints placed on the use of the data so that the genome centers would be the first to be able to publish genome scale analysis of an organisms genome sequence.

At the time I was working at TIGR and I supported this notion of basically letting people search for a few genes of interest but preventing them from doing genome analyses. And then Sean got up and gave a talk and, well, blew my mind. I am sure I have notes somewhere from the meeting but basically what he said was - the genome projects whole point is to generate genome data for people to do genome-level analysis. So how on earth can we justify preventing exactly the type of analysis that the projects were designed to generate. He was not saying that we should not somehow protect the genome centers. What he was saying was that for the benefit of science, we need to find a way to allow people to do genome-level analyses immediately on the data. And he also said that the risks of releasing ones data with no restrictions are much less than everyone claims. I think he convinced many people that genome centers needed to open up their data release policies a bit more. And he convinced me.

And so I went home from that meeting and decided to release the data from as many of my genome projects as I could, with NO restrictions (e.g., this is what we did with Tetrahymena). And also, this new found belief in openness helped pave the way for my conversion to being an Open Access publishing supporter.

Anyway, glad to see Sean getting positive press. It is well deserved. Now off to play some video games.

As a rather minor player in the "Empire" circles as an undergraduate, I remember my surprise when I later discovered an "Empire" reference on Sean Eddy's web site and made the realization that the Sean Eddy of HMMs and of "Empire" were one and the same.

Last Wednesday I had an exam in a bioinformatics course. Some of the syllabus in that course was some nice two-page primers on stuff like Bayesian statistics and dynamic programming. The primers was published in 2004 in Nature Biotechnology. Too bad there were only a few of them, because they were great for getting some insight into the methods they described. Today I imported those primers into Papers and then I saw that he has done a lot of other interesting stuff also.