Notes from the life of a computational biologist

Monthly Archives: September 2006

Conferences are good in that they get you thinking about research. Today, I was dwelling on a phrase that’s been going around a lot lately: “data-driven science versus hypothesis-driven science”.Read the rest…

The comment spam has started and some of it is slipping though the excellent Akismet filter. I fail to understand the mentality of someone who believes that posting “calabash, gourd. (5) Autophytes A spinach stew. (18)” is a worthwhile use of their time, but that’s the human race for you.

Anyway, you now require a previously-approved comment for your comments to be accepted, so don’t be concerned if your comment takes some time to appear.

…but it’s not going to be relaxing. The last weekend in September means footie finals and tomorrow, it’s the Grand Final rematch between the men in red and white and the men from the west. Their last four matches, including last years final, have been decided by 11 points total. It’s going to be one of the best, hardest games that you’ll ever see.
Go the Swannies!

Sometimes, I wonder if this blog should be more focused on fewer topics. The major topic is computational biology, but I like to post about other aspects of science and sometimes, non-science stuff too.
On reflection, I believe in empowering the user. So, don’t neglect that section named “categories” on the right of the page. Only want to see the bioinformatics posts? Just bookmark the bioinformatics link. Just want to subscribe to a feed of bioinformatics posts? Just add /feed to the end. In general: wordpressURL/tag/tagname, wordpressURL/tag/tagname/feed.

A quick reminder – 5 days to send in your posts to Bio::Blogs #4. The October edition will be hosted over at Discovering Biology in a Digital World. Email relevant posts (yours or others) to bioblogs<at>gmail<dot>com. Same address if you’d like to host a future edition – November is spoken for.

There’s now an enormous number of genome projects – the Genomes Online Database lists 2 175 as of today, of which 429 are complete and published. Yet 10 years after the first completed genome, there are still no standards for storing and annotating genome data. What seems to have happened is that the major sequencing centres and databases have created their own pipelines for genome annotation. These centres are well-funded, possess large-scale computational infrastructure and are known and trusted by the community. Perhaps because of the sheer volume of data and the inability of small institutes to process it, we have come to rely on the output of large centres and assume that by and large, their data are “correct”. I thought I’d highlight a couple of examples which illustrate the danger of these assumptions.Read the rest…