Notes from the life of a computational biologist

Monthly Archives: September 2007

When my mind wanders during a conference talk, I often find that short sentences summarising how I feel about work come into my head. Here’s what I scribbled in my notepad during the ComBio meeting this week:

Information relevant to me is communal, not owned by individuals
I wrote that down when thinking about how biologists interact (or not) at meetings. As a computational biologist, most of my day-to-day problems are programming and software issues. If I need information, I go straight to the Web. However, wet-lab biologists seem to get much more of their information by talking to other biologists. If you’re interested in an organism, a model system, a laboratory technique or if you just want to get your hands on a plasmid, you talk to someone who works on it. It strikes me that a lot of wet-lab group leaders claim some sort of ownership over the information that their lab generates, resulting in the “so-and-so is the world expert in system X, you should talk to him” mentality. On the other hand, the idea of schmoozing with “the Perl expert” is a tad silly.
That, at least, is my excuse for not networking much at biological conferences ;)

Bioinformaticians need to be free
We (or at least, I) are happiest when working on a range of problems. A main project and a bunch of fun, side projects with plenty of variety is the key to a happy bioinformatician. Conversely, getting bogged down for months or years on a single project, particularly one on which you work largely alone with little external input makes for a sad bioinformatician.
Much has been written about Google’s 20% time, where employees are encouraged to spend 20% of their time on projects that they think are fun, cool and interesting. I think this would be a great policy to implement for bioinformaticians, computational biologists and other researchers in academia.

On the whole, I found ComBio rather disappointing this year. It seemed much smaller than last year with far less of interest to me. There were perhaps four really great plenary talks and many of the sessions bore little relation to their titles. ComBio is supposed to be wide-ranging, but I felt that the balance between range and depth was wrong. Hopefully I can attend a more relevant bioinformatics/computational biology meeting in the near-future.
On the plus side, I got to spend a week in Sydney.Talk notes

Heading off to Sydney today for ComBio 2007. It’s the annual meeting of the Australian Society for Biochemistry and Molecular Biology and usually features a wide-ranging program including some bioinformatics, “-omics” and structural biology. I’m looking forward to a couple of talks from overseas guest Peer Bork this year. On a personal note, Sydney was home for 6 years, so I’m looking forward to spending time there.

Previous experience has shown that freely-available internet access of any kind is unlikely at ComBio (despite being held in a convention centre with great wireless facilities), so blogging may be sporadic. Furthermore, as the meeting is in the heart of the Sydney CBD, we are informed that “accordingly, lunches do not need to be provided”. No wonder tourists complain that Sydney is an expensive place. I’m beginning to wonder exactly what’s covered by the registration fee!

What’s N? It’s the fraction of time that bioinformaticians spend obtaining, formatting and getting raw data ready to use, as opposed to analysing it.

There’ll be a longer post on this topic soon. Suffice to say, I’ve spent the last month evaluating the performance of 5 predictive tools that are available on the web. To do this, a test dataset of 200 or so sequences had to be submitted to each one. Each tool generates a score for particular residues in the sequence. The final output, which is what I require to do some statistical analysis, looks something like this:

P08153 114 method 61.74 0
P08153 522 method 82.10 1

where we have a sequence UniProt accession number, a sequence position, the name of the tool used (method), a score and either 1 (a positive instance) or 0 (a negative instance).

Doesn’t look too hard, does it? Except that:

None of the web servers provide web services or APIs

None of them provide standalone software for download

Most of them don’t generate easily-parsed output (delimited plain text)

Most of them have limited batch upload and processing capabilities

The solution, as always, is to hack together several hundred lines of poorly-written Perl (using HTML::Form in my case) to send each sequence to the server, grab the HTML that comes back, parse it and write out text files in the format shown above.

That’s 3-4 weeks and 500 lines of throwaway code just to get the raw data in the right state for analysis

When I started out in bioinformatics, I used to joke that at least 50% of my time was spent just obtaining raw data and formatting files. Over the years, I’ve revised my estimate. It’s currently at around 80-90% and I’m not sure that it’s still a joke.

Why is this trend in the wrong direction? When does it become untenable? I’m starting to think that my job title should be “data munger”, not “research officer”. I wouldn’t mind if data munging was perceived as a skill in academia but when funding is results-based, it will only ever be seen as the means to an end. Which it is, of course.

The title of the post says it all, really. The news is slowly making its way through the tech blogs.

I just tried uploading a file saved as Powerpoint (ugh) from OpenOffice. First attempt – server error. Second attempt – success. On the whole, the import preserved formatting pretty well, except for some text formatting (spacing changes, tab stops vanish). Two attempts to display the online slideshow have resulted in a black window with no slides. Oh, and you can only save as zipped HTML.

Early days – let’s hope they iron out the bugs and introduce OpenOffice import/export soon.

No time for bioinformatics blogging just now.
Thought I’d share this time-lapse video with you instead (click “see video” or follow the link). I’m a sucker for time-lapse.Via BoingBoing | original sourceSee video. . .