You still have to know how the tools work

(A little indulgence: ruminating an aspect of the development of science since the late 1980s/early 1990s.)

Steve Caplan has contrasted experimental biology of his Ph.D. student days and the present-day kit-driven science, comparing his early 1990’s efforts to manually labour over a relatively small number of techniques, with today’s students using a much wider range of techniques via kits. He first frets this might result in them not understanding what the kits really d0, but then suggests this might lead to:

‘[…] more time thinking, more time reading, more time figuring out which new assays will be applicable to the research; how to best spend the money to get ‘the most research for the buck.’ There will be less work at the bench, and more thought given to which kits to order and what work to outsource and to whom.’

He concludes that this might be OK, provided students (and researchers!) ‘understand the technical concepts of the science that they carry out.’

When I first read his article, it struck me that a similar thing to the development of kits had happened to computational biology over the same time frame.

A commenter, Boel, has beaten me to raising the essence of the point over there but allow me the luxury of putting my own more extended take on this, explaining the parallel I was seeing and adding a few further thoughts.

Today students and researchers can punch in an URL–or pull up the bookmark–to bring up a (hopefully!) relevant web service, drop their sequence into the textbox, then push the button and just read the results – all with little thought to the conceptual[1] nature of algorithms that are analysing their data, should they chose not to.

Once web service tools didn’t exist. It wasn’t that long ago either. I’ve seen that transition hands-on, implementing a few web services myself.

During my Ph.D. studies you had to locate the software–sometimes a small mission in itself in the late 1980s–then transfer a copy over to your machine, install it, read the notes that came with the software and the paper describing the method, then experiment with the parameters they described to get what you wanted. Remember this wasn’t just for computational biologists, but anyone who wanted to analyse their data.[2]

It struck me that perhaps the change mediated via web-interfaced bioinformatics servers is similar to what experimental kits have done for experimental biology. Both are taking much of the mechanics of doing the work off the hands of the researcher.

Steve goes on to extend this to out-sourcing, using DNA sequencing as his example. Out-sourcing computational biology analysis to people like me[3] might be an obvious parallel.

It’s not difficult to argue that there is a parallel set of good and bad points to these computational web services as to the experimental kits Steve talks about. Both can let you ‘get away’ with knowing the underlying details, should you choose to.

The argument over there is that the real science lies in the decisions made (what methods to use, etc.) and that these ought to be informed by a knowledge of what the methods achieve.

Even though you don’t have to put as much effort into making the things work using services, you still have to understand what the things are doing to your data.

The same applies to all the other tools: the various machinery and instrumentation used.

You need to know at least at a conceptual level what the things are doing.

I would add that this implies a need for good documentation. I personally don’t consider software projects done until the documentation is done. For data analysis methods it has to be more than the mechanics of to run the things, but what they do with the data – the algorithm (in conceptual terms), the parameters, and so on. All fairly obvious to those delivering the tools, but it must be exercised by those using them, too.

I would add, also, that there is a balancing act here. How to best spend the money and time. Time is money, in many ways. If a particular task involves considerable background knowledge, is it really best to spend valuable time learning the background in order to decide what method to choose or how to perform them, or should that be out-sourced and let a specialist take over?[4]

Steve refers to people working in a wider range of techniques, sometimes going (a little) outside their comfort zone. There is a point, I think, at which the thing gets too ‘wide’ – where part of the decision-making is when to locate a collaborator or service. A standard research decision, nothing new there. But is the ease of the tools encouraging people to push wider than they ought to?

Your thoughts are welcome. Right now I’m thinking that people should invest in talking to specialists in the planning stages to check that the plan is one that they can realistically cover themselves, or even if they are taking the right approach at all, to ensure that ‘gotchas’ don’t catch them out.[5]

Footnotes

[1] I’m not suggesting they need to know the finer points of how they’re implemented, just how the conceptually work.

[2] As many of my readers will know, it’s still somewhat like that behind the scenes for those that make the services, and for those working at the cutting edge – most web services are of the more established methods.

[3] I’m a freelance computational biologist, working as a consultant. I have to admit I prefer to be more deeply involved with the project, given a choice.

[4] A problem I run into sometime is biologists who have determined what they consider the appropriate data analysis might be. Sometimes it’s fine, but I find myself asking that they explain the biological problem that they wish to have addressed, so that I might see that the method would in fact give them what they wish, or if there are better approaches that what they have suggested.

[5] I’ve written before that an impression I get of (many) grant applications is that they are written such that ‘the project will ‘hire someone with appropriate expertise when the time comes’, which assumes that the data analysis portion of the plan is sound.’ A related issue is that this can backfire, with a ‘rescue effort’ needed.

Yes! This is why I’m doing a systems biology masters, and why I’m going to hopefully teach myself more about stats this summer. I want to understand vaguely what I’m doing and be able to justify my choices for data analysis… er, whenever I get to the point of having data.

In addition to stats Iâ€™d keep an eye on biophysics, if itâ€™s interesting to you. I keep meaning to write about it: itâ€™s a long story but Iâ€™d still like to see it be the next â€˜big thingâ€™ for systems biology.

Perhaps I should have put it up at the top, but I wanted to extend your thoughts with some inter-related thoughts of my own (recapping a little):

Previously research was perhaps* somewhat self-restricting, in that you didn’t usually get to explore too far from â€˜your patchâ€™ if nothing else because of time constraints. Itâ€™s still true, of course (we’re all far too busy), but tools make it easier to dabble with the more established techniques of adjacent fields.

To me this means that research plans need more scrutiny from specialists in the areas outside of the researcherâ€™s immediate speciality, before the research gets underway (before applying for the grants, really).

This, in turn, suggests that a closer eye to when to turn to outsourcing or collaboration is needed. The tools encourage people to just do it themselves, but theres that wonderful catch-22 that a non-specialist will likely be unaware of â€˜gotchasâ€™ or missing opportunities. This, in turn, relates to if itâ€™s actually a good investment of a researchers time in learning enough to make your this has been covered, which brings us back to needing to talk with specialistsâ€¦! 🙂

(* For most of us; some exceptional people did all sorts of thingsâ€¦)

[…] bioinformatics analysis and presume the results to be meaningful. The issues, of course, is thatÂ you still have to know how the tools work and thence how to relate the results to the biology you are […]

Code for Life is the blog of Dr Grant Jacobs who has wide-ranging interests in science-related subjects, especially genetics, bioinformatics and science communication. To learn more about Code for Life (topics, copyright, comments, writing), see the introductory page Twitter: @BioinfoTools

Sciblogs Archive

Sciblogs is the biggest blog network of scientists in New Zealand, an online forum for discussion of everything from clinical health to climate change. Our Scibloggers are either practising scientists or have been writing on science-related issues for some time. They welcome your feedback!

Sciblogs was created by the Science Media Centre and is independently funded