Science —

Should biologists study computer science?

Science has published a pair of articles in which it's argued that biology …

As in just about every other field, computers have become an essential part of biological research. Complicated algorithms and analyses that once took months of work by specialists are now available as Web services, and whole areas of study, such as genomics, can be pursued entirely in silico. But, even though most biologists know how to plug in their data and act on the output of computational tools, precious few understand the math that's going on behind the scenes, as most bioscience degree programs don't require computer science or any math more advanced than calculus.

Two papers in the latest issue of Science argue that that's a bad thing. One focuses on the ability to represent the behavior of biological systems through algebraic notation, an area that's badly neglected in both science and math education. The second focuses generally on the incorporation of biology-specific math and computer science into the education system. Both assume that the lack of a math background is a serious problem.

In general, as someone who has done a small bit of bioinformatics and a lot of biology, I'm the perfect target audience for this argument. But in reading the papers, I came away with the sense that the authors have lumped different arguments together in a way that confuses the real issues. So what follows is my attempt to separate them out and evaluate each issue separately. The first problem arises in the paper from Pevzner and Shamir, which treats the terms computational biology and bioinformatics as two names for the same discipline. That may be how things are commonly understood but, to me at least, these are two separate endeavors.

Bioinformatics, as its name suggests, is primarily focused on the computer-aided analysis of data generated in biological systems, such as genome and gene expression array analysis. We'll get back to that later. Computational biology involves the attempt to model biological systems in silico. These models are informed by the biology, but they don't necessarily require any biological data to be fed to them in order to run.

Obviously, anyone performing computational biology better have a really good grip on both biology and math/computer science, or they won't be able to know whether the models are valid and fix them if they're not. The same really doesn't apply to bioinformatics. Since there's always real, underlying biological data there, the computation and analysis can be separated—a bioinformatician can simply turn to a biologist and have them sanity-check the results.

Fundamental, tool, or service

So, if we accept that everyone doing computational biology better know both math and biology, that's still not evidence that regular biologists need math. Most regular biologists will end up using bioinformatics tools to align DNA sequences, pick primers, etc. So do they need to know the math behind the tools? I think to answer that, you have to understand where bioinformatics sits on what I'd call the fundamental/tool/service spectrum.

For biologists, fundamentals are things like organic chemistry. All of biology ultimately depends on it, and every biologist should really know something about it—even field biologists, who will have to consider things like how diet and environmental chemicals affect the organisms they study. Bioinformatics really isn't a fundamental; knowing how certain calculations are performed won't necessarily tell you anything about biology.

In fact, it's somewhere between a tool and a service. A tool is something that an average biologist will wind up using that has some biology behind it. So, for example, it's possible to use PCR to amplify DNA samples without knowing anything about what's going into the tubes used for the reactions. But it's much better if a biologist does know; the reactions behind PCR illustrate biological principles, and are essential knowledge for troubleshooting the procedure when it goes wrong (as it inevitably does). In contrast, DNA sequencing, which used to be a tool, has become a service. You put your DNA sample in the mail, and download the sequence data from an FTP account a few days later. The precise details of the actual sequencing reaction that was performed don't really matter.

For the most part, bioinformatics software like those for sequence search and alignment are analogous to a service: the computer spits out a useful result, and you really don't care how it got there. If you can't get a decent result, your first response isn't to look for someone who knows math; it's to look for someone who's more proficient with the service, and knows how to tweak the input parameters. Knowing the math behind things might help with the tweaking or to appreciate the underlying biology, but it just as well might not—empirical experience can be more useful in many cases.

In a worst case scenario, of course, biologists can always resort to contacting someone who has training in bioinformatics, in much the same way as a biochemist might contact an immunologist if they needed to know more about that field.

That's supposed to be helpful?

If bioinformatics is a service, why isn't knowing how to use something as a service good enough? The authors simply state it is without providing an explanation. "For example, biologists sometimes use bioinformatics tools in the same way that an uninformed mathematician might use a polymerase chain reaction (PCR) kit," they write, "without knowing how PCR works and without any background in biology." Presumably, we're supposed to view that as problematic, although the authors never explain why it is.

The second paper, from Robeva and Laubenbacher, isn't brilliant about supporting its position, either. It's a sort of plea for education in algebraic modeling, which can apparently be used to represent biological systems. The authors make their argument by using a textbook case: the Lac operon, a gene regulation system that appears multiple times in a typical biologist's educational history, probably starting at AP bio in high school. In modeling terms, however, the Lac operon needs three equations to be described, one of which takes the form:

L=kL?L(Le)?G(Ge)Q - 2?M(L)B - ?LL

They point out that presenting it in Boolean terms leads to a simplified diagram that still captures the essential features of the system. Even when simplified, however, it's not obvious that the model is any more informative than the standard textbook description, which refers directly to the biology. And I'm skeptical that knowing the model would actually improve a biologists' ability to perform biological research.

This probably comes across as overly harsh—to a certain extent, the authors have a valid point: the more biologists know about the tools and services that they rely on, the better off biology as a whole will be. Informed researchers are more likely to notice anomalous results and squeeze more information out of their data by better deploying existing tools. And the authors' suggestion that we design mathematics courses that will prepare biologists to solve the problems they'll ultimately face would undoubtedly produce a more appealing math education.

But the same sorts of things can be said about biostatistics and physical chemistry, and it's rare to see either of those made a requirement for undergraduate degrees or doctoral programs. (The former would have been very useful at several points in my research career, and even more useful now.)

If the argument is going to be made that biologists should learn more math and computer science, then those advancing it need to do a better job of explaining what, precisely, biologists need to understand about the computational tools, and why simply knowing how to use the tool isn't good enough. There's also a practical issue at play; the authors argue that these additional computation courses be added to educational programs that are already loaded with required courses. That's pretty difficult to justify, especially given the other deserving topics that are already omitted from most program requirements.

In the end, the key questions are avoided in these papers: what, specifically, biologists need to learn, and how will it help them perform their primary function, namely biological research. Without that information, it's going to be impossible to actually design a course that might improve anything.

58 Reader Comments

An excellent article that makes me wistful for my undergrad days in a molecular research lab! When does PCR ever work correctly the first time? Hah!

I think Mr Timmer really pegs the essence of the fallacy described by the article. I was taught to use bioinfomatics tools to develop PCR primers. My professor spent considerable time teaching me the fundamentals of primers, and the common characteristics of a good primer. My first, in fact, I designed by hand as a thought exercise. When I did use the bioinformatics software, as Mr. Timmer describes, I had no idea the algorithms behind it. I did, however, understand the biological principles behind good primer selection, thus I was able to "sanity check" the machine's output.

I compare this to the paragraph on the lac operon. I knew the biology, but I didn't understand the mathematics behind the model. It was irrelevant, though. Had the program been broken, any competent biologist would immediately recognize bad output. If the output was good, and the primer simply a slightly less then optimal than the perfect selection, due to an imperfect algorithm, no would would care in the first place..

If my wrench breaks while removing a bolt, I do not need a degree in metallurgy to see that it is broken and go find one that works..

I'm a physicist and work with biologists quite often (actually I consider myself as a biophysicist). In my experience most biologists blindly trust their tools, all the while being totally ignorant how the tools work. This works in most cases, but it can also lead to fatal errors (if the tool doesn't work correctly, or it is used outside of specification). With tools in general I mean everything a biologists uses, from classical microsopes to modern fluorescence or raman microsocopes, chemical tools like PCR, or bioinformatic tools (especially statistical analysis). I also consider simulation a tool, be that it helps to interprete results or to predict results.I have seen some weird errors in biological works, be it from misinterpreting statistical results "because the program said it is so" or be it from misreading fluorscence microscopic images because the user did not understand how the marker worked.What I think biologists today need is a broader education in other sciences, but they will also need a higher decree of distrust to their tools. You can always ask an expert if necessary, but before you can ask, you have to realize there may be a problem.On the other hand, I see a lot of space in modern biology classes to broaden the education. Just cut some of the classical biology classes, no one needs this huge zoological and botanical classes. All the biology students I know are cursing about the on thousand flowers they have to learn ...

If the argument is going to be made that biologists should learn more math and computer science, then those advancing it need to do a better job of explaining what, precisely, biologists need to understand about the computational tools, and why simply knowing how to use the tool isn't good enough.

I agree with the Science authors on this one.

I work in a bio lab, and having knowledge of how programing works is a must for my day to day activities. To process the data we have, you need to know how to write and edit computer code for Matlab as well as some specialty software for the specific instruments we use (both Campbridge and LiCOR). You can't just call tech support and have them do it. They are only good for pointing you in the write direction.

Honestly, I can see CS being much more useful than Organic Chemistry to biologists. Biologists can treat most chemical processes as black boxes and don't really need to know the details. You can't really hand wave collecting and processing your data however.

I'm going to agree with mmnw on this one. As a Ph.D biophysicist with an undergrad degree in chemistry---I've done much of my work on the interface of biochemistry, chemistry, and spectroscopy. If you're specializing in biology (lets say molecular)---not understanding your tool set can lead to some really bad results. Do you have to be an expert in everything? No. But you need to be exposed and savvy enough to realize when you've got an anomalous result; you need to have the ability to either figure it out on your own or know who to look to for help.

Any good scientist working on interfacial projects that span multiple disciplines needs to have a clue on these areas (or a really good team around them). To paraphrase UltraDEC's comments, if my box-end wrench breaks---I need to know if I need another box-end wrench. Or if I can use an adjustable wrench instead. Or pliers. And what the heck did I do to the wrench that caused to break in the first place...

And organic chemistry is quite useful to a large population of biologists...

Looking at the issue from a more practical perspective, the better some biologists know and understand how their computational tools work, the more valuable their input can be in refining or creating new ones. Most surgical tool innovations and designs come from surgeons themselves who develop new techniques or proceedures, where practical application drives design.

Knowing the functionality and process of computational tools can also allow biologists to more easily frame the scope and process of a study, identify the most appropriate tool, and refine data collection and input methodologies.

What would a Computer Science for Biology course look like? Would it just teach a programming language? What type of computer science and math topics would it include?

Will that allow biologist to do stuff like

quote:

BuckG said:To process the data we have, you need to know how to write and edit computer code for Matlab as well as some specialty software for the specific instruments we use (both Campbridge and LiCOR).

One of the article's points is that biologists should book up on their math. I agree with this, but what that has to do with computer "science" I have no idea, because all the programmers I've ever worked with sucked at math. The college requirements--at least where I went to college--for the CompSci degree were less stringent than for any other subject in the sciences.

My field is structural biology (x-ray crystallography). When it began in the 60's the researchers built their own instruments, developed the theory, and wrote their own algorithms/computer programs. Some of the better written and less user belligerent began to be shared among other laboratories, and were eventually codified as 'best practices'. When I came through (90's) there were some very sophisicated command line tools that still required a deep knowledge of the process to use and adjust properly. Now, training graduate students in 2009, the system has (d)evolved to a point and click interface with a cookbook series of steps for data reduction, model building and refinement.

What is the result of this? The good part is that a lot more folks can focus on the hard parts of the field (protein production and crystallization) and not so much the physics/math. That means a lot more researchers can solve structures and so we as a community are seeing a lot more of structural 'space'. Just as Linnaeus was able to discern taxonomic lines by studying a variety of animals, we see distinct trends in structural evolution by sampling widely.

The downside is that a lot of deep knowledge in the field is getting retired along with the senior professors. This depth is essential if the automatic algorithms fail. A variety of problems can trip up the automated solutions. Unless we can tap (or train) this expertise that means another structure isn't solved and that bit of data lost to the community. And of course these difficult structures tend to be the most interesting ones - complexes and membrane embedded - that are the cutting edge of cell biology.

Bottom line - does everyone need to know the math/physics/computer science behind the pretty face of the GUI? No, but someone better know so the mouse and click folks have someone to ask when it doesn't work.

Great post! As Jay points out (btw it's Dr Timmer, UltraDEC) it's arguably far more important for bioscientists to have a working knowledge of statistics than computer science. It was horrifying to me in my career as a researcher (from graduate school through my postdocs) that I was often the person in the lab with the best knowledge of stats* and what tests to use when. Considering that showing statistical significance is the cornerstone of scientific research, it should be second nature to know that stuff.

*My stats abilities are meagre at best.

Anyway, this is the kind of great science writing that Ars should be proud of. Not like that tripe that Gitlin person used to write.

most bioscience degree programs don't require computer science or any math more advanced than calculus.

It is actually much worse than that. During the years I taught Calculus, I had biology faculty tell me to my face that they did not care whether their majors actually learned any Calculus. The entire point of Calculus was to act as a filter to weed out weaker pre-meds before they reached the advanced bio courses.

Originally posted by monoceros4:One of the article's points is that biologists should book up on their math. I agree with this, but what that has to do with computer "science" I have no idea, because all the programmers I've ever worked with sucked at math. The college requirements--at least where I went to college--for the CompSci degree were less stringent than for any other subject in the sciences.

Might be where you went then. My school required 6 classes under the math department and at least 2 (depending on what you took for project courses) CS courses that were all about math for an undergraduate degree. (not to mention a pair of physics courese that were nothing more than a whole bunch of calculus word problems)

There were also courses that other science majors had to take that our major barred us from getting credit in because the subject matter was deemed to be beneath us - even as electives.

As for the article, there is a bigger issue here: in what department does computational biology belong?

A number of computer science departments in the US have decided that it belongs in their department, not bio. But then you hire computational biologists and many of them feel out-of-place in a CS department, and would rather be in a bio department (this is why my department has been unable to hold on to any of the comp bio faculty we have ever hired).

This has major ramifications on how the undergraduate program is formed. Which department controls the prerequisites? What building is it housed in (e.g. which students -- CS or bio -- do comp bio students spend much of their time with)?

From my experience as a bio(physics) phd student I don't think more CS is that essential, but more of a math background certainly is. My opinion is that for a lot of bio-informatics tools having a bit of a sense of how it works is certainly a good thing, but something that I don't see being applied much/at all except in labs which are advancing the techniques and they usually have people expert in the bio-informatics. However, I think the bigger issue is with the statistical/probability/math knowledge in that a lot of the newer techniques especially with fluorescence microscopy are or can be very quantitative. This amount of quantification leads to new problems of how to interpret the data, how to analyze it, and what the appropriate tests are. Often here I think there is a lack of mathematical/quantitative training which leads to missed or bad conclusions. And as Dr JonboyG said, bad statistics is a very prevalent issue.

I'd say that learning numerical analysis should be useful for biologists. Basically anyone who's work rely on a lot of calculations should understand how those calculations work and how they can be controlled (wrt error margins and such). I'm pretty sure that anyone doing mechanical engineering had to take those classes, and it seems like bio/chem engineering is becoming more and more dependent on numerical calculations much like mech. engineering.

Computer science concepts, like algorithm analysis, are very useful when you want to make things "go faster". And other concepts like logic analysis are useful for any scientist. (It's usually learned in some way if you're a scientist. Even if you don't take a class in it.)

You could also consider the question in a different way. What more could you accomplish if you could do 10, 100 or 1000 times more calculations in the same time? Unless you understand what your calculations are doing and how the computer is calculating them you'll have a hard time adjusting your work so you do it as efficiently as possible.

monoceros4:When I started CS/CE we had most mandatory math of any (engineering) faculty, granted though that most theoretical physicists elected to take plenty extra math classes. But besides them I'd say CS had more math than any other field, closely followed by electrical engineering. So basically, I'd say it varies from school to school. ;-) (This is in Sweden though, I really don't know how the curriculum I took compares to an American university.)

Originally posted by Dr JonboyG: it's arguably far more important for bioscientists to have a working knowledge of statistics than computer science. It was horrifying to me in my career as a researcher (from graduate school through my postdocs) that I was often the person in the lab with the best knowledge of stats* and what tests to use when. Considering that showing statistical significance is the cornerstone of scientific research, it should be second nature to know that stuff.

Why must this be mutually exclusive? Why not stats and some basic computer programming?

And to add on to my original post, I did my undergraduate and masters work in chemistry, and my undergraduate degree required one semester of CS. Mind you, C++ never came in handy, but the logistics and familiarity behind it has come in handy time and time again in many facets of both the chemistry and biology I've done since.

I think it really depends on context. Some labs and research groups will have a bioinformaticist and a computational biologist on staff (or multiple numbers of each). Others may not only lack them but may not have the funding to go for the more reliable bioinformatics services out there.

There are some cases where it may be entirely necessary for a biologist to know how her or his tools work and know it well. It is probably a good idea in general (even when not necessary) to know too.

But we do have separate bioinformatics programs out there and separate computational biology programs out there too, so in a lot of cases it really isn't necessary to train the biologist in those areas when having a dedicated bioinformaticist on the team helps in general.

Of course, I could be a little biased, since I wouldn't mind being that dedicated bioinformaticist, but I think the point still stands. XD

Originally posted by WalkerWhite:As for the article, there is a bigger issue here: in what department does computational biology belong?

No one has solved this problem.

My undergrad uni has a combined "Bio-X program" which is supposed to help cross-discipline interaction involving bio topics. Really, what is needed is to promote interaction between departments (and even schools), and the logistics sort of work themselves out as professors bounce back and forth between two departments.

That being said, I think the comp bio folks should be in bio - CS seems to be more easily farmable elsewhere than the traditional science disciplines.

From the article:

quote:

There's also a practical issue at play; the authors argue that these additional computation courses be added to educational programs that are already loaded with required courses. That's pretty difficult to justify, especially given the other deserving topics that are already omitted from most program requirements.

I think a number of these topics (traditional bio subjects or CS) could be pushed off to grad school rather than the more crowded undergrad, as they're not as relevant to someone not involved with the research direction. On the other hand, mathematics probably needs more emphasis at the undergrad level.

Originally posted by Dr JonboyG: it's arguably far more important for bioscientists to have a working knowledge of statistics than computer science. It was horrifying to me in my career as a researcher (from graduate school through my postdocs) that I was often the person in the lab with the best knowledge of stats* and what tests to use when. Considering that showing statistical significance is the cornerstone of scientific research, it should be second nature to know that stuff.

Why must this be mutually exclusive? Why not stats and some basic computer programming?

Because there's only so much time in the day and there's a hell of a lot of other, more relevant things that bioscientists need to focus on first? Coming from a UK perspective, doing a biomedical degree (in my case pharmacology, but most related fields were similar) it wasn't like doing a humanities degree where you only have 8 hours lectures a week, there really wasn't much more time to fit anything else in. And if you have to pick and choose, I'd say stats are WAY more important than knowing how C# works.

If I'd been great at maths I'd have done a maths or physics degree (or electrophysiology). I went into bio because I don't like spending my days staring at equations.

Originally posted by kdavis:Want a nobel prize? Write an application where you can enter the genotype of an organism and some info about where is grows up, and have it give you detail on the phenotype.

There are reasonable questions just about what computer science actually is and how much value it has even for software engineers. The big reality with biology is that computer simulation is likely to be a requirement for understanding the system properties of even single cells. I don't know where biology as a whole is in recognizing that need. But Henry Markram is a good example of someone who is exploiting this kind of technology in his ambitious attempt to simulate the operation of the human brain apparently at the level of simulating the activity of individual protein molecules. I can imagine this kind of approach having enough impact to at least justify offering an undergraduate class centered on the technology. But the class would clearly have to revolve around important results arrived at through the techniques. The first step is to have results that are important enough to justify a class. The second step is to worry about general education in the techniques required to obtain and understand the results.

Simply: -Statistics is important. Learn a lot of it. - Higher math (ie post calculus work) isn't. - Don't know how to write a program like phylip or BLAST but know the principles behind them. (My grad school program does this with a required six week course taught by a long suffering computational biologist). - If you're working with big datasets, scripting becomes important. I recommend python or perl as well as possibly R. And because as much as you learn, no one has time be be an expert in three separate fields:- Make a good friend who is a statistician- Make another who is a hardcore programmerThe fact that you can sound like you know what you're talking about when it comes to statistics and programming should help with this.

Originally posted by kdavis:Want a nobel prize? Write an application where you can enter the genotype of an organism and some info about where is grows up, and have it give you detail on the phenotype.[/QUOTE]

Sure, like that I guess, but very comprehensive. I hate to go cheap on the logic, but think Jurassic Park in simulation, but choose a realistic target, like taking one cell from a 10 day old human fetus and figure out whether the adult will be intelligent. OK, that's sort of spooky and will get you shot by a right-to-lifer. Still, you can use it to simulate which of the various chemo agents will best attack a cancer cell and minimize side effects. The limit case is to start with a proposed DNA strand for a GM of any species and get the characteristics of the phenotype without waiting several years.

Originally posted by dnjake:There are reasonable questions just about what computer science actually is and how much value it has even for software engineers. The big reality with biology is that computer simulation is likely to be a requirement for understanding the system properties of even single cells. I don't know where biology as a whole is in recognizing that need. But Henry Markram is a good example of someone who is exploiting this kind of technology in his ambitious attempt to simulate the operation of the human brain apparently at the level of simulating the activity of individual protein molecules. I can imagine this kind of approach having enough impact to at least justify offering an undergraduate class centered on the technology. But the class would clearly have to revolve around important results arrived at through the techniques. The first step is to have results that are important enough to justify a class. The second step is to worry about general education in the techniques required to obtain and understand the results.

We're not even slightly close to accurately modeling what goes on in a single cell.

The second focuses generally on the incorporation of biology-specific math and computer science into the education system. Both assume that the lack of a math background is a serious problem.

Bingo.

There's a huge lack of comp sci (even just fundamental) requirements for a lot of majors out there. Computers are ubiquitous, and every person needs fundamental computer education, and even go so far as learning rudimentary programming in order to automate repetitive tasks.

As for the math training, the bio sciences are catching up, but it's coming as a shock to students when they start taking a class and the first thing the teacher reviews is math. I took an Environmental Science course, and all the yuppie/hippie kids thought it was going to be about "reduce, reuse, recycle". Then the first labs we had were about trudging around doing slope calculations, percent changes in before/after product, ratio analysis, etc. Half the class dropped, including the "know it all" hippie kid who kept interrupting the teacher to talk about the perils of the rainforest.

The bio sciences are still open to a lot of theory and exploration, but they need more math to help test theories, and more computer work to do it more efficiently.

I have an undergraduate background in Neuroscience, but recently went in a different direction where I'm in contact with various technically oriented people (Infosec). Based on this and various readings from individuals with a technical rather than bio background, I often feel that the reverse is true as well. Individuals with a technical background often display a stunning lack of comprehension of the basics of neuroscience, even when deeply intertwined with the field.

There seems to be a disproportionate number of biophysics people replying, so obviously they are going to overstate their case that mathematics and computer science is going to be needed by biologists.

While I don't deny that better understanding of computers and math in general is a good thing (I had a PI E-mail me once asking for my E-mail address, I kid you not), I agree with the author that computers are tools, and having a deep knowledge of how they work is not essential. Once complex biochemical pathways can be properly modeled then I might want to know why the model works, but we are a long ways off from that, and in the meantime I frankly have better things to do in the lab.

I think your comments actually reinforce some of the arguments they are making, and I don't think you really read the Robeva and Laubenbacher article very carefully. The Lac operon model equation you critique is the differential equation version; part of their point was that Boolean models are simpler and often capture the essential features of a system. If biologists want to understand how such operons are linked together on a larger systems scale, they will have to learn some modeling - drawing a picture just doesn't cut it at some point.

So, for starters, i have to say that this is one of the best discussions i've seen attached to one of my stories ever. Thanks all for participating.

quote:

Originally posted by hamptonio:I think your comments actually reinforce some of the arguments they are making, and I don't think you really read the Robeva and Laubenbacher article very carefully. The Lac operon model equation you critique is the differential equation version; part of their point was that Boolean models are simpler and often capture the essential features of a system. If biologists want to understand how such operons are linked together on a larger systems scale, they will have to learn some modeling - drawing a picture just doesn't cut it at some point.

Ugh, so the caption presents them in reversed order from how they appear in the figure? That makes sense now that you point it out, but man, that's terrible presentation.

Anyway, i'd say your larger argument is no better defined than that of the authors. Biologists understand things by understanding the biology; a picture is just a means to that end. A model may be a different means to that end.

But you appear to be arguing that we've reached the point where it's fundamentally impossible to understand the biology without a model - which is a rather radical claim, and one that really needs to be supported. Neither you nor the authors of these papers do anything of the sort.

My field is structural biology (x-ray crystallography). When it began in the 60's the researchers built their own instruments, developed the theory, and wrote their own algorithms/computer programs. Some of the better written and less user belligerent began to be shared among other laboratories, and were eventually codified as 'best practices'. When I came through (90's) there were some very sophisicated command line tools that still required a deep knowledge of the process to use and adjust properly. Now, training graduate students in 2009, the system has (d)evolved to a point and click interface with a cookbook series of steps for data reduction, model building and refinement.

"point and click" indeed. Kind of like having an automatic transmission on the car. It still works better when you know WHAT it's doing inside.

quote:

deesee - Tools like BLAST are useless unless you know what the right questions are. And there are only so many hours in the day...

That's what half of learning is about: what the right questions are.and no, although i've worked with computers the last 40 years or so, i don't consider myself to be a "condescending nerdy cheeto-stained computer nerds"; there really wasn't anything in CS in college back in the dino-days.

I agree with the sentiment here- math and CS are hugely helpful, particularly if you work with some newer technologies. I am a PhD student in pharmacology, but my undergrad background was math & computer science, not biology. This has been a great advantage because they teach you the biology, but not the math/CS you may need. However, the biology classes are clearly the first priority, and I would not want to spend an extra few years in grad school to learn linear algebra and C++.

Having worked in brain imaging, I have found that even some experts in the field are fuzzy on how their tools work, since the statistical theories behind them are quite complicated. Ive had similar experience with GeneChip analysis.

What is shocking is that statistics is not required in any way for some PhD programs (even as a prerequisite). I think stats should be added to curricula well before math or CS. These papers do raise some important points, but what they suggest cannot replace more urgent needs (stats).

This is one of those situations that I am torn on. At heart, I believe that the better you understand an instrument and the system you are measuring with it, the less likely you are to fuck up the measurement. On the other hand, it is not necessarily a good use of limited time.

Then I consider some instruments I commonly use, such as digital oscilloscope, optical spectrum analyzer, wave meters, auto-correlators etc etc. You know, I understand these instruments only at their most basic level, so I shouldn't throw stones. So, you might ask, how do I know when and where I can trust the results reported by these instruments. The answer is by being bloody careful. I measure the same thing in multiple different ways such that I can be reasonably certain that artifacts will be exposed. Even so, I still worry.

But the point is, I am probably in a much better position than most biologists to understand my instruments. I have advanced mathematics, I can program, I know some electronics, DSP etc etc. Yet, I cannot in all honesty say that I *know* how a tektronix scope processes its input and displays a waveform.

I think the comments stating that one must first understand what the relavent biological questions are, are absolutely correct. However, I still feel that a better mathematical understanding would be beneficial. (I'll readily admit that I am biased in coming from an engineering background to biology and working in a very quantitative lab). My impression from the slice of biology around me, is that a lot of the new tools, allow measurements to be made from individual units in what previously was only measured as an ensemble average. (In my case it is individual trafficking events rather than measuring a ensemble; but this comes up in being able to take measurements from individual cells where previously there were only measurements of the average of a huge number of cells.) What I want to argue is that by having all of these individual measurements we have an entire distribution of measurements, and while often people grasp at the mean and ignore the rest, there is a lot of information to be gained from these distributions if one can analyze it correctly. Here is where I feel there needs to be a better understanding of probability/statistics that people can better extract data from these measurements.

quote:

Originally posted by Dr. Jay:But you appear to be arguing that we've reached the point where it's fundamentally impossible to understand the biology without a model - which is a rather radical claim, and one that really needs to be supported. Neither you nor the authors of these papers do anything of the sort.

This is a fascinating question that I need to think more about, but I feel this may already be true. Almost all results are interpreted through a model. For example in genetics when looking at the effects of two mutations one looks if the effects are additive or not based upon a model to determine if the two proteins are in the same or parallel pathways. I think there are heuristic models such as this for many if not most things in biology. The question then becomes when is it worth it to move to a more complex computation/mathematical model?

And if you have to pick and choose, I'd say stats are WAY more important than knowing how C# works.

I think, in general, the papers make a good argument for understanding some of the underlying math. I think you make an excellent argument for understanding the statistics (which, really, are also math).

So far as the CS... well... to be honest, the only portions of it that I feel anyone needs to grasp are the mathematical sided ones. In other words, a basic understanding of how algorithms (language neutral) are formulated and applied to solve problems. As a degree, for instance, CS delves far more deeply into exact problems and platform issues, not to mention various languages as tools of expression, as well as HMI and other fields that have no relevance to a biologist.

I don't see why a compressed "how to think programmatically" course couldn't be developed. Isn't this all that would really be needed? Just a general study on the very, very basic roots underpinning CS, essentially the thought patterns needed in approaching and in turn expressing a problem you will be asking a computerized system to help you solve?

Whether we're talking Matlab, raw C# or Java or some other language, custom tools, or anything else... unless the software uses a very heavy natural language engine, this is the part that doesn't change. You don't even need to teach a language per se to teach this, and it might even be better not to, other than just as a rudimentary poke at the idea of syntax being important to a computer.

Part of the problem whenever you involve most computer scientists in a discussion about our field is that many are enamored with the intricacies of the applied field, or even of the deeper science. With good reason (sometimes)! Languages, garbage collection, HMI, data types, stack, pointers, parallelism, machine code, big O, compilers, regular expressions, overflow, mutable algorithms, trees, etc etc...

None of this matters, not really.

It's a fundamental method of thinking and in turn an understanding of how to formulate those thoughts into something to be expressed in a structured language a machine can interpret.

Everything else is just particulars. If you can grasp the fundamentals, any of the rest can be learned and applied for the situation you need it in. You'll never teach every single possible tool someone might or might not be put in a position to use, but you certainly can teach them so that they can easily adapt to and grasp the particulars of that tool.

Personally, rather than the normal "entry level Java/C#/C++/Basic/etc" "CS for non CS majors" course that is basically a "well here's a variable, here's an operand, here's a function, now you write some throwaway code, that was a waste of time and aren't you glad you didn't go into CS?" course, I'd really think a higher level, theory and thinking focused course would be the way to go. Something that didn't try, for instance, to teach the basics of programming in a particular language, but instead tried to teach the basics of "what it's all about," possibly using some light examples from various languages. Certain specifics for algorithms can be taught using language neutral symbolism without getting tied up in the particulars and issues of an actual compilable programming language.

In essence, more a philosophy of programming/computer science and programmatical thinking course.

The best part of any course of this nature is that the applications are so wide ranging, and even go beyond simply computer languages or tools in terms of usefulness.