An evo-devo geek's scientific meanderings

the nature of science

Textbooks may portray science as a codification of facts, but it is really a disciplined way of asking about the unknown. — Andrew Knoll, Life on a Young Planet

Some books change your life. When I was 12 or 13 or thereabouts, SJ Gould and others’ Book of Life rekindled my interest in prehistoric life, introduced me to the Cambrian explosion, and opened my eyes to a whole new worldview. It’s one of the reasons I hold a degree in evolutionary biology.

Life on a Young Planet was not a life-changer, precisely. That’s not why I love it to pieces. By the time I read it, I’d gained an appreciation of just how complex and full of uncertainty natural science was, and the book was permeated by an awareness of this complexity. Also, it was simply beautiful writing.

(I can’t emphasise the importance of good writing enough. I’ve read too many papers and books [Crucible of Creation and The Plausibility of Life, I’m looking at you] that had good information but were so atrociously written that I nearly put them down despite being fascinated by their subject.)

Last month, the author of Life on a Young Planet, Harvard professor Andy Knoll, came to visit my university. I was practically bouncing with excitement from the moment I saw his name on a newsletter. He gave four lectures in total; until the very last one, I actually contemplated getting my copy of the book signed. Or, to be a fangirl and a nerd, my printout of his lovely biomineralisation review. (I still can’t decide if I made a mistake. Damn, I didn’t even ask a stupid question. Four lectures, and I just sat there and drooled over my notebook.)

Knoll is nearly as good a speaker as he is a writer. He doesn’t have the liveliest voice and speaks quite slowly, but if you can get past that, his lectures are really good. (I’m glad of that; I really don’t like losing my illusions!) They are solid structures that you have no difficulty following the logic of.

Let me put it this way – Andy Knoll is an excellent storyteller.

That got me worrying, because I’m a sceptic and (truth be told) a little bit of a cynic at heart, and because over the years I’ve done a lot of navel-gazing about belief and knowledge and conviction. I have a tendency to grow suspicious when I feel too certain about something.

Am I – are we – too often blinded by good storytelling? How often do we get so enamoured of good ideas that we try to force them on situations they don’t fit? And how often do we doubt something just because it sounds too neat?

Here’s the specific example from the Knoll lectures that made me think of this. Knoll is a champion of the oxygen + predation explanation of the Cambrian explosion. (I didn’t realise he was involved in that paper until it came up in the lectures…) He is also an advocate of a similar explanation for the diversification of single-celled eukaryotes 250 million years before the Cambrian. He convinced me well enough, but then I immediately thought – really? Is it really that simple? Does one size really fit both events?

I often take note of these “pet ideas” as I read scientific literature. A group of phylogeneticists uses microRNAs to tackle every tough problem ever. A palaeontologist interprets every squishy-looking Cambrian weirdo as a mollusc. Researchers in the biomineral field look for slushy amorphous precursors to crystalline hard parts everywhere. (Remember, all generalisations are false ;))

Just to be clear: I’m not at all saying that being a “pet idea” automatically makes something wrong or suspicious. For instance, the hunters of amorphous biominerals have some good theoretical reasons to look, and they often do find what they’re looking for. Likewise, I’m impressed enough with Andy Knoll’s pet hypothesis about the Cambrian that I’ve rethought my own pet ideas about the subject.

I’m also not accusing these people of being closed-minded. Going back to Knoll, IMO he demonstrated ample healthy scepticism about his pets during his post-lecture Q&A sessions. (Which makes me a bit less nervous about the neatness of his stories.)

Someone better versed in the philosophy and sociology of science could probably write a long treatise involving paradigms and confirmation bias and contrariness here. I’m even less of a philosopher than I am a geologist, so I think I’ll leave the deeper insights to those who have them.

Meanwhile, I’ll continue to be a fan of Andy Knoll and appreciate a good scientific story. So long as I remember to look beneath the surface – both of good stories and of my own suspicion of them…

When I was little, I wanted to know everything. At age six or seven, I could whip out an explanation of why the sun shines, nuclear fusion and all, and by the time I hit my teens, I’d memorised the basic properties of a couple of hundred dinosaur genera, everything cetacean, and every planet in the solar system (back when that still included Pluto :-P) My family members are still a bit surprised if a science question comes up over the dinner table and I answer “I don’t know”.

During my undergrad years, specialisation was my nightmare. While I could, I took classes in maths, programming, geology and something vaguely philosophy of science-ish in addition to my compulsory credits in biology. My BSc is called evolutionary biology, but the actual subjects I studied for it range all the way from biochemistry to ecology.

But you know what?

After 2+ years of working on a single part of a single animal, I finally feel like I know something.

As an obsessive learner and insufferable know-it-all, the real world was bound to give me some big shocks. The first was venturing onto the internet, and getting a near-infinite pile of information dumped on me by Google. That experience might have been why I lost most of my interest in dinosaurs – there just seemed to be too much to learn. That’s a hard pill to swallow for a young know-it-all!

And then I went to university, and met the scientific literature. Even more than first googling dinosaurs, that made me realise that I knew nothing. Ever since then, I’ve never quite felt secure about my grasp of any field. There were always papers I hadn’t read, ideas I didn’t really understand, facts I hadn’t included in my reckoning. I often feel like I can’t form an opinion on anything, because there’s a part of a discussion I’ve simply missed or didn’t pay enough attention to.

No, I’m nowhere near satisfied with my current knowledge of my own area (now that I have an area I can call my own). I don’t think I’ll ever be, and if it happens it’s probably a good sign that I should read more. But when I look at my animals, when I have to tell others about my work, I feel… comfortable. This is my stuff, and while I may not know everything, I know some things in an intimate way only close study can give you. It is an immensely satisfying feeling. And it makes me think that perhaps, specialisation isn’t such a bad thing after all.

Remember how I complained that people often seem to forget the scientific method when it comes to transcriptomics? Well, I’m glad to say some scientists still remember those all-important steps between data and conclusion. When looking at the predicted functions of the genes active in these cute little baby worms* during the first three days of their lives, Kenny and Shimeld (2012) not only compared their data to a “background” dataset from a well-studied animal, but also

did statistical tests to confirm that the differences they saw were real,

discussed several possible causes for them.

… including those that weren’t biologically interesting at all, like limitations of their methods. In the end, they couldn’t really draw strong conclusions from this particular part of their analysis, but the best thing is they sound perfectly aware of the difficulty and careful not to go too far in interpretation.

Folks, this is how you should write a transcriptomics paper. Not look at a few out-of-context numbers and concoct a story around them.
***

The paper is a pretty standard specimen of the kind of next-generation sequencing study that’s been proliferating in recent years. It looked for genes that make the shell of an oyster by sequencing all the RNA expressed in the tissues that build the shell.

This sort of study generates an awful lot of data, which makes it unfeasible to analyse it just by old-fashioned human brainpower and maybe a little statistics. Anyone happy to manually trawl through nearly 77 thousand sequences to find the interesting ones is a lot crazier than anyone I know… and anyone willing to study each of them to figure out what they do is probably a god with all of eternity at their hands.

One of the ways researchers use to quickly and automatically find interesting patterns in such large datasets is the Gene Ontology (GO) database. GO uses a standard vocabulary to tag protein sequences with various attributes such as where they are found within the cell, what molecules they might bind to, and what biological processes they might participate in. Such tags may be derived from experiments, or often from features of the sequence itself. For example, proteins often contain specific sequence motifs that tell the cell where to send them. Assuming that related proteins have similar functions in different organisms, GO annotations derived from one creature can be used to make sense of data produced in another.

So, Joubert et al. took their 77 thousand sequences, translated them into protein, and ran them through a program that finds similar sequences with existing annotations from GO or similar databases. They got some numbers. X per cent of the proteins are predicted to have some metabolic function, Y per cent of them bind to something, and so on. Then they went on to pull hypotheses out of these numbers. And that’s where they really should have remembered Scientific Method 101.

The part where I started looking askance at the paper is where they get excited about the percentage of predicted ion-binding proteins in the dataset. Of course, oyster shells are largely made of ions (calcium and carbonate ions, to be precise). But, importantly, “ions” are an awfully broad category, and they are essential for many of the everyday workings of any cell. Even calcium ions, which make up one half of the mineral component of the shell, play many other roles. Muscle contraction, cell adhesion, conduction of nerve impulses – all involve calcium ions in one way or another, and lots and lots of proteins either regulate or are regulated by calcium signals via binding the ions. So the question arose in my head: is it really unusual for 17% of “binding” sequences in a sample to bind ions?

This is not the first time I see that people who publish high-throughput sequencing data don’t ask that sort of question. They just report the pattern they saw, and try to interpret it. But patterns can only be interpreted in context. To tell whether a pattern is unusual, you must first know what the usual pattern looks like!

In fact, finding this many ion binding proteins doesn’t seem extraordinary at all. SwissProt, the awesome hand-curated protein sequence database, has a really handy browsing tool where you can get a breakdown of a selected set of sequences by GO categories. Curious, I had a quick look at all the 20 244 human proteins in SwissProt. (I picked humans because I’d probably be hard-pressed to find organisms with more GO-annotated proteins.) Out of the 11 674 that GO classifies under “binding”, 3934 are thought to bind ions. That’s almost 34%. This almost certainly has nothing to do with us having bones made of ions – SwissProt includes proteins from all tissues.

Even when you compare ion-binding sequences with ones that are predicted to bind something else within the same dataset, the “background” wins: while the human collection I looked at has about 1.6 times as many protein-binding sequences as ion-binders, in the oyster dataset the protein-binders outnumber the ion-binders two to one. Caveats apply as usual – datasets are incomplete, GO annotations are probably both incomplete AND a bit suspect, etc.; but I can’t see how that justifies the implication that the pattern this study found in the oyster is somehow special and interesting from a shell-building perspective. The whole thing smells like looking for faces in the clouds to me. Science is not simply about pattern-finding – it’s about finding meaningful patterns. I hope that this crucial distinction doesn’t get completely washed away by the current flood of data, data, data.

He’s the editor-in-chief of BioEssays, a journal dedicated to publishing reviews and “ideas papers” in biology. Last year, I was very happy about his editorial concerning anthropomorphic language in evolutionary biology. Now he’s written another one that makes me want to hug him. It’s about recognising scientists who don’t produce bucketloads of data.

He argues that both in education and funding, there is far too much emphasis on data generation, and far too little on data integration, that is, taking others’ results and making some sort of overarching sense of them. He writes:

The problem starts early: as undergraduates, students learn the foundations of the subject; they then passage to learning how to do research – the emphasis being on generating results. Why the overwhelming preoccupation with generating more results? Aren’t there enough being produced? Arguably there are so many results around that we need more dedicated people who explicitly don’t produce new results, but rather distill out higher level insights. Naturals at this kind of science can also be spotted in the lab: supervisors should be mindful not to automatically denigrate diffuse interest or lack of single-mindedness: perhaps they are the signs of an “integrator”. And an “integrator” is every bit as much a scientist as a “producer”.

As a person for whom generating results is usually a chore while thinking about them is a joy, all I can say is: WORD.

A while back, I discussed the interpretation of the enigmatic Cambrian creature Nectocaris on this space. I just discovered that the same guy (or, well, one of the guys) who described the new Nectocaris fossils as the remains of a primitive cephalopod had also been part of a publication “molluscifying” another enigmatic Cambrian creature. In this somewhat earlier case, Caron et al. (2006) interpret Odontogriphus as a soft-bodied primitive mollusc. Something of a grand-uncle to everything molluscan that lives today. (Unlike Nectocaris, Odontogriphus did, apparently, have a radula.)

Needless to say, this interpretation was immediately contested by another Cambrian expert, Nick Butterfield (Butterfield, 2006). The radula of Odontogriphus (and of the more popular “spiny slug” Wiwaxia) aren’t necessarily true radulae, the serial gills of Odontogriphus need not be the specific kind of gills that molluscs have, etc. (This then triggered a response from Team Mollusc [Caron et al., 2007], but I digress :))

I’m beginning to see a pattern here, something much broader than J-B Caron vs. everyone else. It basically reminds me of the contrast the whole of Wonderful Life (Gould, 1991) was built on. To those who haven’t read the book, one of the central themes of Wonderful Life is the (re-)interpretation of Burgess Shale fossils. Initially, the fossils were all shoehorned into already known groups. Decades later, palaeontologists began to examine them more closely, and found that few of them truly fit into those groups. Out of these surprises grew Stephen Jay Gould’s brave new Cambrian world, the festival of freaks that later dwindled to the pathetic little remnant of its full diversity that populates today’s seas. (We’ll leave the discussion of how right or wrong either view is for another time ;))

Another parallel that comes to mind is the extreme range of interpretations of the earlier Ediacaran organisms, which researchers have flagged as everything from early members of living animal groups to a totally new form of life.

Also, somewhat, the lumper/splitter division that seems to exists in vertebrate palaeontology. There are the “lumpers” who want to group everything vaguely similar into the same taxon, and there are those that want to split everything vaguely unique into its own group. (It should go without saying, but there are also opinions in between. I don’t want you to come away thinking that palaeontology and taxonomy are just armed camps of lumpers and splitters shouting obscenities at each other across a barricade ;))

I get the impression that hardcore lumpers tend to consistently be lumpers and hardcore splitters tend to remain splitters.

Are there just some people who want to connect every new observation to something we’ve already seen? Are there just people with a natural tendency to emphasise the uniqueness of new observations? Or prefer to take the middle ground, as the case may be? Why? And more generally, what makes scientists pick one side – or refuse to pick sides – in controversial issues?

I was reading this news article about a quantum physics experiment. It all seemed suitably exciting and mysterious to my lay eyes, and then the article hit me with this:

“Experiments are only relevant in science when they are crucial tests between at least two good explanatory theories,” [David] Deutsch says. “Here, there was only one, namely that the equations of quantum mechanics really do describe reality.”

I’m sorry, but that’s just… wrong. A lot of real-life experimental work involves testing a single idea against a null hypothesis (e.g. “this mutation causes disease A” vs. “this mutation doesn’t cause disease A”). And you don’t need two competing explanations for one explanation to turn out wrong (e.g. a particular mutation is the only cause you can think of for disease A, but then you find lots of people having the mutation but not the disease). Sure, QM has passed many, many tests – but remember, science can only ever say “this is right to the best of our knowledge”, not “this is Right”. As far as I understood the article, this experiment also looked at QM in a new way, so it’s not like it was a boring replication of something we’d seen a thousand times. Maybe Deutsch is right to question its importance, but I think he chose a really poor reason for doing so.

(Disclaimer: the quotes that appear in articles like this are not necessarily the words that came out of the interviewee’s mouth/keyboard. So like a good scientist, I’ll leave some reservations in my judgement there.)