About this Author

College chemistry, 1983

The 2002 Model

After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases.
To contact Derek email him directly: derekb.lowe@gmail.com
Twitter: Dereklowe

April 10, 2012

Biomarker Caution

After that news of the Stanford professor who underwent just about every "omics" test known, I wrote that I didn't expect this sort of full-body monitoring to become routine in my own lifetime:

It's a safe bet, though, that as this sort of thing is repeated, that we'll find all sorts of unsuspected connections. Some of these connections, I should add, will turn out to be spurious nonsense, noise and artifacts, but we won't know which are which until a lot of people have been studied for a long time. By "lot" I really mean "many, many thousands" - think of how many people we need to establish significance in a clinical trial for something subtle. Now, what if you're looking at a thousand subtle things all at once? The statistics on this stuff will eat you (and your budget) alive.

I can now adduce some evidence for that point of view. The Institute of Medicine has warned that a lot of biomarker work is spurious. The recent Duke University scandal has brought these problems into higher relief, but there are plenty of less egregious (and not even deliberate) examples that are still a problem:

The request for the IOM report stemmed in part from a series of events at Duke University in which researchers claimed that their genomics-based tests were reliable predictors of which chemotherapy would be most effective for specific cancer patients. Failure by many parties to detect or act on problems with key data and computational methods underlying the tests led to the inappropriate enrollment of patients in clinical trials, premature launch of companies, and retraction of dozens of research papers. Five years after they were first made public, the tests were acknowledged to be invalid.

Lack of clearly defined development and evaluation processes has caused several problems, noted the committee that wrote the report. Omics-based tests involve large data sets and complex algorithms, and investigators do not routinely make their data and computational procedures accessible to others who could independently verify them. The regulatory steps that investigators and research institutions should follow may be ignored or misunderstood. As a result, flaws and missteps can go unchecked.

So (Duke aside) the problem isn't fraud, so much as it is wishful thinking. And that's what statistical analysis is supposed to keep in check, but we're got to make sure that that's really happening. But to keep everyone honest, we also have to keep everything out there where multiple sets of eyes can check things over, and this isn't always happening:

Investigators should be required to make the data, computer codes, and computational procedures used to develop their tests publicly accessible for independent review and ensure that their data and steps are presented comprehensibly, the report says. Agencies and companies that fund omics research should require this disclosure and support the cost of independently managed databases to hold the information. Journals also should require researchers to disclose their data and codes at the time of a paper's submission. The computational procedures of candidate tests should be recorded and "locked down" before the start of analytical validation studies designed to assess their accuracy, the report adds.

This is (and has been for some years) a potentially huge field of medical research, with huge implications. But it hasn't been moving forward as quickly as everyone thought it would. We have to resist the temptation to speed things up by cutting corners, consciously or unconsciously.

Get used to stories like this. Take a look at the 6000+ abstracts from the 2012 AACR national meeting last week. Wishful thinking is a mild description for that carnival sideshow. Maybe 3% of the work was worthwhile. Four darts and a genome map would provide a better chance of hitting a cancer response biomarker.

Crony capitalism is the term that comes to my mind. Tenure and grant funding have become so dependent on publications in "elite" journals, where reviews from one's "peers" too often simply means that they've gotten a mentors or former lab-mate to accept whatever experimental dross has come out of the previous two years in their lab.

IMO, there needs to be a complete overhaul of the rewards system in academic science. Tenure and grants need to somehow be linked to sustained support for one's ideas and publications. The alternative is more wasted money and lost lives.

None of this would happen if these academic medical researchers were still primarily concerned about doing academic medical research. Since it seems the first priority out there in the ivory towers is figuring out how to monetize whatever you're doing (patenting everything that moves, founding companies as soon as you have an embryo of an idea), we'll just get access to more unregulated, unverified garbage (sorry - Innovation).

Yes, and I would argue that it's often otherwise. Statistical analysis often misleads us into seeing causation where none exists. Statistical models often predict without explaining. In fact the heavy reliance on statistics in modern biomedical research is creating casualties whose impact will become apparent only when there's long-term damage. Perhaps it's time to heed Sydney Brenner's words and go back to studying the cell as the operational unit of life.

johnnyboy wrote (#3): "None of this would happen if these academic medical researchers were still primarily concerned about doing academic medical research."

Sorry, I was doing research back when "academic medical researchers were still primarily concerned about doing academic medical research", back when any hint of commercial application was almost always regarded as career-ending selling out, and "unregulated, unverified garbage" was just as routine then as it is now. From my first day in grad school, I was appalled at how little the faculty at my top tier university medical school knew or cared about statistics. Find the primary author of a biology paper - any biology paper - and ask them why they chose t-test instead of Chi-square, and prepare for a look of befuddlement in at least half of the cases - and I'm being charitable there. Likewise, how many faculty authors are close enough to the laboratory or the point of data collection that they really know what their grad students or postdocs are doing to obtain the "right" result. How many "scientists" truly think deeply about experimental design and all the places that systematic error creep in?

When grants are the primary/exclusive source of a scientist's funding, is there more or less pressure to cook the books in order to secure the next grant than there is to try to cook the books to launch a startup? Mental exercise: how often are experiments repeated after they are published in the academic world, and how critical is it for things need to be reduced to repeatable practice in order to have a chance of commercial success. Arguably, hopes for commercialization impose *more* discipline on science than anything seen in a purely academic pursuit.

But to keep everyone honest, we also have to keep everything out there where multiple sets of eyes can check things over, and this isn't always happening

I agree in principle, and the first "eyes" should be the eyes of the reviewer(s) assessing the paper. But practically speaking even if you publish your algorithm, provide a mathematical proof (where appropriate), and make the raw data and computer code available few people, it seems to me, are going to take the time necessary to verify it's all correct.

I work in the area of biomarker discovery using integration of multiple -omics data. The bar we strive to reach with our analysis is cross-validation on independent data sets. I've also argued (www.neoproteomics.net/blog) that the publication of novel computational methods be accompanied by positive and negative controls, just as researchers publishing in the area of, say, cell biology (wet-bench) have been required to do for years.

Otherwise, we run the risk of: Most Random Gene Expression Signatures Are Significantly Associated With Breast Cancer Outcome
(www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002240).

The thing I wonder is if there's a way to start collecting a as much "omics" (love that "word") and longitudinal data as possible so that, down the line, someone can crunch the data based on various future hypotheses.

I'm thinking of a kind of broad spectrum mass Framingham study where we just keep collecting as much data as we can (and as technology develops) in the hope that it will be valuable in future. It's hard work, and it has to be done rigorously, but since it will only be valuable to some unknown people in the future, what's the incentive to start now? In fact, as johnnyboy says, the incentives unfortunately _discourage_ collecting these data.

The more biomarkers (or anything else) you study the larger the sample size must be to establish that a possible connection is real. The weaker the connection the larger the sample size must be. Eventually you will get to the point where proving that a real connection actually is real is impossible because it would require studying more people than the total human population. Welcome to the genetic version of the Heisenberg uncertainty principle.

With regard to #10, I think it's feasible with existing samples left over from tons of different studies. You'd be surprised what's available in the right facilities if you know who to call. While the logistics and the paperwork would be a nightmare, it's definitely something that's doable from a pure science perspective.

If you do that, you could get a lot of good quality information along with existing outcomes information. It would definitely be fun to watch.

Kinemed is about to have an ipo, they are monetizing their biomarker tests. They are somehow detailing pathways with heavy water, etc. Sounds very novel, I have no idea whether there is intensive statistical processing / simulation on the backend required to finalize any patients given test results.