Opinion: Confounded Cancer Markers

Ethic guidelines drastically limit experiments on human subjects. Hence, the fundamental mechanisms of human diseases are mostly studied in vitro or in animal models. These are only substitutes for understanding human physiology and disease. Proving that a mechanism responsible for disease progression in a model system is also relevant to human diseases—not to mention then translating it into a new therapeutic—is a major bottleneck in biomedicine. In the end, only clinical interventions on human will bridge models and human disease.

One approach is to look for correlations. If you can show that patients with tumors expressing, for example, stem cell markers have a much worse prognosis than those without them, that would suggest that stem cells are involved in human disease progression. This line of thinking has long been popular in oncology because you need only access surgical specimens, some mRNA or protein marker, and a follow up of patients. And with the recent advent of efficient microarray screens, this approach has become all the rage, reducing the discovery of signatures, i.e. multi genes markers, to a nearly automatic procedure.

The signatures’ prognostic potential can then be tested instantly in genome-wide compendia of expression profiles for hundreds of human tumors, all available for free in the public domain. Besides stem cells markers, signatures linked to all sorts of biological mechanisms or states have been shown to be associated with human cancer outcome. Indeed, several new signatures are published every month in prominent journals.

But such correlations are not all that they seem. The accumulation of signatures with all sorts of biological meaning, but nearly identical prognostic values, already looked suspicious to us and others back in 2007. It seemed that every newly discovered signature was prognostic. We collected from the literature some signatures with as little connection to cancer as possible. We found, for example, a signature of the blood cells of Japanese patients who were told jokes after lunch, and a signature derived from the microarray analysis of the brains from mice that suffered social defeat. Both of these signatures were associated with breast cancer outcome by any statistical standards.

We then went back to published cancer signatures and found that 60 percent were no more prognostic than signatures made by picking up genes at random among the 21,000 human genes. The problem occurred with single gene markers, but became dramatic with multigenes signatures. A gene chosen at random already has roughly one in five chance of being prognostic; for signatures made of more than 100 genes, 90 percent are prognostic. How is this possible? We showed that in breast cancer the expression of a large fraction of the genome correlates with the proliferation rate, which is prognostic in this disease.

It took us four years and six rejections to get this work finally published in a computational biology journal (PLoS Comput Biol, 2011)—not the most efficient venue to reach the oncology community. Meanwhile, a steady stream of studies confounded by proliferation rates has appeared. This has to be said, one can no longer stay silent about the rather limited self-correction capability of the top tier publishing system (Cell, Nature Genetics, PNAS, etc.), which promoted these studies in the first place.

The oncogenomic-based literature has forgotten the pitfalls of non-specific effects and the value of negative controls. It is not enough to show that a signature is prognostic; biological conclusions may be drawn only if its prognostic value is specifically driven by the mechanism/state under investigation. Importantly, we question prognostic signatures as specific research tools, not as clinical guides: smoke does not drive fire, yet it is powerful indicator of when and where a fire is burning.

Vincent Detours is a researcher at the Université Libre de Bruxelles in Belgium.

Add a Comment

Comments

This is really shocking -- not only that medical research has been misled by sloppy statistical interpretation, but that leading scientific journals have attempted to suppress the discovery of this error.

There may be a few good reasons why the author's article was rejected so many times.

First, this article does a disservice by portraying those of us who have been involved in biomarker development studies as semi-morons who are not capable or willing to use the scientific method to validate our findings.

Second, how many random genes should be expected to be associated with cancer outcome is surely not sufficiently addressed by these studies.Â But it is also not sufficiently addressed by this article.Â Everyone knows that when you do multiple hypothesis testing with microarrays that SOMETHING will appear to be significant, so how many should be expected?Â That does not mean such studies are useless, as is clearly implied by the author of this article.Â Â

Third, proper phased biomarker development study designs exist to rule out those findings that do not validate.Â The routine is Discovery, Evaluation, and Validation.Â No signature found to not generalize to samples from patients collected and run at different times in the same medical center will likely be pushed forward.Â This is built into phased biomarker development study designs.Â No signature found to not validate across different centers is likely to proceed to clinical use.Â Â This is also built into phased study designs.Â Most people doing discovery phase studies know they are doing discovery phase research.

Granted, the discovery phase is messy, in part due to a lack of deep understanding that FDR control methods such as B-H do not accurately control FDR, and that reporting an estimate of the FDR is preferred to direct attempts to control.Â It's also messy due to a profound lack of efforts to standardize methods for data representation (normalization and transformation) and the continue use of silly measures like fold-change to measure differences.Â So why not study methods iteratively within each study to see if they provide internally consistent results (e.g., Jordan et al., 2008)?Â Still, if a clinical research team has a panel of putatively prognostic biomarkers, and an associated algorithm, they should be given the chance to further evaluate their generalizability with additional data from their own centers, and if that goes well, validate them in a multi-center trial.Â Then they can do a VDS (Voluntary Data Submission) to the FDA (probably should include data from all phases).Â The re-analysis of individual data sets at the discovery phase alone from studies like this sheds little light on the value of the entire, mature process of biomarker development.

BTW, and fourth, the article uses the term 'confounded' wrong.Â The problem the author is studying is Type I Error Rate Inflation due to Multiple Hypothesis Testing, which occurs with, or without, true confounding.Â The jokes told to Japanese during lunch cannot possibly be a true confounding variable. unless it somehow influences the gene expression profiles of breast cancer patients.Â Age (old cancer px vs. young normal controls) would be a possible confounding variable because it introduces bias.Â Some genes appearing different because they are randomly selected could in fact reflect massive genomic shifts associated with cancer.

While we're busy promoting our own literature, I might hasten to add that in my opinion, people doing -omic based prognostic modeling should avoid using the log-rank test to test the 'significance' of their Kaplan-Meier curves.Â The log-rank test not a sufficiently critical test in this application, and does not track with model accuracy as well as another test called the F* test (Berty et al., 2010).Â Professional critiques have had fun from the sidelines and have sidetracked a lot of good effort.Â Rather than write a scathing critique against straw men, perhaps some effort should be made into identifying weaknesses in a paradigm and making a contribution with a more viable, robust alternative.

Anyone doing biomarker development should choose random sets of gene/protein measurements and feed them into the model optimization via cross-validation to generate a distribution of performance evaluation measures to see if, for their study, a problem exists with random gene selection.

For additional information on how far we've come in understanding the complexities of biomarker research, I still recommend David Ransohoff's occasional articles on the topic.Â

Thank you for creating awareness of this issue.Â Gene signatures for anticancer drug response are very popular now as well.Â As a pharmacologist, IÂ have beenÂ skeptical of how such signatures couldÂ predict drug ADMET, especially in solid tumors with impaired vasculature.Â Yet the work appears in top tier journals and platform presentations at major meetings.Â Meanwhile, standard butÂ informative drug PK/PD studies are relegated to the poster sessions.

This is really shocking -- not only that medical research has been misled by sloppy statistical interpretation, but that leading scientific journals have attempted to suppress the discovery of this error.

There may be a few good reasons why the author's article was rejected so many times.

First, this article does a disservice by portraying those of us who have been involved in biomarker development studies as semi-morons who are not capable or willing to use the scientific method to validate our findings.

Second, how many random genes should be expected to be associated with cancer outcome is surely not sufficiently addressed by these studies.Â But it is also not sufficiently addressed by this article.Â Everyone knows that when you do multiple hypothesis testing with microarrays that SOMETHING will appear to be significant, so how many should be expected?Â That does not mean such studies are useless, as is clearly implied by the author of this article.Â Â

Third, proper phased biomarker development study designs exist to rule out those findings that do not validate.Â The routine is Discovery, Evaluation, and Validation.Â No signature found to not generalize to samples from patients collected and run at different times in the same medical center will likely be pushed forward.Â This is built into phased biomarker development study designs.Â No signature found to not validate across different centers is likely to proceed to clinical use.Â Â This is also built into phased study designs.Â Most people doing discovery phase studies know they are doing discovery phase research.

Granted, the discovery phase is messy, in part due to a lack of deep understanding that FDR control methods such as B-H do not accurately control FDR, and that reporting an estimate of the FDR is preferred to direct attempts to control.Â It's also messy due to a profound lack of efforts to standardize methods for data representation (normalization and transformation) and the continue use of silly measures like fold-change to measure differences.Â So why not study methods iteratively within each study to see if they provide internally consistent results (e.g., Jordan et al., 2008)?Â Still, if a clinical research team has a panel of putatively prognostic biomarkers, and an associated algorithm, they should be given the chance to further evaluate their generalizability with additional data from their own centers, and if that goes well, validate them in a multi-center trial.Â Then they can do a VDS (Voluntary Data Submission) to the FDA (probably should include data from all phases).Â The re-analysis of individual data sets at the discovery phase alone from studies like this sheds little light on the value of the entire, mature process of biomarker development.

BTW, and fourth, the article uses the term 'confounded' wrong.Â The problem the author is studying is Type I Error Rate Inflation due to Multiple Hypothesis Testing, which occurs with, or without, true confounding.Â The jokes told to Japanese during lunch cannot possibly be a true confounding variable. unless it somehow influences the gene expression profiles of breast cancer patients.Â Age (old cancer px vs. young normal controls) would be a possible confounding variable because it introduces bias.Â Some genes appearing different because they are randomly selected could in fact reflect massive genomic shifts associated with cancer.

While we're busy promoting our own literature, I might hasten to add that in my opinion, people doing -omic based prognostic modeling should avoid using the log-rank test to test the 'significance' of their Kaplan-Meier curves.Â The log-rank test not a sufficiently critical test in this application, and does not track with model accuracy as well as another test called the F* test (Berty et al., 2010).Â Professional critiques have had fun from the sidelines and have sidetracked a lot of good effort.Â Rather than write a scathing critique against straw men, perhaps some effort should be made into identifying weaknesses in a paradigm and making a contribution with a more viable, robust alternative.

Anyone doing biomarker development should choose random sets of gene/protein measurements and feed them into the model optimization via cross-validation to generate a distribution of performance evaluation measures to see if, for their study, a problem exists with random gene selection.

For additional information on how far we've come in understanding the complexities of biomarker research, I still recommend David Ransohoff's occasional articles on the topic.Â

Thank you for creating awareness of this issue.Â Gene signatures for anticancer drug response are very popular now as well.Â As a pharmacologist, IÂ have beenÂ skeptical of how such signatures couldÂ predict drug ADMET, especially in solid tumors with impaired vasculature.Â Yet the work appears in top tier journals and platform presentations at major meetings.Â Meanwhile, standard butÂ informative drug PK/PD studies are relegated to the poster sessions.

This is really shocking -- not only that medical research has been misled by sloppy statistical interpretation, but that leading scientific journals have attempted to suppress the discovery of this error.

There may be a few good reasons why the author's article was rejected so many times.

First, this article does a disservice by portraying those of us who have been involved in biomarker development studies as semi-morons who are not capable or willing to use the scientific method to validate our findings.

Second, how many random genes should be expected to be associated with cancer outcome is surely not sufficiently addressed by these studies.Â But it is also not sufficiently addressed by this article.Â Everyone knows that when you do multiple hypothesis testing with microarrays that SOMETHING will appear to be significant, so how many should be expected?Â That does not mean such studies are useless, as is clearly implied by the author of this article.Â Â

Third, proper phased biomarker development study designs exist to rule out those findings that do not validate.Â The routine is Discovery, Evaluation, and Validation.Â No signature found to not generalize to samples from patients collected and run at different times in the same medical center will likely be pushed forward.Â This is built into phased biomarker development study designs.Â No signature found to not validate across different centers is likely to proceed to clinical use.Â Â This is also built into phased study designs.Â Most people doing discovery phase studies know they are doing discovery phase research.

Granted, the discovery phase is messy, in part due to a lack of deep understanding that FDR control methods such as B-H do not accurately control FDR, and that reporting an estimate of the FDR is preferred to direct attempts to control.Â It's also messy due to a profound lack of efforts to standardize methods for data representation (normalization and transformation) and the continue use of silly measures like fold-change to measure differences.Â So why not study methods iteratively within each study to see if they provide internally consistent results (e.g., Jordan et al., 2008)?Â Still, if a clinical research team has a panel of putatively prognostic biomarkers, and an associated algorithm, they should be given the chance to further evaluate their generalizability with additional data from their own centers, and if that goes well, validate them in a multi-center trial.Â Then they can do a VDS (Voluntary Data Submission) to the FDA (probably should include data from all phases).Â The re-analysis of individual data sets at the discovery phase alone from studies like this sheds little light on the value of the entire, mature process of biomarker development.

BTW, and fourth, the article uses the term 'confounded' wrong.Â The problem the author is studying is Type I Error Rate Inflation due to Multiple Hypothesis Testing, which occurs with, or without, true confounding.Â The jokes told to Japanese during lunch cannot possibly be a true confounding variable. unless it somehow influences the gene expression profiles of breast cancer patients.Â Age (old cancer px vs. young normal controls) would be a possible confounding variable because it introduces bias.Â Some genes appearing different because they are randomly selected could in fact reflect massive genomic shifts associated with cancer.

While we're busy promoting our own literature, I might hasten to add that in my opinion, people doing -omic based prognostic modeling should avoid using the log-rank test to test the 'significance' of their Kaplan-Meier curves.Â The log-rank test not a sufficiently critical test in this application, and does not track with model accuracy as well as another test called the F* test (Berty et al., 2010).Â Professional critiques have had fun from the sidelines and have sidetracked a lot of good effort.Â Rather than write a scathing critique against straw men, perhaps some effort should be made into identifying weaknesses in a paradigm and making a contribution with a more viable, robust alternative.

Anyone doing biomarker development should choose random sets of gene/protein measurements and feed them into the model optimization via cross-validation to generate a distribution of performance evaluation measures to see if, for their study, a problem exists with random gene selection.

For additional information on how far we've come in understanding the complexities of biomarker research, I still recommend David Ransohoff's occasional articles on the topic.Â

Thank you for creating awareness of this issue.Â Gene signatures for anticancer drug response are very popular now as well.Â As a pharmacologist, IÂ have beenÂ skeptical of how such signatures couldÂ predict drug ADMET, especially in solid tumors with impaired vasculature.Â Yet the work appears in top tier journals and platform presentations at major meetings.Â Meanwhile, standard butÂ informative drug PK/PD studies are relegated to the poster sessions.

I never liked this "signature" business and their association with disease states, especially as prognostic markers of cancer progression/metastasis and therefore prognosis. As correctly mentioned in this short assay, the sequential silencing of TSG will not be in the"signature". Once again, the "omics" driven "new" concepts and applications are just tools in the war for monies. And no sign of a "scientific revolution"Michael Lerman, M.D., Ph.D.

The problem is not "omics". The problem is their simplistic use by investigators who forget that (a) it's a network, stupid, not pathways, and therefore those 40,000 measurements are not independent and (b) there are many biological variables other than those directly associated with survival encapsulated in those measurements that will make large gene signatures not robust. These failures were noted at least as far back as 2005 in a Bioinformatics paper by an Israeli group (L. Ein-Dor, et al. PMID:15308542). Â Â

I never liked this "signature" business and their association with disease states, especially as prognostic markers of cancer progression/metastasis and therefore prognosis. As correctly mentioned in this short assay, the sequential silencing of TSG will not be in the"signature". Once again, the "omics" driven "new" concepts and applications are just tools in the war for monies. And no sign of a "scientific revolution"Michael Lerman, M.D., Ph.D.

The problem is not "omics". The problem is their simplistic use by investigators who forget that (a) it's a network, stupid, not pathways, and therefore those 40,000 measurements are not independent and (b) there are many biological variables other than those directly associated with survival encapsulated in those measurements that will make large gene signatures not robust. These failures were noted at least as far back as 2005 in a Bioinformatics paper by an Israeli group (L. Ein-Dor, et al. PMID:15308542). Â Â

I never liked this "signature" business and their association with disease states, especially as prognostic markers of cancer progression/metastasis and therefore prognosis. As correctly mentioned in this short assay, the sequential silencing of TSG will not be in the"signature". Once again, the "omics" driven "new" concepts and applications are just tools in the war for monies. And no sign of a "scientific revolution"Michael Lerman, M.D., Ph.D.

The problem is not "omics". The problem is their simplistic use by investigators who forget that (a) it's a network, stupid, not pathways, and therefore those 40,000 measurements are not independent and (b) there are many biological variables other than those directly associated with survival encapsulated in those measurements that will make large gene signatures not robust. These failures were noted at least as far back as 2005 in a Bioinformatics paper by an Israeli group (L. Ein-Dor, et al. PMID:15308542). Â Â

I'm not sure that this is as big a deal breaker as it appears on the surface.Â I suspect that what is going on is the following.Â Suppose that there was a part of biology that affected a very large number of genes in breast cancer patients, and which was also associated with survival, say for example proliferation. If you took a large random selection of genes that were not coordinately expressed, and formed a principal component it is likely that this would focus on the little bit that the genes have in common and result in a version of the proliferation signature, although most of the genes in the signature would have low correlation with thissignature summary.Â If you then tested this you would find that it is associated with survival. Â Â So the signatureâ€™s association with survival is real itâ€™s just not what you thought it was.Â The key is to not take the signature as given but to make sure that the genes in the signature are correlated with each other and so represent a specific biology rather than a random collection of genes, andthen to further check external validation to make sure that the biology fits thesignature identification

I'm not sure that this is as big a deal breaker as it appears on the surface.Â I suspect that what is going on is the following.Â Suppose that there was a part of biology that affected a very large number of genes in breast cancer patients, and which was also associated with survival, say for example proliferation. If you took a large random selection of genes that were not coordinately expressed, and formed a principal component it is likely that this would focus on the little bit that the genes have in common and result in a version of the proliferation signature, although most of the genes in the signature would have low correlation with thissignature summary.Â If you then tested this you would find that it is associated with survival. Â Â So the signatureâ€™s association with survival is real itâ€™s just not what you thought it was.Â The key is to not take the signature as given but to make sure that the genes in the signature are correlated with each other and so represent a specific biology rather than a random collection of genes, andthen to further check external validation to make sure that the biology fits thesignature identification

I'm not sure that this is as big a deal breaker as it appears on the surface.Â I suspect that what is going on is the following.Â Suppose that there was a part of biology that affected a very large number of genes in breast cancer patients, and which was also associated with survival, say for example proliferation. If you took a large random selection of genes that were not coordinately expressed, and formed a principal component it is likely that this would focus on the little bit that the genes have in common and result in a version of the proliferation signature, although most of the genes in the signature would have low correlation with thissignature summary.Â If you then tested this you would find that it is associated with survival. Â Â So the signatureâ€™s association with survival is real itâ€™s just not what you thought it was.Â The key is to not take the signature as given but to make sure that the genes in the signature are correlated with each other and so represent a specific biology rather than a random collection of genes, andthen to further check external validation to make sure that the biology fits thesignature identification

It is perhaps useful to insist we are not questioning theprognostic value of published signatures. On the contrary, Fig. 6or our article confirms that most of them are associated withoutcome and that these associations are reproducible acrosscohorts. Our paper investigate *why* signatures work.Â Thismatters little to patients, but it as consequences for basiccancer research.Â It's common place to suggest that because amarker for a given biological process is prognostic in cancer,then this process must be involved in breast cancer progression.Our study reduces this argument ad absurdum by showing thatrandom signatures are prognostic too (Fig. 1 in the paper).

Contrary to what lifebiomedguru says in a previous thread, theterm 'confounded signatures' is appropriate. There are thousandsgenes correlated with outcome, as shown by Ein-Dor et al.,PMID:15308542.Â We showed in our study that the overwhelmingmajority of these genes are correlated with a proliferationmetagene. In addition, the vast majority of the publishedsignatures loose their prognostic value if you adjust the datafor proliferation.Â Thus, 1- genes picked up at random are likelyto be prognostic. 2- if, for example, an hypoxia signature isprognostic, but loose its prognostic value after adjustment forproliferation, can you claim that its prognostic value supportsthe role of hypoxia in cancer progression? No, because you justcan't determine from such evidence wether hypoxia or any of themany processes statistically associated with proliferation in acomplex tissue cause the correlation with disease progression.

Note that our study addresses breast cancer in whichproliferation is prognostic, it does not necessarily apply to allcancers.

lifebiomedguru raises the issue of multiple testing.Â Our studyis not about multiple testing situations. We are estimating theprobability that a paper presenting a **single** signature findsa significant association with outcome. Consider for example thistypical situation: Dr. Smith derives a stem cell signature andfinds it to be associated with outcome. The question we asked is:what is the probability that Dr. Smith would find the same resultif he completely mess up his stem cell experiment and hissignature is in fact a random set of genes? I.e. we areinterested in the proportion of random genes sets that aresignificantly associated with outcome.Â It is estimated inFigs. 1 and 2 of our paper: if his signature is >100 genes, DrSmith as a 90% chance to find an significant association even ifhis signature is made of random genes. Therefore, Dr Smith cannotdraw any biological conclusion from the fact that his stem cellsignature is associated with outcome.Â Lifebiomedguru's commentis relevant to studies evaluating several markers. The q-valuecalculation for single gene markers in our paper addresses it:26% of the genes are associated with outcome at p<0.05 and 17% atmultiple testing-adjusted q<0.05. The percentages will only behigher for multi-genes markers (relevant to studies investigatinga collection of multi-genes signatures).

We agree with the thread of George Wight, except that having thegenes correlated with one another is not enough. Supplementaryinformations of the paper present PCA for individualsignatures. Several have PC1 explaining most of the variance, butit's highly correlated with proliferation.Â Thus one needs alsoto rule out the confounding effect of proliferation.Â This confounderhas been recognized by several authors, but the method they usedto rule it out does not work (Fig. 5 of our paper)

Finally, lifebiomedguru wrote "this article does a disservice byportraying those of us who have been involved in biomarkerdevelopment studies as semi-morons who are not capable or willingto use the scientific method to validate our findings." I thinkbeing wrong is a risk inherent to innovative science.Â But thecommunity has to recognize errors, it's no disservice to pointthem out.Â As noted by Robert Hurst in this discussion, the majorconclusions of our study are really logical consequences of thefindings of Ein-Dor et al. 2005, PMID:15308542, Wirapati etal. 2008, PMID:18662380, and others. A number of us have beenaware of the issue. But, judging from the biological conclusionsthat continue to be drawn from prognostic breast cancer markerson a monthly basis in top journals, many researchers have notmeasured the implications of these earlier papers.Â Someone hadto pin them down more explicitly. I commented about ourdifficulty to publish this study because I realized others had similar experiences.Â Read, for example, this story about the statisticians who scrutinized the Potti/Nevins Duke data,http://www.sciencemag.org/cont...

It is perhaps useful to insist we are not questioning theprognostic value of published signatures. On the contrary, Fig. 6or our article confirms that most of them are associated withoutcome and that these associations are reproducible acrosscohorts. Our paper investigate *why* signatures work.Â Thismatters little to patients, but it as consequences for basiccancer research.Â It's common place to suggest that because amarker for a given biological process is prognostic in cancer,then this process must be involved in breast cancer progression.Our study reduces this argument ad absurdum by showing thatrandom signatures are prognostic too (Fig. 1 in the paper).

Contrary to what lifebiomedguru says in a previous thread, theterm 'confounded signatures' is appropriate. There are thousandsgenes correlated with outcome, as shown by Ein-Dor et al.,PMID:15308542.Â We showed in our study that the overwhelmingmajority of these genes are correlated with a proliferationmetagene. In addition, the vast majority of the publishedsignatures loose their prognostic value if you adjust the datafor proliferation.Â Thus, 1- genes picked up at random are likelyto be prognostic. 2- if, for example, an hypoxia signature isprognostic, but loose its prognostic value after adjustment forproliferation, can you claim that its prognostic value supportsthe role of hypoxia in cancer progression? No, because you justcan't determine from such evidence wether hypoxia or any of themany processes statistically associated with proliferation in acomplex tissue cause the correlation with disease progression.

Note that our study addresses breast cancer in whichproliferation is prognostic, it does not necessarily apply to allcancers.

lifebiomedguru raises the issue of multiple testing.Â Our studyis not about multiple testing situations. We are estimating theprobability that a paper presenting a **single** signature findsa significant association with outcome. Consider for example thistypical situation: Dr. Smith derives a stem cell signature andfinds it to be associated with outcome. The question we asked is:what is the probability that Dr. Smith would find the same resultif he completely mess up his stem cell experiment and hissignature is in fact a random set of genes? I.e. we areinterested in the proportion of random genes sets that aresignificantly associated with outcome.Â It is estimated inFigs. 1 and 2 of our paper: if his signature is >100 genes, DrSmith as a 90% chance to find an significant association even ifhis signature is made of random genes. Therefore, Dr Smith cannotdraw any biological conclusion from the fact that his stem cellsignature is associated with outcome.Â Lifebiomedguru's commentis relevant to studies evaluating several markers. The q-valuecalculation for single gene markers in our paper addresses it:26% of the genes are associated with outcome at p<0.05 and 17% atmultiple testing-adjusted q<0.05. The percentages will only behigher for multi-genes markers (relevant to studies investigatinga collection of multi-genes signatures).

We agree with the thread of George Wight, except that having thegenes correlated with one another is not enough. Supplementaryinformations of the paper present PCA for individualsignatures. Several have PC1 explaining most of the variance, butit's highly correlated with proliferation.Â Thus one needs alsoto rule out the confounding effect of proliferation.Â This confounderhas been recognized by several authors, but the method they usedto rule it out does not work (Fig. 5 of our paper)

Finally, lifebiomedguru wrote "this article does a disservice byportraying those of us who have been involved in biomarkerdevelopment studies as semi-morons who are not capable or willingto use the scientific method to validate our findings." I thinkbeing wrong is a risk inherent to innovative science.Â But thecommunity has to recognize errors, it's no disservice to pointthem out.Â As noted by Robert Hurst in this discussion, the majorconclusions of our study are really logical consequences of thefindings of Ein-Dor et al. 2005, PMID:15308542, Wirapati etal. 2008, PMID:18662380, and others. A number of us have beenaware of the issue. But, judging from the biological conclusionsthat continue to be drawn from prognostic breast cancer markerson a monthly basis in top journals, many researchers have notmeasured the implications of these earlier papers.Â Someone hadto pin them down more explicitly. I commented about ourdifficulty to publish this study because I realized others had similar experiences.Â Read, for example, this story about the statisticians who scrutinized the Potti/Nevins Duke data,http://www.sciencemag.org/cont...

It is perhaps useful to insist we are not questioning theprognostic value of published signatures. On the contrary, Fig. 6or our article confirms that most of them are associated withoutcome and that these associations are reproducible acrosscohorts. Our paper investigate *why* signatures work.Â Thismatters little to patients, but it as consequences for basiccancer research.Â It's common place to suggest that because amarker for a given biological process is prognostic in cancer,then this process must be involved in breast cancer progression.Our study reduces this argument ad absurdum by showing thatrandom signatures are prognostic too (Fig. 1 in the paper).

Contrary to what lifebiomedguru says in a previous thread, theterm 'confounded signatures' is appropriate. There are thousandsgenes correlated with outcome, as shown by Ein-Dor et al.,PMID:15308542.Â We showed in our study that the overwhelmingmajority of these genes are correlated with a proliferationmetagene. In addition, the vast majority of the publishedsignatures loose their prognostic value if you adjust the datafor proliferation.Â Thus, 1- genes picked up at random are likelyto be prognostic. 2- if, for example, an hypoxia signature isprognostic, but loose its prognostic value after adjustment forproliferation, can you claim that its prognostic value supportsthe role of hypoxia in cancer progression? No, because you justcan't determine from such evidence wether hypoxia or any of themany processes statistically associated with proliferation in acomplex tissue cause the correlation with disease progression.

Note that our study addresses breast cancer in whichproliferation is prognostic, it does not necessarily apply to allcancers.

lifebiomedguru raises the issue of multiple testing.Â Our studyis not about multiple testing situations. We are estimating theprobability that a paper presenting a **single** signature findsa significant association with outcome. Consider for example thistypical situation: Dr. Smith derives a stem cell signature andfinds it to be associated with outcome. The question we asked is:what is the probability that Dr. Smith would find the same resultif he completely mess up his stem cell experiment and hissignature is in fact a random set of genes? I.e. we areinterested in the proportion of random genes sets that aresignificantly associated with outcome.Â It is estimated inFigs. 1 and 2 of our paper: if his signature is >100 genes, DrSmith as a 90% chance to find an significant association even ifhis signature is made of random genes. Therefore, Dr Smith cannotdraw any biological conclusion from the fact that his stem cellsignature is associated with outcome.Â Lifebiomedguru's commentis relevant to studies evaluating several markers. The q-valuecalculation for single gene markers in our paper addresses it:26% of the genes are associated with outcome at p<0.05 and 17% atmultiple testing-adjusted q<0.05. The percentages will only behigher for multi-genes markers (relevant to studies investigatinga collection of multi-genes signatures).

We agree with the thread of George Wight, except that having thegenes correlated with one another is not enough. Supplementaryinformations of the paper present PCA for individualsignatures. Several have PC1 explaining most of the variance, butit's highly correlated with proliferation.Â Thus one needs alsoto rule out the confounding effect of proliferation.Â This confounderhas been recognized by several authors, but the method they usedto rule it out does not work (Fig. 5 of our paper)

Finally, lifebiomedguru wrote "this article does a disservice byportraying those of us who have been involved in biomarkerdevelopment studies as semi-morons who are not capable or willingto use the scientific method to validate our findings." I thinkbeing wrong is a risk inherent to innovative science.Â But thecommunity has to recognize errors, it's no disservice to pointthem out.Â As noted by Robert Hurst in this discussion, the majorconclusions of our study are really logical consequences of thefindings of Ein-Dor et al. 2005, PMID:15308542, Wirapati etal. 2008, PMID:18662380, and others. A number of us have beenaware of the issue. But, judging from the biological conclusionsthat continue to be drawn from prognostic breast cancer markerson a monthly basis in top journals, many researchers have notmeasured the implications of these earlier papers.Â Someone hadto pin them down more explicitly. I commented about ourdifficulty to publish this study because I realized others had similar experiences.Â Read, for example, this story about the statisticians who scrutinized the Potti/Nevins Duke data,http://www.sciencemag.org/cont...

It is perhaps useful to insist we are not questioning theprognostic value of published signatures. On the contrary, Fig. 6or our article confirms that most of them are associated withoutcome and that these associations are reproducible acrosscohorts. Our paper investigate *why* signatures work. Â Thismatters little to patients, but it as consequences for basiccancer research. Â It's common place to suggest that because amarker for a given biological process is prognostic in cancer,then this process must be involved in breast cancer progression.Our study reduces this argument ad absurdum by showing thatrandom signatures are prognostic too (Fig. 1 in the paper).

Contrary to what lifebiomedguru says in a previous thread, theterm 'confounded signatures' is appropriate. There are thousandsgenes correlated with outcome, as shown by Ein-Dor et al.,PMID:15308542. Â We showed in our study that the overwhelmingmajority of these genes are correlated with a proliferationmetagene. In addition, the vast majority of the publishedsignatures loose their prognostic value if you adjust the datafor proliferation. Â Thus, 1- genes picked up at random are likelyto be prognostic. 2- if, for example, an hypoxia signature isprognostic, but loose its prognostic value after adjustment forproliferation, can you claim that its prognostic value supportsthe role of hypoxia in cancer progression? No, because you justcan't determine from such evidence wether hypoxia or any of themany processes statistically associated with proliferation in acomplex tissue cause the correlation with disease progression.

Note that our study addresses breast cancer in whichproliferation is prognostic, it does not necessarily apply to allcancers.

lifebiomedguru raises the issue of multiple testing. Â Our studyis not about multiple testing situations. We are estimating theprobability that a paper presenting a **single** signature findsa significant association with outcome. Consider for example thistypical situation: Dr. Smith derives a stem cell signature andfinds it to be associated with outcome. The question we asked is:what is the probability that Dr. Smith would find the same resultif he completely mess up his stem cell experiment and hissignature is in fact a random set of genes? I.e. we areinterested in the proportion of random genes sets that aresignificantly associated with outcome. Â It is estimated inFigs. 1 and 2 of our paper: if his signature is >100 genes, DrSmith as a 90% chance to find an significant association even ifhis signature is made of random genes. Therefore, Dr Smith cannotdraw any biological conclusion from the fact that his stem cellsignature is associated with outcome. Â Lifebiomedguru's commentis relevant to studies evaluating several markers. The q-valuecalculation for single gene markers in our paper addresses it:26% of the genes are associated with outcome at p<0.05 and 17% atmultiple testing-adjusted q<0.05. The percentages will only behigher for multi-genes markers (relevant to studies investigatinga collection of multi-genes signatures).

We agree with the thread of George Wight, except that having thegenes correlated with one another is not enough. Supplementaryinformations of the paper present PCA for individualsignatures. Several have PC1 explaining most of the variance, butit's highly correlated with proliferation. Â Thus one needs alsoto rule out the confounding effect of proliferation. Â This confounderhas been recognized by several authors, but the method they usedto rule it out does not work (Fig. 5 of our paper)

Finally, lifebiomedguru wrote "There may be a few good reasonswhy the author's article was rejected so many times. First, thisarticle does a disservice by portraying those of us who have beeninvolved in biomarker development studies as semi-morons who arenot capable or willing to use the scientific method to validateour findings." I think being wrong is a risk inherent toinnovative science. Â But the community has to recognize errors,it's no disservice to point them out. Â As noted by Robert Hurstin this discussion, the major conclusions of our study are reallylogical consequences of the findings of Ein-Dor et al. 2005,PMID:15308542, Wirapati et al. 2008, PMID:18662380, and others. Anumber of us have been aware of the issue. But, judging from thebiological conclusions that continue to be drawn from prognosticbreast cancer markers on a monthly basis in top journals, manyresearchers have not measured the implications of these earlierpapers. Â Someone had to pin them down more explicitly. Icommented about our difficulty to publish this study because Irealized others had similar experiences. Â Read, for example, thisstory about the statisticians who scrutinized the Potti/NevinsDuke data,http://www.sciencemag.org/cont...

It is perhaps useful to insist we are not questioning theprognostic value of published signatures. On the contrary, Fig. 6or our article confirms that most of them are associated withoutcome and that these associations are reproducible acrosscohorts. Our paper investigate *why* signatures work. Â Thismatters little to patients, but it as consequences for basiccancer research. Â It's common place to suggest that because amarker for a given biological process is prognostic in cancer,then this process must be involved in breast cancer progression.Our study reduces this argument ad absurdum by showing thatrandom signatures are prognostic too (Fig. 1 in the paper).

Contrary to what lifebiomedguru says in a previous thread, theterm 'confounded signatures' is appropriate. There are thousandsgenes correlated with outcome, as shown by Ein-Dor et al.,PMID:15308542. Â We showed in our study that the overwhelmingmajority of these genes are correlated with a proliferationmetagene. In addition, the vast majority of the publishedsignatures loose their prognostic value if you adjust the datafor proliferation. Â Thus, 1- genes picked up at random are likelyto be prognostic. 2- if, for example, an hypoxia signature isprognostic, but loose its prognostic value after adjustment forproliferation, can you claim that its prognostic value supportsthe role of hypoxia in cancer progression? No, because you justcan't determine from such evidence wether hypoxia or any of themany processes statistically associated with proliferation in acomplex tissue cause the correlation with disease progression.

Note that our study addresses breast cancer in whichproliferation is prognostic, it does not necessarily apply to allcancers.

lifebiomedguru raises the issue of multiple testing. Â Our studyis not about multiple testing situations. We are estimating theprobability that a paper presenting a **single** signature findsa significant association with outcome. Consider for example thistypical situation: Dr. Smith derives a stem cell signature andfinds it to be associated with outcome. The question we asked is:what is the probability that Dr. Smith would find the same resultif he completely mess up his stem cell experiment and hissignature is in fact a random set of genes? I.e. we areinterested in the proportion of random genes sets that aresignificantly associated with outcome. Â It is estimated inFigs. 1 and 2 of our paper: if his signature is >100 genes, DrSmith as a 90% chance to find an significant association even ifhis signature is made of random genes. Therefore, Dr Smith cannotdraw any biological conclusion from the fact that his stem cellsignature is associated with outcome. Â Lifebiomedguru's commentis relevant to studies evaluating several markers. The q-valuecalculation for single gene markers in our paper addresses it:26% of the genes are associated with outcome at p<0.05 and 17% atmultiple testing-adjusted q<0.05. The percentages will only behigher for multi-genes markers (relevant to studies investigatinga collection of multi-genes signatures).

We agree with the thread of George Wight, except that having thegenes correlated with one another is not enough. Supplementaryinformations of the paper present PCA for individualsignatures. Several have PC1 explaining most of the variance, butit's highly correlated with proliferation. Â Thus one needs alsoto rule out the confounding effect of proliferation. Â This confounderhas been recognized by several authors, but the method they usedto rule it out does not work (Fig. 5 of our paper)

Finally, lifebiomedguru wrote "There may be a few good reasonswhy the author's article was rejected so many times. First, thisarticle does a disservice by portraying those of us who have beeninvolved in biomarker development studies as semi-morons who arenot capable or willing to use the scientific method to validateour findings." I think being wrong is a risk inherent toinnovative science. Â But the community has to recognize errors,it's no disservice to point them out. Â As noted by Robert Hurstin this discussion, the major conclusions of our study are reallylogical consequences of the findings of Ein-Dor et al. 2005,PMID:15308542, Wirapati et al. 2008, PMID:18662380, and others. Anumber of us have been aware of the issue. But, judging from thebiological conclusions that continue to be drawn from prognosticbreast cancer markers on a monthly basis in top journals, manyresearchers have not measured the implications of these earlierpapers. Â Someone had to pin them down more explicitly. Icommented about our difficulty to publish this study because Irealized others had similar experiences. Â Read, for example, thisstory about the statisticians who scrutinized the Potti/NevinsDuke data,http://www.sciencemag.org/cont...

It is perhaps useful to insist we are not questioning theprognostic value of published signatures. On the contrary, Fig. 6or our article confirms that most of them are associated withoutcome and that these associations are reproducible acrosscohorts. Our paper investigate *why* signatures work. Â Thismatters little to patients, but it as consequences for basiccancer research. Â It's common place to suggest that because amarker for a given biological process is prognostic in cancer,then this process must be involved in breast cancer progression.Our study reduces this argument ad absurdum by showing thatrandom signatures are prognostic too (Fig. 1 in the paper).

Contrary to what lifebiomedguru says in a previous thread, theterm 'confounded signatures' is appropriate. There are thousandsgenes correlated with outcome, as shown by Ein-Dor et al.,PMID:15308542. Â We showed in our study that the overwhelmingmajority of these genes are correlated with a proliferationmetagene. In addition, the vast majority of the publishedsignatures loose their prognostic value if you adjust the datafor proliferation. Â Thus, 1- genes picked up at random are likelyto be prognostic. 2- if, for example, an hypoxia signature isprognostic, but loose its prognostic value after adjustment forproliferation, can you claim that its prognostic value supportsthe role of hypoxia in cancer progression? No, because you justcan't determine from such evidence wether hypoxia or any of themany processes statistically associated with proliferation in acomplex tissue cause the correlation with disease progression.

Note that our study addresses breast cancer in whichproliferation is prognostic, it does not necessarily apply to allcancers.

lifebiomedguru raises the issue of multiple testing. Â Our studyis not about multiple testing situations. We are estimating theprobability that a paper presenting a **single** signature findsa significant association with outcome. Consider for example thistypical situation: Dr. Smith derives a stem cell signature andfinds it to be associated with outcome. The question we asked is:what is the probability that Dr. Smith would find the same resultif he completely mess up his stem cell experiment and hissignature is in fact a random set of genes? I.e. we areinterested in the proportion of random genes sets that aresignificantly associated with outcome. Â It is estimated inFigs. 1 and 2 of our paper: if his signature is >100 genes, DrSmith as a 90% chance to find an significant association even ifhis signature is made of random genes. Therefore, Dr Smith cannotdraw any biological conclusion from the fact that his stem cellsignature is associated with outcome. Â Lifebiomedguru's commentis relevant to studies evaluating several markers. The q-valuecalculation for single gene markers in our paper addresses it:26% of the genes are associated with outcome at p<0.05 and 17% atmultiple testing-adjusted q<0.05. The percentages will only behigher for multi-genes markers (relevant to studies investigatinga collection of multi-genes signatures).

We agree with the thread of George Wight, except that having thegenes correlated with one another is not enough. Supplementaryinformations of the paper present PCA for individualsignatures. Several have PC1 explaining most of the variance, butit's highly correlated with proliferation. Â Thus one needs alsoto rule out the confounding effect of proliferation. Â This confounderhas been recognized by several authors, but the method they usedto rule it out does not work (Fig. 5 of our paper)

Finally, lifebiomedguru wrote "There may be a few good reasonswhy the author's article was rejected so many times. First, thisarticle does a disservice by portraying those of us who have beeninvolved in biomarker development studies as semi-morons who arenot capable or willing to use the scientific method to validateour findings." I think being wrong is a risk inherent toinnovative science. Â But the community has to recognize errors,it's no disservice to point them out. Â As noted by Robert Hurstin this discussion, the major conclusions of our study are reallylogical consequences of the findings of Ein-Dor et al. 2005,PMID:15308542, Wirapati et al. 2008, PMID:18662380, and others. Anumber of us have been aware of the issue. But, judging from thebiological conclusions that continue to be drawn from prognosticbreast cancer markers on a monthly basis in top journals, manyresearchers have not measured the implications of these earlierpapers. Â Someone had to pin them down more explicitly. Icommented about our difficulty to publish this study because Irealized others had similar experiences. Â Read, for example, thisstory about the statisticians who scrutinized the Potti/NevinsDuke data,http://www.sciencemag.org/cont...