Advertisement
Research Article

Most Random Gene Expression Signatures Are Significantly Associated with Breast Cancer Outcome

  • David Venet,

    Affiliation: IRIDIA-CoDE, Université Libre de Bruxelles (U.L.B.), Brussels, Belgium

    X
  • Jacques E. Dumont,

    Affiliation: IRIBHM, Université Libre de Bruxelles (U.L.B.), Campus Erasme, Brussels, Belgium

    X
  • Vincent Detours mail

    vdetours@ulb.ac.be

    Affiliations: IRIBHM, Université Libre de Bruxelles (U.L.B.), Campus Erasme, Brussels, Belgium, WELBIO, Université Libre de Bruxelles (U.L.B.), Campus Erasme, Brussels, Belgium

    X
  • Published: October 20, 2011
  • DOI: 10.1371/journal.pcbi.1002240

Reader Comments (7)

Post a new comment on this article

The number of individual survival associated genes in breast cancer

Posted by sampsa on 08 Mar 2012 at 13:05 GMT

We read the article by David Venet and colleagues with great interest, and applaud the authors for excellent work. We were, however, a bit surprised to read that as large number as 26% of genes were reported as survival
associated (p<0.05, non-corrected) in breast cancer ("Claims similar to those concerning signatures have been made, that single genes, important in a model system, are relevant for human cancer progression based on
differential expression between short- and long-survival groups. As 26% of the genes are related to survival at p<0.05 (17% at q<0.05), much tighter p-values than commonly used should be imposed to demonstrate such a
relation.").

There are two major issues with the evidence supporting the conclusion. First, the value of 26% (17% at q<0.05) is calculated from a single study only (NKI arrays, Van de Vijver et al. NEJM 2002, doi:10.1056/NEJMoa021967.). The cited article by Ein-Dor et al. (doi:10.1093/bioinformatics/bth469) uses the same NKI data and therefore does not provide independent support. For such a general statement, more data sets from different platforms and breast cancer cohorts would be needed. Second, the high number of survival associated genes may be due to the analysis framework used to calculate survival associations.

To check whether the result by Venet et al. holds in another breast cancer cohort analyzed with a different analysis approach, we analyzed the TCGA breast cancer cohort (524 patients, 59 controls, Agilent Whole Genome
G4502A) for survival association. We first identified differentially expressed genes (DEGs) between breast cancer samples and controls using the t-test (tumor versus normal breast tissue, q<0.001) and sample-wise
absolute fold-change threshold of four. Survival association of each DEG was then estimated with Kaplan-Meier and log-rank analysis.

With this approach we found that 1) 29% of the genes were differentially expressed (8236 out of 28,654) and 2) 3% of all genes in the array (789 out of 28,654) were survival associated at p<0.05 (uncorrected). This result indicates that the conclusion that 26% of the genes in (any) gene expression microarray has an independent survival association does not hold in general.

Riku Louhimo, Marko Laakso, Sampsa Hautaniemi

No competing interests declared.

RE: The number of individual survival associated genes in breast cancer

vdetours replied to sampsa on 08 Mar 2012 at 15:17 GMT

First I'd like to thank the authors of this post
for their interest in our work and for taking the time
to check our results.

Table 1 (line #3) in the paper does provide an independent assessment. The percentage of individual genes associated with outcome at q<0.05 is 8% in the Loi et al. cohort for overall survival and 5% for RFS. Thus, we are beyond 5% in both cases for uncorrected p-values---i.e. the confidence metrics actually relevant for a study that investigates a single genes. Not only entire analysis was rerun on Loi et al. JCO (2007), but early submissions of our work also included the data of Calza et al. Breast Cancer Res. (2006), we dropped it for simplicity, but the same conclusions hold also for this cohort.

We trust the estimate of Riku Louhimo, Marko Laakso and Sampsa Hautaniemi for the TCGA cohort. We expect the exact fraction of significant genes to vary from study to study, as well as the fraction of significant multigene markers. It all depends on the quality of the data, the size of the cohort and its demographics. For example, the prognostic transcriptional signal is much weaker in ER- tumors, hence having more of these tumors may lead to a lower fraction of significant genes. The cohort of Van de Vijver, NEJM 2002 beats all those we've ever analyzed in term of prognostic signal. Yet, it is the one that is typically used in the studies we reviewed...

In summary, it is questionable to draw biological conclusions from a single-gene marker associated with outcome a p<0.05 in the three cohorts we looked at, whereas p<0.05 for 3% of the genes in the TGCA cohort and perhaps other cohorts. The bottom line is that few, if any, of the pre- or post-genomic era studies that draw biological conclusion from single gene association with outcome actually check how likely it is that a random gene is similarly associated with outcome.

Vincent Detours

Competing interests declared: Author of the study.