Advertisement
Research Article

Selection upon Genome Architecture: Conservation of Functional Neighborhoods with Changing Genes

  • Fátima Al-Shahrour,

    Affiliation: Department of Bioinformatics and Genomics, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain

    Current address: Broad Institute, Cambridge, Massachusetts, United States of America

    X
  • Pablo Minguez,

    Affiliation: Department of Bioinformatics and Genomics, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain

    Current address: Structural and Computational Biology Unit, EMBL Heidelberg, Heidelberg, Germany

    X
  • Tomás Marqués-Bonet,

    Affiliations: Institut de Biologia Evolutiva, Universitat Pompeu Fabra (UPF) and Consejo Superior de Investigaciones Científicas (CSIC), Barcelona, Spain, Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America, Howard Hughes Medical Institute, University of Washington, Seattle, Washington, United States of America

    X
  • Elodie Gazave,

    Affiliation: Institut de Biologia Evolutiva, Universitat Pompeu Fabra (UPF) and Consejo Superior de Investigaciones Científicas (CSIC), Barcelona, Spain

    X
  • Arcadi Navarro,

    Affiliations: Institut de Biologia Evolutiva, Universitat Pompeu Fabra (UPF) and Consejo Superior de Investigaciones Científicas (CSIC), Barcelona, Spain, Population Genomics Node (National Institute for Bioinformatics, INB), Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain

    X
  • Joaquín Dopazo mail

    jdopazo@cipf.es

    Affiliations: Department of Bioinformatics and Genomics, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain, CIBER de Enfermedades Raras (CIBERER), Valencia, Spain, Functional Genomics Node (National Institute for Bioinformatics, INB), CIPF, Valencia, Spain

    X
  • Published: October 07, 2010
  • DOI: 10.1371/journal.pcbi.1000953

Reader Comments (1)

Post a new comment on this article

functional neighbourhoods identified in Table S3

Posted by seb951 on 04 Jul 2012 at 16:52 GMT

I was quite surprised by the results of table S3, which indicates all the functional neighbourhoods (FN) identified in the present study.

First the authors claim that, on average, FN windows contain about 50 genes (table S2). But in table S3, almost none of the significant windows have 50 genes. In fact (for Arabidopsis at least), the average number of genes for the significant windows is 7, indicating a strong tendency for statistically significant tests to be biased towards small sample sizes.

Second, I thought it was surprising that windows with as little as 2 genes could show significant over-representation of GO categories. Take Arabidopsis for example. It has about 25000 genes in total. Imagine a FN with 2 genes (as there are many in table S3), each representing one GO term. As an extreme case, say those two GO terms are unique in the whole dataset.

Such that contingency table for GO term #1 would read:
1 1
2 25000

with FisherExactTest p-value = 0.00024. Then depending on how exactly correction for multiple hypothesis is done, this will remain significant. But is this biologically significant? Any random subset of a few genes will sometimes be significant in such a scheme. I don't think the Fisher Exact Test is therefore appropriate here.

Of course I agree that not all FN fit this example. So what about the ones that contain many genes which all have similar functions? There the FisherExactTest is much more appropriate. However, even though the paper claims that FN do not mainly result from tandem duplication, table S3 does not support this. Take Arabidopsis as an example again. Here, I gathered protein sequences from a FN (table S3). I then aligned these sequences using clustalW. I repeated this for about 10-15 randomly chosen FN. All FN contained many closely related paralogs. This left me puzzled about the true impact of tandem duplicated genes sharing GO terms through homology.

Given the ease at which FN are identified (both due to the nature of the FisherExactTest and the tandem duplication), it is then also not surprising that such general GO terms as “Response to biotic stimulus, Response to stress and Localization” are conserved across the phylogeny.

No competing interests declared.