Selection upon Genome Architecture: Conservation of Functional Neighborhoods with Changing Genes

Fátima Al-Shahrour; Pablo Minguez; Tomás Marqués-Bonet; Elodie Gazave; Arcadi Navarro; Joaquín Dopazo

doi:10.1371/journal.pcbi.1000953

Abstract

An increasing number of evidences show that genes are not distributed randomly across eukaryotic chromosomes, but rather in functional neighborhoods. Nevertheless, the driving force that originated and maintains such neighborhoods is still a matter of controversy. We present the first detailed multispecies cartography of genome regions enriched in genes with related functions and study the evolutionary implications of such clustering. Our results indicate that the chromosomes of higher eukaryotic genomes contain up to 12% of genes arranged in functional neighborhoods, with a high level of gene co-expression, which are consistently distributed in phylogenies. Unexpectedly, neighborhoods with homologous functions are formed by different (non-orthologous) genes in different species. Actually, instead of being conserved, functional neighborhoods present a higher degree of synteny breaks than the genome average. This scenario is compatible with the existence of selective pressures optimizing the coordinated transcription of blocks of functionally related genes. If these neighborhoods were broken by chromosomal rearrangements, selection would favor further rearrangements reconstructing other neighborhoods of similar function. The picture arising from this study is a dynamic genomic landscape with a high level of functional organization.

Author Summary

We describe here the most extensive functional cartography of the genomes of multiple species carried out to date. Our study shows, for the first time, how neighborhoods of functionally related genes arise and how they are maintained through evolution following a pattern that is fully consistent with the evolutionary trees of the analyzed species. Contrary to what would be expected, such neighborhoods are not composed of the same genes in different species but rather by genes unrelated, annotated, however, with the same function. Our analysis also reveals that such neighborhoods are dynamically rebuilt in a way that, while the particular genes often change, it is the function of the genes present in the neighborhood, as the ultimate target of selection, that is preserved.

Citation: Al-Shahrour F, Minguez P, Marqués-Bonet T, Gazave E, Navarro A, Dopazo J (2010) Selection upon Genome Architecture: Conservation of Functional Neighborhoods with Changing Genes. PLoS Comput Biol 6(10): e1000953. https://doi.org/10.1371/journal.pcbi.1000953

Editor: Jonathan A. Eisen, University of California Davis, United States of America

Received: December 6, 2009; Accepted: September 8, 2010; Published: October 7, 2010

Copyright: © 2010 Al-Shahrour et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work is supported by grants BIO2008-04212, BFU2009-13409-C02-02, and BFU2006-15413-C02-01 from the Spanish Ministry of Science and Innovation, and PROMETEO/2010/001 from the GVA-FEDER. The National Institute of Bioinformatics (www.inab.org) and The CIBER de Enfermedades Raras are initiatives of the ISCIII. This work is also partially supported by a grant (RD06/0020/1019) from Red Tematica de Investigacion Cooperativa en Cancer (RTICC), Instituto de Salud Carlos III (ISCIII), and the Spanish Ministry of Science and Innovation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Gene activity, in terms of both intensity [1] and coexpression [2]–[5], does not occur randomly across eukaryotic chromosomes, but in many cases it clusters in certain genomic regions. Nevertheless, the driving force that originated and maintains co-expression neighborhoods is still a matter of controversy [3], [4], [6]–[10]. Several hypotheses have been put forward in order to explain the co-expression of neighboring genes which include the selection for co-regulation of genes with similar functional roles [9], [11], the reduction of gene expression noise in co-localized (but not necessarily functionally related) genes [6], [12] or the formation of clusters of paralogous genes with related functions and expression patterns by tandem duplication [2], [4], [13]. Co-regulation seems to be behind a significant part of the observed coexpression [14],[15] and other features, such as protein interactions seem also be correlated to coexpression [16]–[18]. The emerging portrait from different studies suggests that coexpression in clusters of genes might have both a functional and a neutral (non-functional) component [19].

In order to understand the real extent of this phenomenon we have produced a detailed functional cartography of the genomes of eight eukaryotic model species: Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, Gallus gallus, Danio rerio, Drosophila melanogaster, Caenorhabditis elegans, and Arabidopsis thaliana. A sliding window (see Materials and Methods section) was moved along all chromosomes and the enrichment in Gene Ontology [20] (GO) functional terms within each window was analyzed [21].

Results

Functional neighborhoods in eukaryotic genomes

One of the most remarkable results of our analysis is the rich functional landscape that it unveils. When the distribution of the functional annotations of the genes is analyzed by a sliding window (see Materials and Methods) it becomes apparent that genomes are formed by a large amount of functional neighborhoods. These range in the well annotated species from a 3% (Arabidopsis thaliana) to a 12% (Mus musculus) of the genes (Table 1). For example, in Homo sapiens, chromosomes 11 and 19 show a high number of genes in functional clusters (17.6% and 25.6%, respectively), which agrees with previous observations about the special properties of these chromosomes [22]. In mammals, sex chromosomes present a significant deviation in percentages when compared to autosomes (e.g. below 2% in X chromosomes or an extreme value of 35% in the Y chromosome of Mus musculus). See also Figure S1 that depicts the distribution of functional neighborhoods across the chromosomes of the species studied and Figure S2 with more details on the functions found in the neighborhoods. Table S3 list the genes contained in the functional neighborhoods found. Differences between human and chimpanzee in the mean gene density and percentage of genes in functional neighborhoods seem to be greater than it might be expected from their phylogenetic proximity. However, the considerable amount of chromosomal rearrangements between the genomes of humans and chimpanzees, most of which happened in the chimpanzee lineage [23] and were caused by repetitive elements [24] and lineage-specific segmental duplications [25], can provide an explanation for the observed differences. These differences strongly suggest the existence of selective pressures acting differentially on the respective functional neighborhoods. Our results are also in agreement with indirect evidences from inbred strains of mice based on linkage disequilibrium, which indicate that a quarter or more of the mammalian genome could consist of chromosome regions containing clusters of functionally related genes [26].

Download:

Table 1. Characteristics of functional neighborhoods.

https://doi.org/10.1371/journal.pcbi.1000953.t001

Functional neighborhoods are conserved across the phylogeny

Some of the functional categories found in functional neighborhoods are unique, and, as previously suggested [9], probably account for lineage or species-specific characteristics. Nevertheless, many GO term clusters were consistently shared by different species. When the GO terms found within the functional clusters are mapped over the eukaryotic phylogeny the distribution across species of the vast majority of them is fully compatible with the tree topology. Table 2 shows how different functional modules are distributed across species, arranged according their relative positions in the phylogeny. Figure 1 shows the most parsimonious phylogenetic positions of the functions consistently found in neighborhoods. Most trends are clear, despite some discrepancies in G. gallus or D. rerio, probably due to the preliminary stage of the annotations in these organisms. Thus, for example, Response to biotic stimulus, Response to stress and Localization seem to define functional neighborhoods common to all the eukaryotes. Actually, clustering of stress-related genes was described to occur during evolution of the S. cerevisiae genome [27]. Other terms, such as Organismal physiological process, Regulation of physiological process, Regulation of cellular process and Sensory perception, are characteristic to all animals. In plants (at least in its unique representative, A. thaliana) we found different terms, such as Cell growth, Viral infectious cycle, Regulation of gene expression epigenetics, as apomorphisms. Shared by all vertebrates are GO terms such as Coagulation, Response to external stimulus, Response to abiotic stimulus, Cell adhesion, Organ development and Sex differentiation (with the exception of chicken, as already mentioned). Invertebrates share clusters with the GO term Embryonic development. Finally, mammals share functional neighborhoods with GO terms such as Reproductive physiological process, Physiological interaction between organism and Behaviour, most of them making reference to more complex, social or interactive behaviors displayed by these animals. Human and chimpanzee are the closest species and share almost all the GO terms in functional neighborhoods.

Download:

Figure 1. Distribution of functions present in functional neighborhoods along the phylogeny.

The point at which a function makes up a functional neighborhood has been deduced from the species sharing functional clusters with this particular GO term. Boxes in yellow contain GO terms unique to taxa, boxes in blue contain GO terms common to clades and boxes in pink contain GO terms lost in these lineages. In the figure, terms labeled with P were not found in ape, with G: were not found in chicken, with R were not found in rat and with F were not found in fish.

https://doi.org/10.1371/journal.pcbi.1000953.g001

Download:

Table 2. Number of genes in functional neighborhoods.

https://doi.org/10.1371/journal.pcbi.1000953.t002

Relationship between gene coexpression and common annotations in functional neighborhoods

Our results support a strong causal relationship between local coexpression and local co-functionality. For each GO term, the correlations among gene expression profiles of genes located within the functional neighborhood were compared to the corresponding correlations among the rest of genes belonging to the same GO term located outside the neighborhood (see Materials and Methods section). Table 1 shows that there is a significantly higher degree of coexpression in genes belonging to a given functional class when they are packed within a functional neighborhood than when they are elsewhere in the genome. This result, along with the lack of a significant relative enrichment of tandem duplications (see below), points out to coexpression as the most plausible driving force for the existence of functional neighborhoods.

Functional neighborhoods do not mainly result from duplication

If functional neighborhoods are originated as a simple result of tandem duplications of genes, different copies may or may not acquire different functions, but, in any case, they might share GO terms. A simple analysis of the number of paralogous contained in the regions shows that the percentages range from 14% (human) to 28% (rat), which corresponds almost exactly to the percentage of paralogous in the corresponding genomes. The number of paralogous among the GO genes is higher (around the 40%) which, again, corresponds to the percentage of paralogous within the GO categories. Given that functional annotations could be transferred by similarity [20], an artifactual accumulation of identical GO terms could be observed in this scenario. To discard this possibility we performed two different analyses. Firstly, we examined whether functional neighborhoods are enriched with segmental duplications. For every species for which appropriate information is available (see Materials and Methods section), a binomial test was used to determine if the number of segmental duplications within functional clusters is larger than what be expected according to their size. Our results allow rejecting the hypothesis that recent segmental duplications are contributing to the formation of functional clusters (Table 3: p-values always n.s.). Still, the possibility remains that the clusters we observe are the result of ancient duplication events that have diverged too much to be detected as such. In that scenario, different paralogous copies of a gene may still be similar at the protein level and form functionally related gene families. Our second analysis accounts for the effect of ancient duplications by examining the distributions of the average number of BLASTP [28] hits within the regions containing functional clusters and comparing them to the genome-wide background (that is the rest of equivalent chromosomal regions). When compared, both distributions of BLASTP hits were indistinguishable for all the studied organisms (See Figure S3 and Table S1). Thus, our general multispecies analysis demonstrates that the emergence of new genes by tandem duplications cannot be the general explanation behind the origin of functional neighborhoods.

Download:

Table 3. Segmental Duplication (SD) analysis.

https://doi.org/10.1371/journal.pcbi.1000953.t003

Functional neighborhoods shared by clades are not composed of ortholog genes

Surprisingly, the genes found in functional neighborhoods shared by different organisms are not necessarily orthologous (see Table 2). That is, when two species share functional neighborhoods, the genes forming these clusters may be different in each species. One might expect that if such functional neighborhoods emerged in a particular period of the evolution and apparently were maintained since then (given that they are shared by all the descendant species), these clusters were essentially composed by ortholog genes. Nevertheless, this is not the case.

Table 2 shows three columns for each organism. The first one is the number of genes annotated with the GO functional categories shared by the different relevant clades (mammalians, vertebrates and animals), the second one is the proportion of such genes that were found in functional clusters where this particular GO category was significantly over-represented and the third column is the proportion of such genes with a human ortholog. The two most important observations that can be made from Table 2 are: i) the proportion of genes in functional neighborhoods in each functional category tend to be approximately constant across taxa (with a few exceptions). For example, a large proportion of genes belonging to sensory perception cluster in neighborhoods (over 20%, except in the case of Drosophila) across the species in Table 2 while behavior genes keep their proportions approximately between 10 and 20%. The results found in Gallus gallus and Danio rerio are less conclusive probably because of the preliminary of the functional annotation. And ii) the genes found in the shared functional neighborhoods in different organisms do not have a relationship of orthology. That is, the proportion of ortholog genes with respect to their human counterparts is significantly lower than expected from an evolutionary event in which groups of functionally related genes gathered in the genome and were subsequently maintained along evolution. The presence in the functional clusters of mammals of a significantly high number of repetitive elements (SINE), which are known to be involved in rearrangement processes [29], [30], suggest that such regions may be undergoing a continuous process of rearrangement and selection is ultimately favoring the presence of genes belonging to the functional categories required by the organisms. In fact we observed a significant enrichment in SINEs in the functional regions of human (p<0.0001), mouse (p = 0.0057) and rat (p = 0.0002). From this point of view, a number of functional categories would require to have a minimum number of genes clustered together in the genome for optimal transcriptional activity, but not necessarily the same set of ortholog genes. Our findings actually suggest that it is the fraction of genes of a given function, and not the particular genes, which is relevant from the point of view of the transcriptional efficiency. This is in agreement with previous suggestions of other authors about the existence of a functional component reflected in the physical proximity of the genes that would be favoring their simultaneous co-expression [3], [4], [7]. This observation is also compatible with a dynamic scenario in which function, understood as a system of genes at work, rather than a particular static set of orthologous genes, is the target of natural selection [31].

Functional neighborhoods shared between species are significantly enriched with breaks of synteny

The comparative study of synteny conservation can throw some light on a scenario in which phylogenetically consistent functional neighborhoods composed by non orthologous genes occur. Synteny data are available for a number of species, but since the highest quality information has been obtained for the human-chimpanzee synteny relations [23], [32], [33], we have focused in functional neighborhoods shared between these two species. Humans and chimpanzees are separated by 10 major chromosomal rearrangements [34] and many small ones that imply many breaks of synteny between the two species. We observed that, as an average, functional neighborhoods shared between these species are significantly enriched with such breaks of synteny (Table 4). This is another surprising result: not only functional clusters are not particularly conserved, but they seem to be highly reorganized. These clusters appear to be enriched with rearrangement breakpoints relative to the rest of the genome (for example, using the synteny information from the Newman et al. Dataset [23] this means ∼0.15 Breakpoints/Mb in neighborhoods vs. ∼0.09 Bkp/Mb in the rest of the genome, Chi-square test, p-value = 4.2×10⁻⁶, see Table 4). This renders further support to the idea that there are strong selective pressures that maintain a minim number of genes with certain functions within clusters and is consistent with the observation reported above of clusters shared between species that, in spite of having the same functions, do not share the same ortholog genes.

Download:

Table 4. Functional neighborhoods shared between humans and chimpanzees.

https://doi.org/10.1371/journal.pcbi.1000953.t004

Moreover, when functional neighborhoods are classified according to the percentage of orthologous genes they contain, highly orthologous neighborhoods present significantly less synteny breaks than low-orthology neighborhoods (∼0.1 vs. ∼0.2 Bkps/Mb, p-value = 0.000231 in the Newman's Dataset [23]) and a synteny conservation that is similar, or even stronger, than the genome average. The situation is the opposite in low orthology neighborhoods (see Table 5 and Table 3). In both cases functional neighborhoods present an internal level of coexpression that is significantly higher than the level observed in genes belonging to the same functional categories when dispersed across the genome (high orthology p-value = 1.197×10⁻²¹ low orthology p-value = 1.696×10⁻¹⁷; see also Table 1), defining in this way functional, coexpression neighborhoods. The fact that the degree of coexpression is lower in low orthology neighborhoods than in the case of high orthology ones would be compatible with a dynamic scenario of continuous reconstitution of low orthology domains where the expression process was not fully optimized yet. Without entering in the detail on where the conservation of the neighborhoods came from, the observation that genes with altered neighborhood are more likely to undergo expression divergence than genes with conserved neighborhood was already made [35]. This scenario has also some similitude to the one proposed by Poyatos and Hurst for yeast [16], in which selection for high levels of co-expression would correlate with high levels of recombination rates, which, in turn mean high levels of chromosomal rearrangement and increase the probability of breakage of the co-expressed cluster. It has also been observed that co-expression between adjacent genes is positively correlated with the probability that those genes would be apart in the genome of a different species [36]. The fact that, in our case, highly orthologous clusters present the highest co-expression levels and lowest rearrangement rates suggests, however, a different cycle: cluster would contain many rearrangement breakpoints because natural selection would favor the reconstruction of clusters via chromosomal reorganization. In addition, this is consistent with the fact that the rearrangement breakpoints tend to reduce and not to increase recombination while segregating in a population [37].

Download:

Table 5. Functional neighborhoods shared between humans and chimpanzees.

https://doi.org/10.1371/journal.pcbi.1000953.t005

Discussion

Results presented here demonstrate that a large fraction of the genome is arranged in neighborhoods of functionally related genes that are not the result of tandem duplications but of reorganization. Coexpression has systematically observed to occur within functionally related genes defining the functional neighborhoods. The fact that functions shared across species analyzed is compatible with the evolutionary pattern of speciation constitutes strong evidence in favor of the existence of a selective force that produced and maintained the observed functional neighborhoods, even if different sets of genes make them up in different species. Moreover, in an apparent paradox, functional neighborhoods, which in one hand are conserved across evolution, appear to be enriched with rearrangement breakpoints when compared to the rest of the genome. Both observations suggest that selection is operating at the level of functional neighborhoods, no matter their particular genic composition. In this scenario, when a functional neighborhood is broken by a chromosomal rearrangement, selection would favor new rearrangements that tended to reconstitute a neighborhood with the same function, although the gene composition may differ from the ancestral one.

Actually, the number of functional neighborhoods found constitutes, most probably, an underestimation of its real number because of two facts: i) the testing scheme used is conservative and ii) this study considers only neighborhoods collinear in the chromosomes but no spatial neighborhoods formed by the tridimensional conformation of the nucleus. The real spatial conformation of the nucleus is still unknown but new data are continuously arising [38], [39] and the relationship of physical proximity with gene expression [40] and their possible functional implications [41] are becoming increasingly clear. As new information is available this extreme will be studied in more detail.

Materials and Methods

Data

The genomes of Homo sapiens, Mus musculus, Pan troglodytes, Rattus norvegicus, Gallus gallus, Danio rerio, Drosophila melanogaster, Caenorhabditis elegans, were taken from Ensembl [42] and the genome of Arabidopsis thaliana was obtained from AtEnsembl, (release 29, http://atensembl.arabidopsis.info). All the microarrays were chosen to represent conditions as normal and as non-pathological as possible. The following datasets, taken from the ArrayExpress database (http://www.ebi.ac.uk/microarray-as/ae/), were used: Human: E-AFMX-5; Mouse: E-AFMX-4; Fish: E-TABM-33; Fly E-MEXP-127, E-MEXP-152, E-MEXP-202, E-MEXP-493, E-MEXP-88; Worm:E-SMDB-1398, E-SMDB-3540, E-SMDB-3539, E-SMDB-3592 and Plant: E-TABM-17. No comparable data were found for chicken and, thus, it was excluded from the analysis of expression data. More information on the data used and the results obtained can be found at: http://bioinfo.cipf.es/publications/additionaldata/functional_clusters.

Synteny analysis

The analysis of density of breakpoints in windows of functional enrichments was performed using the breaks of synteny between Humans and Chimpanzees from Newman et al [43] and Feuk et al [32]. For the later, only the set of rearrangements >25Kb was used. The first dataset (Newman's dataset), was built blasting Fosmid pair-end sequences into the human genome, so it does not depend on the quality of the chimpanzee assembly. The second dataset (Feuk's dataset), was constructed comparing the order of genes between assemblies of the two species, and thus, it is likely to be affected by the lower quality of the assembly of the chimpanzee genome available at the time of the publication of the paper by Feuk et al [32].

Sliding windows approach

All the chromosomes of the studied species were scanned by means of a sliding window. In order to be compliant with previous studies [1] window size was adjusted in each species to contain, on average, approximately 50 genes (see Table S2). The windows are moved along all chromosomes in steps of half a window. A conventional method of functional enrichment implemented in the FatiGO program [21], which is part of the Babelomics (http://www.babelomics.org) suite for functional analysis [44], [45], was used to study the significant over-representation of GO terms in each window. Briefly, the method builds a 2×2 contingency table for each functional term checked for each window and applies an exact Fisher's test. The p-values obtained for all the windows were FDR-corrected [46] taking into account all the tests conducted in all the organisms. Figure S4 shows a schema of the procedure followed to detect functional neighborhoods.

Testing for duplication events

Available data on segmental duplications were downloaded from the Eichler Lab databases (http://eichlerlab.gs.washington.edu/database.html). To avoid coordinate translation biases, only species for which the segmental duplications and the gene annotation assemblies were concordant were used. Thus, the segmental duplication analysis involves only human, chimpanzee, chicken and mouse. The proportion of segmental duplications contained in windows containing functional neighborhoods was compared to the proportion of segmental duplications in windows without clusters (i.e. in the rest of the genome) after removing from analysis ambiguously located segmental duplications. A binomial test was used to determine whether the number of segmental duplications inside windows with functional clusters was larger than expected under the null hypothesis of random distribution of segmental duplications with respect to functional neighborhoods.

A further way to infer the number of recent and ancient duplication events in a window is using the number of BLASTP [28] hits that any of the genes contained in it produces when searched against a all the genes within the region. A region including only single-copy genes not belonging to a gene family and no ancient or new tandem duplications will theoretically produce only one hit per gene (the gene against itself). A region consisting of a group of genes amplified N times will produce N BLASTP hits per gene. Situations in between these extremes will produce more than one BLASTP hit for some genes. For each window we constructed a BLASTP database with the corresponding proteins. Then, all the proteins corresponding to the genes in the window were blasted (using BLASTP) against the corresponding database, and the total number of BLASTP hits with a percentage of similarity over a threshold of T% was normalized and recorded. Values of 98% and 95% were used as thresholds. Again, for each organism, windows containing functional neighborhoods were compared to a background consisting of the rest of windows without significant functional neighborhoods inside.

Coexpression analysis

For each GO term the pairwise Pearson correlations among genes located within the functional neighborhood is compared to the corresponding correlations among the rest of genes not located in the neighborhood, by means of a Kolmogorov-Smirnov test.

Supporting Information

Figure S1.

Multi-species cartography of genomes enriched in genes with related functions. Functional neighborhoods are represented by arrows at their corresponding chromosomal coordinates. See text for the versions of the databases used for the coordinate mapping. The species analyzed appear in the pages below and are: a) Homo sapiens b) Pan troglodytes c) Mus musculus d) Rattus norvegicus e) Gallus gallus f) Danio rerio g) Drosophila melanogaster h) Caernohabditis elegans i) Arabidopsis Thaliana.

https://doi.org/10.1371/journal.pcbi.1000953.s001

(0.08 MB DOC)

Figure S2.

Distribution of significant GO biological processes terms present in functional neighborhoods in the different genomes analyzed.

https://doi.org/10.1371/journal.pcbi.1000953.s002

(0.11 MB DOC)

Figure S3.

Distribution of BLASTP hits with an identity over the 98% and 95% in the different genomes studied for the functional neighborhoods (red) and for the rest of the genome (black).

https://doi.org/10.1371/journal.pcbi.1000953.s003

(0.57 MB DOC)

Figure S4.

Schema of the procedure followed. See Materials and Methods section.

https://doi.org/10.1371/journal.pcbi.1000953.s004

(0.20 MB DOC)

Table S1.

Duplication events t-test comparing the distribution of BLASTP hits for functional neighborhoods versus BLASTP hits for the rest of the genome.

https://doi.org/10.1371/journal.pcbi.1000953.s005

(0.04 MB DOC)

Table S2.

Sliding window sizes used for scanning all the chromosomes of the studied species. Window size was adjusted in each species to contain, on average, approximately 50 genes.

https://doi.org/10.1371/journal.pcbi.1000953.s006

(0.04 MB DOC)

Table S3.

List of the genes contained in the functional neighborhoods found.

https://doi.org/10.1371/journal.pcbi.1000953.s007

(2.06 MB XLS)

Acknowledgments

The authors are greatly indebted to Dr. Pawel Michalak for his critical reading and comments to the manuscript.

Author Contributions

Conceived and designed the experiments: JD. Analyzed the data: FAS PM TMB EG AN. Wrote the paper: AN JD.

References

1. Caron H, van Schaik B, van der Mee M, Baas F, Riggins G, et al. (2001) The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science 291: 1289–1292.
- View Article
- Google Scholar
2. Hurst LD, Pal C, Lercher MJ (2004) The evolutionary dynamics of eukaryotic gene order. Nat Rev Genet 5: 299–310.
- View Article
- Google Scholar
3. Lercher MJ, Urrutia AO, Hurst LD (2002) Clustering of housekeeping genes provides a unified model of gene order in the human genome. Nat Genet 31: 180–183.
- View Article
- Google Scholar
4. Singer GA, Lloyd AT, Huminiecki LB, Wolfe KH (2005) Clusters of co-expressed genes in mammalian genomes are conserved by natural selection. Mol Biol Evol 22: 767–775.
- View Article
- Google Scholar
5. Fukuoka Y, Inaoka H, Kohane IS (2004) Inter-species differences of co-expression of neighboring genes in eukaryotic genomes. BMC Genomics 5: 4.
- View Article
- Google Scholar
6. Batada NN, Hurst LD (2007) Evolution of chromosome organization driven by selection for reduced gene expression noise. Nat Genet 39: 945–949.
- View Article
- Google Scholar
7. Hurst LD, Lercher MJ (2005) Unusual linkage patterns of ligands and their cognate receptors indicate a novel reason for non-random gene order in the human genome. BMC Evol Biol 5: 62.
- View Article
- Google Scholar
8. Lercher MJ, Blumenthal T, Hurst LD (2003) Coexpression of neighboring genes in Caenorhabditis elegans is mostly due to operons and duplicate genes. Genome Res 13: 238–243.
- View Article
- Google Scholar
9. Lee JM, Sonnhammer EL (2003) Genomic gene clustering analysis of pathways in eukaryotes. Genome Res 13: 875–882.
- View Article
- Google Scholar
10. Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S (2006) Stochastic mRNA synthesis in mammalian cells. PLoS Biol 4: e309.
- View Article
- Google Scholar
11. Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, et al. (2003) Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res 13: 2363–2371.
- View Article
- Google Scholar
12. Kovacs K, Hurst LD, Papp B (2009) Stochasticity in protein levels drives colinearity of gene order in metabolic operons of Escherichia coli. PLoS Biol 7: e1000115.
- View Article
- Google Scholar
13. Spellman PT, Rubin GM (2002) Evidence for large domains of similarly expressed genes in the Drosophila genome. J Biol 1: 5.
- View Article
- Google Scholar
14. Allocco DJ, Kohane IS, Butte AJ (2004) Quantifying the relationship between co-expression, co-regulation and gene function. BMC Bioinformatics 5: 18.
- View Article
- Google Scholar
15. Vogel JH, von Heydebreck A, Purmann A, Sperling S (2005) Chromosomal clustering of a human transcriptome reveals regulatory background. BMC Bioinformatics 6: 230.
- View Article
- Google Scholar
16. Poyatos JF, Hurst LD (2006) Is optimal gene order impossible? Trends Genet 22: 420–423.
- View Article
- Google Scholar
17. Ge H, Liu Z, Church GM, Vidal M (2001) Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat Genet 29: 482–486.
- View Article
- Google Scholar
18. Teichmann SA, Veitia RA (2004) Genes encoding subunits of stable complexes are clustered on the yeast chromosomes: an interpretation from a dosage balance perspective. Genetics 167: 2121–2125.
- View Article
- Google Scholar
19. Michalak P (2008) Coexpression, coregulation, and cofunctionality of neighboring genes in eukaryotic genomes. Genomics 91: 243–248.
- View Article
- Google Scholar
20. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29.
- View Article
- Google Scholar
21. Al-Shahrour F, Diaz-Uriarte R, Dopazo J (2004) FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics 20: 578–580.
- View Article
- Google Scholar
22. Castresana J, Guigo R, Alba MM (2004) Clustering of genes coding for DNA binding proteins in a region of atypical evolution of the human genome. J Mol Evol 59: 72–79.
- View Article
- Google Scholar
23. Newman TL, Tuzun E, Morrison VA, Hayden KE, Ventura M, et al. (2005) A genome-wide survey of structural variation between human and chimpanzee. Genome Res 15: 1344–1356.
- View Article
- Google Scholar
24. Lee J, Han K, Meyer TJ, Kim HS, Batzer MA (2008) Chromosomal inversions between human and chimpanzee lineages caused by retrotransposons. PLoS One 3: e4047.
- View Article
- Google Scholar
25. Marques-Bonet T, Kidd JM, Ventura M, Graves TA, Cheng Z, et al. (2009) A burst of segmental duplications in the genome of the African great ape ancestor. Nature 457: 877–881.
- View Article
- Google Scholar
26. Petkov PM, Graber JH, Churchill GA, DiPetrillo K, King BL, et al. (2005) Evidence of a large-scale functional organization of mammalian chromosomes. PLoS Genet 1: e33.
- View Article
- Google Scholar
27. Burhans DT, Ramachandran L, Wang J, Liang P, Patterton HG, et al. (2006) Non-random clustering of stress-related genes during evolution of the S. cerevisiae genome. BMC Evol Biol 6: 58.
- View Article
- Google Scholar
28. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402.
- View Article
- Google Scholar
29. Eichler EE, Sankoff D (2003) Structural dynamics of eukaryotic chromosome evolution. Science 301: 793–797.
- View Article
- Google Scholar
30. Zhao S, Shetty J, Hou L, Delcher A, Zhu B, et al. (2004) Human, mouse, and rat genome large-scale rearrangements: stability versus speciation. Genome Res 14: 1851–1860.
- View Article
- Google Scholar
31. Al-Shahrour F, Arbiza L, Dopazo H, Huerta-Cepas J, Minguez P, et al. (2007) From genes to functional classes in the study of biological systems. BMC Bioinformatics 8: 114.
- View Article
- Google Scholar
32. Feuk L, MacDonald JR, Tang T, Carson AR, Li M, et al. (2005) Discovery of human inversion polymorphisms by comparative analysis of human and chimpanzee DNA sequence assemblies. PLoS Genet 1: e56.
- View Article
- Google Scholar
33. Kehrer-Sawatzki H, Cooper DN (2007) Structural divergence between the human and chimpanzee genomes. Hum Genet 120: 759–778.
- View Article
- Google Scholar
34. Yunis JJ, Prakash O (1982) The origin of man: a chromosomal pictorial legacy. Science 215: 1525–1530.
- View Article
- Google Scholar
35. De S, Teichmann SA, Babu MM (2009) The impact of genomic neighborhood on the evolution of human and chimpanzee transcriptome. Genome Res 19: 785–794.
- View Article
- Google Scholar
36. Liao BY, Zhang J (2008) Coexpression of linked genes in Mammalian genomes is generally disadvantageous. Mol Biol Evol 25: 1555–1565.
- View Article
- Google Scholar
37. Navarro A, Betran E, Barbadilla A, Ruiz A (1997) Recombination and gene flux caused by gene conversion and crossing over in inversion heterokaryotypes. Genetics 146: 695–709.
- View Article
- Google Scholar
38. Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, et al. (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326: 289–293.
- View Article
- Google Scholar
39. Duan Z, Andronescu M, Schutz K, McIlwain S, Kim YJ, et al. A three-dimensional model of the yeast genome. Nature 465: 363–367.
- View Article
- Google Scholar
40. Fraser P, Bickmore W (2007) Nuclear organization of the genome and the potential for gene regulation. Nature 447: 413–417.
- View Article
- Google Scholar
41. De S, Babu MM (2010) Genomic neighbourhood and the regulation of gene expression. Curr Opin Cell Biol 22: 326–333.
- View Article
- Google Scholar
42. Hubbard TJ, Aken BL, Ayling S, Ballester B, Beal K, et al. (2009) Ensembl 2009. Nucleic Acids Res 37: D690–697.
- View Article
- Google Scholar
43. Newman JC, Weiner AM (2005) L2L: a simple tool for discovering the hidden significance in microarray expression data. Genome Biol 6: R81.
- View Article
- Google Scholar
44. Al-Shahrour F, Carbonell J, Minguez P, Goetz S, Conesa A, et al. (2008) Babelomics: advanced functional profiling of transcriptomics, proteomics and genomics experiments. Nucleic Acids Res 36: W341–346.
- View Article
- Google Scholar
45. Medina I, Carbonell J, Pulido L, Madeira SC, Goetz S, et al. (2010) Babelomics: an integrative platform for the analysis of transcriptomics, proteomics and genomic data with advanced functional profiling. Nucleic Acids Res 38: W210–213.
- View Article
- Google Scholar
46. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B 57: 289–300.
- View Article
- Google Scholar

[ref1] 1. Caron H, van Schaik B, van der Mee M, Baas F, Riggins G, et al. (2001) The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science 291: 1289–1292.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Hurst LD, Pal C, Lercher MJ (2004) The evolutionary dynamics of eukaryotic gene order. Nat Rev Genet 5: 299–310.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Lercher MJ, Urrutia AO, Hurst LD (2002) Clustering of housekeeping genes provides a unified model of gene order in the human genome. Nat Genet 31: 180–183.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Singer GA, Lloyd AT, Huminiecki LB, Wolfe KH (2005) Clusters of co-expressed genes in mammalian genomes are conserved by natural selection. Mol Biol Evol 22: 767–775.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Fukuoka Y, Inaoka H, Kohane IS (2004) Inter-species differences of co-expression of neighboring genes in eukaryotic genomes. BMC Genomics 5: 4.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Batada NN, Hurst LD (2007) Evolution of chromosome organization driven by selection for reduced gene expression noise. Nat Genet 39: 945–949.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Hurst LD, Lercher MJ (2005) Unusual linkage patterns of ligands and their cognate receptors indicate a novel reason for non-random gene order in the human genome. BMC Evol Biol 5: 62.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Lercher MJ, Blumenthal T, Hurst LD (2003) Coexpression of neighboring genes in Caenorhabditis elegans is mostly due to operons and duplicate genes. Genome Res 13: 238–243.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Lee JM, Sonnhammer EL (2003) Genomic gene clustering analysis of pathways in eukaryotes. Genome Res 13: 875–882.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S (2006) Stochastic mRNA synthesis in mammalian cells. PLoS Biol 4: e309.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, et al. (2003) Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res 13: 2363–2371.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Kovacs K, Hurst LD, Papp B (2009) Stochasticity in protein levels drives colinearity of gene order in metabolic operons of Escherichia coli. PLoS Biol 7: e1000115.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref13] 13. Spellman PT, Rubin GM (2002) Evidence for large domains of similarly expressed genes in the Drosophila genome. J Biol 1: 5.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref14] 14. Allocco DJ, Kohane IS, Butte AJ (2004) Quantifying the relationship between co-expression, co-regulation and gene function. BMC Bioinformatics 5: 18.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref15] 15. Vogel JH, von Heydebreck A, Purmann A, Sperling S (2005) Chromosomal clustering of a human transcriptome reveals regulatory background. BMC Bioinformatics 6: 230.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref16] 16. Poyatos JF, Hurst LD (2006) Is optimal gene order impossible? Trends Genet 22: 420–423.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref17] 17. Ge H, Liu Z, Church GM, Vidal M (2001) Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat Genet 29: 482–486.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref18] 18. Teichmann SA, Veitia RA (2004) Genes encoding subunits of stable complexes are clustered on the yeast chromosomes: an interpretation from a dosage balance perspective. Genetics 167: 2121–2125.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref19] 19. Michalak P (2008) Coexpression, coregulation, and cofunctionality of neighboring genes in eukaryotic genomes. Genomics 91: 243–248.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref20] 20. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref21] 21. Al-Shahrour F, Diaz-Uriarte R, Dopazo J (2004) FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics 20: 578–580.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref22] 22. Castresana J, Guigo R, Alba MM (2004) Clustering of genes coding for DNA binding proteins in a region of atypical evolution of the human genome. J Mol Evol 59: 72–79.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref23] 23. Newman TL, Tuzun E, Morrison VA, Hayden KE, Ventura M, et al. (2005) A genome-wide survey of structural variation between human and chimpanzee. Genome Res 15: 1344–1356.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref24] 24. Lee J, Han K, Meyer TJ, Kim HS, Batzer MA (2008) Chromosomal inversions between human and chimpanzee lineages caused by retrotransposons. PLoS One 3: e4047.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref25] 25. Marques-Bonet T, Kidd JM, Ventura M, Graves TA, Cheng Z, et al. (2009) A burst of segmental duplications in the genome of the African great ape ancestor. Nature 457: 877–881.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref26] 26. Petkov PM, Graber JH, Churchill GA, DiPetrillo K, King BL, et al. (2005) Evidence of a large-scale functional organization of mammalian chromosomes. PLoS Genet 1: e33.
View Article
Google Scholar

[77] View Article

[78] Google Scholar

[ref27] 27. Burhans DT, Ramachandran L, Wang J, Liang P, Patterton HG, et al. (2006) Non-random clustering of stress-related genes during evolution of the S. cerevisiae genome. BMC Evol Biol 6: 58.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref28] 28. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref29] 29. Eichler EE, Sankoff D (2003) Structural dynamics of eukaryotic chromosome evolution. Science 301: 793–797.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref30] 30. Zhao S, Shetty J, Hou L, Delcher A, Zhu B, et al. (2004) Human, mouse, and rat genome large-scale rearrangements: stability versus speciation. Genome Res 14: 1851–1860.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref31] 31. Al-Shahrour F, Arbiza L, Dopazo H, Huerta-Cepas J, Minguez P, et al. (2007) From genes to functional classes in the study of biological systems. BMC Bioinformatics 8: 114.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref32] 32. Feuk L, MacDonald JR, Tang T, Carson AR, Li M, et al. (2005) Discovery of human inversion polymorphisms by comparative analysis of human and chimpanzee DNA sequence assemblies. PLoS Genet 1: e56.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref33] 33. Kehrer-Sawatzki H, Cooper DN (2007) Structural divergence between the human and chimpanzee genomes. Hum Genet 120: 759–778.
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref34] 34. Yunis JJ, Prakash O (1982) The origin of man: a chromosomal pictorial legacy. Science 215: 1525–1530.
View Article
Google Scholar

[101] View Article

[102] Google Scholar

[ref35] 35. De S, Teichmann SA, Babu MM (2009) The impact of genomic neighborhood on the evolution of human and chimpanzee transcriptome. Genome Res 19: 785–794.
View Article
Google Scholar

[104] View Article

[105] Google Scholar

[ref36] 36. Liao BY, Zhang J (2008) Coexpression of linked genes in Mammalian genomes is generally disadvantageous. Mol Biol Evol 25: 1555–1565.
View Article
Google Scholar

[107] View Article

[108] Google Scholar

[ref37] 37. Navarro A, Betran E, Barbadilla A, Ruiz A (1997) Recombination and gene flux caused by gene conversion and crossing over in inversion heterokaryotypes. Genetics 146: 695–709.
View Article
Google Scholar

[110] View Article

[111] Google Scholar

[ref38] 38. Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, et al. (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326: 289–293.
View Article
Google Scholar

[113] View Article

[114] Google Scholar

[ref39] 39. Duan Z, Andronescu M, Schutz K, McIlwain S, Kim YJ, et al. A three-dimensional model of the yeast genome. Nature 465: 363–367.
View Article
Google Scholar

[116] View Article

[117] Google Scholar

[ref40] 40. Fraser P, Bickmore W (2007) Nuclear organization of the genome and the potential for gene regulation. Nature 447: 413–417.
View Article
Google Scholar

[119] View Article

[120] Google Scholar

[ref41] 41. De S, Babu MM (2010) Genomic neighbourhood and the regulation of gene expression. Curr Opin Cell Biol 22: 326–333.
View Article
Google Scholar

[122] View Article

[123] Google Scholar

[ref42] 42. Hubbard TJ, Aken BL, Ayling S, Ballester B, Beal K, et al. (2009) Ensembl 2009. Nucleic Acids Res 37: D690–697.
View Article
Google Scholar

[125] View Article

[126] Google Scholar

[ref43] 43. Newman JC, Weiner AM (2005) L2L: a simple tool for discovering the hidden significance in microarray expression data. Genome Biol 6: R81.
View Article
Google Scholar

[128] View Article

[129] Google Scholar

[ref44] 44. Al-Shahrour F, Carbonell J, Minguez P, Goetz S, Conesa A, et al. (2008) Babelomics: advanced functional profiling of transcriptomics, proteomics and genomics experiments. Nucleic Acids Res 36: W341–346.
View Article
Google Scholar

[131] View Article

[132] Google Scholar

[ref45] 45. Medina I, Carbonell J, Pulido L, Madeira SC, Goetz S, et al. (2010) Babelomics: an integrative platform for the analysis of transcriptomics, proteomics and genomic data with advanced functional profiling. Nucleic Acids Res 38: W210–213.
View Article
Google Scholar

[134] View Article

[135] Google Scholar

[ref46] 46. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B 57: 289–300.
View Article
Google Scholar

[137] View Article

[138] Google Scholar

Figures

Abstract

Author Summary

Introduction

Results

Functional neighborhoods in eukaryotic genomes

Functional neighborhoods are conserved across the phylogeny

Relationship between gene coexpression and common annotations in functional neighborhoods

Functional neighborhoods do not mainly result from duplication

Functional neighborhoods shared by clades are not composed of ortholog genes

Functional neighborhoods shared between species are significantly enriched with breaks of synteny

Discussion

Materials and Methods

Data

Synteny analysis

Sliding windows approach

Testing for duplication events

Coexpression analysis

Supporting Information

Acknowledgments

Author Contributions

References