Advertisement
Research Article

Codon Usage Domains over Bacterial Chromosomes

  • Marc Bailly-Bechet,

    Affiliation: CNRS URA 2171, Institute Pasteur, Unité Génétique in silico, Paris, France

    X
  • Antoine Danchin,

    Affiliation: CNRS URA 2171, Institute Pasteur, Unité Génétique des Génomes Bactériens, Paris, France

    X
  • Mudassar Iqbal,

    Affiliations: Abdus Salam International Center Theoretical Physics, Trieste, Italy, Computing Laboratory, University of Kent, Canterbury, Kent, United Kingdom

    X
  • Matteo Marsili,

    Affiliation: Abdus Salam International Center Theoretical Physics, Trieste, Italy

    X
  • Massimo Vergassola mail

    To whom correspondence should be addressed. E-mail: massimo@pasteur.fr

    Affiliation: CNRS URA 2171, Institute Pasteur, Unité Génétique in silico, Paris, France

    X
  • Published: April 21, 2006
  • DOI: 10.1371/journal.pcbi.0020037

Reader Comments (1)

Post a new comment on this article

Addendum to PLoS Computational Biology Vol. 2, No. 4, e37

Posted by PLoS_CompBiol on 20 Feb 2008 at 11:10 GMT

Originally posted as a Reader Response on 27th October, 2006

An issue unexplained in our article [1] is the quantitative difference between E. coli and B. subtilis, visible in Figure 6. While considering the probability that two genes belonging to the same codon usage cluster decay on distances longer than operons in both cases, the B. subtilis curve features much longer correlations. This observation and discussions with Dr. Morten Kloster (Princeton University) spurred us to reconsider the issue. This addendum describes this further analysis, which also allows us to point out an incorrect statement in our article [1]. The conclusion is that clusters of B. subtilis and E. coli not biased in GC content display the same behavior, with correlations of codons usage of the same order, roughly three times the length of the average operons.

Contrary to what was previously stated, the average GC content of the various clusters is not quite homogeneous. The correct values of the GC percentage are, respectively, 0.527, 0.443, 0.541, and 0.522 for clusters 1 to 4 of E. coli; and 0.439, 0.358, 0.450, 0.470, and 0.436 for clusters 1 to 5 of B. subtilis.

The demonstration given in our article [1] that clusters are biologically significant still holds and does not depend on the GC content. In particular, the third cluster of B. subtilis, which we showed to feature an over-representation of anabolic genes and lagging-strand transcriptional orientation, does not show any particular deviation from the average genomic GC content. The clusters most significantly deviating from the average are clusters 2 in E. coli and B. subtilis. The two clusters are enriched in AT and were shown to be enriched in horizontally transferred genes. Their higher AT content is in agreement with the observation that horizontally transferred genes tend to be AT rich [2].

The GC percentage resolves the aforementioned observation of the different correlation lengths in Figure 6 of our article [1] for E. coli and B. subtilis. To demonstrate this, we considered the same correlation functions plotted in Figure 6, but now did so for each individual cluster. Specifically, we measured the histogram of the distances among genes belonging to the same cluster. The resulting curves for the various clusters and further details can be found at http://www.pasteur.fr/rec...

The curves are noisier than those in Figure 6 [1]. This result is natural as each group contains fewer genes, which was our reason for grouping all the clusters together to produce Figure 6. Some statistically robust behaviors are still clearly discernible. In particular, the cluster of B. subtilis having the longest correlations is the fourth one. The correlation length of this cluster is dominant over all the others and it is comparable to the decay length observed in Figure 6 [1]. The fourth cluster is GC enriched with respect to B. subtilis genomic average, which suggests that the dominant contribution to its anomalous decay length is due to the correlations in the GC content of the B. subtilis genome. It is important to remark, however, that groups not biased in their (relative to the average) GC content also feature extended correlations, longer than what could be accounted by operons. Furthermore, the effects are now comparable in E. coli and B. subtilis. A contribution to those correlations might be driven by the advantage of recycling rare tRNAs to tame stalling in the translation process and to ensure a coordinated expression of a set of neighboring genes, as discussed in the conclusions of our paper [1]. The large number of tmRNAs typically present in the cell also highlights the importance of pauses in translation.

1. Bailly-Bechet M, et al. (2006) PLoS Comput Biol 2(4), e37.
2. Rocha EPC, Danchin A (2002) Trends in Genetics 18(6):291-294.

Submitted by: Marc Bailly-Bechet
E-mail: mbailly@pasteur.fr
Occupation: PhD student
Institut Pasteur, Paris
Additional authors: On behalf of all authors