Conceived and designed the experiments: F-JS GC-A. Performed the experiments: F-JS GC-A. Analyzed the data: F-JS GC-A. Wrote the paper: F-JS GC-A.
The authors have declared that no competing interests exist.
Transfer RNAs (tRNAs) are ancient molecules that are central to translation. Since they probably carry evolutionary signatures that were left behind when the living world diversified, we reconstructed phylogenies directly from the sequence and structure of tRNA using well-established phylogenetic methods. The trees placed tRNAs with long variable arms charging Sec, Tyr, Ser, and Leu consistently at the base of the rooted phylogenies, but failed to reveal groupings that would indicate clear evolutionary links to organismal origin or molecular functions. In order to uncover evolutionary patterns in the trees, we forced tRNAs into monophyletic groups using constraint analyses to generate timelines of organismal diversification and test competing evolutionary hypotheses. Remarkably, organismal timelines showed Archaea was the most ancestral superkingdom, followed by viruses, then superkingdoms Eukarya and Bacteria, in that order, supporting conclusions from recent phylogenomic studies of protein architecture. Strikingly, constraint analyses showed that the origin of viruses was not only ancient, but was linked to Archaea. Our findings have important implications. They support the notion that the archaeal lineage was very ancient, resulted in the first organismal divide, and predated diversification of tRNA function and specificity. Results are also consistent with the concept that viruses contributed to the development of the DNA replication machinery during the early diversification of the living world.
The origins of the three major cellular lineages of life—Archaea, Bacteria, and Eukarya—and of viruses have been shrouded in mystery. In this study, we focus on transfer RNA, an ancient nucleic acid molecule that takes center stage in the process of protein biosynthesis and can be found everywhere in life. In a process that reconstructs history from molecular sequence and structure and at the same time forces molecules belonging to lineages into groups, we tested alternative hypotheses of origin and established when major organismal lineages appeared in evolution. Remarkably, timelines showed that Archaea was the most ancient lineage on earth and that viruses originated early in the archaeal lineage. Our findings unroot the universal tree of life, and, for the first time, provide evidence for an evolutionary origin of viruses.
Transfer RNA (tRNA) molecules are central to the entire translation process. They interact with the ribosomal RNA (rRNA) subunits as they are being ratcheted through the center of the ribosome
The hierarchical branching patterns of the universal tree of life portray the natural history of the living world. The current accepted universal tree proposes a tripartite world ruled by three superkingdoms, Archaea, Bacteria, and Eukarya
In order to explore if similar phylogenetic signatures were present in tRNA, we apply a well-established cladistic method
Phylogenetic analyses of the combined dataset of sequence and structure of 571 tRNAs produced most parsimonious trees that were 10,083 steps in length and were intrinsically rooted (
MP analyses of data from 571 tRNA molecules resulted in the preset limit of 20,000 minimal length trees, each of 10,083 steps. Consistency index (CI) = 0.069 and 0.069, with and without uninformative characters, respectively; Retention index (RI) = 0.681; Rescaled consistency index (RC) = 0.047; g1 = −0.107. Terminal leaves are not labeled since they would not be legible. Nodes labeled with closed circles have BS values >50%. tRNA molecules belonging to different superkingdoms and viruses and coding for Sec, Ser, Tyr, and Leu are labeled with colors. Note several of these tRNAs have short variable arms and are derived in the tree.
In order to uncover deep phylogenetic signals and test alternative evolutionary hypotheses we forced groups of tRNAs that shared a same organismal origin (molecules from each superkingdom of life or viruses) into monophyly using constraint analyses. We then recorded the length of the most parsimonious trees that were obtained and the number of additional steps (
Constraints related to the diversification of the organismal world (
Constraints representing non-competing hypotheses of organismal relationship (white circles) are used to define a timeline for the appearance of lineages in a universal tree derived from the sequence and structure of tRNA. Blue circles represent constraints representing competing hypotheses. They illustrate both the most parsimonious lineage relationship and their coalescence. Areas colored in light green, salmon, and light yellow are delimited by lineage coalescence and describe three evolutionary epochs. The timeline is given in a scale of additional steps (
tRNA class | Organismal constraints | Test | |
Unconstrained | ((A), B, E, V) | H | 132 |
((V), A, B, E) | H | 336 | |
((E), A, B, V) | H | 967 | |
((B), A, E, V) | H | 1039 | |
((A, B, E), V) | H | 339 | |
((B, E), A, V) | CH1 | 345 | |
((A, E), B, V) | CH1 | 971 | |
((A, B), E, V) | CH1 | 1038 | |
(((A)(E)), B, V) | CH2 | 966 | |
(((A)(B)), E, V) | CH2 | 1042 | |
(((B)(E)), A, V) | CH2 | 1164 | |
((A), (B), (E), V) | CH3 | 1171 | |
(((A)(B)(E)), V) | CH3 | 1171 | |
(((A)(B))(E), V) | CH3 | 1171 | |
(((A)(E))(B), V) | CH3 | 1178 | |
(((B)(E))(A), V) | CH3 | 1179 | |
((A), (B), (E), (V)) | H | 1190 | |
Constrained | ((Class I: A, B, E, V), (Class II: A, B, E, V)) | H | 232 |
((Class I: A, B, E, V), (Class II: (A), B, E, V)) | H | 136 | |
((Class I: A, B, E, V), (Class II: (V), A, B, E)) | H | 168 | |
((Class I: A, B, E, V), (Class II: (B), A, E, V)) | H | 318 | |
((Class I: A, B, E, V), (Class II: (E), A, B, V)) | H | 276 | |
((Class I: A, B, E, V), (Class II: (B, E), A, V)) | CH1 | 174 | |
((Class I: A, B, E, V), (Class II: (A, E), B, V)) | CH1 | 296 | |
((Class I: A, B, E, V), (Class II: (A, B), E, V)) | CH1 | 309 | |
((Class I: A, B, E, V), (Class II: ((A)(E)), B, V)) | CH2 | 294 | |
((Class I: A, B, E, V), (Class II: ((A)(B)), E, V)) | CH2 | 316 | |
((Class I: A, B, E, V), (Class II: ((B)(E)), A, V)) | CH2 | 325 | |
((Class I: (A), B, E, V), (Class II: A, B, E, V)) | H | 143 | |
((Class I: (V), A, B, E), (Class II: A, B, E, V)) | H | 291 | |
((Class I: (E), A, B, V), (Class II: A, B, E, V)) | H | 814 | |
((Class I: (B), A, E, V), (Class II: A, B, E, V)) | H | 852 | |
((Class I: (B, E), A, V), (Class II: A, B, E, V)) | CH1 | 297 | |
((Class I: (A, E), B, V), (Class II: A, B, E, V)) | CH1 | 825 | |
((Class I: (A, B), E, V), (Class II: A, B, E, V)) | CH1 | 851 | |
((Class I: ((A)(E)), B, V), (Class II: A, B, E, V)) | CH2 | 843 | |
((Class I: ((A)(B)), E, V), (Class II: A, B, E, V)) | CH2 | 870 | |
((Class I: ((B)(E)), A, V), (Class II: A, B, E, V)) | CH2 | 961 |
The numbers of additional steps (
We also explored the origins of viruses by constraining molecules from each individual superkingdom or viruses into monophyletic groups, together [e.g., (AV)] or separately [e.g., ((A)(V))] (
tRNA category | Organismal constraints | Test | |
Unconstrained | ((A, V), B, E) | CH1 | 342 |
((B, V), A, E) | CH1 | 979 | |
((E, V), A, B) | CH1 | 1034 | |
(((A)(V)), B, E) | CH2 | 333 | |
(((B)(V)), A, E) | CH2 | 1164 | |
(((E)(V)), A, B) | CH2 | 1162 | |
((A, VB), VE, B, E) | CH3 | 249 | |
((B, VB), VE, A, E) | CH3 | 959 | |
((E, VB), VE, A, B) | CH3 | 1015 | |
((A, VE), VB, B, E) | CH4 | 198 | |
((B, VE), VB, A, E) | CH4 | 926 | |
((E, VE), VB, A, B) | CH4 | 955 | |
(((A)(VB)), VE, B, E) | CH5 | 246 | |
(((B)(VB)), VE, A, E) | CH5 | 1100 | |
(((E)(VB)), VE, A, B) | CH5 | 1088 | |
(((A)(VE)), VB, B, E) | CH6 | 192 | |
(((B)(VE)), VB, A, E) | CH6 | 1078 | |
(((E)(VE)), VB, A, B) | CH6 | 1018 | |
Constrained | ((Class I: A, B, E, V), (Class II: (A, V), B, E)) | CH7 | 182 |
((Class I: A, B, E, V), (Class II: (B, V), A, E)) | CH7 | 289 | |
((Class I: A, B, E, V), (Class II: (E, V), A, B)) | CH7 | 312 | |
((Class I: A, B, E, V), (Class II: ((A)(V)), B, E)) | CH8 | 189 | |
((Class I: A, B, E, V), (Class II: ((B)(V)), A, E)) | CH8 | 324 | |
((Class I: A, B, E, V), (Class II: ((E)(V)), A, B)) | CH8 | 328 | |
((Class I: (A, V), B, E), (Class II: A, B, E, V)) | CH9 | 292 | |
((Class I: (B, V), A, E), (Class II: A, B, E, V)) | CH9 | 817 | |
((Class I: (E, V), A, B), (Class II: A, B, E, V)) | CH9 | 855 | |
((Class I: ((A)(V)), B, E), (Class II: A, B, E, V)) | CH10 | 301 | |
((Class I: ((B)(V)), A, E), (Class II: A, B, E, V)) | CH10 | 971 | |
((Class I: ((E)(V)), A, B), (Class II: A, B, E, V)) | CH10 | 961 |
The numbers of additional steps (
Finally, we constrained trees according to isoacceptor group and then according to organismal group, or vice versa, with or without constraining tRNA categories (
tRNA category | Constraints | |
Unconstrained | Superkingdom diversification prior to functional divergence: ((A: (Ala), (Arg), …, (Sec)), (B: (Ala), (Arg), …, (Sec)), (E: (Ala), (Arg), …, (Sec)), (V: (Ala), (Arg), …, (Sec))) | 2481 |
Functional divergence prior to superkingdom diversification: ((Ala: (A)(B)(E)(V)), (Arg: (A)(B)(E)(V)), …, (Val: (A)(B)(E)(V))) | 2534 | |
Constrained | Superkingdom diversification prior to functional divergence: ((Class II: (A: (Ser)(Sec)(Leu)(Tyr)), (B: (Ser)(Sec)(Leu)(Tyr)), (E: (Ser)(Sec)(Leu)(Tyr)), (V: (Ser)(Sec)(Leu)(Tyr))), (Class I: (A: (Ala), (Arg), …, (Sec)), (B: (Ala), (Arg), …, (Sec)), (E: (Ala), (Arg), …, (Sec)), (V: (Ala), (Arg), …, (Sec)))) | 2338 |
Functional divergence prior to superkingdom diversification: ((Class II: (Ser: (A)(B)(E)(V)), (Sec: (A)(B)(E)(V)), (Leu; (A)(B)(E)(V)), (Tyr: (A)(B)(E)(V))), (Class I: (Ala: (A)(B)(E)(V)), (Arg: (A)(B)(E)(V)), …, (Val: (A)(B)(E)(V)))) | 2415 |
The length of the most parsimonious trees derived from the combined data set was 10,083 steps. Each constrained group is given in parentheses. Both chloroplast and mitochondria tRNAs were included in Bacteria. A = Archaea, B = Bacteria, E = Eukarya, V = viruses. Amino acids are indicated by the International Union of Pure and Applied Chemistry (IUPAC) 3-letter nomenclature.
Since constraint analyses could be biased by unequal rates of evolution, we calculated average number of character change per branch in consensus trees generated from partitioned data matrices (
Assigned branch length | Minimum length | Maximum length | |
Archaea (59 leaves) | 5.25±6.44 (425) | 4.78±5.80 (387) | 5.74±6.57 (465) |
Bacteria (275 leaves) | 4.20±5.34 (1,776) | 3.97±5.20 (1,679) | 4.45±5.44 (1,881) |
Eukarya (220 leaves) | 5.19±6.16 (1,667) | 4.81±6.02 (1,544) | 5.60±6.34 (1,796) |
Viruses (17 leaves) | 5.85±9.87 (193) | 5.42±9.43 (179) | 6.27±9.89 (207) |
The average number of character changes per branch (±standard deviations) are listed for assigned, minimum, and maximum values. The total numbers of character changes in the trees are given in parentheses. ANOVA showed average branch lengths were not significantly different between different superkingdoms or viruses (assigned branch lengths, df: 3, 854; F = 2.
In order to uncover evolutionary patterns related to organismal diversification, we first generated rooted phylogenetic trees using information embedded in the structure and sequence of tRNA (
Two fundamental assumptions support the analysis. First, we assume tRNA structures acquired new identities and functions as the genetic code expanded, and that different structures were co-opted for the task in different lineages and different functional contexts. This assumption seems reasonable. Recruitment processes are common in evolution of macromolecules. In cellular metabolism, for example, enzymes are often recruited into different pathways to perform new enzymatic functions
We also assume phylogenies are free from systematic errors and the confounding effects of mutational saturation, long branch attraction artifacts, and unequal rates of evolution along branches of the trees
We constrained tRNA groups according to organismal origin using different schemes and tested possible competing and non-competing hypotheses describing timelines of organismal diversification and possible topologies of the universal tree of life (
The timeline of organismal diversification provides evidence that the lineage of Archaea segregated from an ancient community of ancestral organisms and established the first organismal divide. The scenario of organismal diversification described above is congruent with our recent phylogenomic analyses of protein structure
Our evolutionary timeline is also remarkable in that it identifies three epochs in the evolution of the organismal world that were analogous to those proposed earlier
The evolutionary patterns observed in timelines appeared consistently in the absence or presence of class I or class II tRNA structural constraints (
The organismal timeline inferred from tRNA sequence and structure showed Archaea was the most ancient superkingdom but established that viruses were also ancient. Viruses are relatively simple living entities and in many cases maintain a regular structure. They have long been considered fragments of cellular genomes and not living organisms and were generally excluded from consideration in evolutionary scenarios of the tripartite world, despite being important components of the biosphere. The importance of viruses and their potential roles in early cellular evolution were recently reevaluated
In order to establish if the origin of the viruses was linked to one or more of the three superkingdoms of life we constrained viral and individual superkingdom tRNAs into competing monophyletic relationships (
The origin of viruses is generally complex and may involve more than one mechanism
We end by noting that due to the small number of viral sequences sampled in our study, the conclusions drawn here should be taken with caution. However, a separate undergoing study analyzing a comprehensive dataset of tRNA sequences and structures but lacking information on base modifications support the evolutionary patterns presented in this study (Ospina, Sun, and Caetano-Anollés, unpublished).
Part 2 (compilation of tRNA sequences) of the Bayreuth tRNA Database (
We treated structural features in molecules as phylogenetic multi-state characters with character states transforming according to linearly ordered and reversible pathways. Character state transformations were polarized by assuming an evolutionary tendency towards molecular order. Characters were analyzed using maximum parsimony (MP), a popular phylogenetic optimization method that searches for solutions that require the least amount of change. It is appropriate to treat geometrical features as linearly ordered characters because RNA structures change in discrete manner by addition or removal of nucleotide units. This causes gradual extension or contraction of geometrical features. Although insertion and deletion are also possible, they are more costly. The validity of character argumentation has been discussed in detail elsewhere
All data matrices were analyzed using equally weighted MP as the optimality criterion in PAUP* v. 4.0
Constraint analysis restricts the search of optimal trees to pre-specified tree topologies defining specific monophyletic groups, and was used here to test alternative or compare non-mutually exclusive hypotheses. The number of additional steps (
Taxonomic distributions of the 571 tRNA molecules examined in the phylogenetic study.
(0.04 MB DOC)
Structural characters and their statistics (range and mean ± standard deviation) used in the phylogenetic analyses of 571 tRNA molecules.
(0.10 MB DOC)
We thank Hee Shin Kim, Ajith Harish, Minglei Wang, and Jay E. Mittenthal for helpful discussions. Any opinions, findings, and conclusions and recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.