plosPLoS Comput BiolploscompPLoS Computational Biology1553-734X1553-7358Public Library of ScienceSan Francisco, USAPCOMPBIOL-D-11-0166410.1371/journal.pcbi.1002495EducationBiologyComputational biologyEvolutionary modelingPopulation geneticsEvolutionary biologyEvolutionary systematicsPhylogeneticsGeneticsGenetics and GenomicsComputational BiologyEvolutionary BiologySimulation of Molecular Data under Diverse Evolutionary ScenariosArenasMiguel12*Computational and Molecular Population Genetics Lab (CMPG), Institute of Ecology and Evolution, University of Bern, Bern, SwitzerlandSwiss Institute of Bioinformatics, Lausanne, SwitzerlandLewitterFranEditorWhitehead Institute, United States of America* E-mail: miguel.arenasbusto@iee.unibe.ch
The author has declared that no competing interests exist.
52012315201285e10024952012Miguel ArenasThis is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.The author received no specific funding for this article.
This is an original PLoS Computational Biology tutorial.
Introduction
This study is intended for evolutionary biologists interested in strategies for the simulation of molecular data under diverse evolutionary scenarios. It begins with a brief background on simulation approaches and describes some of the most important simulators developed to date. Then, several practical examples for simulating particular scenarios are presented, and finally, details are given on a variety of relevant applications of simulated data. Overall, this study provides a practical guide for applying simulation techniques to real world problems in molecular evolution.
The Importance of Computer Simulations in Molecular Evolution
A commonly used methodology to mimic the processes that occur in the real world is to perform computer simulations [1]. Computer simulations allow us to understand which patterns may dramatically alter a particular system and can be used to study complex processes, including those that are analytically intractable. Furthermore, the simulation of multiple replicates with stochasticity may provide the variability required to study numerous processes, such as those often found in evolution. In molecular evolution, the simulation of genetic data has been commonly used for hypothesis testing (e.g., [2]), to compare and verify analytical methods or tools (e.g., [3]–[5]), to analyze interactions among evolutionary processes (e.g., [6]), and even to estimate evolutionary parameters (e.g., [7]). Consequently, a wide variety of tools have been developed to simulate sequence data under different substitution models of evolution, but also under different evolutionary processes such as selection, recombination, demographics, population structure, and migration. In recent years, new programs have been developed to handle very complex scenarios (e.g., [8], [9]) and efficient algorithms have been incorporated in order to accommodate large datasets in response to the increasing amount of genome-wide data (e.g., [10]). Thus, the importance of simulations continues to grow in order to deal with these new challenges.
Approaches for the Simulation of Molecular Data
After the simulation of evolutionary histories (see Box 1), or when just a rooted tree or network is given, a sequence assigned to the most recent common ancestor (MRCA, or grand MRCA [GMRCA] in the case of networks) can be evolved along branches according to a substitution model of evolution, in order to simulate sequences for all internal and terminal nodes (see an example in Figure 1). A common procedure consists of applying continuous-time Markov models defined by 4×4, 20×20, and 61×61 matrices of substitution rates for nucleotide, amino acid, and codon (note that stop codons are ignored) data, respectively (details in [11]). This methodology is very flexible and allows for heterogeneous evolution where different sites and branches can be evolved under different substitution models (e.g., [12]). These aspects suggest in practice two important considerations. Firstly, simulations of nucleotide sequences are much faster than simulations of coding or amino acid sequences due to the dimension of the substitution matrices. Secondly, a large number of branches (derived from a large number of taxa or recombination events) leads to slower simulations due to the need to re-calculate the matrix for each branch.
10.1371/journal.pcbi.1002495.g001
Example of nucleotide evolution on the ancestral recombination graph.
Note that this ARG contains a recombination event with breakpoint at position 6. Starting from a sequence assigned to the GMRCA, substitutions (marked with black circles) occur forward in time. Non-ancestral material (material that does not reach the sample) and its substitution events are shown in grey.
Box 1. Simulation of Evolutionary Histories
There are two main approaches commonly used to simulate evolutionary histories in population genetics: the forward in time (forward-time) and the coalescent (backward-time). Here I describe the main particularities of these approaches, considering goals and limitations for the simulation of diverse evolutionary scenarios.
The forward-time approach simulates the evolutionary history of an entire population from the past to the present and allows the success of a lineage to be a function of the genotype (see reviews, [13], [14], [80]). Thus, these simulations consider all ancestral information and therefore can be useful to fully study the subsequent evolutionary process of the population, including gene–gene interactions, mating systems, complex migration models (such as sex biased dispersal or long-distance dispersal), or complex selection (e.g., [42], [81], [82]); beginners may explore these basic concepts using educational simulations [83], [84]. Unfortunately, because the whole population history is simulated, forward simulations require generally extensive computational cost, although recently significant improvements have been achieved in this concern (e.g., [85]).
On the other hand, the coalescent approach describes a backwards in time genealogical process of a sample of genes to a single ancestral copy (see reviews [86], [87]). The coalescent allows the simulation of a limited set of scenarios, namely population size changes (e.g., [88]), population structure and migration (e.g., [89]), recombination (e.g., [90]), and selection (e.g., [91]). A key aspect of the coalescent is that the history of the whole population is not required (so it is not actually simulated) and, consequently, it is generally computationally faster than the forward-time approach. It is important to remember, however, that the efficiency of forward-time simulations is irrespective of the amount of recombination or selection, in contrast to coalescent simulations that are highly sensitive to such processes.
Coalescent and forward-time approaches can be considered complementary [13]. In fact, recently two new methods have incorporated both approaches for fast simulations of complex scenarios [9], [33]. In conclusion, one should keep in mind that the choice of the simulation approach may depend on the complexity of the target scenario, as well as on the required computational cost for the simulation.
Main Software Implementations
A number of programs have been developed to simulate nucleotide, codon, and amino acid sequences evolution. Although several studies have already reviewed these software tools (e.g., [13]–[17]), such revisions quickly become obsolete due to the emergence of new simulators, as noted in [14]. Table 1 shows an updated list of user-friendly and commonly used programs available to date. Next, the most interesting software from a practical perspective is briefly described.
10.1371/journal.pcbi.1002495.t001
The main software used to simulate genetic sequences under nucleotide, codon, and amino acid substitution models.
“Class” includes phylogenetic (where a genealogy is already given from the user), forward, birth-death, and coalescent approaches. “Process” shows the implemented evolutionary scenarios: “D”, “Pm”, “R”, and “S” indicate demographics, population structure and migration, recombination, and selection, respectively. “Substitution model” refers to substitution models based on nucleotide “Nt”, codon “Cod”, and amino acid “Aa” sequences; indeed, “Nt: All” indicates all nucleotide substitution models developed so far (JC, …, GTR) and “EM” indicates empirical model. “Rate variation” indicates whether different sites can be evolved under different rates (G: gamma distribution; I: proportion of invariable sites). “Indels” indicates the consideration of insertion and deletion events. “OS” shows the availability of executable files and/or source code “SC” for different operative systems (“All” means that Macintosh, Windows, and Linux executables are available), and “R” means the R language for statistical computing. “Ref” indicates the reference of publication. Although many more software packages exist, here I have selected, from my point of view, those programs most commonly used, most user-friendly, and which implement the most diverse range of evolutionary scenarios.
Intracodon recombination is also allowed in NetRecodon and GenomePop.
The ARG can be exported from NetRecodon and can be then visualized and analyzed using NetTest[78].
Under codon models, ω can change across codons.
Coding sequences are simulated by nucleotide substitution models, avoiding stop codons.
EvolSimulator simulates phylogenetic histories under the birth-death model of speciation and extinction [79].
Under codon models, ω can change across codons and branches.
When attempting to simulate a complex evolutionary scenario, several programs developed under the forward-time approach may be useful (see Table 1). GenomePop[18] and SFS_CODE[19] seem the most comprehensive tools with implementations of population structure, demographic particularities, recombination, and selection, but they do not allow simulations under amino acid substitution models. The programs SPLATCHE2[9] and AQUASPLATCHE[20] are able to simulate nucleotide data under spatially (using land or freshwater maps, respectively) and temporally explicit demographic models. A disadvantage of these programs is that only two DNA substitution models are available, note that other programs such as SFS_CODE or SimuPop[21] implement all DNA substitution models (see Table 1), which may be problematic when trying to mimic genome-wide data (see [22]).
If our target scenario can be represented by the coalescent, a variety of coalescent-based programs are able to simulate nucleotide data (see Table 1). Nevertheless, only CodonRecSim[23], Recodon[24], and NetRecodon[8] can simulate coding sequences in the presence of recombination. The first two of these programs force recombination breakpoints to occur between codons while NetRecodon does not (see [8]). On the other hand, fastsimcoal[10], Recodon, and NetRecodon allow simulations with sampling at different times, which can be very interesting for the joint analysis of ancient and modern DNA [25].
When a phylogenetic history (one or several trees) is given, numerous programs exist to directly simulate sequences along such history (see Table 1, phylogenetic class). One of the most applied programs is Seq-Gen[26], which implements several nucleotide and amino acid substitution models. The program indel-Seq-Gen 2.0[27] extended Seq-Gen to include diverse indel (insertion and deletion) models. Almost at the same time as Seq-Gen, the program EVOLVER (from the PAML package [28]) was released, which additionally allowed the simulation of coding data. Recently, INDELible[12] and PhyloSim[29] implemented all those capabilities, and in addition they included codon models where dN/dS (nonsynonymous/synonymous rate ratio) may vary across sites and/or branches. INDELible is very user-friendly but PhyloSim was implemented in R (language for statistical computing, [30]) and requires some programming knowledge.
Practical Examples
In this section I outline five hypothetical practical examples, of the fast simulation of genetic sequences under particular evolutionary scenarios, which will be of general interest. The reader may notice that some scenarios can be solved using more than one approach, but I base my suggestions here on how appropriate, flexible, and user-friendly I think the simulators are.
I) Nucleotide Data under Natural Selection
This scenario is commonly applied to identifying targets of positive selection in real datasets (e.g., [31], [32]). To my knowledge, there is no coalescent framework available to simulate data under natural selection whilst using Markov DNA substitution models, which may bring realistic information because not necessarily every mutation occurs at a different site in the sequence. However, two programs can be combined to quickly perform this task. First, we can simulate coalescent trees using the programs msms[33] or SelSim[34], although both tools are limited to simulation of a single locus under selection. Then, nucleotide sequences can be evolved along those trees using Seq-Gen. Another possibility is to apply a forward-time simulator that implements complex selection and all DNA substitution models (e.g., SFS_CODE).
II) Coding Data with Intracodon Recombination
Simulations with recombination breakpoints that occur within codons are more realistic since these particular events occur 2/3 of the time that a recombination happens, assuming a spatially uniform distribution. Therefore, these events might exert undue influence on other parameter estimates since current analytical phylogenetic methods using codon models and recombination assume intercodon recombination. However, such effects have not been observed; in particular, dN/dS estimations were not altered (see [8]), so this should be studied further. The fastest procedure for the simulation of intracodon recombination is to directly apply the program NetRecodon. Alternatively, GenomePop can also perform this simulation under the forward approach. This scenario was applied in [35].
III) Amino Acid Data with Indels and Under Recombination
This is a very specific scenario, but one that can also be very interesting for readers due to its complexity and the multiple possible options for its simulation. For instance, this scenario could be useful for testing phylogenetic tree reconstruction (or recombination detection) methods from amino acid datasets that evolved under recombination (e.g., [36]). As far as I know, there is no single tool available that can simulate this scenario. My suggestion is to first simulate coalescent trees (a tree for each recombinant fragment) by the program ms, and then amino acid sequences with indels can be evolved on the respective trees using INDELible.
IV) Long Genomic DNA Regions under Recombination
The amount of genomic data available increases rapidly and as a consequence, plenty of genetic studies focusing on large genomic regions have appeared (e.g., [37]). As expected, such studies require robust and memory-efficient simulators [10], [38]. One of them is fastsimcoal, which allows for efficient simulations because it is based on a simplification of the standard coalescent with recombination (the sequential Markovian coalescent [SMC] algorithm [39]). Therefore, it seems to be an appropriate framework to simulate this scenario.
V) Coding Data under a Spatial and Temporal Range Expansion
Spatial and temporal range expansions have occurred repeatedly in the history of most species and promote genetic consequences that are different than those produced by pure demographic expansions [40]. In addition, other spatiotemporal processes, such as range contractions and range shifts (usually produced during climate changes) or long-distance dispersal events, can also affect molecular diversity [41], [42]. Using SPLATCHE2, trees can be simulated under spatial and temporal range expansion in a straightforward manner. Then, coding data can be simulated over those trees by INDELible.
Applications of Simulated Genetic Data
Computer simulation is a powerful tool in population genetics with a rich variety of applications. Here I show some interesting published applications.
I. Hypothesis Testing
The effect of recombination on ancestral sequence reconstruction.
Recently, Arenas and Posada [35] tested if recombination can affect ancestral sequence reconstruction (ASR). They simulated nucleotide, codon, and amino acid data with NetRecodon and they observed that recombination biases the reconstruction of ancestral sequences, regardless of the method or software used. This effect was shown as a consequence of incorrect phylogenetic tree reconstructions when recombination is ignored [43]. Note that this effect is crucial for numerous ASR-based studies (e.g., [44]).
The effect of recombination on selection tests.
Tests for identifying selection (based on dN/dS) are frequently used in different species, including highly recombining viruses and bacteria (e.g., [45]). There is, however, an important pitfall of such tests in the presence of recombination. In the studies [8], [23] authors simulated coding data under several heterogeneous codon models [46] and different levels of recombination. Then, they applied likelihood ratio tests (LRTs) for model choice. Results showed a weak impact of recombination on the estimation of global dN/dS but a strong effect at the local level by inflating the number of positively selected sites. Simulations were carried out using CodonRecSim and NetRecodon.
Testing criteria for substitution model selection.
A common step in phylogenetics consists of the statistical selection of a DNA substitution model that best fits the data [47], [48]. Currently, this model selection can be performed using several criteria, namely hierarchical and dynamic LRTs, Akaike and Bayesian information criterion (AIC and BIC, respectively), and the decision-theoretic approach (DT). Although AIC and BIC showed advantages over LRTs [47], the best criterion among all other options remained unclear. Recently, Luo et al. [49] addressed this point by extensive simulations of nucleotide data (using PAML[28] to simulate four tree topologies and Seq-Gen to evolve DNA sequences under a wide set of substitution models) and coding data (using Recodon). Then, by statistical analysis they concluded that BIC and DT approaches favor accurate model selection.
II. Verification of Analytical Methods
Validation of a method for large phylogenetic tree reconstruction.
One of the most well-established programs for phylogenetic tree reconstruction is PHYML[50]. As with most analytical tools, PHYML required thorough validation through computer simulations. In particular, 5,000 random phylogenies were simulated according to the standard speciation process (see [51]), and then DNA sequences were evolved on those phylogenies using Seq-Gen. The program showed a topological accuracy similar to that from other maximum likelihood programs, but it strongly reduced computing time.
Validation of a method for the detection of recombinant breakpoints.
Recombination detection methods are fundamental for the analysis of genome dynamics, genetic mapping, and phylogenetic methods. As a result, a variety of methods for recombination detection exist (see [52]). One of them was recently developed by Westesson and Holmes [5] for the analysis of whole-genome alignments. For its validation, ancestral recombination graphs (ARGs) were simulated using Recodon, then marginal trees with identical topologies were excluded and DNA sequences were simulated on the remaining trees using Seq-Gen. The method accurately detected recombinant breakpoints even for genome-size datasets.
III. Study of Complex Evolutionary Processes
Principal component analysis of human genetic diversity across Europe.
A controversial topic that sparked debate in recent years was the interpretation of gradients of population genetic variation across Europe derived from principal component analysis (PCA) [53]–[56]. Briefly, while initially Cavalli-Sforza et al. [56] interpreted principal component (PC) gradients only as a consequence of human ancestral expansions, Novembre and Stephens [53] showed that similar PC gradients may arise from diverse spatial genetic patterns under equilibrium isolation-by-distance models. Recently, François et al. [55] carried out simulations of DNA data using SPLATCHE2 in order to mimic the Neolithic farmer expansion across Europe taking into account various levels of interbreeding between farmer and resident hunter-gatherer populations (see Figure 2). They concluded that demographic and spatial population expansions often lead to PC gradients that are perpendicular to the direction of the expansion as a consequence of the allele surfing phenomenon [57].
10.1371/journal.pcbi.1002495.g002
Example of a simulated modern human range expansion over Europe.
(A) Snapshots of the program SPLATCHE2 for an example of simulation of a Neolithic farmer expansion over Europe. Settings (demographic parameter values) used for this example are similar to those used in François et al. [55]. Note that the range expansion starts from the bottom-right corner of Europe. Snapshots are taken every 40 generations. White demes are empty and dark colors indicate low population densities (in particular at the front of the expansion). (B) Scheme of sampling locations used for this simulation. (C) Spatial distribution of coalescent events during the range expansion.
IV. Estimation of Evolutionary Parameters
Coestimation of evolutionary parameters using approximate Bayesian computation.
Approximate Bayesian computation (ABC) is a recent and useful approach for the inference in evolutionary genetics (see [58]), based on computer simulations. It provides a robust alternative for those analyses where the likelihood function cannot be evaluated or is computationally too expensive. An interesting example studied by Wilson et al. [59] applied ABC to coestimate several evolutionary parameters (such as mutation, dN/dS, and recombination rates) from coding data of the bacteria Campylobacter jejuni. Although the simulator used was not published, such a scenario could be simulated using e.g., Recodon. In addition, Laval et al. [60] also applied an ABC-based approach to coestimate, assuming a particular model of human evolution, important historical and demographic parameters like the onset of the African expansions and the out-of-Africa migration, as well as the current and ancestral effective population sizes of Africans and non-Africans. Here the simulation of DNA data was performed using SIMCOAL2.
The Future of Computer Simulations
Although current software available can simulate a wide set of evolutionary scenarios, some limitations still remain concerning computational costs and particular complex models. In some cases the computational time is crucial (e.g., ABC studies that require millions of simulations to cover a wide range of parameter space), and running simulations in parallel on a cluster can help alleviate the computational time. On the other hand, several complex scenarios that interest evolutionary biologists are still difficult to simulate. An example is the simulation of molecular evolution with dependence among sites (coevolving sites, e.g., [61]). Here, although some models were already developed (see [62]), they could not be extensively applied in simulations due to intractable computational costs derived from the calculation of diverse structural energies (like those used in [63]). Another challenging scenario is the simulation of coding data under natural selection, but where the signatures of natural selection directly influence the synonymous and nonsynonymous substitutions (see [64]).
There is a permanent need of software for the simulation of molecular data due to the emergence of complex scenarios and the requirement of fast simulations. Thus, I expect a fruitful future for this basic and applied area of research.
I want to thank Vicky Schneider and the Editor of PLoS Computational Biology's Education section for their invitation to contribute with this education article. I also want to thank Isabel Alves, Yang Liu, Rebecca Krebs-Wheaton, and William Fletcher for helpful comments. I thank three anonymous reviewers for their efforts in reviewing this study.
ReferencesPeckSL2004Simulation as experiment: a philosophical reassessment for biological modeling.19530534DeChaineEGMartinAP2006Using coalescent simulations to test the impact of quaternary climate cycles on divergence in an alpine plant-insect association.6010041013Carvajal-RodriguezACrandallKAPosadaD2006Recombination estimation under complex evolutionary models with the coalescent composite-likelihood method.23817827ArenasMValienteGPosadaD2008Characterization of reticulate networks based on the coalescent with recombination.2525172520WestessonOHolmesI2009Accurate detection of recombinant breakpoints in whole-genome alignments.5e1000318doi:10.1371/journal.pcbi.1000318HillWGRobertsonA1966The effect of linkage on limits to artificial selection.8269294BeaumontMAZhangWBaldingDJ2002Approximate Bayesian computation in population genetics.16220252035ArenasMPosadaD2010Coalescent simulation of intracodon recombination.184429437RayNCurratMFollMExcoffierL2010SPLATCHE2: a spatially explicit simulation framework for complex demography, genetic admixture and recombination.2629932994ExcoffierLFollM2011fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios.2713321334YangZ2006Oxford University PressFletcherWYangZ2009INDELible: a flexible simulator of biological sequence evolution.2618791888Carvajal-RodriguezA2008Simulation of genomes: a review.9155159Carvajal-RodriguezA2010Simulation of genes and genomes forward in time.115861LiuYAthanasiadisGWealeME2008A survey of genetic simulation software for population and epidemiological studies.37986HobanSBertorelleGGaggiottiOE2012Computer simulations: tools for population and evolutionary genetics.13110122ArenasMPosadaD2012Simulation of coding sequence evolution.CannarozziGMSchneiderAOxfordOxford University Press126132Carvajal-RodriguezA2008GENOMEPOP: a program to simulate genomes in populations.9223HernandezRD2008A flexible forward simulator for populations subject to selection and demography.2427862787NeuenschwanderS2006AQUASPLATCHE: a program to simulate genetic diversity in populations living in linear habitats.6583585PengBKimmelM2005simuPOP: a forward-time population genetics simulation environment.2136863687ArbizaLPatricioMDopazoHPosadaD2011Genome-wide heterogeneity of nucleotide substitution model fit.3896908AnisimovaMNielsenRYangZ2003Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites.16412291236ArenasMPosadaD2007Recodon: coalescent simulation of coding DNA sequences with recombination, migration and demography.8458NavascuesMDepaulisFEmersonBC2010Combining contemporary and ancient DNA in population genetic and phylogeographical studies.10760772RambautAGrasslyNC1997Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees.13235238StropeCLAbelKScottSDMoriyamaEN2009Biological sequence simulation for testing complex evolutionary hypotheses: indel-Seq-Gen version 2.0.2625812593YangZ1997PAML: a program package for phylogenetic analysis by maximum likelihood.13555556SiposBMassinghamTJordanGEGoldmanN2011PhyloSim - Monte Carlo simulation of sequence evolution in the R statistical computing environment.12104IhakaRGentlemanR1996R: a language for data analysis and graphics.169299314BiswasSAkeyJ2006Genomic insights into positive selection.22437446KelleyJLMadeoyJCalhounJCSwansonWAkeyJM2006Genomic signatures of positive selection in humans and the limits of outlier approaches.16980989EwingGHermissonJ2010MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus.2620642065SpencerCCCoopG2004SelSim: a program to simulate population genetic data with natural selection and recombination.2036733675ArenasMPosadaD2010The effect of recombination on the reconstruction of ancestral sequences.18411331139LemeyPLottMMartinDPMoultonV2009Identifying recombinants in human and primate immunodeficiency virus sequence alignments using quartet scanning.10126DurbinRMAltshulerDLAbecasisGRBentleyDRChakravartiA2010A map of human genome variation from population-scale sequencing.46710611073MarjoramPWallJD2006Fast “coalescent” simulation.716McVeanGACardinNJ2005Approximating the coalescent with recombination.36013871393ExcoffierLFollMPetitRJ2009Genetic consequences of range expansions.40481501ArenasMRayNCurratMExcoffierL2012Consequences of range contractions and range shifts on molecular diversity.29207218RayNExcoffierL2010A first step towards inferring levels of long-distance dispersal during past expansions.10902914SchierupMHHeinJ2000Consequences of recombination on traditional phylogenetic analysis.156879891ArenasMPosadaD2010Computational design of centralized HIV-1 genes.8613621BozekKLengauerT2010Positive selection of HIV host factors and the evolution of lentivirus genes.10186YangZNielsenRGoldmanNPedersenA-MK2000Codon-substitution models for heterogeneous selection pressure at amino acid sites.155431449PosadaDBuckleyTR2004Model selection and model averaging in phylogenetics: advantages of Akaike Information Criterion and Bayesian approaches over likelihood ratio tests.53793808SullivanJJoyceP2005Model selection in phylogenetics.36445466LuoAQiaoHZhangYShiWHoSY2010Performance of criteria for selecting evolutionary models in phylogenetics: a comprehensive study based on simulated datasets.10242GuindonSGascuelO2003A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.52696704KuhnerMKFelsensteinJ1994A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates.11459468PosadaD2002Evaluation of methods for detecting recombination from DNA sequences: empirical data.19708717NovembreJStephensM2008Interpreting principal component analyses of spatial population genetic variation.40646649NovembreJStephensM2010Response to Cavalli-Sforza interview [Human Biology 82(3):245–266 (June 2010)].82469470FrançoisOCurratMRayNHanEExcoffierL2010Principal component analysis under population genetic models of range expansion and admixture.2712571268Cavalli-SforzaLLMenozziPPiazzaA1994Princeton, New JerseyPrinceton University PressExcoffierLRayN2008Surfing during population expansions promotes genetic revolutions and structuration.23347351BeaumontMA2010Approximate Bayesian computation in evolution and ecology.41379405WilsonDJGabrielELeatherbarrowAJCheesbroughJGeeS2009Rapid evolution and the importance of recombination to the gastroenteric pathogen Campylobacter jejuni.26385397LavalGPatinEBarreiroLBQuintana-MurciL2010Formulating a historical and demographic model of recent human evolution based on resequencing data from noncoding regions.5e10284doi:10.1371/journal.pone.0010284WangMKapralovMVAnisimovaM2011Coevolution of amino acid residues in the key photosynthetic enzyme Rubisco.11266BastollaUPortoMRomanHEVendruscoloM2007Berlin, HeidelbergSpringerArenasMVillaverdeMCSussmanF2009Prediction and analysis of binding affinities for chemically diverse HIV-1 PR inhibitors by the modified SAFE_p approach.3012291240KryazhimskiySPlotkinJB2008The population genetics of dN/dS.4e1000304doi:10.1371/journal.pgen.1000304ExcoffierLNovembreJSchneiderS2000SIMCOAL: a general coalescent program for the simulation of molecular data in interconnected populations with arbitrary demography.91506509AndersonCNRamakrishnanUChanYLHadlyEA2005Serial SimCoal: a population genetics model for data from multiple populations and points in time.2117331734Ramos-OnsinsSEMitchell-OldsT2007Mlcoalsim: multilocus coalescent simulations.34144GrasslyNCHarveyPHHolmesEC1999Population dynamics of HIV-1 inferred from gene sequences.151427438BeikoRGCharleboisRL2007A simulation test bed for hypotheses of genome evolution.23825831HallBG2008Simulating DNA coding sequence evolution with EvolveAGene 3.25688695CartwrightRA2005DNA assembly with gaps (Dawg): simulating sequence evolution.21Suppl 3iii3138RosenbergMS2005MySSP: Non-stationary evolutionary sequence simulation, including indels.18183GesellTvon HaeselerA2006In silico sequence evolution with site-specific interactions along phylogenetic trees.22716722StoyeJEversDMeyerF1998Rose: generating sequence families.14157163VaradarajanABradleyRKHolmesIH2008Tools for simulating evolution of aligned genomic regions with integrated parameter estimation.9R147DalquenDAAnisimovaMGonnetGHDessimozC2012ALF–a simulation framework for genome evolution.2911151123PangASmithADNuinPATillierER2005SIMPROT: using an empirically determined indel distribution in simulations of protein evolution.6236ArenasMPatricioMPosadaDValienteG2010Characterization of phylogenetic networks with NetTest.11268RaupDMGouldSJSchopfTJMSimberloffDS1973Stochastic models of phylogeny and the evolution of diversity.81525542EppersonBKMcRaeBHScribnerKCushmanSARosenbergMS2010Utility of computer simulations in landscape genetics.1935493564PengBAmosCIKimmelM2007Forward-time simulations of human populations with complex diseases.3e47doi:10.1371/journal.pgen.0030047CalafellFGrigorenkoELChikanianAAKiddKK2001Haplotype evolution and linkage disequilibrium: a simulation study.518596JonesTCLaughlinTF2010PopGen fishbowl: a free online simulation model of microevolutionary processes.72100103CoombsJALetcherBHNislowKH2010Pedagog: software for simulating eco-evolutionary population dynamics.10558563PadhukasahasramBMarjoramPWallJDBustamanteCDNordborgM2008Exploring population genetic models with recombination using efficient forward-time simulations.17824172427NordborgM2007Coalescent theory.BaldingDJBishopMCanningsCChichester, UKJohn Wiley & Sons, Ltd843877WakeleyJ2008Greenwood Village, ColoradoRoberts and Company PublishersSlatkinM2001Simulating genealogies of selected alleles in a population of variable size.784957HudsonRR1998Island models and the coalescent process.7413418HudsonRR1983Properties of a neutral allele model with intragenic recombination.23183201HudsonRRKaplanNL1988The coalescent process in models with selection and recombination.120831840