PLoSWiki
http://topicpages.ploscompbiol.org/wiki/Main_Page
MediaWiki 1.17.0
first-letter
Media
Special
Talk
User
User talk
PLoSWiki
PLoSWiki talk
File
File talk
MediaWiki
MediaWiki talk
Template
Template talk
Help
Help talk
Category
Category talk
Viral phylodynamics
63
397
2012-02-28T17:38:31Z
Erikvolz
8
Created page with "Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | path..."
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
=Methods=
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
424
2012-03-05T07:08:57Z
Erikvolz
8
Rough draft of an introduction. I've decided to introduce the subject by referring to Fig 3 of Grenfell's original paper.
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of virus have been used to investigate many epidemiological processes, including the effects of selection by the [[wp:immune system | immune system]], growth rates in [[wp:Prevalence | epidemic prevalence]], the time that a virus originated in a new host population or species, and the rates that a virus is transmitted between different host populations.
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.{{Citation needed}} who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see figure {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see figure {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see figure {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see figure {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see figure {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see figure {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see figure {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
Virus within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are also more closely related.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See figure {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock]] {{Citation needed}}.
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]]. {{Citation needed}}
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
425
2012-03-22T21:27:53Z
Erikvolz
8
Started the section on coalescent models.
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of virus have been used to investigate many epidemiological processes, including the effects of selection by the [[wp:immune system | immune system]], growth rates in [[wp:Prevalence | epidemic prevalence]], the time that a virus originated in a new host population or species, and the rates that a virus is transmitted between different host populations.
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.{{Citation needed}} who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see figure {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see figure {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see figure {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see figure {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see figure {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see figure {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see figure {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
Virus within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are also more closely related.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See figure {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock]] {{Citation needed}}.
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]]. {{Citation needed}}
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 <\math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \mathrm{E}[\mathrm{TMRCA}] = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\ = 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\ = 2N(1 - \frac{1}{n})</math>.
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0}{D} </math>.
For example, Fraser et al. {{Citation needed}} generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
426
2012-03-23T02:50:05Z
Erikvolz
8
Added Figure 4.
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of virus have been used to investigate many epidemiological processes, including the effects of selection by the [[wp:immune system | immune system]], growth rates in [[wp:Prevalence | epidemic prevalence]], the time that a virus originated in a new host population or species, and the rates that a virus is transmitted between different host populations.
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.{{Citation needed}} who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see figure {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see figure {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see figure {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see figure {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see figure {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see figure {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see figure {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
Virus within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are also more closely related.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See figure {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock]] {{Citation needed}}.
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]]. {{Citation needed}}
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 <\math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \mathrm{E}[\mathrm{TMRCA}] = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\ = 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\ = 2N(1 - \frac{1}{n})</math>.
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0}{D} </math>.
For example, Fraser et al. {{Citation needed}} generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
431
2012-03-23T22:52:37Z
Spencer Bliven
1
/* Coalescent theory and phylodynamics */ Testing math & fixing TeX errors
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of virus have been used to investigate many epidemiological processes, including the effects of selection by the [[wp:immune system | immune system]], growth rates in [[wp:Prevalence | epidemic prevalence]], the time that a virus originated in a new host population or species, and the rates that a virus is transmitted between different host populations.
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.{{Citation needed}} who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see figure {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see figure {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see figure {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see figure {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see figure {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see figure {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see figure {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
Virus within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are also more closely related.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See figure {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock]] {{Citation needed}}.
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]]. {{Citation needed}}
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0}{D} </math>.
For example, Fraser et al. {{Citation needed}} generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
442
2012-03-27T04:11:03Z
Erikvolz
8
Fixed R0 equation.
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of virus have been used to investigate many epidemiological processes, including the effects of selection by the [[wp:immune system | immune system]], growth rates in [[wp:Prevalence | epidemic prevalence]], the time that a virus originated in a new host population or species, and the rates that a virus is transmitted between different host populations.
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.{{Citation needed}} who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see figure {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see figure {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see figure {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see figure {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see figure {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see figure {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see figure {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
Virus within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are also more closely related.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See figure {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock]] {{Citation needed}}.
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]]. {{Citation needed}}
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al. {{Citation needed}} generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
443
2012-03-27T04:13:00Z
Erikvolz
8
Corrected figure tags.
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of virus have been used to investigate many epidemiological processes, including the effects of selection by the [[wp:immune system | immune system]], growth rates in [[wp:Prevalence | epidemic prevalence]], the time that a virus originated in a new host population or species, and the rates that a virus is transmitted between different host populations.
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.{{Citation needed}} who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
Virus within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are also more closely related.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock]] {{Citation needed}}.
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]]. {{Citation needed}}
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al. {{Citation needed}} generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
445
2012-03-28T13:05:13Z
Tbedford
7
Sketch of phylogeographic methods and simulation methods.
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of virus have been used to investigate many epidemiological processes, including the effects of selection by the [[wp:immune system | immune system]], growth rates in [[wp:Prevalence | epidemic prevalence]], the time that a virus originated in a new host population or species, and the rates that a virus is transmitted between different host populations.
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.{{Citation needed}} who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
Virus within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are also more closely related.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock]] {{Citation needed}}.
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]]. {{Citation needed}}
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al. {{Citation needed}} generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model. This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices. The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.
== Simulation ==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains, or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population. Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious. In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, virus mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
448
2012-03-28T13:26:38Z
Tbedford
7
Trying introduction as it's own section
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of virus have been used to investigate many epidemiological processes, including the effects of selection by the [[wp:immune system | immune system]], growth rates in [[wp:Prevalence | epidemic prevalence]], the time that a virus originated in a new host population or species, and the rates that a virus is transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.{{Citation needed}} who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
Virus within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are also more closely related.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock]] {{Citation needed}}.
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]]. {{Citation needed}}
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al. {{Citation needed}} generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model. This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices. The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.
== Simulation ==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains, or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population. Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious. In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, virus mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
449
2012-03-28T13:34:17Z
Tbedford
7
Minor typos.
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of virus have been used to investigate many epidemiological processes, including the effects of selection by the [[wp:immune system | immune system]], growth rates in [[wp:Prevalence | epidemic prevalence]], the time that a virus originated in a new host population or species, and the rates that a virus is transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.{{Citation needed}} who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]] {{Citation needed}}.
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]]. {{Citation needed}}
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al. {{Citation needed}} generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model. This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices. The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.
== Simulation ==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains, or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population. Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious. In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, virus mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
450
2012-03-28T14:26:10Z
Tbedford
7
Skeleton of applications
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of virus have been used to investigate many epidemiological processes, including the effects of selection by the [[wp:immune system | immune system]], growth rates in [[wp:Prevalence | epidemic prevalence]], the time that a virus originated in a new host population or species, and the rates that a virus is transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.{{Citation needed}} who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009. This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math> R_0 </math>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeographic models | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]] {{Citation needed}}.
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]]. {{Citation needed}}
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al. {{Citation needed}} generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model. This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices. The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.
== Simulation ==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains, or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population. Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious. In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, virus mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
453
2012-03-29T11:46:51Z
Tbedford
7
References section
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of virus have been used to investigate many epidemiological processes, including the effects of selection by the [[wp:immune system | immune system]], growth rates in [[wp:Prevalence | epidemic prevalence]], the time that a virus originated in a new host population or species, and the rates that a virus is transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.{{Citation needed}} who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math> R_0 </math>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeographic models | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]] {{Citation needed}}.
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]]. {{Citation needed}}
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al. {{Citation needed}} generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model. This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices. The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.
== Simulation ==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains, or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population. Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious. In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, virus mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
== References ==
<references/>
454
2012-03-29T11:58:23Z
Tbedford
7
Reference
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of virus have been used to investigate many epidemiological processes, including the effects of selection by the [[wp:immune system | immune system]], growth rates in [[wp:Prevalence | epidemic prevalence]], the time that a virus originated in a new host population or species, and the rates that a virus is transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.{{Citation needed}} who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math> R_0 </math>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeographic models | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al. {{Citation needed}} generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model. This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices. The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.
== Simulation ==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains, or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population. Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious. In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, virus mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
== References ==
<references/>
455
2012-03-29T12:06:21Z
Tbedford
7
Reference
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of virus have been used to investigate many epidemiological processes, including the effects of selection by the [[wp:immune system | immune system]], growth rates in [[wp:Prevalence | epidemic prevalence]], the time that a virus originated in a new host population or species, and the rates that a virus is transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.{{Citation needed}} who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math> R_0 </math>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeographic models | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al. {{Citation needed}} generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model. This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices. The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.
== Simulation ==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains, or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population. Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious. In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, virus mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
== References ==
<references/>
456
2012-03-29T12:07:19Z
Tbedford
7
Typo
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of virus have been used to investigate many epidemiological processes, including the effects of selection by the [[wp:immune system | immune system]], growth rates in [[wp:Prevalence | epidemic prevalence]], the time that a virus originated in a new host population or species, and the rates that a virus is transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.{{Citation needed}} who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math> R_0 </math>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeographic models | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al. {{Citation needed}} generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model. This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices. The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.
== Simulation ==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains, or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population. Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious. In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
== References ==
<references/>
457
2012-03-29T12:10:28Z
Tbedford
7
/* Phylogeography */ References
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of virus have been used to investigate many epidemiological processes, including the effects of selection by the [[wp:immune system | immune system]], growth rates in [[wp:Prevalence | epidemic prevalence]], the time that a virus originated in a new host population or species, and the rates that a virus is transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.{{Citation needed}} who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math> R_0 </math>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeographic models | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al. {{Citation needed}} generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
== Simulation ==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains, or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population. Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious. In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
== References ==
<references/>
458
2012-03-29T12:16:05Z
Tbedford
7
/* Simulation */ References
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of virus have been used to investigate many epidemiological processes, including the effects of selection by the [[wp:immune system | immune system]], growth rates in [[wp:Prevalence | epidemic prevalence]], the time that a virus originated in a new host population or species, and the rates that a virus is transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.{{Citation needed}} who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math> R_0 </math>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeographic models | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al. {{Citation needed}} generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
== Simulation ==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious. In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
== References ==
<references/>
459
2012-03-29T12:18:52Z
Tbedford
7
/* Simulation */ References
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of virus have been used to investigate many epidemiological processes, including the effects of selection by the [[wp:immune system | immune system]], growth rates in [[wp:Prevalence | epidemic prevalence]], the time that a virus originated in a new host population or species, and the rates that a virus is transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.{{Citation needed}} who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math> R_0 </math>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeographic models | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al. {{Citation needed}} generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
== Simulation ==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
== References ==
<references/>
460
2012-03-29T12:29:11Z
Tbedford
7
/* Sources of phylodynamic variation */ Reference
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of virus have been used to investigate many epidemiological processes, including the effects of selection by the [[wp:immune system | immune system]], growth rates in [[wp:Prevalence | epidemic prevalence]], the time that a virus originated in a new host population or species, and the rates that a virus is transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math> R_0 </math>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeographic models | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al. {{Citation needed}} generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
== Simulation ==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
== References ==
<references/>
461
2012-03-29T12:32:01Z
Tbedford
7
/* Coalescent theory and phylodynamics */ Reference
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of virus have been used to investigate many epidemiological processes, including the effects of selection by the [[wp:immune system | immune system]], growth rates in [[wp:Prevalence | epidemic prevalence]], the time that a virus originated in a new host population or species, and the rates that a virus is transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math> R_0 </math>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeographic models | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
== Simulation ==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
== References ==
<references/>
466
2012-04-02T15:19:47Z
Tbedford
7
References
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of virus have been used to investigate many epidemiological processes, including the effects of selection by the [[wp:immune system | immune system]], growth rates in [[wp:Prevalence | epidemic prevalence]], the time that a virus originated in a new host population or species, and the rates that a virus is transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math> R_0 </math>.<ref name=Volz09>{{cite pmid|19797047}}</ref> Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeographic models | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref>
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
== Simulation ==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
== References ==
<references/>
467
2012-04-02T15:30:28Z
Tbedford
7
Wording change
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection by the [[wp:immune system | immune system]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like{{Citation needed}} the tree is (see {{See Figure|1}}).
* How rapidly a population of virus expanded in the past may be reflected by how star-like{{Citation needed}} the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are{{Citation needed}} (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math> R_0 </math>.<ref name=Volz09>{{cite pmid|19797047}}</ref> Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeographic models | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref>
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
== Simulation ==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
== References ==
<references/>
468
2012-04-02T15:33:15Z
Tbedford
7
References
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection by the [[wp:immune system | immune system]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like the tree is (see {{See Figure|1}}).<ref name=Grenfell04/>
* How rapidly a population of virus expanded in the past may be reflected by how star-like the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.<ref name=Grenfell04/>
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.<ref name=Grenfell04/>
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math> R_0 </math>.<ref name=Volz09>{{cite pmid|19797047}}</ref> Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeographic models | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref>
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
== Simulation ==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
== References ==
<references/>
469
2012-04-02T15:51:23Z
Tbedford
7
More references
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection by the [[wp:immune system | immune system]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like the tree is (see {{See Figure|1}}).<ref name=Grenfell04/>
* How rapidly a population of virus expanded in the past may be reflected by how star-like the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.<ref name=Grenfell04/>
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.<ref name=Grenfell04/>
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math>R_0</math>.<ref name=Volz09>{{cite pmid|19797047}}</ref> Such approaches have been used to estimate <math>R_0</math> in hepatitis C<ref name=Pybus01>{{cite pmid|11423661}}</ref> and HIV<ref name=Volz09/>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeography | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref> Phylodynamic approaches have been used to reveal the patterns of geographic movement of human the influenza virus<ref name=Bedford10>{{cite pmid|20523898}}</ref> and the epidemic spread of rabies virus in North American raccoons.<ref name=Lemey11>{{cite pmid|PMC2915639}}</ref>
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
== Simulation ==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
== References ==
<references/>
475
2012-04-05T18:38:11Z
Tbedford
7
Stem of Antiviral Resistance section
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection by the [[wp:immune system | immune system]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like the tree is (see {{See Figure|1}}).<ref name=Grenfell04/>
* How rapidly a population of virus expanded in the past may be reflected by how star-like the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.<ref name=Grenfell04/>
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.<ref name=Grenfell04/>
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math>R_0</math>.<ref name=Volz09>{{cite pmid|19797047}}</ref> Such approaches have been used to estimate <math>R_0</math> in hepatitis C<ref name=Pybus01>{{cite pmid|11423661}}</ref> and HIV<ref name=Volz09/>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeography | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref> Phylodynamic approaches have been used to reveal the patterns of geographic movement of human the influenza virus<ref name=Bedford10>{{cite pmid|20523898}}</ref> and the epidemic spread of rabies virus in North American raccoons.<ref name=Lemey11>{{cite pmid|PMC2915639}}</ref>
== Antiviral resistance ==
Without recourse to antibiotics, [[wp: antiviral drug | antiviral treatment]] is a popular approach for the control of viral pathogens. However, the use of antivirals creates selective pressure for the evolution of [[wp: drug resistance | drug resistance]] in the virus population, as resistant strains will be at a transmission advantage compared to susceptible strains. Commonly, there is a fitness [[wp: trade-off | trade-off]] between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals.{{Citation needed}} Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of [[wp: Oseltamivir | Oseltamivir]] in influenza A (H1N1).{{Citation needed}}
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
== Simulation ==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
== References ==
<references/>
638
2012-04-19T04:56:09Z
Erikvolz
8
Added results to coalescent section from Volz 2009 and Frost 2010.
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection by the [[wp:immune system | immune system]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like the tree is (see {{See Figure|1}}).<ref name=Grenfell04/>
* How rapidly a population of virus expanded in the past may be reflected by how star-like the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.<ref name=Grenfell04/>
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.<ref name=Grenfell04/>
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitus B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math>R_0</math>.<ref name=Volz09>{{cite pmid|19797047}}</ref> Such approaches have been used to estimate <math>R_0</math> in hepatitis C<ref name=Pybus01>{{cite pmid|11423661}}</ref> and HIV<ref name=Volz09/>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeography | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref> Phylodynamic approaches have been used to reveal the patterns of geographic movement of human the influenza virus<ref name=Bedford10>{{cite pmid|20523898}}</ref> and the epidemic spread of rabies virus in North American raccoons.<ref name=Lemey11>{{cite pmid|PMC2915639}}</ref>
== Antiviral resistance ==
Without recourse to antibiotics, [[wp: antiviral drug | antiviral treatment]] is a popular approach for the control of viral pathogens. However, the use of antivirals creates selective pressure for the evolution of [[wp: drug resistance | drug resistance]] in the virus population, as resistant strains will be at a transmission advantage compared to susceptible strains. Commonly, there is a fitness [[wp: trade-off | trade-off]] between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals.{{Citation needed}} Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of [[wp: Oseltamivir | Oseltamivir]] in influenza A (H1N1).{{Citation needed}}
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corrolaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yeild trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
Infectious disease epidemics are often characterized by highly non-linear and rapid changes in the number of infected individuals and effective population size of virus. In such cases, birth rates are highly variable, which can diminish the correspondence between effective population size and the prevalence of infection {{Citation needed}}. Many mathematical models have been developed in the field of [[wp:Epidemic_model | mathematical epidemiology]] to describe the nonlinear time series of prevalence of infection and the number of susceptible hosts. A well studied example is the [[wp:Epidemic_model#The_SIR_Model | Susceptible-Infected-Recovered(SIR)]] system of [[wp:Ordinary_differential_equations | differential equations]], which describes the fractions of the population <math>I(t)</math> infected and <math>S(t)</math> susceptible as a function of time:
: <math>\frac{dS}{dt} = - \beta S I </math>
: <math>\frac{dI}{dt} = \beta S I - \gamma I </math>
Here, <math>\beta</math> is the per capita rate of transmissions to susceptible hosts, and <math>\gamma</math> is the rate that infected individuals recover, whereupon they are no longer infectious. In this case, the incidence of new infections per unit time is <math>f(t) = \beta S I</math>, which is analogous to the birth rate in classical population genetics models. Volz et al. {{Citation needed}} proposed that the rate of coalescence for such an epidemic will be
: <math> \lambda_n(t) = {n \choose 2} \frac{2 f(t)}{I^2(t)}. </math>
For the simple SIR model, this yeilds
: <math> \lambda_n(t) = {n \choose 2} \frac{2 \beta S(t)}{I(t)}. </math>
This has a similar to the Kingman coalescent rate, but is damped by the fraction susceptible <math>S(t)</math>.
The ratio <math> 2{n \choose 2} / {I(t)^2} </math> can be understood as arising from the probability that two lineages selected uniformly at random are both ancestral to the sample. This probability is the ratio of the number of ways to pick two lineages without replacement from the set of lineages and from the set of all infections: <math>{n \choose 2} / {I(t) \choose 2} \approx 2{n \choose 2} / {I(t)^2}</math>. Coalescent events will occur with this probability at the rate given by the incidence function <math>f(t)</math>.
Early in an epidemic, <math>S(0) \approx 1</math>, so for the SIR model,
: <math> \lambda_n(0) \approx {n \choose 2} \frac{2 \beta}{I(0)}.</math>
This has the same mathematical form as the rate in the Kingman coalescent, substituting <math>N_e = I(t)/2\beta</math>. Consequently, estimates of effective population size based on the Kingman coalescent will be proportional to prevalence of infection during the early period of exponential growth of the epidemic {{Citation needed}}.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
== Simulation ==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
==Phylodynamics of HIV==
== References ==
<references/>
785
2012-05-07T13:50:58Z
Tbedford
7
Introduction to influenza.
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection by the [[wp:immune system | immune system]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like the tree is (see {{See Figure|1}}).<ref name=Grenfell04/>
* How rapidly a population of virus expanded in the past may be reflected by how star-like the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.<ref name=Grenfell04/>
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.<ref name=Grenfell04/>
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitis B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math>R_0</math>.<ref name=Volz09>{{cite pmid|19797047}}</ref> Such approaches have been used to estimate <math>R_0</math> in hepatitis C<ref name=Pybus01>{{cite pmid|11423661}}</ref> and HIV<ref name=Volz09/>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeography | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref> Phylodynamic approaches have been used to reveal the patterns of geographic movement of human the influenza virus<ref name=Bedford10>{{cite pmid|20523898}}</ref> and the epidemic spread of rabies virus in North American raccoons.<ref name=Lemey11>{{cite pmid|PMC2915639}}</ref>
== Antiviral resistance ==
Without recourse to antibiotics, [[wp: antiviral drug | antiviral treatment]] is a popular approach for the control of viral pathogens. However, the use of antivirals creates selective pressure for the evolution of [[wp: drug resistance | drug resistance]] in the virus population, as resistant strains will be at a transmission advantage compared to susceptible strains. Commonly, there is a fitness [[wp: trade-off | trade-off]] between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals.{{Citation needed}} Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of [[wp: Oseltamivir | Oseltamivir]] in influenza A (H1N1).{{Citation needed}}
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corollaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yield trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
Infectious disease epidemics are often characterized by highly non-linear and rapid changes in the number of infected individuals and effective population size of virus. In such cases, birth rates are highly variable, which can diminish the correspondence between effective population size and the prevalence of infection {{Citation needed}}. Many mathematical models have been developed in the field of [[wp:Epidemic_model | mathematical epidemiology]] to describe the nonlinear time series of prevalence of infection and the number of susceptible hosts. A well studied example is the [[wp:Epidemic_model#The_SIR_Model | Susceptible-Infected-Recovered(SIR)]] system of [[wp:Ordinary_differential_equations | differential equations]], which describes the fractions of the population <math>I(t)</math> infected and <math>S(t)</math> susceptible as a function of time:
: <math>\frac{dS}{dt} = - \beta S I </math>
: <math>\frac{dI}{dt} = \beta S I - \gamma I </math>
Here, <math>\beta</math> is the per capita rate of transmissions to susceptible hosts, and <math>\gamma</math> is the rate that infected individuals recover, whereupon they are no longer infectious. In this case, the incidence of new infections per unit time is <math>f(t) = \beta S I</math>, which is analogous to the birth rate in classical population genetics models. Volz et al. {{Citation needed}} proposed that the rate of coalescence for such an epidemic will be
: <math> \lambda_n(t) = {n \choose 2} \frac{2 f(t)}{I^2(t)}. </math>
For the simple SIR model, this yields
: <math> \lambda_n(t) = {n \choose 2} \frac{2 \beta S(t)}{I(t)}. </math>
This has a similar to the Kingman coalescent rate, but is damped by the fraction susceptible <math>S(t)</math>.
The ratio <math> 2{n \choose 2} / {I(t)^2} </math> can be understood as arising from the probability that two lineages selected uniformly at random are both ancestral to the sample. This probability is the ratio of the number of ways to pick two lineages without replacement from the set of lineages and from the set of all infections: <math>{n \choose 2} / {I(t) \choose 2} \approx 2{n \choose 2} / {I(t)^2}</math>. Coalescent events will occur with this probability at the rate given by the incidence function <math>f(t)</math>.
Early in an epidemic, <math>S(0) \approx 1</math>, so for the SIR model,
: <math> \lambda_n(0) \approx {n \choose 2} \frac{2 \beta}{I(0)}.</math>
This has the same mathematical form as the rate in the Kingman coalescent, substituting <math>N_e = I(t)/2\beta</math>. Consequently, estimates of effective population size based on the Kingman coalescent will be proportional to prevalence of infection during the early period of exponential growth of the epidemic {{Citation needed}}.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
==Simulation==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
Human influenza is an acute respiratory infection caused primarily by influenza virus types [[wp: influenza A virus | A]] and [[wp: Influenzavirus B | B]], which can be further classified into subtypes, such as [[wp: influenza A virus subtype H1N1 | A/H1N1]] and [[wp: influenza A virus subtype H3N2 | A/H3N2]]. Here, subtypes are denoted according to their [[wp: hemagglutinin (influenza) | hemagglutinin]] (H or HA) and [[wp: viral neuraminidase | neuraminidase]] (N or NA) genes, which as surface proteins, act as the primary targets for the [[wp: humoral immunity | humoral immune response]]. Influenza viruses circulate in other species as well, most notably as [[wp: swine influenza | swine influenza]] and [[wp: avian influenza | avian influenza.]] Through [[wp: reassortment | reassortment]], genetic sequences from swine and avian influenza occasionally enter the human population. If a specific hemagglutinin or neuraminidase has been circulating outside the human population, then humans will lack immunity to the this protein and [[wp: influenza pandemic | a pandemic may follow a host switch event]], as seen in 1918, 1957, 1968 and 2009. After introduction into the human population, a lineage of influenza generally persists through [[wp: antigenic drift | antigenic drift]], in which HA and NA continually accumulate mutations allowing viruses to infect hosts immune to the earlier forms of the virus. These lineages of influenza show recurrent seasonal epidemics in temperate regions and less periodic transmission in the tropics. Generally, at each pandemic event, the new form of the virus will outcompete existing lineages.<ref name=Ferguson03>{{cite pmid|12660783}}</ref> The study of viral phylodynamics in influenza primarily focuses on the continual circulation and evolution of epidemic influenza, rather than on pandemic emergence.
===Selective pressures===
===Circulation patterns===
==Phylodynamics of HIV==
==References==
<references/>
786
2012-05-07T16:30:32Z
Tbedford
7
Selective pressures paragraph.
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection by the [[wp:immune system | immune system]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like the tree is (see {{See Figure|1}}).<ref name=Grenfell04/>
* How rapidly a population of virus expanded in the past may be reflected by how star-like the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.<ref name=Grenfell04/>
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.<ref name=Grenfell04/>
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitis B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math>R_0</math>.<ref name=Volz09>{{cite pmid|19797047}}</ref> Such approaches have been used to estimate <math>R_0</math> in hepatitis C<ref name=Pybus01>{{cite pmid|11423661}}</ref> and HIV<ref name=Volz09/>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeography | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref> Phylodynamic approaches have been used to reveal the patterns of geographic movement of human the influenza virus<ref name=Bedford10>{{cite pmid|20523898}}</ref> and the epidemic spread of rabies virus in North American raccoons.<ref name=Lemey11>{{cite pmid|PMC2915639}}</ref>
== Antiviral resistance ==
Without recourse to antibiotics, [[wp: antiviral drug | antiviral treatment]] is a popular approach for the control of viral pathogens. However, the use of antivirals creates selective pressure for the evolution of [[wp: drug resistance | drug resistance]] in the virus population, as resistant strains will be at a transmission advantage compared to susceptible strains. Commonly, there is a fitness [[wp: trade-off | trade-off]] between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals.{{Citation needed}} Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of [[wp: Oseltamivir | Oseltamivir]] in influenza A (H1N1).{{Citation needed}}
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corollaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yield trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
Infectious disease epidemics are often characterized by highly non-linear and rapid changes in the number of infected individuals and effective population size of virus. In such cases, birth rates are highly variable, which can diminish the correspondence between effective population size and the prevalence of infection {{Citation needed}}. Many mathematical models have been developed in the field of [[wp:Epidemic_model | mathematical epidemiology]] to describe the nonlinear time series of prevalence of infection and the number of susceptible hosts. A well studied example is the [[wp:Epidemic_model#The_SIR_Model | Susceptible-Infected-Recovered(SIR)]] system of [[wp:Ordinary_differential_equations | differential equations]], which describes the fractions of the population <math>I(t)</math> infected and <math>S(t)</math> susceptible as a function of time:
: <math>\frac{dS}{dt} = - \beta S I </math>
: <math>\frac{dI}{dt} = \beta S I - \gamma I </math>
Here, <math>\beta</math> is the per capita rate of transmissions to susceptible hosts, and <math>\gamma</math> is the rate that infected individuals recover, whereupon they are no longer infectious. In this case, the incidence of new infections per unit time is <math>f(t) = \beta S I</math>, which is analogous to the birth rate in classical population genetics models. Volz et al. {{Citation needed}} proposed that the rate of coalescence for such an epidemic will be
: <math> \lambda_n(t) = {n \choose 2} \frac{2 f(t)}{I^2(t)}. </math>
For the simple SIR model, this yields
: <math> \lambda_n(t) = {n \choose 2} \frac{2 \beta S(t)}{I(t)}. </math>
This has a similar to the Kingman coalescent rate, but is damped by the fraction susceptible <math>S(t)</math>.
The ratio <math> 2{n \choose 2} / {I(t)^2} </math> can be understood as arising from the probability that two lineages selected uniformly at random are both ancestral to the sample. This probability is the ratio of the number of ways to pick two lineages without replacement from the set of lineages and from the set of all infections: <math>{n \choose 2} / {I(t) \choose 2} \approx 2{n \choose 2} / {I(t)^2}</math>. Coalescent events will occur with this probability at the rate given by the incidence function <math>f(t)</math>.
Early in an epidemic, <math>S(0) \approx 1</math>, so for the SIR model,
: <math> \lambda_n(0) \approx {n \choose 2} \frac{2 \beta}{I(0)}.</math>
This has the same mathematical form as the rate in the Kingman coalescent, substituting <math>N_e = I(t)/2\beta</math>. Consequently, estimates of effective population size based on the Kingman coalescent will be proportional to prevalence of infection during the early period of exponential growth of the epidemic {{Citation needed}}.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
==Simulation==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
Human influenza is an acute respiratory infection caused primarily by influenza virus types [[wp: influenza A virus | A]] and [[wp: Influenzavirus B | B]], which can be further classified into subtypes, such as [[wp: influenza A virus subtype H1N1 | A/H1N1]] and [[wp: influenza A virus subtype H3N2 | A/H3N2]]. Here, subtypes are denoted according to their [[wp: hemagglutinin (influenza) | hemagglutinin]] (H or HA) and [[wp: viral neuraminidase | neuraminidase]] (N or NA) genes, which as surface proteins, act as the primary targets for the [[wp: humoral immunity | humoral immune response]]. Influenza viruses circulate in other species as well, most notably as [[wp: swine influenza | swine influenza]] and [[wp: avian influenza | avian influenza.]] Through [[wp: reassortment | reassortment]], genetic sequences from swine and avian influenza occasionally enter the human population. If a specific hemagglutinin or neuraminidase has been circulating outside the human population, then humans will lack immunity to the this protein and [[wp: influenza pandemic | a pandemic may follow a host switch event]], as seen in 1918, 1957, 1968 and 2009. After introduction into the human population, a lineage of influenza generally persists through [[wp: antigenic drift | antigenic drift]], in which HA and NA continually accumulate mutations allowing viruses to infect hosts immune to the earlier forms of the virus. These lineages of influenza show recurrent seasonal epidemics in temperate regions and less periodic transmission in the tropics. Generally, at each pandemic event, the new form of the virus will outcompete existing lineages.<ref name=Ferguson03>{{cite pmid|12660783}}</ref> The study of viral phylodynamics in influenza primarily focuses on the continual circulation and evolution of epidemic influenza, rather than on pandemic emergence.
[[Image:flu.png|thumb|{{Figure|flutree}} Phylogeny of HA gene of influenza A (H3N2) from 1968 to 2002. Branches are colored according to antigenic cluster as determined by Smith et al.<ref name=Smith04>{{cite pmid|15218094}}</ref> ]]
===Selective pressures===
Phylodynamic techniques have yielded insight into the relative selective effects of mutations to different sites in different genes across the influenza virus genome. The exposed location of hemagglutinin (HA) suggests that should exist strong selective pressure for evolution to the specific sites on HA that are recognized by antibodies in the human system. These sites are referred to as [[wp: epitope | epitope]] sites. Phylogenetic analysis of H3N2 influenza has shown that exposed putative epitope sites of the HA protein evolve approximately 3.5 times faster on the trunk of phylogeny than on side branches (see {{See Figure|flutree}}).<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> This suggests that the viruses possessing mutations to these exposed sites benefit from positive selection and are more likely than viruses lacking such mutations to take over the influenza population. Conversely, other putatively non-epitope sites of the HA protein evolve approximately twice as fast on side branches than on the trunk of the H3 phylogeny,<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> indicating that mutations to these sites are selected against and viruses such mutations are less likely to take over the influenza population. Thus, analysis of phylodynamic patterns gives insight into underlying selective forces. A similar analysis across genes shows that both HA and NA undergo substantial positive selection, but that internal genes show less evidence of positive selection.<ref name=Bhatt11>{{cite pmid|21415025}}</ref>
Further analysis of the HA gene has shown it to have a very small [[wp: effective population size | effective population size]], as expected for a gene undergoing strong positive selection.<ref name=Bedford11>{{cite pmid|PMC3199772}}</ref> However, across the genome, there is surprisingly little variation in effective population size; all genes are equally low.<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> This finding suggests that reassortment between segments occurs slowly enough, relative to the actions of positive selection, that [[wp: genetic hitchhiking | genetic hitchhiking]] causes beneficial mutations in HA and NA to reduce diversity in linked neutral variation in other segments of the genome.
===Circulation patterns===
==Phylodynamics of HIV==
==References==
<references/>
788
2012-05-08T09:55:49Z
Tbedford
7
Including flu tree figure.
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection by the [[wp:immune system | immune system]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like the tree is (see {{See Figure|1}}).<ref name=Grenfell04/>
* How rapidly a population of virus expanded in the past may be reflected by how star-like the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.<ref name=Grenfell04/>
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.<ref name=Grenfell04/>
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitis B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math>R_0</math>.<ref name=Volz09>{{cite pmid|19797047}}</ref> Such approaches have been used to estimate <math>R_0</math> in hepatitis C<ref name=Pybus01>{{cite pmid|11423661}}</ref> and HIV<ref name=Volz09/>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeography | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref> Phylodynamic approaches have been used to reveal the patterns of geographic movement of human the influenza virus<ref name=Bedford10>{{cite pmid|20523898}}</ref> and the epidemic spread of rabies virus in North American raccoons.<ref name=Lemey11>{{cite pmid|PMC2915639}}</ref>
== Antiviral resistance ==
Without recourse to antibiotics, [[wp: antiviral drug | antiviral treatment]] is a popular approach for the control of viral pathogens. However, the use of antivirals creates selective pressure for the evolution of [[wp: drug resistance | drug resistance]] in the virus population, as resistant strains will be at a transmission advantage compared to susceptible strains. Commonly, there is a fitness [[wp: trade-off | trade-off]] between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals.{{Citation needed}} Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of [[wp: Oseltamivir | Oseltamivir]] in influenza A (H1N1).{{Citation needed}}
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corollaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yield trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
Infectious disease epidemics are often characterized by highly non-linear and rapid changes in the number of infected individuals and effective population size of virus. In such cases, birth rates are highly variable, which can diminish the correspondence between effective population size and the prevalence of infection {{Citation needed}}. Many mathematical models have been developed in the field of [[wp:Epidemic_model | mathematical epidemiology]] to describe the nonlinear time series of prevalence of infection and the number of susceptible hosts. A well studied example is the [[wp:Epidemic_model#The_SIR_Model | Susceptible-Infected-Recovered(SIR)]] system of [[wp:Ordinary_differential_equations | differential equations]], which describes the fractions of the population <math>I(t)</math> infected and <math>S(t)</math> susceptible as a function of time:
: <math>\frac{dS}{dt} = - \beta S I </math>
: <math>\frac{dI}{dt} = \beta S I - \gamma I </math>
Here, <math>\beta</math> is the per capita rate of transmissions to susceptible hosts, and <math>\gamma</math> is the rate that infected individuals recover, whereupon they are no longer infectious. In this case, the incidence of new infections per unit time is <math>f(t) = \beta S I</math>, which is analogous to the birth rate in classical population genetics models. Volz et al. {{Citation needed}} proposed that the rate of coalescence for such an epidemic will be
: <math> \lambda_n(t) = {n \choose 2} \frac{2 f(t)}{I^2(t)}. </math>
For the simple SIR model, this yields
: <math> \lambda_n(t) = {n \choose 2} \frac{2 \beta S(t)}{I(t)}. </math>
This has a similar to the Kingman coalescent rate, but is damped by the fraction susceptible <math>S(t)</math>.
The ratio <math> 2{n \choose 2} / {I(t)^2} </math> can be understood as arising from the probability that two lineages selected uniformly at random are both ancestral to the sample. This probability is the ratio of the number of ways to pick two lineages without replacement from the set of lineages and from the set of all infections: <math>{n \choose 2} / {I(t) \choose 2} \approx 2{n \choose 2} / {I(t)^2}</math>. Coalescent events will occur with this probability at the rate given by the incidence function <math>f(t)</math>.
Early in an epidemic, <math>S(0) \approx 1</math>, so for the SIR model,
: <math> \lambda_n(0) \approx {n \choose 2} \frac{2 \beta}{I(0)}.</math>
This has the same mathematical form as the rate in the Kingman coalescent, substituting <math>N_e = I(t)/2\beta</math>. Consequently, estimates of effective population size based on the Kingman coalescent will be proportional to prevalence of infection during the early period of exponential growth of the epidemic {{Citation needed}}.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
==Simulation==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
Human influenza is an acute respiratory infection caused primarily by influenza virus types [[wp: influenza A virus | A]] and [[wp: Influenzavirus B | B]], which can be further classified into subtypes, such as [[wp: influenza A virus subtype H1N1 | A/H1N1]] and [[wp: influenza A virus subtype H3N2 | A/H3N2]]. Here, subtypes are denoted according to their [[wp: hemagglutinin (influenza) | hemagglutinin]] (H or HA) and [[wp: viral neuraminidase | neuraminidase]] (N or NA) genes, which as surface proteins, act as the primary targets for the [[wp: humoral immunity | humoral immune response]]. Influenza viruses circulate in other species as well, most notably as [[wp: swine influenza | swine influenza]] and [[wp: avian influenza | avian influenza.]] Through [[wp: reassortment | reassortment]], genetic sequences from swine and avian influenza occasionally enter the human population. If a specific hemagglutinin or neuraminidase has been circulating outside the human population, then humans will lack immunity to the this protein and [[wp: influenza pandemic | a pandemic may follow a host switch event]], as seen in 1918, 1957, 1968 and 2009. After introduction into the human population, a lineage of influenza generally persists through [[wp: antigenic drift | antigenic drift]], in which HA and NA continually accumulate mutations allowing viruses to infect hosts immune to the earlier forms of the virus. These lineages of influenza show recurrent seasonal epidemics in temperate regions and less periodic transmission in the tropics. Generally, at each pandemic event, the new form of the virus will outcompete existing lineages.<ref name=Ferguson03>{{cite pmid|12660783}}</ref> The study of viral phylodynamics in influenza primarily focuses on the continual circulation and evolution of epidemic influenza, rather than on pandemic emergence.
[[Image:Flutree.png|thumb|{{Figure|flutree}} Phylogenetic tree of the HA1 region of the HA gene of influenza A (H3N2) from viruses sampled between 1968 and 2002. Tips are colored according to the antigenic cluster assignment made by Smith et al.<ref name=Smith04>{{cite pmid|15218094}}</ref> and branches are colored based on a phylogenetic discrete traits model. ]]
===Selective pressures===
Phylodynamic techniques have yielded insight into the relative selective effects of mutations to different sites in different genes across the influenza virus genome. The exposed location of hemagglutinin (HA) suggests that should exist strong selective pressure for evolution to the specific sites on HA that are recognized by antibodies in the human system. These sites are referred to as [[wp: epitope | epitope]] sites. Phylogenetic analysis of H3N2 influenza has shown that exposed putative epitope sites of the HA protein evolve approximately 3.5 times faster on the trunk of phylogeny than on side branches (see {{See Figure|flutree}}).<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> This suggests that the viruses possessing mutations to these exposed sites benefit from positive selection and are more likely than viruses lacking such mutations to take over the influenza population. Conversely, other putatively non-epitope sites of the HA protein evolve approximately twice as fast on side branches than on the trunk of the H3 phylogeny,<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> indicating that mutations to these sites are selected against and viruses such mutations are less likely to take over the influenza population. Thus, analysis of phylodynamic patterns gives insight into underlying selective forces. A similar analysis across genes shows that both HA and NA undergo substantial positive selection, but that internal genes show less evidence of positive selection.<ref name=Bhatt11>{{cite pmid|21415025}}</ref>
Further analysis of the HA gene has shown it to have a very small [[wp: effective population size | effective population size]], as expected for a gene undergoing strong positive selection.<ref name=Bedford11>{{cite pmid|PMC3199772}}</ref> However, across the genome, there is surprisingly little variation in effective population size; all genes are equally low.<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> This finding suggests that reassortment between segments occurs slowly enough, relative to the actions of positive selection, that [[wp: genetic hitchhiking | genetic hitchhiking]] causes beneficial mutations in HA and NA to reduce diversity in linked neutral variation in other segments of the genome.
===Circulation patterns===
==Phylodynamics of HIV==
==References==
<references/>
789
2012-05-08T10:00:42Z
Tbedford
7
Resizing figure.
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection by the [[wp:immune system | immune system]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like the tree is (see {{See Figure|1}}).<ref name=Grenfell04/>
* How rapidly a population of virus expanded in the past may be reflected by how star-like the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.<ref name=Grenfell04/>
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.<ref name=Grenfell04/>
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitis B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math>R_0</math>.<ref name=Volz09>{{cite pmid|19797047}}</ref> Such approaches have been used to estimate <math>R_0</math> in hepatitis C<ref name=Pybus01>{{cite pmid|11423661}}</ref> and HIV<ref name=Volz09/>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeography | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref> Phylodynamic approaches have been used to reveal the patterns of geographic movement of human the influenza virus<ref name=Bedford10>{{cite pmid|20523898}}</ref> and the epidemic spread of rabies virus in North American raccoons.<ref name=Lemey11>{{cite pmid|PMC2915639}}</ref>
== Antiviral resistance ==
Without recourse to antibiotics, [[wp: antiviral drug | antiviral treatment]] is a popular approach for the control of viral pathogens. However, the use of antivirals creates selective pressure for the evolution of [[wp: drug resistance | drug resistance]] in the virus population, as resistant strains will be at a transmission advantage compared to susceptible strains. Commonly, there is a fitness [[wp: trade-off | trade-off]] between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals.{{Citation needed}} Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of [[wp: Oseltamivir | Oseltamivir]] in influenza A (H1N1).{{Citation needed}}
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corollaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yield trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
Infectious disease epidemics are often characterized by highly non-linear and rapid changes in the number of infected individuals and effective population size of virus. In such cases, birth rates are highly variable, which can diminish the correspondence between effective population size and the prevalence of infection {{Citation needed}}. Many mathematical models have been developed in the field of [[wp:Epidemic_model | mathematical epidemiology]] to describe the nonlinear time series of prevalence of infection and the number of susceptible hosts. A well studied example is the [[wp:Epidemic_model#The_SIR_Model | Susceptible-Infected-Recovered(SIR)]] system of [[wp:Ordinary_differential_equations | differential equations]], which describes the fractions of the population <math>I(t)</math> infected and <math>S(t)</math> susceptible as a function of time:
: <math>\frac{dS}{dt} = - \beta S I </math>
: <math>\frac{dI}{dt} = \beta S I - \gamma I </math>
Here, <math>\beta</math> is the per capita rate of transmissions to susceptible hosts, and <math>\gamma</math> is the rate that infected individuals recover, whereupon they are no longer infectious. In this case, the incidence of new infections per unit time is <math>f(t) = \beta S I</math>, which is analogous to the birth rate in classical population genetics models. Volz et al. {{Citation needed}} proposed that the rate of coalescence for such an epidemic will be
: <math> \lambda_n(t) = {n \choose 2} \frac{2 f(t)}{I^2(t)}. </math>
For the simple SIR model, this yields
: <math> \lambda_n(t) = {n \choose 2} \frac{2 \beta S(t)}{I(t)}. </math>
This has a similar to the Kingman coalescent rate, but is damped by the fraction susceptible <math>S(t)</math>.
The ratio <math> 2{n \choose 2} / {I(t)^2} </math> can be understood as arising from the probability that two lineages selected uniformly at random are both ancestral to the sample. This probability is the ratio of the number of ways to pick two lineages without replacement from the set of lineages and from the set of all infections: <math>{n \choose 2} / {I(t) \choose 2} \approx 2{n \choose 2} / {I(t)^2}</math>. Coalescent events will occur with this probability at the rate given by the incidence function <math>f(t)</math>.
Early in an epidemic, <math>S(0) \approx 1</math>, so for the SIR model,
: <math> \lambda_n(0) \approx {n \choose 2} \frac{2 \beta}{I(0)}.</math>
This has the same mathematical form as the rate in the Kingman coalescent, substituting <math>N_e = I(t)/2\beta</math>. Consequently, estimates of effective population size based on the Kingman coalescent will be proportional to prevalence of infection during the early period of exponential growth of the epidemic {{Citation needed}}.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
==Simulation==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
Human influenza is an acute respiratory infection caused primarily by influenza virus types [[wp: influenza A virus | A]] and [[wp: Influenzavirus B | B]], which can be further classified into subtypes, such as [[wp: influenza A virus subtype H1N1 | A/H1N1]] and [[wp: influenza A virus subtype H3N2 | A/H3N2]]. Here, subtypes are denoted according to their [[wp: hemagglutinin (influenza) | hemagglutinin]] (H or HA) and [[wp: viral neuraminidase | neuraminidase]] (N or NA) genes, which as surface proteins, act as the primary targets for the [[wp: humoral immunity | humoral immune response]]. Influenza viruses circulate in other species as well, most notably as [[wp: swine influenza | swine influenza]] and [[wp: avian influenza | avian influenza.]] Through [[wp: reassortment | reassortment]], genetic sequences from swine and avian influenza occasionally enter the human population. If a specific hemagglutinin or neuraminidase has been circulating outside the human population, then humans will lack immunity to the this protein and [[wp: influenza pandemic | a pandemic may follow a host switch event]], as seen in 1918, 1957, 1968 and 2009. After introduction into the human population, a lineage of influenza generally persists through [[wp: antigenic drift | antigenic drift]], in which HA and NA continually accumulate mutations allowing viruses to infect hosts immune to the earlier forms of the virus. These lineages of influenza show recurrent seasonal epidemics in temperate regions and less periodic transmission in the tropics. Generally, at each pandemic event, the new form of the virus will outcompete existing lineages.<ref name=Ferguson03>{{cite pmid|12660783}}</ref> The study of viral phylodynamics in influenza primarily focuses on the continual circulation and evolution of epidemic influenza, rather than on pandemic emergence.
[[Image:Flutree.png|300px|thumb|{{Figure|flutree}} Phylogenetic tree of the HA1 region of the HA gene of influenza A (H3N2) from viruses sampled between 1968 and 2002. Tips are colored according to the antigenic cluster assignment made by Smith et al.<ref name=Smith04>{{cite pmid|15218094}}</ref> and branches are colored based on a phylogenetic discrete traits model. ]]
===Selective pressures===
Phylodynamic techniques have yielded insight into the relative selective effects of mutations to different sites in different genes across the influenza virus genome. The exposed location of hemagglutinin (HA) suggests that should exist strong selective pressure for evolution to the specific sites on HA that are recognized by antibodies in the human system. These sites are referred to as [[wp: epitope | epitope]] sites. Phylogenetic analysis of H3N2 influenza has shown that exposed putative epitope sites of the HA protein evolve approximately 3.5 times faster on the trunk of phylogeny than on side branches (see {{See Figure|flutree}}).<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> This suggests that the viruses possessing mutations to these exposed sites benefit from positive selection and are more likely than viruses lacking such mutations to take over the influenza population. Conversely, other putatively non-epitope sites of the HA protein evolve approximately twice as fast on side branches than on the trunk of the H3 phylogeny,<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> indicating that mutations to these sites are selected against and viruses such mutations are less likely to take over the influenza population. Thus, analysis of phylodynamic patterns gives insight into underlying selective forces. A similar analysis across genes shows that both HA and NA undergo substantial positive selection, but that internal genes show less evidence of positive selection.<ref name=Bhatt11>{{cite pmid|21415025}}</ref>
Further analysis of the HA gene has shown it to have a very small [[wp: effective population size | effective population size]], as expected for a gene undergoing strong positive selection.<ref name=Bedford11>{{cite pmid|PMC3199772}}</ref> However, across the genome, there is surprisingly little variation in effective population size; all genes are equally low.<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> This finding suggests that reassortment between segments occurs slowly enough, relative to the actions of positive selection, that [[wp: genetic hitchhiking | genetic hitchhiking]] causes beneficial mutations in HA and NA to reduce diversity in linked neutral variation in other segments of the genome.
===Circulation patterns===
==Phylodynamics of HIV==
==References==
<references/>
790
2012-05-08T10:25:53Z
Tbedford
7
Minor revisions to flu section.
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection by the [[wp:immune system | immune system]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like the tree is (see {{See Figure|1}}).<ref name=Grenfell04/>
* How rapidly a population of virus expanded in the past may be reflected by how star-like the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.<ref name=Grenfell04/>
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.<ref name=Grenfell04/>
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitis B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math>R_0</math>.<ref name=Volz09>{{cite pmid|19797047}}</ref> Such approaches have been used to estimate <math>R_0</math> in hepatitis C<ref name=Pybus01>{{cite pmid|11423661}}</ref> and HIV<ref name=Volz09/>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeography | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref> Phylodynamic approaches have been used to reveal the patterns of geographic movement of human the influenza virus<ref name=Bedford10>{{cite pmid|20523898}}</ref> and the epidemic spread of rabies virus in North American raccoons.<ref name=Lemey11>{{cite pmid|PMC2915639}}</ref>
== Antiviral resistance ==
Without recourse to antibiotics, [[wp: antiviral drug | antiviral treatment]] is a popular approach for the control of viral pathogens. However, the use of antivirals creates selective pressure for the evolution of [[wp: drug resistance | drug resistance]] in the virus population, as resistant strains will be at a transmission advantage compared to susceptible strains. Commonly, there is a fitness [[wp: trade-off | trade-off]] between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals.{{Citation needed}} Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of [[wp: Oseltamivir | Oseltamivir]] in influenza A (H1N1).{{Citation needed}}
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corollaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yield trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
Infectious disease epidemics are often characterized by highly non-linear and rapid changes in the number of infected individuals and effective population size of virus. In such cases, birth rates are highly variable, which can diminish the correspondence between effective population size and the prevalence of infection {{Citation needed}}. Many mathematical models have been developed in the field of [[wp:Epidemic_model | mathematical epidemiology]] to describe the nonlinear time series of prevalence of infection and the number of susceptible hosts. A well studied example is the [[wp:Epidemic_model#The_SIR_Model | Susceptible-Infected-Recovered(SIR)]] system of [[wp:Ordinary_differential_equations | differential equations]], which describes the fractions of the population <math>I(t)</math> infected and <math>S(t)</math> susceptible as a function of time:
: <math>\frac{dS}{dt} = - \beta S I </math>
: <math>\frac{dI}{dt} = \beta S I - \gamma I </math>
Here, <math>\beta</math> is the per capita rate of transmissions to susceptible hosts, and <math>\gamma</math> is the rate that infected individuals recover, whereupon they are no longer infectious. In this case, the incidence of new infections per unit time is <math>f(t) = \beta S I</math>, which is analogous to the birth rate in classical population genetics models. Volz et al. {{Citation needed}} proposed that the rate of coalescence for such an epidemic will be
: <math> \lambda_n(t) = {n \choose 2} \frac{2 f(t)}{I^2(t)}. </math>
For the simple SIR model, this yields
: <math> \lambda_n(t) = {n \choose 2} \frac{2 \beta S(t)}{I(t)}. </math>
This has a similar to the Kingman coalescent rate, but is damped by the fraction susceptible <math>S(t)</math>.
The ratio <math> 2{n \choose 2} / {I(t)^2} </math> can be understood as arising from the probability that two lineages selected uniformly at random are both ancestral to the sample. This probability is the ratio of the number of ways to pick two lineages without replacement from the set of lineages and from the set of all infections: <math>{n \choose 2} / {I(t) \choose 2} \approx 2{n \choose 2} / {I(t)^2}</math>. Coalescent events will occur with this probability at the rate given by the incidence function <math>f(t)</math>.
Early in an epidemic, <math>S(0) \approx 1</math>, so for the SIR model,
: <math> \lambda_n(0) \approx {n \choose 2} \frac{2 \beta}{I(0)}.</math>
This has the same mathematical form as the rate in the Kingman coalescent, substituting <math>N_e = I(t)/2\beta</math>. Consequently, estimates of effective population size based on the Kingman coalescent will be proportional to prevalence of infection during the early period of exponential growth of the epidemic {{Citation needed}}.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
==Simulation==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
Human influenza is an acute respiratory infection primarily caused by viruses [[wp: influenza A virus | influenza A]] and [[wp: Influenzavirus B | influenza B]], which can be further classified into subtypes, such as [[wp: influenza A virus subtype H1N1 | A/H1N1]] and [[wp: influenza A virus subtype H3N2 | A/H3N2]]. Here, subtypes are denoted according to their [[wp: hemagglutinin (influenza) | hemagglutinin]] (H or HA) and [[wp: viral neuraminidase | neuraminidase]] (N or NA) genes, which as surface proteins, act as the primary targets for the [[wp: humoral immunity | humoral immune response]]. Influenza viruses circulate in other species as well, most notably as [[wp: swine influenza | swine influenza]] and [[wp: avian influenza | avian influenza.]] Through [[wp: reassortment | reassortment]], genetic sequences from swine and avian influenza occasionally enter the human population. If a particular hemagglutinin or neuraminidase has been circulating outside the human population, then humans will lack immunity to the this protein and an [[wp: influenza pandemic | influenza pandemic ]] may follow a host switch event, as seen in 1918, 1957, 1968 and 2009. After introduction into the human population, a lineage of influenza generally persists through [[wp: antigenic drift | antigenic drift]], in which HA and NA continually accumulate mutations allowing viruses to infect hosts immune to the earlier forms of the virus. These lineages of influenza show recurrent seasonal epidemics in temperate regions and less periodic transmission in the tropics. Generally, at each pandemic event, the new form of the virus outcompetes existing lineages.<ref name=Ferguson03>{{cite pmid|12660783}}</ref> The study of viral phylodynamics in influenza primarily focuses on the continual circulation and evolution of epidemic influenza, rather than on pandemic emergence. Of central interest to the study of viral phylodynamics is the distinctive phylogenetic tree of epidemic influenza, showing a single prodominant trunk lineage that persists through time and side branches that persist for only 1-5 years before going extinct (see {{See Figure|flutree}}).<ref name=Fitch97>{{cite pmid|9223253}}</ref>
[[Image:Flutree.png|300px|thumb|{{Figure|flutree}} Phylogenetic tree of the HA1 region of the HA gene of influenza A (H3N2) from viruses sampled between 1968 and 2002. Tips are colored according to the antigenic cluster assignment made by Smith et al.<ref name=Smith04>{{cite pmid|15218094}}</ref> and branches are colored based on a phylogenetic discrete traits model.<ref name=Lemey09>{{cite pmid|PMC2740835}}</ref>]]
===Selective pressures===
Phylodynamic techniques have yielded insight into the relative selective effects of mutations to different sites and different genes across the influenza virus genome. The exposed location of hemagglutinin (HA) suggests that there should exist strong selective pressure for evolution to the specific sites on HA that are recognized by antibodies in the human system. These sites are referred to as [[wp: epitope | epitope]] sites. Phylogenetic analysis of H3N2 influenza has shown that putative epitope sites of the HA protein evolve approximately 3.5 times faster on the trunk of phylogeny than on side branches (see {{See Figure|flutree}}).<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> This suggests that viruses possessing mutations to these exposed sites benefit from positive selection and are more likely than viruses lacking such mutations to take over the influenza population. Conversely, putative non-epitope sites of the HA protein evolve approximately twice as fast on side branches than on the trunk of the H3 phylogeny,<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> indicating that mutations to these sites are selected against and viruses possessing such mutations are less likely to take over the influenza population. Thus, analysis of phylodynamic patterns gives insight into underlying selective forces. A similar analysis combining sites across genes shows that while both HA and NA undergo substantial positive selection, internal genes have little acceleration of trunk substitutions, suggesting an absence of positive selection.<ref name=Bhatt11>{{cite pmid|21415025}}</ref>
Further analysis of HA has shown it to have a very small [[wp: effective population size | effective population size]] relative to the census size of the virus population, as expected for a gene undergoing strong positive selection.<ref name=Bedford11>{{cite pmid|PMC3199772}}</ref> However, across the influenza genome, there is surprisingly little variation in effective population size; all genes are nearly equally low.<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> This finding suggests that reassortment between segments occurs slowly enough, relative to the actions of positive selection, that [[wp: genetic hitchhiking | genetic hitchhiking]] causes beneficial mutations in HA and NA to reduce diversity in linked neutral variation in other segments of the genome.
===Circulation patterns===
==Phylodynamics of HIV==
==References==
<references/>
791
2012-05-08T10:48:41Z
Tbedford
7
Continuing selective pressures.
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection by the [[wp:immune system | immune system]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like the tree is (see {{See Figure|1}}).<ref name=Grenfell04/>
* How rapidly a population of virus expanded in the past may be reflected by how star-like the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.<ref name=Grenfell04/>
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.<ref name=Grenfell04/>
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitis B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math>R_0</math>.<ref name=Volz09>{{cite pmid|19797047}}</ref> Such approaches have been used to estimate <math>R_0</math> in hepatitis C<ref name=Pybus01>{{cite pmid|11423661}}</ref> and HIV<ref name=Volz09/>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeography | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref> Phylodynamic approaches have been used to reveal the patterns of geographic movement of human the influenza virus<ref name=Bedford10>{{cite pmid|20523898}}</ref> and the epidemic spread of rabies virus in North American raccoons.<ref name=Lemey11>{{cite pmid|PMC2915639}}</ref>
== Antiviral resistance ==
Without recourse to antibiotics, [[wp: antiviral drug | antiviral treatment]] is a popular approach for the control of viral pathogens. However, the use of antivirals creates selective pressure for the evolution of [[wp: drug resistance | drug resistance]] in the virus population, as resistant strains will be at a transmission advantage compared to susceptible strains. Commonly, there is a fitness [[wp: trade-off | trade-off]] between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals.{{Citation needed}} Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of [[wp: Oseltamivir | Oseltamivir]] in influenza A (H1N1).{{Citation needed}}
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corollaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yield trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
Infectious disease epidemics are often characterized by highly non-linear and rapid changes in the number of infected individuals and effective population size of virus. In such cases, birth rates are highly variable, which can diminish the correspondence between effective population size and the prevalence of infection {{Citation needed}}. Many mathematical models have been developed in the field of [[wp:Epidemic_model | mathematical epidemiology]] to describe the nonlinear time series of prevalence of infection and the number of susceptible hosts. A well studied example is the [[wp:Epidemic_model#The_SIR_Model | Susceptible-Infected-Recovered(SIR)]] system of [[wp:Ordinary_differential_equations | differential equations]], which describes the fractions of the population <math>I(t)</math> infected and <math>S(t)</math> susceptible as a function of time:
: <math>\frac{dS}{dt} = - \beta S I </math>
: <math>\frac{dI}{dt} = \beta S I - \gamma I </math>
Here, <math>\beta</math> is the per capita rate of transmissions to susceptible hosts, and <math>\gamma</math> is the rate that infected individuals recover, whereupon they are no longer infectious. In this case, the incidence of new infections per unit time is <math>f(t) = \beta S I</math>, which is analogous to the birth rate in classical population genetics models. Volz et al. {{Citation needed}} proposed that the rate of coalescence for such an epidemic will be
: <math> \lambda_n(t) = {n \choose 2} \frac{2 f(t)}{I^2(t)}. </math>
For the simple SIR model, this yields
: <math> \lambda_n(t) = {n \choose 2} \frac{2 \beta S(t)}{I(t)}. </math>
This has a similar to the Kingman coalescent rate, but is damped by the fraction susceptible <math>S(t)</math>.
The ratio <math> 2{n \choose 2} / {I(t)^2} </math> can be understood as arising from the probability that two lineages selected uniformly at random are both ancestral to the sample. This probability is the ratio of the number of ways to pick two lineages without replacement from the set of lineages and from the set of all infections: <math>{n \choose 2} / {I(t) \choose 2} \approx 2{n \choose 2} / {I(t)^2}</math>. Coalescent events will occur with this probability at the rate given by the incidence function <math>f(t)</math>.
Early in an epidemic, <math>S(0) \approx 1</math>, so for the SIR model,
: <math> \lambda_n(0) \approx {n \choose 2} \frac{2 \beta}{I(0)}.</math>
This has the same mathematical form as the rate in the Kingman coalescent, substituting <math>N_e = I(t)/2\beta</math>. Consequently, estimates of effective population size based on the Kingman coalescent will be proportional to prevalence of infection during the early period of exponential growth of the epidemic {{Citation needed}}.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
==Simulation==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
Human influenza is an acute respiratory infection primarily caused by viruses [[wp: influenza A virus | influenza A]] and [[wp: Influenzavirus B | influenza B]], which can be further classified into subtypes, such as [[wp: influenza A virus subtype H1N1 | A/H1N1]] and [[wp: influenza A virus subtype H3N2 | A/H3N2]]. Here, subtypes are denoted according to their [[wp: hemagglutinin (influenza) | hemagglutinin]] (H or HA) and [[wp: viral neuraminidase | neuraminidase]] (N or NA) genes, which as surface proteins, act as the primary targets for the [[wp: humoral immunity | humoral immune response]]. Influenza viruses circulate in other species as well, most notably as [[wp: swine influenza | swine influenza]] and [[wp: avian influenza | avian influenza.]] Through [[wp: reassortment | reassortment]], genetic sequences from swine and avian influenza occasionally enter the human population. If a particular hemagglutinin or neuraminidase has been circulating outside the human population, then humans will lack immunity to the this protein and an [[wp: influenza pandemic | influenza pandemic ]] may follow a host switch event, as seen in 1918, 1957, 1968 and 2009. After introduction into the human population, a lineage of influenza generally persists through [[wp: antigenic drift | antigenic drift]], in which HA and NA continually accumulate mutations allowing viruses to infect hosts immune to the earlier forms of the virus. These lineages of influenza show recurrent seasonal epidemics in temperate regions and less periodic transmission in the tropics. Generally, at each pandemic event, the new form of the virus outcompetes existing lineages.<ref name=Ferguson03>{{cite pmid|12660783}}</ref> The study of viral phylodynamics in influenza primarily focuses on the continual circulation and evolution of epidemic influenza, rather than on pandemic emergence. Of central interest to the study of viral phylodynamics is the distinctive phylogenetic tree of epidemic influenza, showing a single prodominant trunk lineage that persists through time and side branches that persist for only 1-5 years before going extinct (see {{See Figure|flutree}}).<ref name=Fitch97>{{cite pmid|9223253}}</ref>
[[Image:Flutree.png|300px|thumb|{{Figure|flutree}} Phylogenetic tree of the HA1 region of the HA gene of influenza A (H3N2) from viruses sampled between 1968 and 2002. Tips are colored according to the antigenic cluster assignment made by Smith et al.<ref name=Smith04>{{cite pmid|15218094}}</ref> and branches are colored based on a phylogenetic discrete traits model.<ref name=Lemey09>{{cite pmid|PMC2740835}}</ref>]]
===Selective pressures===
Phylodynamic techniques have yielded insight into the relative selective effects of mutations to different sites and different genes across the influenza virus genome. The exposed location of hemagglutinin (HA) suggests that there should exist strong selective pressure for evolution to the specific sites on HA that are recognized by antibodies in the human system. These sites are referred to as [[wp: epitope | epitope]] sites. Phylogenetic analysis of H3N2 influenza has shown that putative epitope sites of the HA protein evolve approximately 3.5 times faster on the trunk of phylogeny than on side branches (see {{See Figure|flutree}}).<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> This suggests that viruses possessing mutations to these exposed sites benefit from positive selection and are more likely than viruses lacking such mutations to take over the influenza population. Conversely, putative non-epitope sites of the HA protein evolve approximately twice as fast on side branches than on the trunk of the H3 phylogeny,<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> indicating that mutations to these sites are selected against and viruses possessing such mutations are less likely to take over the influenza population. Thus, analysis of phylodynamic patterns gives insight into underlying selective forces. A similar analysis combining sites across genes shows that while both HA and NA undergo substantial positive selection, internal genes have little acceleration of trunk substitutions, suggesting an absence of positive selection.<ref name=Bhatt11>{{cite pmid|21415025}}</ref>
Further analysis of HA has shown it to have a very small [[wp: effective population size | effective population size]] relative to the census size of the virus population, as expected for a gene undergoing strong positive selection.<ref name=Bedford11>{{cite pmid|PMC3199772}}</ref> However, across the influenza genome, there is surprisingly little variation in effective population size; all genes are nearly equally low.<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> This finding suggests that reassortment between segments occurs slowly enough, relative to the actions of positive selection, that [[wp: genetic hitchhiking | genetic hitchhiking]] causes beneficial mutations in HA and NA to reduce diversity in linked neutral variation in other segments of the genome.
Influenza A/H1N1 shows a larger effective population size and greater genetic diversity than influenza H3N2,<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> suggesting that H1N1 undergoes less adaptive evolution than H3N2. This hypothesis is supported by empirical patterns of antigenic evolution; there have been 9 vaccine updates suggested by the [[wp: World Health Organization | WHO]] for H1N1 in the interpandemic period between 1978 and 2009, while there have been 20 vaccine updates suggested for H3N2 during this same time period.<ref name=IRD>{{cite pmid|22260278}}</ref> Additionally, an analysis of patterns of sequence evolution on trunk and side branches suggests that H1N1 undergoes substantially less positive selection than H3N2.<ref name=Wolf06>{{cite pmid|17067369}}</ref><ref name=Bhatt11>{{cite pmid|21415025}}</ref> The evolutionary or epidemiological cause for this difference between H3N2 and H1N1 remains unclear.
===Circulation patterns===
==Phylodynamics of HIV==
==References==
<references/>
819
2012-05-11T14:49:30Z
Tbedford
7
Fixing typos.
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection by the [[wp:immune system | immune system]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like the tree is (see {{See Figure|1}}).<ref name=Grenfell04/>
* How rapidly a population of virus expanded in the past may be reflected by how star-like the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.<ref name=Grenfell04/>
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.<ref name=Grenfell04/>
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitis B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math>R_0</math>.<ref name=Volz09>{{cite pmid|19797047}}</ref> Such approaches have been used to estimate <math>R_0</math> in hepatitis C<ref name=Pybus01>{{cite pmid|11423661}}</ref> and HIV<ref name=Volz09/>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeography | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref> Phylodynamic approaches have been used to reveal the patterns of geographic movement of human the influenza virus<ref name=Bedford10>{{cite pmid|20523898}}</ref> and the epidemic spread of rabies virus in North American raccoons.<ref name=Lemey11>{{cite pmid|PMC2915639}}</ref>
== Antiviral resistance ==
Without recourse to antibiotics, [[wp: antiviral drug | antiviral treatment]] is a popular approach for the control of viral pathogens. However, the use of antivirals creates selective pressure for the evolution of [[wp: drug resistance | drug resistance]] in the virus population, as resistant strains will be at a transmission advantage compared to susceptible strains. Commonly, there is a fitness [[wp: trade-off | trade-off]] between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals.{{Citation needed}} Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of [[wp: Oseltamivir | Oseltamivir]] in influenza A (H1N1).{{Citation needed}}
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corollaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yield trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
Infectious disease epidemics are often characterized by highly non-linear and rapid changes in the number of infected individuals and effective population size of virus. In such cases, birth rates are highly variable, which can diminish the correspondence between effective population size and the prevalence of infection {{Citation needed}}. Many mathematical models have been developed in the field of [[wp:Epidemic_model | mathematical epidemiology]] to describe the nonlinear time series of prevalence of infection and the number of susceptible hosts. A well studied example is the [[wp:Epidemic_model#The_SIR_Model | Susceptible-Infected-Recovered(SIR)]] system of [[wp:Ordinary_differential_equations | differential equations]], which describes the fractions of the population <math>I(t)</math> infected and <math>S(t)</math> susceptible as a function of time:
: <math>\frac{dS}{dt} = - \beta S I </math>
: <math>\frac{dI}{dt} = \beta S I - \gamma I </math>
Here, <math>\beta</math> is the per capita rate of transmissions to susceptible hosts, and <math>\gamma</math> is the rate that infected individuals recover, whereupon they are no longer infectious. In this case, the incidence of new infections per unit time is <math>f(t) = \beta S I</math>, which is analogous to the birth rate in classical population genetics models. Volz et al. {{Citation needed}} proposed that the rate of coalescence for such an epidemic will be
: <math> \lambda_n(t) = {n \choose 2} \frac{2 f(t)}{I^2(t)}. </math>
For the simple SIR model, this yields
: <math> \lambda_n(t) = {n \choose 2} \frac{2 \beta S(t)}{I(t)}. </math>
This has a similar to the Kingman coalescent rate, but is damped by the fraction susceptible <math>S(t)</math>.
The ratio <math> 2{n \choose 2} / {I(t)^2} </math> can be understood as arising from the probability that two lineages selected uniformly at random are both ancestral to the sample. This probability is the ratio of the number of ways to pick two lineages without replacement from the set of lineages and from the set of all infections: <math>{n \choose 2} / {I(t) \choose 2} \approx 2{n \choose 2} / {I(t)^2}</math>. Coalescent events will occur with this probability at the rate given by the incidence function <math>f(t)</math>.
Early in an epidemic, <math>S(0) \approx 1</math>, so for the SIR model,
: <math> \lambda_n(0) \approx {n \choose 2} \frac{2 \beta}{I(0)}.</math>
This has the same mathematical form as the rate in the Kingman coalescent, substituting <math>N_e = I(t)/2\beta</math>. Consequently, estimates of effective population size based on the Kingman coalescent will be proportional to prevalence of infection during the early period of exponential growth of the epidemic {{Citation needed}}.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
==Simulation==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
Human influenza is an acute respiratory infection primarily caused by viruses [[wp: influenza A virus | influenza A]] and [[wp: Influenzavirus B | influenza B]], which can be further classified into subtypes, such as [[wp: influenza A virus subtype H1N1 | A/H1N1]] and [[wp: influenza A virus subtype H3N2 | A/H3N2]]. Here, subtypes are denoted according to their [[wp: hemagglutinin (influenza) | hemagglutinin]] (H or HA) and [[wp: viral neuraminidase | neuraminidase]] (N or NA) genes, which as surface proteins, act as the primary targets for the [[wp: humoral immunity | humoral immune response]]. Influenza viruses circulate in other species as well, most notably as [[wp: swine influenza | swine influenza]] and [[wp: avian influenza | avian influenza.]] Through [[wp: reassortment | reassortment]], genetic sequences from swine and avian influenza occasionally enter the human population. If a particular hemagglutinin or neuraminidase has been circulating outside the human population, then humans will lack immunity to the this protein and an [[wp: influenza pandemic | influenza pandemic ]] may follow a host switch event, as seen in 1918, 1957, 1968 and 2009. After introduction into the human population, a lineage of influenza generally persists through [[wp: antigenic drift | antigenic drift]], in which HA and NA continually accumulate mutations allowing viruses to infect hosts immune to earlier forms of the virus. These lineages of influenza show recurrent seasonal epidemics in temperate regions and less periodic transmission in the tropics. Generally, at each pandemic event, the new form of the virus outcompetes existing lineages.<ref name=Ferguson03>{{cite pmid|12660783}}</ref> The study of viral phylodynamics in influenza primarily focuses on the continual circulation and evolution of epidemic influenza, rather than on pandemic emergence. Of central interest to the study of viral phylodynamics is the distinctive phylogenetic tree of epidemic influenza, showing a single predominant trunk lineage that persists through time and side branches that persist for only 1-5 years before going extinct (see {{See Figure|flutree}}).<ref name=Fitch97>{{cite pmid|9223253}}</ref>
[[Image:Flutree.png|300px|thumb|{{Figure|flutree}} Phylogenetic tree of the HA1 region of the HA gene of influenza A (H3N2) from viruses sampled between 1968 and 2002. Tips are colored according to the antigenic cluster assignment made by Smith et al.<ref name=Smith04>{{cite pmid|15218094}}</ref> and branches are colored based on a phylogenetic discrete traits model.<ref name=Lemey09>{{cite pmid|PMC2740835}}</ref>]]
===Selective pressures===
Phylodynamic techniques have provided insight into the relative selective effects of mutations to different sites and different genes across the influenza virus genome. The exposed location of hemagglutinin (HA) suggests that there should exist strong selective pressure for evolution to the specific sites on HA that are recognized by antibodies in the human immune system. These sites are referred to as [[wp: epitope | epitope]] sites. Phylogenetic analysis of H3N2 influenza has shown that putative epitope sites of the HA protein evolve approximately 3.5 times faster on the trunk of phylogeny than on side branches (see {{See Figure|flutree}}).<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> This suggests that viruses possessing mutations to these exposed sites benefit from positive selection and are more likely than viruses lacking such mutations to take over the influenza population. Conversely, putative non-epitope sites of the HA protein evolve approximately twice as fast on side branches than on the trunk of the H3 phylogeny,<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> indicating that mutations to these sites are selected against and viruses possessing such mutations are less likely to take over the influenza population. Thus, analysis of phylodynamic patterns gives insight into underlying selective forces. A similar analysis combining sites across genes shows that while both HA and NA undergo substantial positive selection, internal genes have little acceleration of trunk substitutions, suggesting an absence of positive selection.<ref name=Bhatt11>{{cite pmid|21415025}}</ref>
Further analysis of HA has shown it to have a very small [[wp: effective population size | effective population size]] relative to the census size of the virus population, as expected for a gene undergoing strong positive selection.<ref name=Bedford11>{{cite pmid|PMC3199772}}</ref> However, across the influenza genome, there is surprisingly little variation in effective population size; all genes are nearly equally low.<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> This finding suggests that reassortment between segments occurs slowly enough, relative to the actions of positive selection, that [[wp: genetic hitchhiking | genetic hitchhiking]] causes beneficial mutations in HA and NA to reduce diversity in linked neutral variation in other segments of the genome.
Influenza A/H1N1 shows a larger effective population size and greater genetic diversity than influenza H3N2,<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> suggesting that H1N1 undergoes less adaptive evolution than H3N2. This hypothesis is supported by empirical patterns of antigenic evolution; there have been 9 vaccine updates recommended by the [[wp: World Health Organization | WHO]] for H1N1 in the interpandemic period between 1978 and 2009, while there have been 20 vaccine updates recommended for H3N2 during this same time period.<ref name=IRD>{{cite pmid|22260278}}</ref> Additionally, an analysis of patterns of sequence evolution on trunk and side branches suggests that H1N1 undergoes substantially less positive selection than H3N2.<ref name=Wolf06>{{cite pmid|17067369}}</ref><ref name=Bhatt11>{{cite pmid|21415025}}</ref> The underlying evolutionary or epidemiological cause for this difference between H3N2 and H1N1 remains unclear.
===Circulation patterns===
The extremely rapid turnover of the influenza population means that the rate of geographic spread of influenza lineages must be also, to some extent, rapid.
==Phylodynamics of HIV==
==References==
<references/>
872
2012-05-31T15:29:46Z
Erikvolz
8
Draft of HIV section: global diversity, growth rates, phylogenetic clustering, selection and adaptation
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection by the [[wp:immune system | immune system]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogeny]] of virus.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term ‘’phylodynamics’’ postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics“.
Grenfell et al. identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like the tree is (see {{See Figure|1}}).<ref name=Grenfell04/>
* How rapidly a population of virus expanded in the past may be reflected by how star-like the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.<ref name=Grenfell04/>
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.<ref name=Grenfell04/>
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which shows the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which shows the effects of population growth. ]]
When a virus population passes through repeated [[wp:Selective sweep | selective sweeps]], genetic diversity is highly constrained, resulting in a phylogeny which is ladder-like; branches of the phylogeny rapidly merge with the trunk of the phylogeny.
Repeated selective sweeps may result from [[wp:Herd immunity | herd immunity]] and a limited supply of susceptible hosts.
The phylogeny of [[wp:influenza | influenza virus]] bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection and repeated bottlenecks, which is reflected in phylogeny of [[wp:HIV | HIV]] (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
When a viral population grows very rapidly, the phylogeny of the virus will have external branches that appear very long relative to branches on the interior of the tree (see section [[#Phylodynamics of HIV | Phylodynamics of HIV ]]).
Such a tree is said to be “star-like” {{Citation needed}}, and such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitis B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
Population structure of the host population, especially spatial structure, may result in a highly differentiated virus population.
In this case, viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math>R_0</math>.<ref name=Volz09>{{cite pmid|19797047}}</ref> Such approaches have been used to estimate <math>R_0</math> in hepatitis C<ref name=Pybus01>{{cite pmid|11423661}}</ref> and HIV<ref name=Volz09/>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeography | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref> Phylodynamic approaches have been used to reveal the patterns of geographic movement of human the influenza virus<ref name=Bedford10>{{cite pmid|20523898}}</ref> and the epidemic spread of rabies virus in North American raccoons.<ref name=Lemey11>{{cite pmid|PMC2915639}}</ref>
== Antiviral resistance ==
Without recourse to antibiotics, [[wp: antiviral drug | antiviral treatment]] is a popular approach for the control of viral pathogens. However, the use of antivirals creates selective pressure for the evolution of [[wp: drug resistance | drug resistance]] in the virus population, as resistant strains will be at a transmission advantage compared to susceptible strains. Commonly, there is a fitness [[wp: trade-off | trade-off]] between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals.{{Citation needed}} Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of [[wp: Oseltamivir | Oseltamivir]] in influenza A (H1N1).{{Citation needed}}
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have been used to model immune selection and [[wp:Antigenic shift | antigenic shifts]] {{Citation needed}}.
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corollaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yield trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
Infectious disease epidemics are often characterized by highly non-linear and rapid changes in the number of infected individuals and effective population size of virus. In such cases, birth rates are highly variable, which can diminish the correspondence between effective population size and the prevalence of infection {{Citation needed}}. Many mathematical models have been developed in the field of [[wp:Epidemic_model | mathematical epidemiology]] to describe the nonlinear time series of prevalence of infection and the number of susceptible hosts. A well studied example is the [[wp:Epidemic_model#The_SIR_Model | Susceptible-Infected-Recovered(SIR)]] system of [[wp:Ordinary_differential_equations | differential equations]], which describes the fractions of the population <math>I(t)</math> infected and <math>S(t)</math> susceptible as a function of time:
: <math>\frac{dS}{dt} = - \beta S I </math>
: <math>\frac{dI}{dt} = \beta S I - \gamma I </math>
Here, <math>\beta</math> is the per capita rate of transmissions to susceptible hosts, and <math>\gamma</math> is the rate that infected individuals recover, whereupon they are no longer infectious. In this case, the incidence of new infections per unit time is <math>f(t) = \beta S I</math>, which is analogous to the birth rate in classical population genetics models. Volz et al. {{Citation needed}} proposed that the rate of coalescence for such an epidemic will be
: <math> \lambda_n(t) = {n \choose 2} \frac{2 f(t)}{I^2(t)}. </math>
For the simple SIR model, this yields
: <math> \lambda_n(t) = {n \choose 2} \frac{2 \beta S(t)}{I(t)}. </math>
This has a similar to the Kingman coalescent rate, but is damped by the fraction susceptible <math>S(t)</math>.
The ratio <math> 2{n \choose 2} / {I(t)^2} </math> can be understood as arising from the probability that two lineages selected uniformly at random are both ancestral to the sample. This probability is the ratio of the number of ways to pick two lineages without replacement from the set of lineages and from the set of all infections: <math>{n \choose 2} / {I(t) \choose 2} \approx 2{n \choose 2} / {I(t)^2}</math>. Coalescent events will occur with this probability at the rate given by the incidence function <math>f(t)</math>.
Early in an epidemic, <math>S(0) \approx 1</math>, so for the SIR model,
: <math> \lambda_n(0) \approx {n \choose 2} \frac{2 \beta}{I(0)}.</math>
This has the same mathematical form as the rate in the Kingman coalescent, substituting <math>N_e = I(t)/2\beta</math>. Consequently, estimates of effective population size based on the Kingman coalescent will be proportional to prevalence of infection during the early period of exponential growth of the epidemic {{Citation needed}}.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
==Simulation==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
Human influenza is an acute respiratory infection primarily caused by viruses [[wp: influenza A virus | influenza A]] and [[wp: Influenzavirus B | influenza B]], which can be further classified into subtypes, such as [[wp: influenza A virus subtype H1N1 | A/H1N1]] and [[wp: influenza A virus subtype H3N2 | A/H3N2]]. Here, subtypes are denoted according to their [[wp: hemagglutinin (influenza) | hemagglutinin]] (H or HA) and [[wp: viral neuraminidase | neuraminidase]] (N or NA) genes, which as surface proteins, act as the primary targets for the [[wp: humoral immunity | humoral immune response]]. Influenza viruses circulate in other species as well, most notably as [[wp: swine influenza | swine influenza]] and [[wp: avian influenza | avian influenza.]] Through [[wp: reassortment | reassortment]], genetic sequences from swine and avian influenza occasionally enter the human population. If a particular hemagglutinin or neuraminidase has been circulating outside the human population, then humans will lack immunity to the this protein and an [[wp: influenza pandemic | influenza pandemic ]] may follow a host switch event, as seen in 1918, 1957, 1968 and 2009. After introduction into the human population, a lineage of influenza generally persists through [[wp: antigenic drift | antigenic drift]], in which HA and NA continually accumulate mutations allowing viruses to infect hosts immune to earlier forms of the virus. These lineages of influenza show recurrent seasonal epidemics in temperate regions and less periodic transmission in the tropics. Generally, at each pandemic event, the new form of the virus outcompetes existing lineages.<ref name=Ferguson03>{{cite pmid|12660783}}</ref> The study of viral phylodynamics in influenza primarily focuses on the continual circulation and evolution of epidemic influenza, rather than on pandemic emergence. Of central interest to the study of viral phylodynamics is the distinctive phylogenetic tree of epidemic influenza, showing a single predominant trunk lineage that persists through time and side branches that persist for only 1-5 years before going extinct (see {{See Figure|flutree}}).<ref name=Fitch97>{{cite pmid|9223253}}</ref>
[[Image:Flutree.png|300px|thumb|{{Figure|flutree}} Phylogenetic tree of the HA1 region of the HA gene of influenza A (H3N2) from viruses sampled between 1968 and 2002. Tips are colored according to the antigenic cluster assignment made by Smith et al.<ref name=Smith04>{{cite pmid|15218094}}</ref> and branches are colored based on a phylogenetic discrete traits model.<ref name=Lemey09>{{cite pmid|PMC2740835}}</ref>]]
===Selective pressures===
Phylodynamic techniques have provided insight into the relative selective effects of mutations to different sites and different genes across the influenza virus genome. The exposed location of hemagglutinin (HA) suggests that there should exist strong selective pressure for evolution to the specific sites on HA that are recognized by antibodies in the human immune system. These sites are referred to as [[wp: epitope | epitope]] sites. Phylogenetic analysis of H3N2 influenza has shown that putative epitope sites of the HA protein evolve approximately 3.5 times faster on the trunk of phylogeny than on side branches (see {{See Figure|flutree}}).<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> This suggests that viruses possessing mutations to these exposed sites benefit from positive selection and are more likely than viruses lacking such mutations to take over the influenza population. Conversely, putative non-epitope sites of the HA protein evolve approximately twice as fast on side branches than on the trunk of the H3 phylogeny,<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> indicating that mutations to these sites are selected against and viruses possessing such mutations are less likely to take over the influenza population. Thus, analysis of phylodynamic patterns gives insight into underlying selective forces. A similar analysis combining sites across genes shows that while both HA and NA undergo substantial positive selection, internal genes have little acceleration of trunk substitutions, suggesting an absence of positive selection.<ref name=Bhatt11>{{cite pmid|21415025}}</ref>
Further analysis of HA has shown it to have a very small [[wp: effective population size | effective population size]] relative to the census size of the virus population, as expected for a gene undergoing strong positive selection.<ref name=Bedford11>{{cite pmid|PMC3199772}}</ref> However, across the influenza genome, there is surprisingly little variation in effective population size; all genes are nearly equally low.<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> This finding suggests that reassortment between segments occurs slowly enough, relative to the actions of positive selection, that [[wp: genetic hitchhiking | genetic hitchhiking]] causes beneficial mutations in HA and NA to reduce diversity in linked neutral variation in other segments of the genome.
Influenza A/H1N1 shows a larger effective population size and greater genetic diversity than influenza H3N2,<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> suggesting that H1N1 undergoes less adaptive evolution than H3N2. This hypothesis is supported by empirical patterns of antigenic evolution; there have been 9 vaccine updates recommended by the [[wp: World Health Organization | WHO]] for H1N1 in the interpandemic period between 1978 and 2009, while there have been 20 vaccine updates recommended for H3N2 during this same time period.<ref name=IRD>{{cite pmid|22260278}}</ref> Additionally, an analysis of patterns of sequence evolution on trunk and side branches suggests that H1N1 undergoes substantially less positive selection than H3N2.<ref name=Wolf06>{{cite pmid|17067369}}</ref><ref name=Bhatt11>{{cite pmid|21415025}}</ref> The underlying evolutionary or epidemiological cause for this difference between H3N2 and H1N1 remains unclear.
===Circulation patterns===
The extremely rapid turnover of the influenza population means that the rate of geographic spread of influenza lineages must be also, to some extent, rapid.
==Phylodynamics of HIV==
The global diversity of HIV-1 group M is shaped by its [[wp:Origin_of_AIDS |origins]] in Central Africa around the turn of the 20th century. The epidemic underwent explosive growth throughout the early 20th century with multiple radiations out of Central Africa. Multiple [[wp:Founder_effect | founder events]] have given rise to distinct [[wp:HIV_subtypes | subtypes]] which predominate in different parts of the world. Subtype B is most prevalent in North America and Western Europe, while A and C, which account for more than half of infections worldwide, are common in Africa. <ref name=osmanov02>{{cite pmid | 11832690}}</ref> Transmissibility of virus, virulence, effectiveness of antiretroviral therapy, and pathogenesis may differ slightly between subtypes. <ref name=taylor08>{{cite pmid | 18971501}}</ref>
The rapid growth of the HIV epidemic is reflected in phylogenies of HIV virus, which are star-like. Most coalescent events occur in the distant past. Figure 6 shows an example based on 173 HIV-1 sequences from the Democratic Republic of Congo. <ref name=yusim01>{{cite pmid | 11405933}}</ref>
The rate of exponential growth of HIV in Central Africa in the early 20th century preceding the establishment of modern subtypes has been estimated
* Yusim et al.<ref name=yusim01>{{cite pmid | 11405933 }}</ref> estimated <math>r= 0.1676</math> using coalescent methods and a parametric model of exponential growth
* Similar conclusions were reached using nonparametric estimates of <math>N_e</math><ref name=strimmer01>{{cite pmid | 11719579}}</ref>
Different epidemic growth rates have been estimated for different subtypes using coalescent approaches. The early growth of subtype B in North America was quite high. Estimates range from <math>r=0.48-0.834</math> transmissions per infection per year.<ref name=walker>{{cite pmid | 15737910}}</ref><ref name=robbins03>{{cite pmid | 12743293}}</ref>The duration of exponential growth in North America was relatively short, with saturation occurring in the mid- and late-1980s. <ref name=Volz09>{{cite pmid|19797047}}</ref> The early growth rates of the more common subtype C in Africa is lower ( approximately 0.27 per infection per year), although exponential growth has continued for a longer period of time.<ref name=grassly99>{{pmid | 9927440}}</ref><ref name=walker>{{cite pmid | 15737910}}</ref> HIV-1 group O, which is relatively rare and is found mainly in Cameroon, has grown at a lower rate (approximately 0.068 transmissions per infection per year. <ref name=lemey04> {{cite pmid | 15280223 }}</ref>
HIV-1 sequences sampled over a span of five decades have been used with relaxed molecular clock phylogenetic methods to estimate the time of zoonosis around the early 20th century <ref name=worobey08>{{cite pmid | 18833279}}</ref>. The estimated TMRCA for HIV-1 coincides with the appearance of the first densely populated large cities in Central Africa. Similar methods have been used to estimate the time that HIV originated in different parts of the world. The origin of subtype B in North America is estimated to be in the 1960s, where it went undetected until the AIDS epidemic in the 1980s. <ref name=robbins03>{{cite pmid | 12743293}}</ref> There is evidence that progenitors of modern subtype B originally colonized the Caribbean before undergoing multiple radiations to North and South America. <ref name=junqueira11>{{cite pmid | 22132104}}</ref> Subtype C originated around the same time in Africa.<ref name=walker>{{cite pmid | 15737910}}</ref>
At smaller geographical and time scales, phylogenies of HIV virus may reflect epidemiological dynamics related to risk behavior and [[wp:sexual_network | sexual networks]]. Very dense sampling of viral sequences within cities over short periods of time has given a detailed picture of HIV transmission patterns in modern epidemics. Sequencing of virus from newly diagnosed patients is now routine in many countries for surveillance of [[wp:Drug_resistance | drug resistance]] mutations, which has yielded large databases of sequence data in those areas. Lewis et al. <ref name=lewis08>{{cite pmid | 18351795}}</ref> used such sequence data from [[wp:Men_who_have_sex_with_men | men who have sex with men]] in London, UK, and found evidence that transmission is highly concentrated in the brief period of [[wp:Primary_HIV_infection | primary HIV infection]] (PHI), which consist of approximately the first 6 months of the infectious period. Volz et al. {{Citation needed}} found that patients who were recently infected were more likely to harbor virus that is phylogenetically clustered with other samples from recently infected patients, reflecting that transmission had occurred recently and that transmission is more likely to occur during early infection. Such phylogenetic patterns arise from epidemiological dynamics which featured an early period of intensified transmission during PHI, but also depends on sampling a large fraction of extant viral lineages.
Dense sampling of HIV sequences has enabled the estimation of transmission rates between different risk groups. For example, Paraskevis et al. <ref name=paraskevis09>{{cite pmid | 19457244}}</ref> estimated transmission rates between countries in Europe, and Oster et al. <ref name=oster11>{{cite pmid | 21866038}}</ref> showed disparate transmissions within racial and demographic groups within Mississippi, USA.
Purifying immune selection dominates evolution of HIV within hosts, but evolution between hosts is largely decoupled from intra-host evolution <ref name=rambaut04>{{cite pmid | 14708016}}</ref>. Intra-host HIV phylogenies show continual fixation of advantageous mutations, while population-level HIV phylogenies reflect continual diversification. Immune selection has relatively little influence on HIV phylogenies at the population level because
* There is an extreme bottleneck in viral diversity at the time of sexual transmission.<ref name=keele10>{{cite pmid | 20543609}}</ref>
* Transmission tends to occur early in infection before immune selection has had a chance to operate <ref name=cohen11>{{cite pmid | 21591946}}</ref>.
* The replicative fitness measured in transmissions per host are largely extrinsic to virological factors, and include heterogeneous sexual and drug-use behaviors in the host population.
There is some evidence of HIV [[wp:Virulence#Evolution |adaptation towards intermediate virulence]] which can maximize transmission potential between hosts. <ref name=fraser07>{{cite pmid | 17954909}}</ref> This hypothesis is predicated on several observations:
* The set-point [[wp:Viral_load | viral load]] (SPVL), which is the quasi-equilibrium titre of viral particles in the blood during [[wp:HIV#Chronic_infection | chronic infection]], is correlated with the time until AIDS. SPVL is therefore a useful proxy for virulence. <ref name=korenromp09>{{cite pmid | 19536329}}</ref>
* SPVL is correlated between HIV donor and recipients in transmission pairs. <ref name=hollingsworth10>{{cite pmid | 20463808}}</ref>
* The transmission probability per sexual act is has positive correlation with viral load. Thus, there is a trade-off between the [[wp:Intensity_function | intensity]] of transmission and the lifespan of the host. <ref name=baeten03>{{cite pmid | 15043213}}</ref><ref name=fiore97>{{cite pmid | 9233454}}</ref>
* Recent work <ref name=shirreff11>{{cite pmid | 22022243}}</ref> has shown that given empirical values for transmissibility of HIV and lifespan of hosts as a function of SPVL, adaptation of HIV towards optimum SPVL could be expected over 100-150 years.
==References==
<references/>
943
2012-06-09T22:50:50Z
Katia.koelle
9
Some minor edits; also added first version of flu simulation model section.
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection by the [[wp:immune system | immune system]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogenies]] of viruses.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term "phylodynamics" postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics".
Grenfell and colleagues identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like the tree is (see {{See Figure|1}}).<ref name=Grenfell04/>
* How rapidly a population of virus expanded in the past may be reflected by how star-like the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.<ref name=Grenfell04/>
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.<ref name=Grenfell04/>
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which show the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which show the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which show the effects of population structure. Tip labels A and B in this case denote spatial locations from which viral samples were isolated. ]]
The first pattern, addressing the effect of immune selection on the topology of a viral phylogeny, is exemplified by contrasting the trees of [[wp:influenza | influenza virus]] and [[wp:HIV | HIV]]’s antigenic glycoproteins. The phylogeny of influenza virus’s [[wp:Hemagglutinin (influenza) | hemagglutinin]] protein bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]])). Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection, which is reflected in the phylogeny of HIV’s envelope protein inferred from sequences isolated from different individuals in a population (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The second pattern, addressing the effect of viral population growth on viral phylogenies, holds that expanding viral populations have “star-like” {{Citation needed}} trees, with long external branches relative to internal branches. Such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size. Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitis B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
The third pattern addresses the effect that population structure of the host, especially spatial structure, can have on differentiating the viral population. Viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are expected to be more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math>R_0</math>.<ref name=Volz09>{{cite pmid|19797047}}</ref> Such approaches have been used to estimate <math>R_0</math> in hepatitis C<ref name=Pybus01>{{cite pmid|11423661}}</ref> and HIV<ref name=Volz09/>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeography | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref> Phylodynamic approaches have been used to reveal the patterns of geographic movement of human the influenza virus<ref name=Bedford10>{{cite pmid|20523898}}</ref> and the epidemic spread of rabies virus in North American raccoons.<ref name=Lemey11>{{cite pmid|PMC2915639}}</ref>
== Antiviral resistance ==
Without recourse to antibiotics, [[wp: antiviral drug | antiviral treatment]] is a popular approach for the control of viral pathogens. However, the use of antivirals creates selective pressure for the evolution of [[wp: drug resistance | drug resistance]] in the virus population, as resistant strains will be at a transmission advantage compared to susceptible strains. Commonly, there is a fitness [[wp: trade-off | trade-off]] between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals.{{Citation needed}} Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of [[wp: Oseltamivir | Oseltamivir]] in influenza A (H1N1).{{Citation needed}}
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have most commonly been used to model immune selection (see section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corollaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yield trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
Infectious disease epidemics are often characterized by highly non-linear and rapid changes in the number of infected individuals and effective population size of virus. In such cases, birth rates are highly variable, which can diminish the correspondence between effective population size and the prevalence of infection {{Citation needed}}. Many mathematical models have been developed in the field of [[wp:Epidemic_model | mathematical epidemiology]] to describe the nonlinear time series of prevalence of infection and the number of susceptible hosts. A well studied example is the [[wp:Epidemic_model#The_SIR_Model | Susceptible-Infected-Recovered(SIR)]] system of [[wp:Ordinary_differential_equations | differential equations]], which describes the fractions of the population <math>I(t)</math> infected and <math>S(t)</math> susceptible as a function of time:
: <math>\frac{dS}{dt} = - \beta S I </math>
: <math>\frac{dI}{dt} = \beta S I - \gamma I </math>
Here, <math>\beta</math> is the per capita rate of transmissions to susceptible hosts, and <math>\gamma</math> is the rate that infected individuals recover, whereupon they are no longer infectious. In this case, the incidence of new infections per unit time is <math>f(t) = \beta S I</math>, which is analogous to the birth rate in classical population genetics models. Volz et al. {{Citation needed}} proposed that the rate of coalescence for such an epidemic will be
: <math> \lambda_n(t) = {n \choose 2} \frac{2 f(t)}{I^2(t)}. </math>
For the simple SIR model, this yields
: <math> \lambda_n(t) = {n \choose 2} \frac{2 \beta S(t)}{I(t)}. </math>
This has a similar to the Kingman coalescent rate, but is damped by the fraction susceptible <math>S(t)</math>.
The ratio <math> 2{n \choose 2} / {I(t)^2} </math> can be understood as arising from the probability that two lineages selected uniformly at random are both ancestral to the sample. This probability is the ratio of the number of ways to pick two lineages without replacement from the set of lineages and from the set of all infections: <math>{n \choose 2} / {I(t) \choose 2} \approx 2{n \choose 2} / {I(t)^2}</math>. Coalescent events will occur with this probability at the rate given by the incidence function <math>f(t)</math>.
Early in an epidemic, <math>S(0) \approx 1</math>, so for the SIR model,
: <math> \lambda_n(0) \approx {n \choose 2} \frac{2 \beta}{I(0)}.</math>
This has the same mathematical form as the rate in the Kingman coalescent, substituting <math>N_e = I(t)/2\beta</math>. Consequently, estimates of effective population size based on the Kingman coalescent will be proportional to prevalence of infection during the early period of exponential growth of the epidemic {{Citation needed}}.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
==Simulation==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
Human influenza is an acute respiratory infection primarily caused by viruses [[wp: influenza A virus | influenza A]] and [[wp: Influenzavirus B | influenza B]], which can be further classified into subtypes, such as [[wp: influenza A virus subtype H1N1 | A/H1N1]] and [[wp: influenza A virus subtype H3N2 | A/H3N2]]. Here, subtypes are denoted according to their [[wp: hemagglutinin (influenza) | hemagglutinin]] (H or HA) and [[wp: viral neuraminidase | neuraminidase]] (N or NA) genes, which as surface proteins, act as the primary targets for the [[wp: humoral immunity | humoral immune response]]. Influenza viruses circulate in other species as well, most notably as [[wp: swine influenza | swine influenza]] and [[wp: avian influenza | avian influenza.]] Through [[wp: reassortment | reassortment]], genetic sequences from swine and avian influenza occasionally enter the human population. If a particular hemagglutinin or neuraminidase has been circulating outside the human population, then humans will lack immunity to the this protein and an [[wp: influenza pandemic | influenza pandemic ]] may follow a host switch event, as seen in 1918, 1957, 1968 and 2009. After introduction into the human population, a lineage of influenza generally persists through [[wp: antigenic drift | antigenic drift]], in which HA and NA continually accumulate mutations allowing viruses to infect hosts immune to earlier forms of the virus. These lineages of influenza show recurrent seasonal epidemics in temperate regions and less periodic transmission in the tropics. Generally, at each pandemic event, the new form of the virus outcompetes existing lineages.<ref name=Ferguson03>{{cite pmid|12660783}}</ref> The study of viral phylodynamics in influenza primarily focuses on the continual circulation and evolution of epidemic influenza, rather than on pandemic emergence. Of central interest to the study of viral phylodynamics is the distinctive phylogenetic tree of epidemic influenza, showing a single predominant trunk lineage that persists through time and side branches that persist for only 1-5 years before going extinct (see {{See Figure|flutree}}).<ref name=Fitch97>{{cite pmid|9223253}}</ref>
[[Image:Flutree.png|300px|thumb|{{Figure|flutree}} Phylogenetic tree of the HA1 region of the HA gene of influenza A (H3N2) from viruses sampled between 1968 and 2002. Tips are colored according to the antigenic cluster assignment made by Smith et al.<ref name=Smith04>{{cite pmid|15218094}}</ref> and branches are colored based on a phylogenetic discrete traits model.<ref name=Lemey09>{{cite pmid|PMC2740835}}</ref>]]
===Selective pressures===
Phylodynamic techniques have provided insight into the relative selective effects of mutations to different sites and different genes across the influenza virus genome. The exposed location of hemagglutinin (HA) suggests that there should exist strong selective pressure for evolution to the specific sites on HA that are recognized by antibodies in the human immune system. These sites are referred to as [[wp: epitope | epitope]] sites. Phylogenetic analysis of H3N2 influenza has shown that putative epitope sites of the HA protein evolve approximately 3.5 times faster on the trunk of phylogeny than on side branches (see {{See Figure|flutree}}).<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> This suggests that viruses possessing mutations to these exposed sites benefit from positive selection and are more likely than viruses lacking such mutations to take over the influenza population. Conversely, putative non-epitope sites of the HA protein evolve approximately twice as fast on side branches than on the trunk of the H3 phylogeny,<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> indicating that mutations to these sites are selected against and viruses possessing such mutations are less likely to take over the influenza population. Thus, analysis of phylodynamic patterns gives insight into underlying selective forces. A similar analysis combining sites across genes shows that while both HA and NA undergo substantial positive selection, internal genes have little acceleration of trunk substitutions, suggesting an absence of positive selection.<ref name=Bhatt11>{{cite pmid|21415025}}</ref>
Further analysis of HA has shown it to have a very small [[wp: effective population size | effective population size]] relative to the census size of the virus population, as expected for a gene undergoing strong positive selection.<ref name=Bedford11>{{cite pmid|PMC3199772}}</ref> However, across the influenza genome, there is surprisingly little variation in effective population size; all genes are nearly equally low.<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> This finding suggests that reassortment between segments occurs slowly enough, relative to the actions of positive selection, that [[wp: genetic hitchhiking | genetic hitchhiking]] causes beneficial mutations in HA and NA to reduce diversity in linked neutral variation in other segments of the genome.
Influenza A/H1N1 shows a larger effective population size and greater genetic diversity than influenza H3N2,<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> suggesting that H1N1 undergoes less adaptive evolution than H3N2. This hypothesis is supported by empirical patterns of antigenic evolution; there have been 9 vaccine updates recommended by the [[wp: World Health Organization | WHO]] for H1N1 in the interpandemic period between 1978 and 2009, while there have been 20 vaccine updates recommended for H3N2 during this same time period.<ref name=IRD>{{cite pmid|22260278}}</ref> Additionally, an analysis of patterns of sequence evolution on trunk and side branches suggests that H1N1 undergoes substantially less positive selection than H3N2.<ref name=Wolf06>{{cite pmid|17067369}}</ref><ref name=Bhatt11>{{cite pmid|21415025}}</ref> The underlying evolutionary or epidemiological cause for this difference between H3N2 and H1N1 remains unclear.
===Circulation patterns===
The extremely rapid turnover of the influenza population means that the rate of geographic spread of influenza lineages must be also, to some extent, rapid.
==Phylodynamics of HIV==
The global diversity of HIV-1 group M is shaped by its [[wp:Origin_of_AIDS |origins]] in Central Africa around the turn of the 20th century. The epidemic underwent explosive growth throughout the early 20th century with multiple radiations out of Central Africa. Multiple [[wp:Founder_effect | founder events]] have given rise to distinct [[wp:HIV_subtypes | subtypes]] which predominate in different parts of the world. Subtype B is most prevalent in North America and Western Europe, while A and C, which account for more than half of infections worldwide, are common in Africa. <ref name=osmanov02>{{cite pmid | 11832690}}</ref> Transmissibility of virus, virulence, effectiveness of antiretroviral therapy, and pathogenesis may differ slightly between subtypes. <ref name=taylor08>{{cite pmid | 18971501}}</ref>
The rapid growth of the HIV epidemic is reflected in phylogenies of HIV virus, which are star-like. Most coalescent events occur in the distant past. Figure 6 shows an example based on 173 HIV-1 sequences from the Democratic Republic of Congo. <ref name=yusim01>{{cite pmid | 11405933}}</ref>
The rate of exponential growth of HIV in Central Africa in the early 20th century preceding the establishment of modern subtypes has been estimated
* Yusim et al.<ref name=yusim01>{{cite pmid | 11405933 }}</ref> estimated <math>r= 0.1676</math> using coalescent methods and a parametric model of exponential growth
* Similar conclusions were reached using nonparametric estimates of <math>N_e</math><ref name=strimmer01>{{cite pmid | 11719579}}</ref>
Different epidemic growth rates have been estimated for different subtypes using coalescent approaches. The early growth of subtype B in North America was quite high. Estimates range from <math>r=0.48-0.834</math> transmissions per infection per year.<ref name=walker>{{cite pmid | 15737910}}</ref><ref name=robbins03>{{cite pmid | 12743293}}</ref>The duration of exponential growth in North America was relatively short, with saturation occurring in the mid- and late-1980s. <ref name=Volz09>{{cite pmid|19797047}}</ref> The early growth rates of the more common subtype C in Africa is lower ( approximately 0.27 per infection per year), although exponential growth has continued for a longer period of time.<ref name=grassly99>{{pmid | 9927440}}</ref><ref name=walker>{{cite pmid | 15737910}}</ref> HIV-1 group O, which is relatively rare and is found mainly in Cameroon, has grown at a lower rate (approximately 0.068 transmissions per infection per year. <ref name=lemey04> {{cite pmid | 15280223 }}</ref>
HIV-1 sequences sampled over a span of five decades have been used with relaxed molecular clock phylogenetic methods to estimate the time of zoonosis around the early 20th century <ref name=worobey08>{{cite pmid | 18833279}}</ref>. The estimated TMRCA for HIV-1 coincides with the appearance of the first densely populated large cities in Central Africa. Similar methods have been used to estimate the time that HIV originated in different parts of the world. The origin of subtype B in North America is estimated to be in the 1960s, where it went undetected until the AIDS epidemic in the 1980s. <ref name=robbins03>{{cite pmid | 12743293}}</ref> There is evidence that progenitors of modern subtype B originally colonized the Caribbean before undergoing multiple radiations to North and South America. <ref name=junqueira11>{{cite pmid | 22132104}}</ref> Subtype C originated around the same time in Africa.<ref name=walker>{{cite pmid | 15737910}}</ref>
At smaller geographical and time scales, phylogenies of HIV virus may reflect epidemiological dynamics related to risk behavior and [[wp:sexual_network | sexual networks]]. Very dense sampling of viral sequences within cities over short periods of time has given a detailed picture of HIV transmission patterns in modern epidemics. Sequencing of virus from newly diagnosed patients is now routine in many countries for surveillance of [[wp:Drug_resistance | drug resistance]] mutations, which has yielded large databases of sequence data in those areas. Lewis et al. <ref name=lewis08>{{cite pmid | 18351795}}</ref> used such sequence data from [[wp:Men_who_have_sex_with_men | men who have sex with men]] in London, UK, and found evidence that transmission is highly concentrated in the brief period of [[wp:Primary_HIV_infection | primary HIV infection]] (PHI), which consist of approximately the first 6 months of the infectious period. Volz et al. {{Citation needed}} found that patients who were recently infected were more likely to harbor virus that is phylogenetically clustered with other samples from recently infected patients, reflecting that transmission had occurred recently and that transmission is more likely to occur during early infection. Such phylogenetic patterns arise from epidemiological dynamics which featured an early period of intensified transmission during PHI, but also depends on sampling a large fraction of extant viral lineages.
Dense sampling of HIV sequences has enabled the estimation of transmission rates between different risk groups. For example, Paraskevis et al. <ref name=paraskevis09>{{cite pmid | 19457244}}</ref> estimated transmission rates between countries in Europe, and Oster et al. <ref name=oster11>{{cite pmid | 21866038}}</ref> showed disparate transmissions within racial and demographic groups within Mississippi, USA.
Purifying immune selection dominates evolution of HIV within hosts, but evolution between hosts is largely decoupled from intra-host evolution <ref name=rambaut04>{{cite pmid | 14708016}}</ref>. Intra-host HIV phylogenies show continual fixation of advantageous mutations, while population-level HIV phylogenies reflect continual diversification. Immune selection has relatively little influence on HIV phylogenies at the population level because
* There is an extreme bottleneck in viral diversity at the time of sexual transmission.<ref name=keele10>{{cite pmid | 20543609}}</ref>
* Transmission tends to occur early in infection before immune selection has had a chance to operate <ref name=cohen11>{{cite pmid | 21591946}}</ref>.
* The replicative fitness measured in transmissions per host are largely extrinsic to virological factors, and include heterogeneous sexual and drug-use behaviors in the host population.
There is some evidence of HIV [[wp:Virulence#Evolution |adaptation towards intermediate virulence]] which can maximize transmission potential between hosts. <ref name=fraser07>{{cite pmid | 17954909}}</ref> This hypothesis is predicated on several observations:
* The set-point [[wp:Viral_load | viral load]] (SPVL), which is the quasi-equilibrium titre of viral particles in the blood during [[wp:HIV#Chronic_infection | chronic infection]], is correlated with the time until AIDS. SPVL is therefore a useful proxy for virulence. <ref name=korenromp09>{{cite pmid | 19536329}}</ref>
* SPVL is correlated between HIV donor and recipients in transmission pairs. <ref name=hollingsworth10>{{cite pmid | 20463808}}</ref>
* The transmission probability per sexual act is has positive correlation with viral load. Thus, there is a trade-off between the [[wp:Intensity_function | intensity]] of transmission and the lifespan of the host. <ref name=baeten03>{{cite pmid | 15043213}}</ref><ref name=fiore97>{{cite pmid | 9233454}}</ref>
* Recent work <ref name=shirreff11>{{cite pmid | 22022243}}</ref> has shown that given empirical values for transmissibility of HIV and lifespan of hosts as a function of SPVL, adaptation of HIV towards optimum SPVL could be expected over 100-150 years.
==References==
<references/>
944
2012-06-09T23:22:47Z
Katia.koelle
9
minor edits, added section of flu forward models
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection by the [[wp:immune system | immune system]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogenies]] of viruses.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term "phylodynamics" postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics".
Grenfell and colleagues identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like the tree is (see {{See Figure|1}}).<ref name=Grenfell04/>
* How rapidly a population of virus expanded in the past may be reflected by how star-like the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.<ref name=Grenfell04/>
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.<ref name=Grenfell04/>
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which show the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which show the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which show the effects of population structure. Tip labels A and B in this case denote spatial locations from which viral samples were isolated. ]]
The first pattern, addressing the effect of immune selection on the topology of a viral phylogeny, is exemplified by contrasting the trees of [[wp:influenza | influenza virus]] and [[wp:HIV | HIV]]’s surface proteins. The phylogeny of influenza virus’s [[wp:Hemagglutinin (influenza) | hemagglutinin]] protein bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]])). Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection, which is reflected in the phylogeny of HIV’s envelope protein inferred from sequences isolated from different individuals in a population (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The second pattern, addressing the effect of viral population growth on viral phylogenies, holds that expanding viral populations have “star-like” {{Citation needed}} trees, with long external branches relative to internal branches. Such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size. Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitis B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
The third pattern addresses the effect that population structure of the host, especially spatial structure, can have on differentiating the viral population. Viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are expected to be more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math>R_0</math>.<ref name=Volz09>{{cite pmid|19797047}}</ref> Such approaches have been used to estimate <math>R_0</math> in hepatitis C<ref name=Pybus01>{{cite pmid|11423661}}</ref> and HIV<ref name=Volz09/>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeography | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref> Phylodynamic approaches have been used to reveal the patterns of geographic movement of human the influenza virus<ref name=Bedford10>{{cite pmid|20523898}}</ref> and the epidemic spread of rabies virus in North American raccoons.<ref name=Lemey11>{{cite pmid|PMC2915639}}</ref>
== Antiviral resistance ==
Without recourse to antibiotics, [[wp: antiviral drug | antiviral treatment]] is a popular approach for the control of viral pathogens. However, the use of antivirals creates selective pressure for the evolution of [[wp: drug resistance | drug resistance]] in the virus population, as resistant strains will be at a transmission advantage compared to susceptible strains. Commonly, there is a fitness [[wp: trade-off | trade-off]] between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals.{{Citation needed}} Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of [[wp: Oseltamivir | Oseltamivir]] in influenza A (H1N1).{{Citation needed}}
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have most commonly been used to model immune selection (see section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corollaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yield trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
Infectious disease epidemics are often characterized by highly non-linear and rapid changes in the number of infected individuals and effective population size of virus. In such cases, birth rates are highly variable, which can diminish the correspondence between effective population size and the prevalence of infection {{Citation needed}}. Many mathematical models have been developed in the field of [[wp:Epidemic_model | mathematical epidemiology]] to describe the nonlinear time series of prevalence of infection and the number of susceptible hosts. A well studied example is the [[wp:Epidemic_model#The_SIR_Model | Susceptible-Infected-Recovered(SIR)]] system of [[wp:Ordinary_differential_equations | differential equations]], which describes the fractions of the population <math>I(t)</math> infected and <math>S(t)</math> susceptible as a function of time:
: <math>\frac{dS}{dt} = - \beta S I </math>
: <math>\frac{dI}{dt} = \beta S I - \gamma I </math>
Here, <math>\beta</math> is the per capita rate of transmissions to susceptible hosts, and <math>\gamma</math> is the rate that infected individuals recover, whereupon they are no longer infectious. In this case, the incidence of new infections per unit time is <math>f(t) = \beta S I</math>, which is analogous to the birth rate in classical population genetics models. Volz et al. {{Citation needed}} proposed that the rate of coalescence for such an epidemic will be
: <math> \lambda_n(t) = {n \choose 2} \frac{2 f(t)}{I^2(t)}. </math>
For the simple SIR model, this yields
: <math> \lambda_n(t) = {n \choose 2} \frac{2 \beta S(t)}{I(t)}. </math>
This has a similar to the Kingman coalescent rate, but is damped by the fraction susceptible <math>S(t)</math>.
The ratio <math> 2{n \choose 2} / {I(t)^2} </math> can be understood as arising from the probability that two lineages selected uniformly at random are both ancestral to the sample. This probability is the ratio of the number of ways to pick two lineages without replacement from the set of lineages and from the set of all infections: <math>{n \choose 2} / {I(t) \choose 2} \approx 2{n \choose 2} / {I(t)^2}</math>. Coalescent events will occur with this probability at the rate given by the incidence function <math>f(t)</math>.
Early in an epidemic, <math>S(0) \approx 1</math>, so for the SIR model,
: <math> \lambda_n(0) \approx {n \choose 2} \frac{2 \beta}{I(0)}.</math>
This has the same mathematical form as the rate in the Kingman coalescent, substituting <math>N_e = I(t)/2\beta</math>. Consequently, estimates of effective population size based on the Kingman coalescent will be proportional to prevalence of infection during the early period of exponential growth of the epidemic {{Citation needed}}.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
==Simulation==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
Human influenza is an acute respiratory infection primarily caused by viruses [[wp: influenza A virus | influenza A]] and [[wp: Influenzavirus B | influenza B]]. Influenza A viruses can be further classified into subtypes, such as [[wp: influenza A virus subtype H1N1 | A/H1N1]] and [[wp: influenza A virus subtype H3N2 | A/H3N2]]. Here, subtypes are denoted according to their [[wp: hemagglutinin (influenza) | hemagglutinin]] (H or HA) and [[wp: viral neuraminidase | neuraminidase]] (N or NA) genes, which as surface proteins, act as the primary targets for the [[wp: humoral immunity | humoral immune response]]. Influenza viruses circulate in other species as well, most notably as [[wp: swine influenza | swine influenza]] and [[wp: avian influenza | avian influenza.]] Through [[wp: reassortment | reassortment]], genetic sequences from swine and avian influenza occasionally enter the human population. If a particular hemagglutinin or neuraminidase has been circulating outside the human population, then humans will lack immunity to the this protein and an [[wp: influenza pandemic | influenza pandemic ]] may follow a host switch event, as seen in 1918, 1957, 1968 and 2009. After introduction into the human population, a lineage of influenza generally persists through [[wp: antigenic drift | antigenic drift]], in which HA and NA continually accumulate mutations allowing viruses to infect hosts immune to earlier forms of the virus. These lineages of influenza show recurrent seasonal epidemics in temperate regions and less periodic transmission in the tropics. Generally, at each pandemic event, the new form of the virus outcompetes existing lineages.<ref name=Ferguson03>{{cite pmid|12660783}}</ref> The study of viral phylodynamics in influenza primarily focuses on the continual circulation and evolution of epidemic influenza, rather than on pandemic emergence. Of central interest to the study of viral phylodynamics is the distinctive phylogenetic tree of epidemic influenza A/H3N2, which shows a single predominant trunk lineage that persists through time and side branches that persist for only 1-5 years before going extinct (see {{See Figure|flutree}}).<ref name=Fitch97>{{cite pmid|9223253}}</ref>
[[Image:Flutree.png|300px|thumb|{{Figure|flutree}} Phylogenetic tree of the HA1 region of the HA gene of influenza A (H3N2) from viruses sampled between 1968 and 2002. Tips are colored according to the antigenic cluster assignment made by Smith et al.<ref name=Smith04>{{cite pmid|15218094}}</ref> and branches are colored based on a phylogenetic discrete traits model.<ref name=Lemey09>{{cite pmid|PMC2740835}}</ref>]]
===Selective pressures===
Phylodynamic techniques have provided insight into the relative selective effects of mutations to different sites and different genes across the influenza virus genome. The exposed location of hemagglutinin (HA) suggests that there should exist strong selective pressure for evolution to the specific sites on HA that are recognized by antibodies in the human immune system. These sites are referred to as [[wp: epitope | epitope]] sites. Phylogenetic analysis of H3N2 influenza has shown that putative epitope sites of the HA protein evolve approximately 3.5 times faster on the trunk of phylogeny than on side branches (see {{See Figure|flutree}}).<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> This suggests that viruses possessing mutations to these exposed sites benefit from positive selection and are more likely than viruses lacking such mutations to take over the influenza population. Conversely, putative non-epitope sites of the HA protein evolve approximately twice as fast on side branches than on the trunk of the H3 phylogeny,<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> indicating that mutations to these sites are selected against and viruses possessing such mutations are less likely to take over the influenza population. Thus, analysis of phylodynamic patterns gives insight into underlying selective forces. A similar analysis combining sites across genes shows that while both HA and NA undergo substantial positive selection, internal genes have little acceleration of trunk substitutions, suggesting an absence of positive selection.<ref name=Bhatt11>{{cite pmid|21415025}}</ref>
Further analysis of HA has shown it to have a very small [[wp: effective population size | effective population size]] relative to the census size of the virus population, as expected for a gene undergoing strong positive selection.<ref name=Bedford11>{{cite pmid|PMC3199772}}</ref> However, across the influenza genome, there is surprisingly little variation in effective population size; all genes are nearly equally low.<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> This finding suggests that reassortment between segments occurs slowly enough, relative to the actions of positive selection, that [[wp: genetic hitchhiking | genetic hitchhiking]] causes beneficial mutations in HA and NA to reduce diversity in linked neutral variation in other segments of the genome.
Influenza A/H1N1 shows a larger effective population size and greater genetic diversity than influenza H3N2,<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> suggesting that H1N1 undergoes less adaptive evolution than H3N2. This hypothesis is supported by empirical patterns of antigenic evolution; there have been 9 vaccine updates recommended by the [[wp: World Health Organization | WHO]] for H1N1 in the interpandemic period between 1978 and 2009, while there have been 20 vaccine updates recommended for H3N2 during this same time period.<ref name=IRD>{{cite pmid|22260278}}</ref> Additionally, an analysis of patterns of sequence evolution on trunk and side branches suggests that H1N1 undergoes substantially less positive selection than H3N2.<ref name=Wolf06>{{cite pmid|17067369}}</ref><ref name=Bhatt11>{{cite pmid|21415025}}</ref> The underlying evolutionary or epidemiological cause for this difference between H3N2 and H1N1 remains unclear.
===Circulation patterns===
The extremely rapid turnover of the influenza population means that the rate of geographic spread of influenza lineages must be also, to some extent, rapid.
===Simulation-based models===
Forward simulation-based approaches for addressing how immune selection can shape the phylogeny of influenza A/H3N2’s hemagglutinin protein have been actively developed by disease modelers over the last decade. These approaches include both compartmental models and agent-based models.
One of the first compartmental models for influenza was developed by Gog and Grenfell <ref name=Gog02>{{cite pmid| 12481034}}</ref>, who simulated the dynamics of many strains with partial cross-immunity to one another. Under a parameterization of long host lifespan and short infectious period, they found that strains would form self-organized sets that would emerge and replace one another. Although the authors did not reconstruct a phylogeny from their simulated results, the dynamics they found were consistent with a ladder-like viral phylogeny exhibiting low strain diversity and rapid lineage turnover.
Later work by Ferguson and colleagues <ref name=Ferguson03>{{cite pmid| 12660783}}</ref> adopted an agent-based approach to better identify the immunological and ecological determinants of influenza evolution. The authors modeled influenza’s hemagglutinin as 4 epitopes, each consisting of 3 amino acids. They showed that under strain-specific immunity alone (with partial cross-immunity between strains based on their amino acid similarity), the phylogeny of influenza A/H3N2’s HA was expected to exhibit ‘explosive genetic diversity’, a pattern that was not observed in HA trees inferred from empirical influenza A/H3N2 isolates. This led the authors to postulate the existence of a temporary strain-transcending immunity: individuals were immune to reinfection with any other influenza strain for approximately six months following an infection. With this assumption, the agent-based model could reproduce the ladderlike phylogeny of influenza A/H3N2’s HA protein.
Work by Koelle and colleagues <ref name=Koelle06>{{cite pmid| 17185596}}</ref> revisited the dynamics of influenza A/H3N2 evolution following the publication of a seminal paper by Smith and colleagues <ref name=Smith04>{{cite pmid| 15218094}}</ref>, which showed that, while the genetic evolution of influenza A/H3N2’s HA was gradual, the antigenic evolution of the virus occurred in a punctuated manner. The phylodynamic model designed by Koelle and coauthors argued that this pattern reflected a many-to-one genotype-to-phenotype mapping, with the possibility of strains from antigenically distinct clusters of influenza sharing a high degree of genetic similarity. Through incorporating this mapping of viral genotype into viral phenotype (or antigenic cluster) into their model, the authors were able to reproduce the ladderlike phylogeny of influenza’s HA protein without generalized strain-transcending immunity. The reproduction of the ladderlike phylogeny resulted from the viral population passing through repeated selective sweeps. These sweeps were driven by [[wp:Herd immunity | herd immunity]] and acted to constrain viral genetic diversity. A simplification of this model <ref name=Koelle10>{{cite pmid| 20335193}}</ref> was also able to reproduce the phylogenetic trees of equine influenza A/H3N8 and of influenza B, both of which have two co-circulating lineages.
Instead of modeling the genotypes of viral strains, a compartmental simulation model by Gökaydin and colleagues <ref name=Gökaydin07>{{cite pmid| 17015285}}</ref> considered influenza evolution at the scale of antigenic clusters (or phenotypes). This model showed that antigenic emergence and replacement could result under certain epidemiological conditions. These antigenic dynamics would be consistent with a ladderlike phylogeny of influenza exhibiting low genetic diversity and continual strain turn-over.
In recent work, Bedford and colleagues {{Citation needed}} used an agent-based model to show that evolution in a Euclidean antigenic space can account for the phylogenetic pattern of influenza A/H3N2’s HA, as well as the virus’s antigenic, epidemiological, and geographic patterns. The model showed the reproduction of influenza’s ladderlike phylogeny depended critically on the mutation rate of the virus as well as the immunological distance yielded by each mutation.
==Phylodynamics of HIV==
The global diversity of HIV-1 group M is shaped by its [[wp:Origin_of_AIDS |origins]] in Central Africa around the turn of the 20th century. The epidemic underwent explosive growth throughout the early 20th century with multiple radiations out of Central Africa. Multiple [[wp:Founder_effect | founder events]] have given rise to distinct [[wp:HIV_subtypes | subtypes]] which predominate in different parts of the world. Subtype B is most prevalent in North America and Western Europe, while A and C, which account for more than half of infections worldwide, are common in Africa. <ref name=osmanov02>{{cite pmid | 11832690}}</ref> Transmissibility of virus, virulence, effectiveness of antiretroviral therapy, and pathogenesis may differ slightly between subtypes. <ref name=taylor08>{{cite pmid | 18971501}}</ref>
The rapid growth of the HIV epidemic is reflected in phylogenies of HIV virus, which are star-like. Most coalescent events occur in the distant past. Figure 6 shows an example based on 173 HIV-1 sequences from the Democratic Republic of Congo. <ref name=yusim01>{{cite pmid | 11405933}}</ref>
The rate of exponential growth of HIV in Central Africa in the early 20th century preceding the establishment of modern subtypes has been estimated
* Yusim et al.<ref name=yusim01>{{cite pmid | 11405933 }}</ref> estimated <math>r= 0.1676</math> using coalescent methods and a parametric model of exponential growth
* Similar conclusions were reached using nonparametric estimates of <math>N_e</math><ref name=strimmer01>{{cite pmid | 11719579}}</ref>
Different epidemic growth rates have been estimated for different subtypes using coalescent approaches. The early growth of subtype B in North America was quite high. Estimates range from <math>r=0.48-0.834</math> transmissions per infection per year.<ref name=walker>{{cite pmid | 15737910}}</ref><ref name=robbins03>{{cite pmid | 12743293}}</ref>The duration of exponential growth in North America was relatively short, with saturation occurring in the mid- and late-1980s. <ref name=Volz09>{{cite pmid|19797047}}</ref> The early growth rates of the more common subtype C in Africa is lower ( approximately 0.27 per infection per year), although exponential growth has continued for a longer period of time.<ref name=grassly99>{{pmid | 9927440}}</ref><ref name=walker>{{cite pmid | 15737910}}</ref> HIV-1 group O, which is relatively rare and is found mainly in Cameroon, has grown at a lower rate (approximately 0.068 transmissions per infection per year. <ref name=lemey04> {{cite pmid | 15280223 }}</ref>
HIV-1 sequences sampled over a span of five decades have been used with relaxed molecular clock phylogenetic methods to estimate the time of zoonosis around the early 20th century <ref name=worobey08>{{cite pmid | 18833279}}</ref>. The estimated TMRCA for HIV-1 coincides with the appearance of the first densely populated large cities in Central Africa. Similar methods have been used to estimate the time that HIV originated in different parts of the world. The origin of subtype B in North America is estimated to be in the 1960s, where it went undetected until the AIDS epidemic in the 1980s. <ref name=robbins03>{{cite pmid | 12743293}}</ref> There is evidence that progenitors of modern subtype B originally colonized the Caribbean before undergoing multiple radiations to North and South America. <ref name=junqueira11>{{cite pmid | 22132104}}</ref> Subtype C originated around the same time in Africa.<ref name=walker>{{cite pmid | 15737910}}</ref>
At smaller geographical and time scales, phylogenies of HIV virus may reflect epidemiological dynamics related to risk behavior and [[wp:sexual_network | sexual networks]]. Very dense sampling of viral sequences within cities over short periods of time has given a detailed picture of HIV transmission patterns in modern epidemics. Sequencing of virus from newly diagnosed patients is now routine in many countries for surveillance of [[wp:Drug_resistance | drug resistance]] mutations, which has yielded large databases of sequence data in those areas. Lewis et al. <ref name=lewis08>{{cite pmid | 18351795}}</ref> used such sequence data from [[wp:Men_who_have_sex_with_men | men who have sex with men]] in London, UK, and found evidence that transmission is highly concentrated in the brief period of [[wp:Primary_HIV_infection | primary HIV infection]] (PHI), which consist of approximately the first 6 months of the infectious period. Volz et al. {{Citation needed}} found that patients who were recently infected were more likely to harbor virus that is phylogenetically clustered with other samples from recently infected patients, reflecting that transmission had occurred recently and that transmission is more likely to occur during early infection. Such phylogenetic patterns arise from epidemiological dynamics which featured an early period of intensified transmission during PHI, but also depends on sampling a large fraction of extant viral lineages.
Dense sampling of HIV sequences has enabled the estimation of transmission rates between different risk groups. For example, Paraskevis et al. <ref name=paraskevis09>{{cite pmid | 19457244}}</ref> estimated transmission rates between countries in Europe, and Oster et al. <ref name=oster11>{{cite pmid | 21866038}}</ref> showed disparate transmissions within racial and demographic groups within Mississippi, USA.
Purifying immune selection dominates evolution of HIV within hosts, but evolution between hosts is largely decoupled from intra-host evolution <ref name=rambaut04>{{cite pmid | 14708016}}</ref>. Intra-host HIV phylogenies show continual fixation of advantageous mutations, while population-level HIV phylogenies reflect continual diversification. Immune selection has relatively little influence on HIV phylogenies at the population level because
* There is an extreme bottleneck in viral diversity at the time of sexual transmission.<ref name=keele10>{{cite pmid | 20543609}}</ref>
* Transmission tends to occur early in infection before immune selection has had a chance to operate <ref name=cohen11>{{cite pmid | 21591946}}</ref>.
* The replicative fitness measured in transmissions per host are largely extrinsic to virological factors, and include heterogeneous sexual and drug-use behaviors in the host population.
There is some evidence of HIV [[wp:Virulence#Evolution |adaptation towards intermediate virulence]] which can maximize transmission potential between hosts. <ref name=fraser07>{{cite pmid | 17954909}}</ref> This hypothesis is predicated on several observations:
* The set-point [[wp:Viral_load | viral load]] (SPVL), which is the quasi-equilibrium titre of viral particles in the blood during [[wp:HIV#Chronic_infection | chronic infection]], is correlated with the time until AIDS. SPVL is therefore a useful proxy for virulence. <ref name=korenromp09>{{cite pmid | 19536329}}</ref>
* SPVL is correlated between HIV donor and recipients in transmission pairs. <ref name=hollingsworth10>{{cite pmid | 20463808}}</ref>
* The transmission probability per sexual act is has positive correlation with viral load. Thus, there is a trade-off between the [[wp:Intensity_function | intensity]] of transmission and the lifespan of the host. <ref name=baeten03>{{cite pmid | 15043213}}</ref><ref name=fiore97>{{cite pmid | 9233454}}</ref>
* Recent work <ref name=shirreff11>{{cite pmid | 22022243}}</ref> has shown that given empirical values for transmissibility of HIV and lifespan of hosts as a function of SPVL, adaptation of HIV towards optimum SPVL could be expected over 100-150 years.
==References==
<references/>
945
2012-06-09T23:24:49Z
Katia.koelle
9
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection by the [[wp:immune system | immune system]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogenies]] of viruses.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term "phylodynamics" postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics".
Grenfell and colleagues identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like the tree is (see {{See Figure|1}}).<ref name=Grenfell04/>
* How rapidly a population of virus expanded in the past may be reflected by how star-like the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.<ref name=Grenfell04/>
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.<ref name=Grenfell04/>
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which show the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which show the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which show the effects of population structure. Tip labels A and B in this case denote spatial locations from which viral samples were isolated. ]]
The first pattern, addressing the effect of immune selection on the topology of a viral phylogeny, is exemplified by contrasting the trees of [[wp:influenza | influenza virus]] and [[wp:HIV | HIV]]’s surface proteins. The phylogeny of influenza virus’s [[wp:Hemagglutinin (influenza) | hemagglutinin]] protein bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]])). Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection, which is reflected in the phylogeny of HIV’s envelope protein inferred from sequences isolated from different individuals in a population (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The second pattern, addressing the effect of viral population growth on viral phylogenies, holds that expanding viral populations have “star-like” {{Citation needed}} trees, with long external branches relative to internal branches. Such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size. Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitis B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
The third pattern addresses the effect that population structure of the host, especially spatial structure, can have on differentiating the viral population. Viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are expected to be more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math>R_0</math>.<ref name=Volz09>{{cite pmid|19797047}}</ref> Such approaches have been used to estimate <math>R_0</math> in hepatitis C<ref name=Pybus01>{{cite pmid|11423661}}</ref> and HIV<ref name=Volz09/>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeography | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref> Phylodynamic approaches have been used to reveal the patterns of geographic movement of human the influenza virus<ref name=Bedford10>{{cite pmid|20523898}}</ref> and the epidemic spread of rabies virus in North American raccoons.<ref name=Lemey11>{{cite pmid|PMC2915639}}</ref>
== Antiviral resistance ==
Without recourse to antibiotics, [[wp: antiviral drug | antiviral treatment]] is a popular approach for the control of viral pathogens. However, the use of antivirals creates selective pressure for the evolution of [[wp: drug resistance | drug resistance]] in the virus population, as resistant strains will be at a transmission advantage compared to susceptible strains. Commonly, there is a fitness [[wp: trade-off | trade-off]] between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals.{{Citation needed}} Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of [[wp: Oseltamivir | Oseltamivir]] in influenza A (H1N1).{{Citation needed}}
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have most commonly been used to model immune selection (see section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corollaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yield trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
Infectious disease epidemics are often characterized by highly non-linear and rapid changes in the number of infected individuals and effective population size of virus. In such cases, birth rates are highly variable, which can diminish the correspondence between effective population size and the prevalence of infection {{Citation needed}}. Many mathematical models have been developed in the field of [[wp:Epidemic_model | mathematical epidemiology]] to describe the nonlinear time series of prevalence of infection and the number of susceptible hosts. A well studied example is the [[wp:Epidemic_model#The_SIR_Model | Susceptible-Infected-Recovered(SIR)]] system of [[wp:Ordinary_differential_equations | differential equations]], which describes the fractions of the population <math>I(t)</math> infected and <math>S(t)</math> susceptible as a function of time:
: <math>\frac{dS}{dt} = - \beta S I </math>
: <math>\frac{dI}{dt} = \beta S I - \gamma I </math>
Here, <math>\beta</math> is the per capita rate of transmissions to susceptible hosts, and <math>\gamma</math> is the rate that infected individuals recover, whereupon they are no longer infectious. In this case, the incidence of new infections per unit time is <math>f(t) = \beta S I</math>, which is analogous to the birth rate in classical population genetics models. Volz et al. {{Citation needed}} proposed that the rate of coalescence for such an epidemic will be
: <math> \lambda_n(t) = {n \choose 2} \frac{2 f(t)}{I^2(t)}. </math>
For the simple SIR model, this yields
: <math> \lambda_n(t) = {n \choose 2} \frac{2 \beta S(t)}{I(t)}. </math>
This has a similar to the Kingman coalescent rate, but is damped by the fraction susceptible <math>S(t)</math>.
The ratio <math> 2{n \choose 2} / {I(t)^2} </math> can be understood as arising from the probability that two lineages selected uniformly at random are both ancestral to the sample. This probability is the ratio of the number of ways to pick two lineages without replacement from the set of lineages and from the set of all infections: <math>{n \choose 2} / {I(t) \choose 2} \approx 2{n \choose 2} / {I(t)^2}</math>. Coalescent events will occur with this probability at the rate given by the incidence function <math>f(t)</math>.
Early in an epidemic, <math>S(0) \approx 1</math>, so for the SIR model,
: <math> \lambda_n(0) \approx {n \choose 2} \frac{2 \beta}{I(0)}.</math>
This has the same mathematical form as the rate in the Kingman coalescent, substituting <math>N_e = I(t)/2\beta</math>. Consequently, estimates of effective population size based on the Kingman coalescent will be proportional to prevalence of infection during the early period of exponential growth of the epidemic {{Citation needed}}.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
==Simulation==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
Human influenza is an acute respiratory infection primarily caused by viruses [[wp: influenza A virus | influenza A]] and [[wp: Influenzavirus B | influenza B]]. Influenza A viruses can be further classified into subtypes, such as [[wp: influenza A virus subtype H1N1 | A/H1N1]] and [[wp: influenza A virus subtype H3N2 | A/H3N2]]. Here, subtypes are denoted according to their [[wp: hemagglutinin (influenza) | hemagglutinin]] (H or HA) and [[wp: viral neuraminidase | neuraminidase]] (N or NA) genes, which as surface proteins, act as the primary targets for the [[wp: humoral immunity | humoral immune response]]. Influenza viruses circulate in other species as well, most notably as [[wp: swine influenza | swine influenza]] and [[wp: avian influenza | avian influenza.]] Through [[wp: reassortment | reassortment]], genetic sequences from swine and avian influenza occasionally enter the human population. If a particular hemagglutinin or neuraminidase has been circulating outside the human population, then humans will lack immunity to the this protein and an [[wp: influenza pandemic | influenza pandemic ]] may follow a host switch event, as seen in 1918, 1957, 1968 and 2009. After introduction into the human population, a lineage of influenza generally persists through [[wp: antigenic drift | antigenic drift]], in which HA and NA continually accumulate mutations allowing viruses to infect hosts immune to earlier forms of the virus. These lineages of influenza show recurrent seasonal epidemics in temperate regions and less periodic transmission in the tropics. Generally, at each pandemic event, the new form of the virus outcompetes existing lineages.<ref name=Ferguson03>{{cite pmid|12660783}}</ref> The study of viral phylodynamics in influenza primarily focuses on the continual circulation and evolution of epidemic influenza, rather than on pandemic emergence. Of central interest to the study of viral phylodynamics is the distinctive phylogenetic tree of epidemic influenza A/H3N2, which shows a single predominant trunk lineage that persists through time and side branches that persist for only 1-5 years before going extinct (see {{See Figure|flutree}}).<ref name=Fitch97>{{cite pmid|9223253}}</ref>
[[Image:Flutree.png|300px|thumb|{{Figure|flutree}} Phylogenetic tree of the HA1 region of the HA gene of influenza A (H3N2) from viruses sampled between 1968 and 2002. Tips are colored according to the antigenic cluster assignment made by Smith et al.<ref name=Smith04>{{cite pmid|15218094}}</ref> and branches are colored based on a phylogenetic discrete traits model.<ref name=Lemey09>{{cite pmid|PMC2740835}}</ref>]]
===Selective pressures===
Phylodynamic techniques have provided insight into the relative selective effects of mutations to different sites and different genes across the influenza virus genome. The exposed location of hemagglutinin (HA) suggests that there should exist strong selective pressure for evolution to the specific sites on HA that are recognized by antibodies in the human immune system. These sites are referred to as [[wp: epitope | epitope]] sites. Phylogenetic analysis of H3N2 influenza has shown that putative epitope sites of the HA protein evolve approximately 3.5 times faster on the trunk of phylogeny than on side branches (see {{See Figure|flutree}}).<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> This suggests that viruses possessing mutations to these exposed sites benefit from positive selection and are more likely than viruses lacking such mutations to take over the influenza population. Conversely, putative non-epitope sites of the HA protein evolve approximately twice as fast on side branches than on the trunk of the H3 phylogeny,<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> indicating that mutations to these sites are selected against and viruses possessing such mutations are less likely to take over the influenza population. Thus, analysis of phylodynamic patterns gives insight into underlying selective forces. A similar analysis combining sites across genes shows that while both HA and NA undergo substantial positive selection, internal genes have little acceleration of trunk substitutions, suggesting an absence of positive selection.<ref name=Bhatt11>{{cite pmid|21415025}}</ref>
Further analysis of HA has shown it to have a very small [[wp: effective population size | effective population size]] relative to the census size of the virus population, as expected for a gene undergoing strong positive selection.<ref name=Bedford11>{{cite pmid|PMC3199772}}</ref> However, across the influenza genome, there is surprisingly little variation in effective population size; all genes are nearly equally low.<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> This finding suggests that reassortment between segments occurs slowly enough, relative to the actions of positive selection, that [[wp: genetic hitchhiking | genetic hitchhiking]] causes beneficial mutations in HA and NA to reduce diversity in linked neutral variation in other segments of the genome.
Influenza A/H1N1 shows a larger effective population size and greater genetic diversity than influenza H3N2,<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> suggesting that H1N1 undergoes less adaptive evolution than H3N2. This hypothesis is supported by empirical patterns of antigenic evolution; there have been 9 vaccine updates recommended by the [[wp: World Health Organization | WHO]] for H1N1 in the interpandemic period between 1978 and 2009, while there have been 20 vaccine updates recommended for H3N2 during this same time period.<ref name=IRD>{{cite pmid|22260278}}</ref> Additionally, an analysis of patterns of sequence evolution on trunk and side branches suggests that H1N1 undergoes substantially less positive selection than H3N2.<ref name=Wolf06>{{cite pmid|17067369}}</ref><ref name=Bhatt11>{{cite pmid|21415025}}</ref> The underlying evolutionary or epidemiological cause for this difference between H3N2 and H1N1 remains unclear.
===Circulation patterns===
The extremely rapid turnover of the influenza population means that the rate of geographic spread of influenza lineages must be also, to some extent, rapid.
===Simulation-based models===
Forward simulation-based approaches for addressing how immune selection can shape the phylogeny of influenza A/H3N2’s hemagglutinin protein have been actively developed by disease modelers over the last decade. These approaches include both compartmental models and agent-based models.
One of the first compartmental models for influenza was developed by Gog and Grenfell <ref name=Gog02>{{cite pmid| 12481034}}</ref>, who simulated the dynamics of many strains with partial cross-immunity to one another. Under a parameterization of long host lifespan and short infectious period, they found that strains would form self-organized sets that would emerge and replace one another. Although the authors did not reconstruct a phylogeny from their simulated results, the dynamics they found were consistent with a ladder-like viral phylogeny exhibiting low strain diversity and rapid lineage turnover.
Later work by Ferguson and colleagues <ref name=Ferguson03>{{cite pmid| 12660783}}</ref> adopted an agent-based approach to better identify the immunological and ecological determinants of influenza evolution. The authors modeled influenza’s hemagglutinin as 4 epitopes, each consisting of 3 amino acids. They showed that under strain-specific immunity alone (with partial cross-immunity between strains based on their amino acid similarity), the phylogeny of influenza A/H3N2’s HA was expected to exhibit ‘explosive genetic diversity’, a pattern that was not observed in HA trees inferred from empirical influenza A/H3N2 isolates. This led the authors to postulate the existence of a temporary strain-transcending immunity: individuals were immune to reinfection with any other influenza strain for approximately six months following an infection. With this assumption, the agent-based model could reproduce the ladderlike phylogeny of influenza A/H3N2’s HA protein.
Work by Koelle and colleagues <ref name=Koelle06>{{cite pmid| 17185596}}</ref> revisited the dynamics of influenza A/H3N2 evolution following the publication of a seminal paper by Smith and colleagues <ref name=Smith04>{{cite pmid| 15218094}}</ref>, which showed that, while the genetic evolution of influenza A/H3N2’s HA was gradual, the antigenic evolution of the virus occurred in a punctuated manner. The phylodynamic model designed by Koelle and coauthors argued that this pattern reflected a many-to-one genotype-to-phenotype mapping, with the possibility of strains from antigenically distinct clusters of influenza sharing a high degree of genetic similarity. Through incorporating this mapping of viral genotype into viral phenotype (or antigenic cluster) into their model, the authors were able to reproduce the ladderlike phylogeny of influenza’s HA protein without generalized strain-transcending immunity. The reproduction of the ladderlike phylogeny resulted from the viral population passing through repeated selective sweeps. These sweeps were driven by [[wp:Herd immunity | herd immunity]] and acted to constrain viral genetic diversity. A simplification of this model <ref name=Koelle10>{{cite pmid| 20335193}}</ref> was also able to reproduce the phylogenetic trees of equine influenza A/H3N8 and of influenza B, both of which have two co-circulating lineages.
Instead of modeling the genotypes of viral strains, a compartmental simulation model by Gökaydin and colleagues <ref name=Gökaydin07>{{cite pmid| 17015285}}</ref> considered influenza evolution at the scale of antigenic clusters (or phenotypes). This model showed that antigenic emergence and replacement could result under certain epidemiological conditions. These antigenic dynamics would be consistent with a ladderlike phylogeny of influenza exhibiting low genetic diversity and continual strain turn-over.
In recent work, Bedford and colleagues {{Citation needed}} used an agent-based model to show that evolution in a Euclidean antigenic space can account for the phylogenetic pattern of influenza A/H3N2’s HA, as well as the virus’s antigenic, epidemiological, and geographic patterns. The model showed the reproduction of influenza’s ladderlike phylogeny depended critically on the mutation rate of the virus as well as the immunological distance yielded by each mutation.
==Phylodynamics of HIV==
The global diversity of HIV-1 group M is shaped by its [[wp:Origin_of_AIDS |origins]] in Central Africa around the turn of the 20th century. The epidemic underwent explosive growth throughout the early 20th century with multiple radiations out of Central Africa. Multiple [[wp:Founder_effect | founder events]] have given rise to distinct [[wp:HIV_subtypes | subtypes]] which predominate in different parts of the world. Subtype B is most prevalent in North America and Western Europe, while A and C, which account for more than half of infections worldwide, are common in Africa. <ref name=osmanov02>{{cite pmid | 11832690}}</ref> Transmissibility of virus, virulence, effectiveness of antiretroviral therapy, and pathogenesis may differ slightly between subtypes. <ref name=taylor08>{{cite pmid | 18971501}}</ref>
The rapid growth of the HIV epidemic is reflected in phylogenies of HIV virus, which are star-like. Most coalescent events occur in the distant past. Figure 6 shows an example based on 173 HIV-1 sequences from the Democratic Republic of Congo. <ref name=yusim01>{{cite pmid | 11405933}}</ref>
The rate of exponential growth of HIV in Central Africa in the early 20th century preceding the establishment of modern subtypes has been estimated
* Yusim et al.<ref name=yusim01>{{cite pmid | 11405933 }}</ref> estimated <math>r= 0.1676</math> using coalescent methods and a parametric model of exponential growth
* Similar conclusions were reached using nonparametric estimates of <math>N_e</math><ref name=strimmer01>{{cite pmid | 11719579}}</ref>
Different epidemic growth rates have been estimated for different subtypes using coalescent approaches. The early growth of subtype B in North America was quite high. Estimates range from <math>r=0.48-0.834</math> transmissions per infection per year.<ref name=walker>{{cite pmid | 15737910}}</ref><ref name=robbins03>{{cite pmid | 12743293}}</ref>The duration of exponential growth in North America was relatively short, with saturation occurring in the mid- and late-1980s. <ref name=Volz09>{{cite pmid|19797047}}</ref> The early growth rates of the more common subtype C in Africa is lower ( approximately 0.27 per infection per year), although exponential growth has continued for a longer period of time.<ref name=grassly99>{{pmid | 9927440}}</ref><ref name=walker>{{cite pmid | 15737910}}</ref> HIV-1 group O, which is relatively rare and is found mainly in Cameroon, has grown at a lower rate (approximately 0.068 transmissions per infection per year. <ref name=lemey04> {{cite pmid | 15280223 }}</ref>
HIV-1 sequences sampled over a span of five decades have been used with relaxed molecular clock phylogenetic methods to estimate the time of zoonosis around the early 20th century <ref name=worobey08>{{cite pmid | 18833279}}</ref>. The estimated TMRCA for HIV-1 coincides with the appearance of the first densely populated large cities in Central Africa. Similar methods have been used to estimate the time that HIV originated in different parts of the world. The origin of subtype B in North America is estimated to be in the 1960s, where it went undetected until the AIDS epidemic in the 1980s. <ref name=robbins03>{{cite pmid | 12743293}}</ref> There is evidence that progenitors of modern subtype B originally colonized the Caribbean before undergoing multiple radiations to North and South America. <ref name=junqueira11>{{cite pmid | 22132104}}</ref> Subtype C originated around the same time in Africa.<ref name=walker>{{cite pmid | 15737910}}</ref>
At smaller geographical and time scales, phylogenies of HIV virus may reflect epidemiological dynamics related to risk behavior and [[wp:sexual_network | sexual networks]]. Very dense sampling of viral sequences within cities over short periods of time has given a detailed picture of HIV transmission patterns in modern epidemics. Sequencing of virus from newly diagnosed patients is now routine in many countries for surveillance of [[wp:Drug_resistance | drug resistance]] mutations, which has yielded large databases of sequence data in those areas. Lewis et al. <ref name=lewis08>{{cite pmid | 18351795}}</ref> used such sequence data from [[wp:Men_who_have_sex_with_men | men who have sex with men]] in London, UK, and found evidence that transmission is highly concentrated in the brief period of [[wp:Primary_HIV_infection | primary HIV infection]] (PHI), which consist of approximately the first 6 months of the infectious period. Volz et al. {{Citation needed}} found that patients who were recently infected were more likely to harbor virus that is phylogenetically clustered with other samples from recently infected patients, reflecting that transmission had occurred recently and that transmission is more likely to occur during early infection. Such phylogenetic patterns arise from epidemiological dynamics which featured an early period of intensified transmission during PHI, but also depends on sampling a large fraction of extant viral lineages.
Dense sampling of HIV sequences has enabled the estimation of transmission rates between different risk groups. For example, Paraskevis et al. <ref name=paraskevis09>{{cite pmid | 19457244}}</ref> estimated transmission rates between countries in Europe, and Oster et al. <ref name=oster11>{{cite pmid | 21866038}}</ref> showed disparate transmissions within racial and demographic groups within Mississippi, USA.
Purifying immune selection dominates evolution of HIV within hosts, but evolution between hosts is largely decoupled from intra-host evolution <ref name=rambaut04>{{cite pmid | 14708016}}</ref>. Intra-host HIV phylogenies show continual fixation of advantageous mutations, while population-level HIV phylogenies reflect continual diversification. Immune selection has relatively little influence on HIV phylogenies at the population level because
* There is an extreme bottleneck in viral diversity at the time of sexual transmission.<ref name=keele10>{{cite pmid | 20543609}}</ref>
* Transmission tends to occur early in infection before immune selection has had a chance to operate <ref name=cohen11>{{cite pmid | 21591946}}</ref>.
* The replicative fitness measured in transmissions per host are largely extrinsic to virological factors, and include heterogeneous sexual and drug-use behaviors in the host population.
There is some evidence of HIV [[wp:Virulence#Evolution |adaptation towards intermediate virulence]] which can maximize transmission potential between hosts. <ref name=fraser07>{{cite pmid | 17954909}}</ref> This hypothesis is predicated on several observations:
* The set-point [[wp:Viral_load | viral load]] (SPVL), which is the quasi-equilibrium titre of viral particles in the blood during [[wp:HIV#Chronic_infection | chronic infection]], is correlated with the time until AIDS. SPVL is therefore a useful proxy for virulence. <ref name=korenromp09>{{cite pmid | 19536329}}</ref>
* SPVL is correlated between HIV donor and recipients in transmission pairs. <ref name=hollingsworth10>{{cite pmid | 20463808}}</ref>
* The transmission probability per sexual act is has positive correlation with viral load. Thus, there is a trade-off between the [[wp:Intensity_function | intensity]] of transmission and the lifespan of the host. <ref name=baeten03>{{cite pmid | 15043213}}</ref><ref name=fiore97>{{cite pmid | 9233454}}</ref>
* Recent work <ref name=shirreff11>{{cite pmid | 22022243}}</ref> has shown that given empirical values for transmissibility of HIV and lifespan of hosts as a function of SPVL, adaptation of HIV towards optimum SPVL could be expected over 100-150 years.
==References==
<references/>
991
2012-06-22T12:04:10Z
Tbedford
7
Include paragraph on flu circulation patterns
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection by the [[wp:immune system | immune system]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogenies]] of viruses.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term "phylodynamics" postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics".
Grenfell and colleagues identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like the tree is (see {{See Figure|1}}).<ref name=Grenfell04/>
* How rapidly a population of virus expanded in the past may be reflected by how star-like the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.<ref name=Grenfell04/>
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.<ref name=Grenfell04/>
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which show the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which show the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which show the effects of population structure. Tip labels A and B in this case denote spatial locations from which viral samples were isolated. ]]
The first pattern, addressing the effect of immune selection on the topology of a viral phylogeny, is exemplified by contrasting the trees of [[wp:influenza | influenza virus]] and [[wp:HIV | HIV]]’s surface proteins. The phylogeny of influenza virus’s [[wp:Hemagglutinin (influenza) | hemagglutinin]] protein bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]])). Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection, which is reflected in the phylogeny of HIV’s envelope protein inferred from sequences isolated from different individuals in a population (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The second pattern, addressing the effect of viral population growth on viral phylogenies, holds that expanding viral populations have “star-like” {{Citation needed}} trees, with long external branches relative to internal branches. Such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size. Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitis B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
The third pattern addresses the effect that population structure of the host, especially spatial structure, can have on differentiating the viral population. Viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are expected to be more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math>R_0</math>.<ref name=Volz09>{{cite pmid|19797047}}</ref> Such approaches have been used to estimate <math>R_0</math> in hepatitis C<ref name=Pybus01>{{cite pmid|11423661}}</ref> and HIV<ref name=Volz09/>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeography | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref> Phylodynamic approaches have been used to reveal the patterns of geographic movement of human the influenza virus<ref name=Bedford10>{{cite pmid|20523898}}</ref> and the epidemic spread of rabies virus in North American raccoons.<ref name=Lemey11>{{cite pmid|PMC2915639}}</ref>
== Antiviral resistance ==
Without recourse to antibiotics, [[wp: antiviral drug | antiviral treatment]] is a popular approach for the control of viral pathogens. However, the use of antivirals creates selective pressure for the evolution of [[wp: drug resistance | drug resistance]] in the virus population, as resistant strains will be at a transmission advantage compared to susceptible strains. Commonly, there is a fitness [[wp: trade-off | trade-off]] between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals.{{Citation needed}} Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of [[wp: Oseltamivir | Oseltamivir]] in influenza A (H1N1).{{Citation needed}}
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].{{Citation needed}}
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to incorporate prior knowledge of demographic processes while fitting complex [[wp:Nucleotide substitution model | models of nucleotide substitution]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have most commonly been used to model immune selection (see section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals {{Citation needed}},
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corollaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. {{Citation needed}} estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donneley and Tavare {{Citation needed}} derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yield trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
Infectious disease epidemics are often characterized by highly non-linear and rapid changes in the number of infected individuals and effective population size of virus. In such cases, birth rates are highly variable, which can diminish the correspondence between effective population size and the prevalence of infection {{Citation needed}}. Many mathematical models have been developed in the field of [[wp:Epidemic_model | mathematical epidemiology]] to describe the nonlinear time series of prevalence of infection and the number of susceptible hosts. A well studied example is the [[wp:Epidemic_model#The_SIR_Model | Susceptible-Infected-Recovered(SIR)]] system of [[wp:Ordinary_differential_equations | differential equations]], which describes the fractions of the population <math>I(t)</math> infected and <math>S(t)</math> susceptible as a function of time:
: <math>\frac{dS}{dt} = - \beta S I </math>
: <math>\frac{dI}{dt} = \beta S I - \gamma I </math>
Here, <math>\beta</math> is the per capita rate of transmissions to susceptible hosts, and <math>\gamma</math> is the rate that infected individuals recover, whereupon they are no longer infectious. In this case, the incidence of new infections per unit time is <math>f(t) = \beta S I</math>, which is analogous to the birth rate in classical population genetics models. Volz et al. {{Citation needed}} proposed that the rate of coalescence for such an epidemic will be
: <math> \lambda_n(t) = {n \choose 2} \frac{2 f(t)}{I^2(t)}. </math>
For the simple SIR model, this yields
: <math> \lambda_n(t) = {n \choose 2} \frac{2 \beta S(t)}{I(t)}. </math>
This has a similar to the Kingman coalescent rate, but is damped by the fraction susceptible <math>S(t)</math>.
The ratio <math> 2{n \choose 2} / {I(t)^2} </math> can be understood as arising from the probability that two lineages selected uniformly at random are both ancestral to the sample. This probability is the ratio of the number of ways to pick two lineages without replacement from the set of lineages and from the set of all infections: <math>{n \choose 2} / {I(t) \choose 2} \approx 2{n \choose 2} / {I(t)^2}</math>. Coalescent events will occur with this probability at the rate given by the incidence function <math>f(t)</math>.
Early in an epidemic, <math>S(0) \approx 1</math>, so for the SIR model,
: <math> \lambda_n(0) \approx {n \choose 2} \frac{2 \beta}{I(0)}.</math>
This has the same mathematical form as the rate in the Kingman coalescent, substituting <math>N_e = I(t)/2\beta</math>. Consequently, estimates of effective population size based on the Kingman coalescent will be proportional to prevalence of infection during the early period of exponential growth of the epidemic {{Citation needed}}.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
==Simulation==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
Human influenza is an acute respiratory infection primarily caused by viruses [[wp: influenza A virus | influenza A]] and [[wp: Influenzavirus B | influenza B]]. Influenza A viruses can be further classified into subtypes, such as [[wp: influenza A virus subtype H1N1 | A/H1N1]] and [[wp: influenza A virus subtype H3N2 | A/H3N2]]. Here, subtypes are denoted according to their [[wp: hemagglutinin (influenza) | hemagglutinin]] (H or HA) and [[wp: viral neuraminidase | neuraminidase]] (N or NA) genes, which as surface proteins, act as the primary targets for the [[wp: humoral immunity | humoral immune response]]. Influenza viruses circulate in other species as well, most notably as [[wp: swine influenza | swine influenza]] and [[wp: avian influenza | avian influenza.]] Through [[wp: reassortment | reassortment]], genetic sequences from swine and avian influenza occasionally enter the human population. If a particular hemagglutinin or neuraminidase has been circulating outside the human population, then humans will lack immunity to the this protein and an [[wp: influenza pandemic | influenza pandemic ]] may follow a host switch event, as seen in 1918, 1957, 1968 and 2009. After introduction into the human population, a lineage of influenza generally persists through [[wp: antigenic drift | antigenic drift]], in which HA and NA continually accumulate mutations allowing viruses to infect hosts immune to earlier forms of the virus. These lineages of influenza show recurrent seasonal epidemics in temperate regions and less periodic transmission in the tropics. Generally, at each pandemic event, the new form of the virus outcompetes existing lineages.<ref name=Ferguson03>{{cite pmid|12660783}}</ref> The study of viral phylodynamics in influenza primarily focuses on the continual circulation and evolution of epidemic influenza, rather than on pandemic emergence. Of central interest to the study of viral phylodynamics is the distinctive phylogenetic tree of epidemic influenza A/H3N2, which shows a single predominant trunk lineage that persists through time and side branches that persist for only 1-5 years before going extinct (see {{See Figure|flutree}}).<ref name=Fitch97>{{cite pmid|9223253}}</ref>
[[Image:Flutree.png|300px|thumb|{{Figure|flutree}} Phylogenetic tree of the HA1 region of the HA gene of influenza A (H3N2) from viruses sampled between 1968 and 2002. Tips are colored according to the antigenic cluster assignment made by Smith et al.<ref name=Smith04>{{cite pmid|15218094}}</ref> and branches are colored based on a phylogenetic discrete traits model.<ref name=Lemey09>{{cite pmid|PMC2740835}}</ref>]]
===Selective pressures===
Phylodynamic techniques have provided insight into the relative selective effects of mutations to different sites and different genes across the influenza virus genome. The exposed location of hemagglutinin (HA) suggests that there should exist strong selective pressure for evolution to the specific sites on HA that are recognized by antibodies in the human immune system. These sites are referred to as [[wp: epitope | epitope]] sites. Phylogenetic analysis of H3N2 influenza has shown that putative epitope sites of the HA protein evolve approximately 3.5 times faster on the trunk of phylogeny than on side branches (see {{See Figure|flutree}}).<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> This suggests that viruses possessing mutations to these exposed sites benefit from positive selection and are more likely than viruses lacking such mutations to take over the influenza population. Conversely, putative non-epitope sites of the HA protein evolve approximately twice as fast on side branches than on the trunk of the H3 phylogeny,<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> indicating that mutations to these sites are selected against and viruses possessing such mutations are less likely to take over the influenza population. Thus, analysis of phylodynamic patterns gives insight into underlying selective forces. A similar analysis combining sites across genes shows that while both HA and NA undergo substantial positive selection, internal genes have little acceleration of trunk substitutions, suggesting an absence of positive selection.<ref name=Bhatt11>{{cite pmid|21415025}}</ref>
Further analysis of HA has shown it to have a very small [[wp: effective population size | effective population size]] relative to the census size of the virus population, as expected for a gene undergoing strong positive selection.<ref name=Bedford11>{{cite pmid|PMC3199772}}</ref> However, across the influenza genome, there is surprisingly little variation in effective population size; all genes are nearly equally low.<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> This finding suggests that reassortment between segments occurs slowly enough, relative to the actions of positive selection, that [[wp: genetic hitchhiking | genetic hitchhiking]] causes beneficial mutations in HA and NA to reduce diversity in linked neutral variation in other segments of the genome.
Influenza A/H1N1 shows a larger effective population size and greater genetic diversity than influenza H3N2,<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> suggesting that H1N1 undergoes less adaptive evolution than H3N2. This hypothesis is supported by empirical patterns of antigenic evolution; there have been 9 vaccine updates recommended by the [[wp: World Health Organization | WHO]] for H1N1 in the interpandemic period between 1978 and 2009, while there have been 20 vaccine updates recommended for H3N2 during this same time period.<ref name=IRD>{{cite pmid|22260278}}</ref> Additionally, an analysis of patterns of sequence evolution on trunk and side branches suggests that H1N1 undergoes substantially less positive selection than H3N2.<ref name=Wolf06>{{cite pmid|17067369}}</ref><ref name=Bhatt11>{{cite pmid|21415025}}</ref> The underlying evolutionary or epidemiological cause for this difference between H3N2 and H1N1 remains unclear.
===Circulation patterns===
The extremely rapid turnover of the influenza population means that the rate of geographic spread of influenza lineages must also, to some extent, be rapid. Surveillance data clearly shows a pattern of strong seasonal epidemics in temperate regions, and less periodic epidemics in the tropics.<ref name=Finkelman07>{{cite pmid|PMC2117904}}</ref> The geographic origin of seasonal epidemics in the Northern and Southern Hemispheres had been a major open question in the field. However, recent work by Rambaut et al.<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> and Russell et al.<ref name=Russell08>{{cite pmid|18420927}}</ref> has shown that temperate epidemics usually emerge from a global reservoir rather than emerging from within the previous season's genetic diversity. This work, and more recent work by Bedford et al.<ref name=Bedford10>{{cite pmid|PMC2877742}}</ref> and Bahl et al.,<ref name=Bahl11>{{cite pmid|22084096}}</ref> has suggested that the global persistence of the influenza population is driven by viruses being passed from epidemic to epidemic, with no individual region in the world showing continual persistence. However, there is considerable debate regarding the particular configuration of the global network of influenza, with one hypothesis suggesting a metapopulation in East and Southeast Asia that continually seeds influenza in the rest of the world<ref name=Russell08>{{cite pmid|18420927}}</ref> versus a more global metapopulation in which temperate lineages can return to the tropics at the end of a seasonal epidemic.<ref name=Bedford10>{{cite pmid|PMC2877742}}</ref><ref name=Bahl11>{{cite pmid|22084096}}</ref>
All of these phylogeographic studies suffer from limitations in the worldwide sampling of influenza viruses. For example, the relative importance of tropical Africa and India have yet to be uncovered. Additionally, the phylogeographic methods used in these studies (see section on [[#Phylogeography | phylogeographic methods]]) make inferences of the ancentral locations and migration rates only on the phylogeny of the samples at hand, rather than on the population phylogeny in which these samples are embedded. Because of this, study-specific sampling procedures are a concern in extrapolating to population-level inferences. However, through joint epidemiological and evolutionary models, Bedford et al.<ref name=Bedford10>{{cite pmid|PMC2877742}}</ref> show that their estimates of migration rates appear robust to a large degree of under-sampling or over-sampling of a particular region. Further methodological progress is required to more fully address these issues.
===Simulation-based models===
Forward simulation-based approaches for addressing how immune selection can shape the phylogeny of influenza A/H3N2’s hemagglutinin protein have been actively developed by disease modelers over the last decade. These approaches include both compartmental models and agent-based models.
One of the first compartmental models for influenza was developed by Gog and Grenfell <ref name=Gog02>{{cite pmid| 12481034}}</ref>, who simulated the dynamics of many strains with partial cross-immunity to one another. Under a parameterization of long host lifespan and short infectious period, they found that strains would form self-organized sets that would emerge and replace one another. Although the authors did not reconstruct a phylogeny from their simulated results, the dynamics they found were consistent with a ladder-like viral phylogeny exhibiting low strain diversity and rapid lineage turnover.
Later work by Ferguson and colleagues <ref name=Ferguson03>{{cite pmid| 12660783}}</ref> adopted an agent-based approach to better identify the immunological and ecological determinants of influenza evolution. The authors modeled influenza’s hemagglutinin as 4 epitopes, each consisting of 3 amino acids. They showed that under strain-specific immunity alone (with partial cross-immunity between strains based on their amino acid similarity), the phylogeny of influenza A/H3N2’s HA was expected to exhibit ‘explosive genetic diversity’, a pattern that was not observed in HA trees inferred from empirical influenza A/H3N2 isolates. This led the authors to postulate the existence of a temporary strain-transcending immunity: individuals were immune to reinfection with any other influenza strain for approximately six months following an infection. With this assumption, the agent-based model could reproduce the ladderlike phylogeny of influenza A/H3N2’s HA protein.
Work by Koelle and colleagues <ref name=Koelle06>{{cite pmid| 17185596}}</ref> revisited the dynamics of influenza A/H3N2 evolution following the publication of a seminal paper by Smith and colleagues <ref name=Smith04>{{cite pmid| 15218094}}</ref>, which showed that, while the genetic evolution of influenza A/H3N2’s HA was gradual, the antigenic evolution of the virus occurred in a punctuated manner. The phylodynamic model designed by Koelle and coauthors argued that this pattern reflected a many-to-one genotype-to-phenotype mapping, with the possibility of strains from antigenically distinct clusters of influenza sharing a high degree of genetic similarity. Through incorporating this mapping of viral genotype into viral phenotype (or antigenic cluster) into their model, the authors were able to reproduce the ladderlike phylogeny of influenza’s HA protein without generalized strain-transcending immunity. The reproduction of the ladderlike phylogeny resulted from the viral population passing through repeated selective sweeps. These sweeps were driven by [[wp:Herd immunity | herd immunity]] and acted to constrain viral genetic diversity. A simplification of this model <ref name=Koelle10>{{cite pmid| 20335193}}</ref> was also able to reproduce the phylogenetic trees of equine influenza A/H3N8 and of influenza B, both of which have two co-circulating lineages.
Instead of modeling the genotypes of viral strains, a compartmental simulation model by Gökaydin and colleagues <ref name=Gökaydin07>{{cite pmid| 17015285}}</ref> considered influenza evolution at the scale of antigenic clusters (or phenotypes). This model showed that antigenic emergence and replacement could result under certain epidemiological conditions. These antigenic dynamics would be consistent with a ladderlike phylogeny of influenza exhibiting low genetic diversity and continual strain turn-over.
In recent work, Bedford and colleagues {{Citation needed}} used an agent-based model to show that evolution in a Euclidean antigenic space can account for the phylogenetic pattern of influenza A/H3N2’s HA, as well as the virus’s antigenic, epidemiological, and geographic patterns. The model showed the reproduction of influenza’s ladderlike phylogeny depended critically on the mutation rate of the virus as well as the immunological distance yielded by each mutation.
==Phylodynamics of HIV==
The global diversity of HIV-1 group M is shaped by its [[wp:Origin_of_AIDS |origins]] in Central Africa around the turn of the 20th century. The epidemic underwent explosive growth throughout the early 20th century with multiple radiations out of Central Africa. Multiple [[wp:Founder_effect | founder events]] have given rise to distinct [[wp:HIV_subtypes | subtypes]] which predominate in different parts of the world. Subtype B is most prevalent in North America and Western Europe, while A and C, which account for more than half of infections worldwide, are common in Africa. <ref name=osmanov02>{{cite pmid | 11832690}}</ref> Transmissibility of virus, virulence, effectiveness of antiretroviral therapy, and pathogenesis may differ slightly between subtypes. <ref name=taylor08>{{cite pmid | 18971501}}</ref>
The rapid growth of the HIV epidemic is reflected in phylogenies of HIV virus, which are star-like. Most coalescent events occur in the distant past. Figure 6 shows an example based on 173 HIV-1 sequences from the Democratic Republic of Congo. <ref name=yusim01>{{cite pmid | 11405933}}</ref>
The rate of exponential growth of HIV in Central Africa in the early 20th century preceding the establishment of modern subtypes has been estimated
* Yusim et al.<ref name=yusim01>{{cite pmid | 11405933 }}</ref> estimated <math>r= 0.1676</math> using coalescent methods and a parametric model of exponential growth
* Similar conclusions were reached using nonparametric estimates of <math>N_e</math><ref name=strimmer01>{{cite pmid | 11719579}}</ref>
Different epidemic growth rates have been estimated for different subtypes using coalescent approaches. The early growth of subtype B in North America was quite high. Estimates range from <math>r=0.48-0.834</math> transmissions per infection per year.<ref name=walker>{{cite pmid | 15737910}}</ref><ref name=robbins03>{{cite pmid | 12743293}}</ref>The duration of exponential growth in North America was relatively short, with saturation occurring in the mid- and late-1980s. <ref name=Volz09>{{cite pmid|19797047}}</ref> The early growth rates of the more common subtype C in Africa is lower ( approximately 0.27 per infection per year), although exponential growth has continued for a longer period of time.<ref name=grassly99>{{pmid | 9927440}}</ref><ref name=walker>{{cite pmid | 15737910}}</ref> HIV-1 group O, which is relatively rare and is found mainly in Cameroon, has grown at a lower rate (approximately 0.068 transmissions per infection per year. <ref name=lemey04> {{cite pmid | 15280223 }}</ref>
HIV-1 sequences sampled over a span of five decades have been used with relaxed molecular clock phylogenetic methods to estimate the time of zoonosis around the early 20th century <ref name=worobey08>{{cite pmid | 18833279}}</ref>. The estimated TMRCA for HIV-1 coincides with the appearance of the first densely populated large cities in Central Africa. Similar methods have been used to estimate the time that HIV originated in different parts of the world. The origin of subtype B in North America is estimated to be in the 1960s, where it went undetected until the AIDS epidemic in the 1980s. <ref name=robbins03>{{cite pmid | 12743293}}</ref> There is evidence that progenitors of modern subtype B originally colonized the Caribbean before undergoing multiple radiations to North and South America. <ref name=junqueira11>{{cite pmid | 22132104}}</ref> Subtype C originated around the same time in Africa.<ref name=walker>{{cite pmid | 15737910}}</ref>
At smaller geographical and time scales, phylogenies of HIV virus may reflect epidemiological dynamics related to risk behavior and [[wp:sexual_network | sexual networks]]. Very dense sampling of viral sequences within cities over short periods of time has given a detailed picture of HIV transmission patterns in modern epidemics. Sequencing of virus from newly diagnosed patients is now routine in many countries for surveillance of [[wp:Drug_resistance | drug resistance]] mutations, which has yielded large databases of sequence data in those areas. Lewis et al. <ref name=lewis08>{{cite pmid | 18351795}}</ref> used such sequence data from [[wp:Men_who_have_sex_with_men | men who have sex with men]] in London, UK, and found evidence that transmission is highly concentrated in the brief period of [[wp:Primary_HIV_infection | primary HIV infection]] (PHI), which consist of approximately the first 6 months of the infectious period. Volz et al. {{Citation needed}} found that patients who were recently infected were more likely to harbor virus that is phylogenetically clustered with other samples from recently infected patients, reflecting that transmission had occurred recently and that transmission is more likely to occur during early infection. Such phylogenetic patterns arise from epidemiological dynamics which featured an early period of intensified transmission during PHI, but also depends on sampling a large fraction of extant viral lineages.
Dense sampling of HIV sequences has enabled the estimation of transmission rates between different risk groups. For example, Paraskevis et al. <ref name=paraskevis09>{{cite pmid | 19457244}}</ref> estimated transmission rates between countries in Europe, and Oster et al. <ref name=oster11>{{cite pmid | 21866038}}</ref> showed disparate transmissions within racial and demographic groups within Mississippi, USA.
Purifying immune selection dominates evolution of HIV within hosts, but evolution between hosts is largely decoupled from intra-host evolution <ref name=rambaut04>{{cite pmid | 14708016}}</ref>. Intra-host HIV phylogenies show continual fixation of advantageous mutations, while population-level HIV phylogenies reflect continual diversification. Immune selection has relatively little influence on HIV phylogenies at the population level because
* There is an extreme bottleneck in viral diversity at the time of sexual transmission.<ref name=keele10>{{cite pmid | 20543609}}</ref>
* Transmission tends to occur early in infection before immune selection has had a chance to operate <ref name=cohen11>{{cite pmid | 21591946}}</ref>.
* The replicative fitness measured in transmissions per host are largely extrinsic to virological factors, and include heterogeneous sexual and drug-use behaviors in the host population.
There is some evidence of HIV [[wp:Virulence#Evolution |adaptation towards intermediate virulence]] which can maximize transmission potential between hosts. <ref name=fraser07>{{cite pmid | 17954909}}</ref> This hypothesis is predicated on several observations:
* The set-point [[wp:Viral_load | viral load]] (SPVL), which is the quasi-equilibrium titre of viral particles in the blood during [[wp:HIV#Chronic_infection | chronic infection]], is correlated with the time until AIDS. SPVL is therefore a useful proxy for virulence. <ref name=korenromp09>{{cite pmid | 19536329}}</ref>
* SPVL is correlated between HIV donor and recipients in transmission pairs. <ref name=hollingsworth10>{{cite pmid | 20463808}}</ref>
* The transmission probability per sexual act is has positive correlation with viral load. Thus, there is a trade-off between the [[wp:Intensity_function | intensity]] of transmission and the lifespan of the host. <ref name=baeten03>{{cite pmid | 15043213}}</ref><ref name=fiore97>{{cite pmid | 9233454}}</ref>
* Recent work <ref name=shirreff11>{{cite pmid | 22022243}}</ref> has shown that given empirical values for transmissibility of HIV and lifespan of hosts as a function of SPVL, adaptation of HIV towards optimum SPVL could be expected over 100-150 years.
==References==
<references/>
992
2012-06-22T12:26:55Z
Tbedford
7
References
Viral phylodynamics is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection by the [[wp:immune system | immune system]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
= Sources of phylodynamic variation =
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogenies]] of viruses.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term "phylodynamics" postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics".
Grenfell and colleagues identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like the tree is (see {{See Figure|1}}).<ref name=Grenfell04/>
* How rapidly a population of virus expanded in the past may be reflected by how star-like the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.<ref name=Grenfell04/>
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.<ref name=Grenfell04/>
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which show the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which show the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which show the effects of population structure. Tip labels A and B in this case denote spatial locations from which viral samples were isolated. ]]
The first pattern, addressing the effect of immune selection on the topology of a viral phylogeny, is exemplified by contrasting the trees of [[wp:influenza | influenza virus]] and [[wp:HIV | HIV]]’s surface proteins. The phylogeny of influenza virus’s [[wp:Hemagglutinin (influenza) | hemagglutinin]] protein bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]])). Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection, which is reflected in the phylogeny of HIV’s envelope protein inferred from sequences isolated from different individuals in a population (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The second pattern, addressing the effect of viral population growth on viral phylogenies, holds that expanding viral populations have “star-like” {{Citation needed}} trees, with long external branches relative to internal branches. Such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size. Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitis B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
The third pattern addresses the effect that population structure of the host, especially spatial structure, can have on differentiating the viral population. Viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are expected to be more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
= Applications =
== Dating origins ==
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
== Epidemiological ==
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math>R_0</math>.<ref name=Volz09>{{cite pmid|19797047}}</ref> Such approaches have been used to estimate <math>R_0</math> in hepatitis C<ref name=Pybus01>{{cite pmid|11423661}}</ref> and HIV<ref name=Volz09/>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeography | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref> Phylodynamic approaches have been used to reveal the patterns of geographic movement of human the influenza virus<ref name=Bedford10>{{cite pmid|20523898}}</ref> and the epidemic spread of rabies virus in North American raccoons.<ref name=Lemey11>{{cite pmid|PMC2915639}}</ref>
== Antiviral resistance ==
Without recourse to antibiotics, [[wp: antiviral drug | antiviral treatment]] is a popular approach for the control of viral pathogens. However, the use of antivirals creates selective pressure for the evolution of [[wp: drug resistance | drug resistance]] in the virus population, as resistant strains will be at a transmission advantage compared to susceptible strains. Commonly, there is a fitness [[wp: trade-off | trade-off]] between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals.{{Citation needed}} Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of [[wp: Oseltamivir | Oseltamivir]] in influenza A (H1N1).{{Citation needed}}
=Methods=
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to fit complex demographic scenarios while integrating out uncertainty in phylogenetic inference.<ref name=Drummond05>{{cite pmid| 15703244}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have most commonly been used to model immune selection (see section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
==Coalescent theory and phylodynamics==
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals,
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corollaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al.<ref name=Robbins03>{{cite pmid| PMC155028}}</ref> estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donnelley and Tavaré<ref name=Donnelly>{{cite pmid| 8825481}}</ref> derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yield trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
Infectious disease epidemics are often characterized by highly non-linear and rapid changes in the number of infected individuals and effective population size of virus. In such cases, birth rates are highly variable, which can diminish the correspondence between effective population size and the prevalence of infection {{Citation needed}}. Many mathematical models have been developed in the field of [[wp:Epidemic_model | mathematical epidemiology]] to describe the nonlinear time series of prevalence of infection and the number of susceptible hosts. A well studied example is the [[wp:Epidemic_model#The_SIR_Model | Susceptible-Infected-Recovered(SIR)]] system of [[wp:Ordinary_differential_equations | differential equations]], which describes the fractions of the population <math>I(t)</math> infected and <math>S(t)</math> susceptible as a function of time:
: <math>\frac{dS}{dt} = - \beta S I </math>
: <math>\frac{dI}{dt} = \beta S I - \gamma I </math>
Here, <math>\beta</math> is the per capita rate of transmissions to susceptible hosts, and <math>\gamma</math> is the rate that infected individuals recover, whereupon they are no longer infectious. In this case, the incidence of new infections per unit time is <math>f(t) = \beta S I</math>, which is analogous to the birth rate in classical population genetics models. Volz et al. {{Citation needed}} proposed that the rate of coalescence for such an epidemic will be
: <math> \lambda_n(t) = {n \choose 2} \frac{2 f(t)}{I^2(t)}. </math>
For the simple SIR model, this yields
: <math> \lambda_n(t) = {n \choose 2} \frac{2 \beta S(t)}{I(t)}. </math>
This has a similar to the Kingman coalescent rate, but is damped by the fraction susceptible <math>S(t)</math>.
The ratio <math> 2{n \choose 2} / {I(t)^2} </math> can be understood as arising from the probability that two lineages selected uniformly at random are both ancestral to the sample. This probability is the ratio of the number of ways to pick two lineages without replacement from the set of lineages and from the set of all infections: <math>{n \choose 2} / {I(t) \choose 2} \approx 2{n \choose 2} / {I(t)^2}</math>. Coalescent events will occur with this probability at the rate given by the incidence function <math>f(t)</math>.
Early in an epidemic, <math>S(0) \approx 1</math>, so for the SIR model,
: <math> \lambda_n(0) \approx {n \choose 2} \frac{2 \beta}{I(0)}.</math>
This has the same mathematical form as the rate in the Kingman coalescent, substituting <math>N_e = I(t)/2\beta</math>. Consequently, estimates of effective population size based on the Kingman coalescent will be proportional to prevalence of infection during the early period of exponential growth of the epidemic {{Citation needed}}.
== Phylogeography ==
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
==Simulation==
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
=Examples=
==Phylodynamics of Influenza==
Human influenza is an acute respiratory infection primarily caused by viruses [[wp: influenza A virus | influenza A]] and [[wp: Influenzavirus B | influenza B]]. Influenza A viruses can be further classified into subtypes, such as [[wp: influenza A virus subtype H1N1 | A/H1N1]] and [[wp: influenza A virus subtype H3N2 | A/H3N2]]. Here, subtypes are denoted according to their [[wp: hemagglutinin (influenza) | hemagglutinin]] (H or HA) and [[wp: viral neuraminidase | neuraminidase]] (N or NA) genes, which as surface proteins, act as the primary targets for the [[wp: humoral immunity | humoral immune response]]. Influenza viruses circulate in other species as well, most notably as [[wp: swine influenza | swine influenza]] and [[wp: avian influenza | avian influenza.]] Through [[wp: reassortment | reassortment]], genetic sequences from swine and avian influenza occasionally enter the human population. If a particular hemagglutinin or neuraminidase has been circulating outside the human population, then humans will lack immunity to the this protein and an [[wp: influenza pandemic | influenza pandemic ]] may follow a host switch event, as seen in 1918, 1957, 1968 and 2009. After introduction into the human population, a lineage of influenza generally persists through [[wp: antigenic drift | antigenic drift]], in which HA and NA continually accumulate mutations allowing viruses to infect hosts immune to earlier forms of the virus. These lineages of influenza show recurrent seasonal epidemics in temperate regions and less periodic transmission in the tropics. Generally, at each pandemic event, the new form of the virus outcompetes existing lineages.<ref name=Ferguson03>{{cite pmid|12660783}}</ref> The study of viral phylodynamics in influenza primarily focuses on the continual circulation and evolution of epidemic influenza, rather than on pandemic emergence. Of central interest to the study of viral phylodynamics is the distinctive phylogenetic tree of epidemic influenza A/H3N2, which shows a single predominant trunk lineage that persists through time and side branches that persist for only 1-5 years before going extinct (see {{See Figure|flutree}}).<ref name=Fitch97>{{cite pmid|9223253}}</ref>
[[Image:Flutree.png|300px|thumb|{{Figure|flutree}} Phylogenetic tree of the HA1 region of the HA gene of influenza A (H3N2) from viruses sampled between 1968 and 2002. Tips are colored according to the antigenic cluster assignment made by Smith et al.<ref name=Smith04>{{cite pmid|15218094}}</ref> and branches are colored based on a phylogenetic discrete traits model.<ref name=Lemey09>{{cite pmid|PMC2740835}}</ref>]]
===Selective pressures===
Phylodynamic techniques have provided insight into the relative selective effects of mutations to different sites and different genes across the influenza virus genome. The exposed location of hemagglutinin (HA) suggests that there should exist strong selective pressure for evolution to the specific sites on HA that are recognized by antibodies in the human immune system. These sites are referred to as [[wp: epitope | epitope]] sites. Phylogenetic analysis of H3N2 influenza has shown that putative epitope sites of the HA protein evolve approximately 3.5 times faster on the trunk of phylogeny than on side branches (see {{See Figure|flutree}}).<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> This suggests that viruses possessing mutations to these exposed sites benefit from positive selection and are more likely than viruses lacking such mutations to take over the influenza population. Conversely, putative non-epitope sites of the HA protein evolve approximately twice as fast on side branches than on the trunk of the H3 phylogeny,<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> indicating that mutations to these sites are selected against and viruses possessing such mutations are less likely to take over the influenza population. Thus, analysis of phylodynamic patterns gives insight into underlying selective forces. A similar analysis combining sites across genes shows that while both HA and NA undergo substantial positive selection, internal genes have little acceleration of trunk substitutions, suggesting an absence of positive selection.<ref name=Bhatt11>{{cite pmid|21415025}}</ref>
Further analysis of HA has shown it to have a very small [[wp: effective population size | effective population size]] relative to the census size of the virus population, as expected for a gene undergoing strong positive selection.<ref name=Bedford11>{{cite pmid|PMC3199772}}</ref> However, across the influenza genome, there is surprisingly little variation in effective population size; all genes are nearly equally low.<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> This finding suggests that reassortment between segments occurs slowly enough, relative to the actions of positive selection, that [[wp: genetic hitchhiking | genetic hitchhiking]] causes beneficial mutations in HA and NA to reduce diversity in linked neutral variation in other segments of the genome.
Influenza A/H1N1 shows a larger effective population size and greater genetic diversity than influenza H3N2,<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> suggesting that H1N1 undergoes less adaptive evolution than H3N2. This hypothesis is supported by empirical patterns of antigenic evolution; there have been 9 vaccine updates recommended by the [[wp: World Health Organization | WHO]] for H1N1 in the interpandemic period between 1978 and 2009, while there have been 20 vaccine updates recommended for H3N2 during this same time period.<ref name=IRD>{{cite pmid|22260278}}</ref> Additionally, an analysis of patterns of sequence evolution on trunk and side branches suggests that H1N1 undergoes substantially less positive selection than H3N2.<ref name=Wolf06>{{cite pmid|17067369}}</ref><ref name=Bhatt11>{{cite pmid|21415025}}</ref> The underlying evolutionary or epidemiological cause for this difference between H3N2 and H1N1 remains unclear.
===Circulation patterns===
The extremely rapid turnover of the influenza population means that the rate of geographic spread of influenza lineages must also, to some extent, be rapid. Surveillance data clearly shows a pattern of strong seasonal epidemics in temperate regions, and less periodic epidemics in the tropics.<ref name=Finkelman07>{{cite pmid|PMC2117904}}</ref> The geographic origin of seasonal epidemics in the Northern and Southern Hemispheres had been a major open question in the field. However, recent work by Rambaut et al.<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> and Russell et al.<ref name=Russell08>{{cite pmid|18420927}}</ref> has shown that temperate epidemics usually emerge from a global reservoir rather than emerging from within the previous season's genetic diversity. This work, and more recent work by Bedford et al.<ref name=Bedford10>{{cite pmid|PMC2877742}}</ref> and Bahl et al.,<ref name=Bahl11>{{cite pmid|22084096}}</ref> has suggested that the global persistence of the influenza population is driven by viruses being passed from epidemic to epidemic, with no individual region in the world showing continual persistence. However, there is considerable debate regarding the particular configuration of the global network of influenza, with one hypothesis suggesting a metapopulation in East and Southeast Asia that continually seeds influenza in the rest of the world<ref name=Russell08>{{cite pmid|18420927}}</ref> versus a more global metapopulation in which temperate lineages can return to the tropics at the end of a seasonal epidemic.<ref name=Bedford10>{{cite pmid|PMC2877742}}</ref><ref name=Bahl11>{{cite pmid|22084096}}</ref>
All of these phylogeographic studies suffer from limitations in the worldwide sampling of influenza viruses. For example, the relative importance of tropical Africa and India have yet to be uncovered. Additionally, the phylogeographic methods used in these studies (see section on [[#Phylogeography | phylogeographic methods]]) make inferences of the ancentral locations and migration rates only on the phylogeny of the samples at hand, rather than on the population phylogeny in which these samples are embedded. Because of this, study-specific sampling procedures are a concern in extrapolating to population-level inferences. However, through joint epidemiological and evolutionary models, Bedford et al.<ref name=Bedford10>{{cite pmid|PMC2877742}}</ref> show that their estimates of migration rates appear robust to a large degree of under-sampling or over-sampling of a particular region. Further methodological progress is required to more fully address these issues.
===Simulation-based models===
Forward simulation-based approaches for addressing how immune selection can shape the phylogeny of influenza A/H3N2’s hemagglutinin protein have been actively developed by disease modelers over the last decade. These approaches include both compartmental models and agent-based models.
One of the first compartmental models for influenza was developed by Gog and Grenfell <ref name=Gog02>{{cite pmid| 12481034}}</ref>, who simulated the dynamics of many strains with partial cross-immunity to one another. Under a parameterization of long host lifespan and short infectious period, they found that strains would form self-organized sets that would emerge and replace one another. Although the authors did not reconstruct a phylogeny from their simulated results, the dynamics they found were consistent with a ladder-like viral phylogeny exhibiting low strain diversity and rapid lineage turnover.
Later work by Ferguson and colleagues <ref name=Ferguson03>{{cite pmid| 12660783}}</ref> adopted an agent-based approach to better identify the immunological and ecological determinants of influenza evolution. The authors modeled influenza’s hemagglutinin as 4 epitopes, each consisting of 3 amino acids. They showed that under strain-specific immunity alone (with partial cross-immunity between strains based on their amino acid similarity), the phylogeny of influenza A/H3N2’s HA was expected to exhibit ‘explosive genetic diversity’, a pattern that was not observed in HA trees inferred from empirical influenza A/H3N2 isolates. This led the authors to postulate the existence of a temporary strain-transcending immunity: individuals were immune to reinfection with any other influenza strain for approximately six months following an infection. With this assumption, the agent-based model could reproduce the ladderlike phylogeny of influenza A/H3N2’s HA protein.
Work by Koelle and colleagues <ref name=Koelle06>{{cite pmid| 17185596}}</ref> revisited the dynamics of influenza A/H3N2 evolution following the publication of a seminal paper by Smith and colleagues <ref name=Smith04>{{cite pmid| 15218094}}</ref>, which showed that, while the genetic evolution of influenza A/H3N2’s HA was gradual, the antigenic evolution of the virus occurred in a punctuated manner. The phylodynamic model designed by Koelle and coauthors argued that this pattern reflected a many-to-one genotype-to-phenotype mapping, with the possibility of strains from antigenically distinct clusters of influenza sharing a high degree of genetic similarity. Through incorporating this mapping of viral genotype into viral phenotype (or antigenic cluster) into their model, the authors were able to reproduce the ladderlike phylogeny of influenza’s HA protein without generalized strain-transcending immunity. The reproduction of the ladderlike phylogeny resulted from the viral population passing through repeated selective sweeps. These sweeps were driven by [[wp:Herd immunity | herd immunity]] and acted to constrain viral genetic diversity. A simplification of this model <ref name=Koelle10>{{cite pmid| 20335193}}</ref> was also able to reproduce the phylogenetic trees of equine influenza A/H3N8 and of influenza B, both of which have two co-circulating lineages.
Instead of modeling the genotypes of viral strains, a compartmental simulation model by Gökaydin and colleagues <ref name=Gökaydin07>{{cite pmid| 17015285}}</ref> considered influenza evolution at the scale of antigenic clusters (or phenotypes). This model showed that antigenic emergence and replacement could result under certain epidemiological conditions. These antigenic dynamics would be consistent with a ladderlike phylogeny of influenza exhibiting low genetic diversity and continual strain turn-over.
In recent work, Bedford and colleagues<ref name=Bedford12>{{cite pmid|22546494}}</ref> used an agent-based model to show that evolution in a Euclidean antigenic space can account for the phylogenetic pattern of influenza A/H3N2's HA, as well as the virus’s antigenic, epidemiological, and geographic patterns. The model showed the reproduction of influenza’s ladderlike phylogeny depended critically on the mutation rate of the virus as well as the immunological distance yielded by each mutation.
==Phylodynamics of HIV==
The global diversity of HIV-1 group M is shaped by its [[wp:Origin_of_AIDS |origins]] in Central Africa around the turn of the 20th century. The epidemic underwent explosive growth throughout the early 20th century with multiple radiations out of Central Africa. Multiple [[wp:Founder_effect | founder events]] have given rise to distinct [[wp:HIV_subtypes | subtypes]] which predominate in different parts of the world. Subtype B is most prevalent in North America and Western Europe, while A and C, which account for more than half of infections worldwide, are common in Africa. <ref name=osmanov02>{{cite pmid | 11832690}}</ref> Transmissibility of virus, virulence, effectiveness of antiretroviral therapy, and pathogenesis may differ slightly between subtypes. <ref name=taylor08>{{cite pmid | 18971501}}</ref>
The rapid growth of the HIV epidemic is reflected in phylogenies of HIV virus, which are star-like. Most coalescent events occur in the distant past. Figure 6 shows an example based on 173 HIV-1 sequences from the Democratic Republic of Congo. <ref name=yusim01>{{cite pmid | 11405933}}</ref>
The rate of exponential growth of HIV in Central Africa in the early 20th century preceding the establishment of modern subtypes has been estimated
* Yusim et al.<ref name=yusim01>{{cite pmid | 11405933 }}</ref> estimated <math>r= 0.1676</math> using coalescent methods and a parametric model of exponential growth
* Similar conclusions were reached using nonparametric estimates of <math>N_e</math><ref name=strimmer01>{{cite pmid | 11719579}}</ref>
Different epidemic growth rates have been estimated for different subtypes using coalescent approaches. The early growth of subtype B in North America was quite high. Estimates range from <math>r=0.48-0.834</math> transmissions per infection per year.<ref name=walker>{{cite pmid | 15737910}}</ref><ref name=robbins03>{{cite pmid | 12743293}}</ref>The duration of exponential growth in North America was relatively short, with saturation occurring in the mid- and late-1980s. <ref name=Volz09>{{cite pmid|19797047}}</ref> The early growth rates of the more common subtype C in Africa is lower ( approximately 0.27 per infection per year), although exponential growth has continued for a longer period of time.<ref name=grassly99>{{pmid | 9927440}}</ref><ref name=walker>{{cite pmid | 15737910}}</ref> HIV-1 group O, which is relatively rare and is found mainly in Cameroon, has grown at a lower rate (approximately 0.068 transmissions per infection per year. <ref name=lemey04> {{cite pmid | 15280223 }}</ref>
HIV-1 sequences sampled over a span of five decades have been used with relaxed molecular clock phylogenetic methods to estimate the time of zoonosis around the early 20th century <ref name=worobey08>{{cite pmid | 18833279}}</ref>. The estimated TMRCA for HIV-1 coincides with the appearance of the first densely populated large cities in Central Africa. Similar methods have been used to estimate the time that HIV originated in different parts of the world. The origin of subtype B in North America is estimated to be in the 1960s, where it went undetected until the AIDS epidemic in the 1980s. <ref name=robbins03>{{cite pmid | 12743293}}</ref> There is evidence that progenitors of modern subtype B originally colonized the Caribbean before undergoing multiple radiations to North and South America. <ref name=junqueira11>{{cite pmid | 22132104}}</ref> Subtype C originated around the same time in Africa.<ref name=walker>{{cite pmid | 15737910}}</ref>
At smaller geographical and time scales, phylogenies of HIV virus may reflect epidemiological dynamics related to risk behavior and [[wp:sexual_network | sexual networks]]. Very dense sampling of viral sequences within cities over short periods of time has given a detailed picture of HIV transmission patterns in modern epidemics. Sequencing of virus from newly diagnosed patients is now routine in many countries for surveillance of [[wp:Drug_resistance | drug resistance]] mutations, which has yielded large databases of sequence data in those areas. Lewis et al. <ref name=lewis08>{{cite pmid | 18351795}}</ref> used such sequence data from [[wp:Men_who_have_sex_with_men | men who have sex with men]] in London, UK, and found evidence that transmission is highly concentrated in the brief period of [[wp:Primary_HIV_infection | primary HIV infection]] (PHI), which consist of approximately the first 6 months of the infectious period. Volz et al. {{Citation needed}} found that patients who were recently infected were more likely to harbor virus that is phylogenetically clustered with other samples from recently infected patients, reflecting that transmission had occurred recently and that transmission is more likely to occur during early infection. Such phylogenetic patterns arise from epidemiological dynamics which featured an early period of intensified transmission during PHI, but also depends on sampling a large fraction of extant viral lineages.
Dense sampling of HIV sequences has enabled the estimation of transmission rates between different risk groups. For example, Paraskevis et al. <ref name=paraskevis09>{{cite pmid | 19457244}}</ref> estimated transmission rates between countries in Europe, and Oster et al. <ref name=oster11>{{cite pmid | 21866038}}</ref> showed disparate transmissions within racial and demographic groups within Mississippi, USA.
Purifying immune selection dominates evolution of HIV within hosts, but evolution between hosts is largely decoupled from intra-host evolution <ref name=rambaut04>{{cite pmid | 14708016}}</ref>. Intra-host HIV phylogenies show continual fixation of advantageous mutations, while population-level HIV phylogenies reflect continual diversification. Immune selection has relatively little influence on HIV phylogenies at the population level because
* There is an extreme bottleneck in viral diversity at the time of sexual transmission.<ref name=keele10>{{cite pmid | 20543609}}</ref>
* Transmission tends to occur early in infection before immune selection has had a chance to operate <ref name=cohen11>{{cite pmid | 21591946}}</ref>.
* The replicative fitness measured in transmissions per host are largely extrinsic to virological factors, and include heterogeneous sexual and drug-use behaviors in the host population.
There is some evidence of HIV [[wp:Virulence#Evolution |adaptation towards intermediate virulence]] which can maximize transmission potential between hosts. <ref name=fraser07>{{cite pmid | 17954909}}</ref> This hypothesis is predicated on several observations:
* The set-point [[wp:Viral_load | viral load]] (SPVL), which is the quasi-equilibrium titre of viral particles in the blood during [[wp:HIV#Chronic_infection | chronic infection]], is correlated with the time until AIDS. SPVL is therefore a useful proxy for virulence. <ref name=korenromp09>{{cite pmid | 19536329}}</ref>
* SPVL is correlated between HIV donor and recipients in transmission pairs. <ref name=hollingsworth10>{{cite pmid | 20463808}}</ref>
* The transmission probability per sexual act is has positive correlation with viral load. Thus, there is a trade-off between the [[wp:Intensity_function | intensity]] of transmission and the lifespan of the host. <ref name=baeten03>{{cite pmid | 15043213}}</ref><ref name=fiore97>{{cite pmid | 9233454}}</ref>
* Recent work <ref name=shirreff11>{{cite pmid | 22022243}}</ref> has shown that given empirical values for transmissibility of HIV and lifespan of hosts as a function of SPVL, adaptation of HIV towards optimum SPVL could be expected over 100-150 years.
==References==
<references/>
997
2012-06-26T21:30:50Z
Daniel Mietchen
5
formatting
'''Viral phylodynamics''' is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation in hosts because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection by the [[wp:immune system | immune system]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
== Sources of phylodynamic variation ==
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogenies]] of viruses.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref> who coined the term "phylodynamics" postulated that virus phylogenies “... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics".
Grenfell and colleagues identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* The strength of immune [[wp:Directional selection | selection]] may be reflected by how unbalanced and ladder-like the tree is (see {{See Figure|1}}).<ref name=Grenfell04/>
* How rapidly a population of virus expanded in the past may be reflected by how star-like the tree is (see {{See Figure|2}}). In a star-like tree external branches are long relative to internal branches.<ref name=Grenfell04/>
* Host [[wp:population stratification | population structure]] may be reflected in how clustered the [[wp:taxa | taxa]] of the tree are (see {{See Figure|3}}). If there is strong population structure, virus sampled from similar hosts will be more closely related than virus from different hosts.<ref name=Grenfell04/>
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which show the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which show the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which show the effects of population structure. Tip labels A and B in this case denote spatial locations from which viral samples were isolated. ]]
The first pattern, addressing the effect of immune selection on the topology of a viral phylogeny, is exemplified by contrasting the trees of [[wp:influenza | influenza virus]] and [[wp:HIV | HIV]]’s surface proteins. The phylogeny of influenza virus’s [[wp:Hemagglutinin (influenza) | hemagglutinin]] protein bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]])). Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection, which is reflected in the phylogeny of HIV’s envelope protein inferred from sequences isolated from different individuals in a population (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The second pattern, addressing the effect of viral population growth on viral phylogenies, holds that expanding viral populations have “star-like” {{Citation needed}} trees, with long external branches relative to internal branches. Such trees arise because a common ancestor is much more likely to occur when the past population was small relative to the present population size. Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitis B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population which has roughly constant size.
The third pattern addresses the effect that population structure of the host, especially spatial structure, can have on differentiating the viral population. Viruses within similar hosts, such as hosts that reside in the same region or who have similar risk factors for infection, are expected to be more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure, however this structure is not observed at all spatial scales.
At small spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
== Applications ==
=== Dating origins ===
Phylodynamic models may aid in dating epidemic and pandemic origins. The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus. With a rate of substitution measured in real units of time, it's possible to infer the time to the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences. The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population. In April 2009, the genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref> This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
=== Epidemiological ===
An understanding of the transmission and spread of an infectious disease is important to public health interventions. Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means. For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for factors such as the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provides a separate avenue for inference of <math>R_0</math>.<ref name=Volz09>{{cite pmid|19797047}}</ref> Such approaches have been used to estimate <math>R_0</math> in hepatitis C<ref name=Pybus01>{{cite pmid|11423661}}</ref> and HIV<ref name=Volz09/>. Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone. [[#Phylogeography | Phylogeographic models]] have the possibility of more-directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref> Phylodynamic approaches have been used to reveal the patterns of geographic movement of human the influenza virus<ref name=Bedford10>{{cite pmid|20523898}}</ref> and the epidemic spread of rabies virus in North American raccoons.<ref name=Lemey11>{{cite pmid|PMC2915639}}</ref>
=== Antiviral resistance ===
Without recourse to antibiotics, [[wp: antiviral drug | antiviral treatment]] is a popular approach for the control of viral pathogens. However, the use of antivirals creates selective pressure for the evolution of [[wp: drug resistance | drug resistance]] in the virus population, as resistant strains will be at a transmission advantage compared to susceptible strains. Commonly, there is a fitness [[wp: trade-off | trade-off]] between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals.{{Citation needed}} Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of [[wp: Oseltamivir | Oseltamivir]] in influenza A (H1N1).{{Citation needed}}
==Methods==
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured from a [[wp: multiple sequence alignment | multiple sequence alignment]] by examination of the [[wp:DN/dS | ratio of nonsynonymous to synonymous substitution rates (dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima’s D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to fit complex demographic scenarios while integrating out uncertainty in phylogenetic inference.<ref name=Drummond05>{{cite pmid| 15703244}}</ref>
Several analytical methods have been developed to deal specifically with problems related to the phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population dynamics such as the historical prevalence of infection.
Simulation methods have most commonly been used to model immune selection (see section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
===Coalescent theory and phylodynamics===
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the time prior to sample collection that at least 2 out of a sample of <math>n</math> gene copies have a [[wp:MRCA | most recent common ancestor]] is [[wp:Exponential_distribution | distributed exponentially]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
At the time <math>T_n</math> prior to when the sample was collected when 2 gene copies in the sample \emph{coalesce} (find a common ancestor) (See {{See Figure|4}}), there will be <math> n-1 </math> extant lineages.
The remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected time until the most recent common ancestor of the sample is the sum of the expected values of the internode intervals,
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corollaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[T_1] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al.<ref name=Robbins03>{{cite pmid| PMC155028}}</ref> estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donnelley and Tavaré<ref name=Donnelly>{{cite pmid| 8825481}}</ref> derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yield trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
Infectious disease epidemics are often characterized by highly non-linear and rapid changes in the number of infected individuals and effective population size of virus. In such cases, birth rates are highly variable, which can diminish the correspondence between effective population size and the prevalence of infection {{Citation needed}}. Many mathematical models have been developed in the field of [[wp:Epidemic_model | mathematical epidemiology]] to describe the nonlinear time series of prevalence of infection and the number of susceptible hosts. A well studied example is the [[wp:Epidemic_model#The_SIR_Model | Susceptible-Infected-Recovered(SIR)]] system of [[wp:Ordinary_differential_equations | differential equations]], which describes the fractions of the population <math>I(t)</math> infected and <math>S(t)</math> susceptible as a function of time:
: <math>\frac{dS}{dt} = - \beta S I </math>
: <math>\frac{dI}{dt} = \beta S I - \gamma I </math>
Here, <math>\beta</math> is the per capita rate of transmissions to susceptible hosts, and <math>\gamma</math> is the rate that infected individuals recover, whereupon they are no longer infectious. In this case, the incidence of new infections per unit time is <math>f(t) = \beta S I</math>, which is analogous to the birth rate in classical population genetics models. Volz et al. {{Citation needed}} proposed that the rate of coalescence for such an epidemic will be
: <math> \lambda_n(t) = {n \choose 2} \frac{2 f(t)}{I^2(t)}. </math>
For the simple SIR model, this yields
: <math> \lambda_n(t) = {n \choose 2} \frac{2 \beta S(t)}{I(t)}. </math>
This has a similar to the Kingman coalescent rate, but is damped by the fraction susceptible <math>S(t)</math>.
The ratio <math> 2{n \choose 2} / {I(t)^2} </math> can be understood as arising from the probability that two lineages selected uniformly at random are both ancestral to the sample. This probability is the ratio of the number of ways to pick two lineages without replacement from the set of lineages and from the set of all infections: <math>{n \choose 2} / {I(t) \choose 2} \approx 2{n \choose 2} / {I(t)^2}</math>. Coalescent events will occur with this probability at the rate given by the incidence function <math>f(t)</math>.
Early in an epidemic, <math>S(0) \approx 1</math>, so for the SIR model,
: <math> \lambda_n(0) \approx {n \choose 2} \frac{2 \beta}{I(0)}.</math>
This has the same mathematical form as the rate in the Kingman coalescent, substituting <math>N_e = I(t)/2\beta</math>. Consequently, estimates of effective population size based on the Kingman coalescent will be proportional to prevalence of infection during the early period of exponential growth of the epidemic {{Citation needed}}.
=== Phylogeography ===
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness. A basic question is whether the geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}). This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]]. If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref> This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data. By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] found by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the ancestral geographic locations of particular lineages. Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]]. The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref> The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree. In a geographic transmission network, some regions may mix more readily and other regions may be more isolated. Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'. With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
===Simulation===
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis. Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and viral strain of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref> Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
Here, it is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts. For [[wp: antigenic variation | antigenically variable]], viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc... The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]]. In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref> In addition to cross-immunity between virus strains, a forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by a nucleotide or amino acid sequence. Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another. Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]]. Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
==Examples==
===Phylodynamics of Influenza===
Human influenza is an acute respiratory infection primarily caused by viruses [[wp: influenza A virus | influenza A]] and [[wp: Influenzavirus B | influenza B]]. Influenza A viruses can be further classified into subtypes, such as [[wp: influenza A virus subtype H1N1 | A/H1N1]] and [[wp: influenza A virus subtype H3N2 | A/H3N2]]. Here, subtypes are denoted according to their [[wp: hemagglutinin (influenza) | hemagglutinin]] (H or HA) and [[wp: viral neuraminidase | neuraminidase]] (N or NA) genes, which as surface proteins, act as the primary targets for the [[wp: humoral immunity | humoral immune response]]. Influenza viruses circulate in other species as well, most notably as [[wp: swine influenza | swine influenza]] and [[wp: avian influenza | avian influenza.]] Through [[wp: reassortment | reassortment]], genetic sequences from swine and avian influenza occasionally enter the human population. If a particular hemagglutinin or neuraminidase has been circulating outside the human population, then humans will lack immunity to the this protein and an [[wp: influenza pandemic | influenza pandemic ]] may follow a host switch event, as seen in 1918, 1957, 1968 and 2009. After introduction into the human population, a lineage of influenza generally persists through [[wp: antigenic drift | antigenic drift]], in which HA and NA continually accumulate mutations allowing viruses to infect hosts immune to earlier forms of the virus. These lineages of influenza show recurrent seasonal epidemics in temperate regions and less periodic transmission in the tropics. Generally, at each pandemic event, the new form of the virus outcompetes existing lineages.<ref name=Ferguson03>{{cite pmid|12660783}}</ref> The study of viral phylodynamics in influenza primarily focuses on the continual circulation and evolution of epidemic influenza, rather than on pandemic emergence. Of central interest to the study of viral phylodynamics is the distinctive phylogenetic tree of epidemic influenza A/H3N2, which shows a single predominant trunk lineage that persists through time and side branches that persist for only 1-5 years before going extinct (see {{See Figure|flutree}}).<ref name=Fitch97>{{cite pmid|9223253}}</ref>
[[Image:Flutree.png|300px|thumb|{{Figure|flutree}} Phylogenetic tree of the HA1 region of the HA gene of influenza A (H3N2) from viruses sampled between 1968 and 2002. Tips are colored according to the antigenic cluster assignment made by Smith et al.<ref name=Smith04>{{cite pmid|15218094}}</ref> and branches are colored based on a phylogenetic discrete traits model.<ref name=Lemey09>{{cite pmid|PMC2740835}}</ref>]]
====Selective pressures====
Phylodynamic techniques have provided insight into the relative selective effects of mutations to different sites and different genes across the influenza virus genome. The exposed location of hemagglutinin (HA) suggests that there should exist strong selective pressure for evolution to the specific sites on HA that are recognized by antibodies in the human immune system. These sites are referred to as [[wp: epitope | epitope]] sites. Phylogenetic analysis of H3N2 influenza has shown that putative epitope sites of the HA protein evolve approximately 3.5 times faster on the trunk of phylogeny than on side branches (see {{See Figure|flutree}}).<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> This suggests that viruses possessing mutations to these exposed sites benefit from positive selection and are more likely than viruses lacking such mutations to take over the influenza population. Conversely, putative non-epitope sites of the HA protein evolve approximately twice as fast on side branches than on the trunk of the H3 phylogeny,<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> indicating that mutations to these sites are selected against and viruses possessing such mutations are less likely to take over the influenza population. Thus, analysis of phylodynamic patterns gives insight into underlying selective forces. A similar analysis combining sites across genes shows that while both HA and NA undergo substantial positive selection, internal genes have little acceleration of trunk substitutions, suggesting an absence of positive selection.<ref name=Bhatt11>{{cite pmid|21415025}}</ref>
Further analysis of HA has shown it to have a very small [[wp: effective population size | effective population size]] relative to the census size of the virus population, as expected for a gene undergoing strong positive selection.<ref name=Bedford11>{{cite pmid|PMC3199772}}</ref> However, across the influenza genome, there is surprisingly little variation in effective population size; all genes are nearly equally low.<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> This finding suggests that reassortment between segments occurs slowly enough, relative to the actions of positive selection, that [[wp: genetic hitchhiking | genetic hitchhiking]] causes beneficial mutations in HA and NA to reduce diversity in linked neutral variation in other segments of the genome.
Influenza A/H1N1 shows a larger effective population size and greater genetic diversity than influenza H3N2,<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> suggesting that H1N1 undergoes less adaptive evolution than H3N2. This hypothesis is supported by empirical patterns of antigenic evolution; there have been 9 vaccine updates recommended by the [[wp: World Health Organization | WHO]] for H1N1 in the interpandemic period between 1978 and 2009, while there have been 20 vaccine updates recommended for H3N2 during this same time period.<ref name=IRD>{{cite pmid|22260278}}</ref> Additionally, an analysis of patterns of sequence evolution on trunk and side branches suggests that H1N1 undergoes substantially less positive selection than H3N2.<ref name=Wolf06>{{cite pmid|17067369}}</ref><ref name=Bhatt11>{{cite pmid|21415025}}</ref> The underlying evolutionary or epidemiological cause for this difference between H3N2 and H1N1 remains unclear.
====Circulation patterns====
The extremely rapid turnover of the influenza population means that the rate of geographic spread of influenza lineages must also, to some extent, be rapid. Surveillance data clearly shows a pattern of strong seasonal epidemics in temperate regions, and less periodic epidemics in the tropics.<ref name=Finkelman07>{{cite pmid|PMC2117904}}</ref> The geographic origin of seasonal epidemics in the Northern and Southern Hemispheres had been a major open question in the field. However, recent work by Rambaut et al.<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> and Russell et al.<ref name=Russell08>{{cite pmid|18420927}}</ref> has shown that temperate epidemics usually emerge from a global reservoir rather than emerging from within the previous season's genetic diversity. This work, and more recent work by Bedford et al.<ref name=Bedford10>{{cite pmid|PMC2877742}}</ref> and Bahl et al.,<ref name=Bahl11>{{cite pmid|22084096}}</ref> has suggested that the global persistence of the influenza population is driven by viruses being passed from epidemic to epidemic, with no individual region in the world showing continual persistence. However, there is considerable debate regarding the particular configuration of the global network of influenza, with one hypothesis suggesting a metapopulation in East and Southeast Asia that continually seeds influenza in the rest of the world<ref name=Russell08>{{cite pmid|18420927}}</ref> versus a more global metapopulation in which temperate lineages can return to the tropics at the end of a seasonal epidemic.<ref name=Bedford10>{{cite pmid|PMC2877742}}</ref><ref name=Bahl11>{{cite pmid|22084096}}</ref>
All of these phylogeographic studies suffer from limitations in the worldwide sampling of influenza viruses. For example, the relative importance of tropical Africa and India have yet to be uncovered. Additionally, the phylogeographic methods used in these studies (see section on [[#Phylogeography | phylogeographic methods]]) make inferences of the ancentral locations and migration rates only on the phylogeny of the samples at hand, rather than on the population phylogeny in which these samples are embedded. Because of this, study-specific sampling procedures are a concern in extrapolating to population-level inferences. However, through joint epidemiological and evolutionary models, Bedford et al.<ref name=Bedford10>{{cite pmid|PMC2877742}}</ref> show that their estimates of migration rates appear robust to a large degree of under-sampling or over-sampling of a particular region. Further methodological progress is required to more fully address these issues.
====Simulation-based models====
Forward simulation-based approaches for addressing how immune selection can shape the phylogeny of influenza A/H3N2’s hemagglutinin protein have been actively developed by disease modelers over the last decade. These approaches include both compartmental models and agent-based models.
One of the first compartmental models for influenza was developed by Gog and Grenfell <ref name=Gog02>{{cite pmid| 12481034}}</ref>, who simulated the dynamics of many strains with partial cross-immunity to one another. Under a parameterization of long host lifespan and short infectious period, they found that strains would form self-organized sets that would emerge and replace one another. Although the authors did not reconstruct a phylogeny from their simulated results, the dynamics they found were consistent with a ladder-like viral phylogeny exhibiting low strain diversity and rapid lineage turnover.
Later work by Ferguson and colleagues <ref name=Ferguson03>{{cite pmid| 12660783}}</ref> adopted an agent-based approach to better identify the immunological and ecological determinants of influenza evolution. The authors modeled influenza’s hemagglutinin as 4 epitopes, each consisting of 3 amino acids. They showed that under strain-specific immunity alone (with partial cross-immunity between strains based on their amino acid similarity), the phylogeny of influenza A/H3N2’s HA was expected to exhibit ‘explosive genetic diversity’, a pattern that was not observed in HA trees inferred from empirical influenza A/H3N2 isolates. This led the authors to postulate the existence of a temporary strain-transcending immunity: individuals were immune to reinfection with any other influenza strain for approximately six months following an infection. With this assumption, the agent-based model could reproduce the ladderlike phylogeny of influenza A/H3N2’s HA protein.
Work by Koelle and colleagues <ref name=Koelle06>{{cite pmid| 17185596}}</ref> revisited the dynamics of influenza A/H3N2 evolution following the publication of a seminal paper by Smith and colleagues <ref name=Smith04>{{cite pmid| 15218094}}</ref>, which showed that, while the genetic evolution of influenza A/H3N2’s HA was gradual, the antigenic evolution of the virus occurred in a punctuated manner. The phylodynamic model designed by Koelle and coauthors argued that this pattern reflected a many-to-one genotype-to-phenotype mapping, with the possibility of strains from antigenically distinct clusters of influenza sharing a high degree of genetic similarity. Through incorporating this mapping of viral genotype into viral phenotype (or antigenic cluster) into their model, the authors were able to reproduce the ladderlike phylogeny of influenza’s HA protein without generalized strain-transcending immunity. The reproduction of the ladderlike phylogeny resulted from the viral population passing through repeated selective sweeps. These sweeps were driven by [[wp:Herd immunity | herd immunity]] and acted to constrain viral genetic diversity. A simplification of this model <ref name=Koelle10>{{cite pmid| 20335193}}</ref> was also able to reproduce the phylogenetic trees of equine influenza A/H3N8 and of influenza B, both of which have two co-circulating lineages.
Instead of modeling the genotypes of viral strains, a compartmental simulation model by Gökaydin and colleagues <ref name=Gökaydin07>{{cite pmid| 17015285}}</ref> considered influenza evolution at the scale of antigenic clusters (or phenotypes). This model showed that antigenic emergence and replacement could result under certain epidemiological conditions. These antigenic dynamics would be consistent with a ladderlike phylogeny of influenza exhibiting low genetic diversity and continual strain turn-over.
In recent work, Bedford and colleagues<ref name=Bedford12>{{cite pmid|22546494}}</ref> used an agent-based model to show that evolution in a Euclidean antigenic space can account for the phylogenetic pattern of influenza A/H3N2's HA, as well as the virus’s antigenic, epidemiological, and geographic patterns. The model showed the reproduction of influenza’s ladderlike phylogeny depended critically on the mutation rate of the virus as well as the immunological distance yielded by each mutation.
===Phylodynamics of HIV===
The global diversity of HIV-1 group M is shaped by its [[wp:Origin_of_AIDS |origins]] in Central Africa around the turn of the 20th century. The epidemic underwent explosive growth throughout the early 20th century with multiple radiations out of Central Africa. Multiple [[wp:Founder_effect | founder events]] have given rise to distinct [[wp:HIV_subtypes | subtypes]] which predominate in different parts of the world. Subtype B is most prevalent in North America and Western Europe, while A and C, which account for more than half of infections worldwide, are common in Africa. <ref name=osmanov02>{{cite pmid | 11832690}}</ref> Transmissibility of virus, virulence, effectiveness of antiretroviral therapy, and pathogenesis may differ slightly between subtypes. <ref name=taylor08>{{cite pmid | 18971501}}</ref>
The rapid growth of the HIV epidemic is reflected in phylogenies of HIV virus, which are star-like. Most coalescent events occur in the distant past. Figure 6 shows an example based on 173 HIV-1 sequences from the Democratic Republic of Congo. <ref name=yusim01>{{cite pmid | 11405933}}</ref>
The rate of exponential growth of HIV in Central Africa in the early 20th century preceding the establishment of modern subtypes has been estimated
* Yusim et al.<ref name=yusim01>{{cite pmid | 11405933 }}</ref> estimated <math>r= 0.1676</math> using coalescent methods and a parametric model of exponential growth
* Similar conclusions were reached using nonparametric estimates of <math>N_e</math><ref name=strimmer01>{{cite pmid | 11719579}}</ref>
Different epidemic growth rates have been estimated for different subtypes using coalescent approaches. The early growth of subtype B in North America was quite high. Estimates range from <math>r=0.48-0.834</math> transmissions per infection per year.<ref name=walker>{{cite pmid | 15737910}}</ref><ref name=robbins03>{{cite pmid | 12743293}}</ref>The duration of exponential growth in North America was relatively short, with saturation occurring in the mid- and late-1980s. <ref name=Volz09>{{cite pmid|19797047}}</ref> The early growth rates of the more common subtype C in Africa is lower ( approximately 0.27 per infection per year), although exponential growth has continued for a longer period of time.<ref name=grassly99>{{pmid | 9927440}}</ref><ref name=walker>{{cite pmid | 15737910}}</ref> HIV-1 group O, which is relatively rare and is found mainly in Cameroon, has grown at a lower rate (approximately 0.068 transmissions per infection per year. <ref name=lemey04> {{cite pmid | 15280223 }}</ref>
HIV-1 sequences sampled over a span of five decades have been used with relaxed molecular clock phylogenetic methods to estimate the time of zoonosis around the early 20th century <ref name=worobey08>{{cite pmid | 18833279}}</ref>. The estimated TMRCA for HIV-1 coincides with the appearance of the first densely populated large cities in Central Africa. Similar methods have been used to estimate the time that HIV originated in different parts of the world. The origin of subtype B in North America is estimated to be in the 1960s, where it went undetected until the AIDS epidemic in the 1980s. <ref name=robbins03>{{cite pmid | 12743293}}</ref> There is evidence that progenitors of modern subtype B originally colonized the Caribbean before undergoing multiple radiations to North and South America. <ref name=junqueira11>{{cite pmid | 22132104}}</ref> Subtype C originated around the same time in Africa.<ref name=walker>{{cite pmid | 15737910}}</ref>
At smaller geographical and time scales, phylogenies of HIV virus may reflect epidemiological dynamics related to risk behavior and [[wp:sexual_network | sexual networks]]. Very dense sampling of viral sequences within cities over short periods of time has given a detailed picture of HIV transmission patterns in modern epidemics. Sequencing of virus from newly diagnosed patients is now routine in many countries for surveillance of [[wp:Drug_resistance | drug resistance]] mutations, which has yielded large databases of sequence data in those areas. Lewis et al. <ref name=lewis08>{{cite pmid | 18351795}}</ref> used such sequence data from [[wp:Men_who_have_sex_with_men | men who have sex with men]] in London, UK, and found evidence that transmission is highly concentrated in the brief period of [[wp:Primary_HIV_infection | primary HIV infection]] (PHI), which consist of approximately the first 6 months of the infectious period. Volz et al. {{Citation needed}} found that patients who were recently infected were more likely to harbor virus that is phylogenetically clustered with other samples from recently infected patients, reflecting that transmission had occurred recently and that transmission is more likely to occur during early infection. Such phylogenetic patterns arise from epidemiological dynamics which featured an early period of intensified transmission during PHI, but also depends on sampling a large fraction of extant viral lineages.
Dense sampling of HIV sequences has enabled the estimation of transmission rates between different risk groups. For example, Paraskevis et al. <ref name=paraskevis09>{{cite pmid | 19457244}}</ref> estimated transmission rates between countries in Europe, and Oster et al. <ref name=oster11>{{cite pmid | 21866038}}</ref> showed disparate transmissions within racial and demographic groups within Mississippi, USA.
Purifying immune selection dominates evolution of HIV within hosts, but evolution between hosts is largely decoupled from intra-host evolution <ref name=rambaut04>{{cite pmid | 14708016}}</ref>. Intra-host HIV phylogenies show continual fixation of advantageous mutations, while population-level HIV phylogenies reflect continual diversification. Immune selection has relatively little influence on HIV phylogenies at the population level because
* There is an extreme bottleneck in viral diversity at the time of sexual transmission.<ref name=keele10>{{cite pmid | 20543609}}</ref>
* Transmission tends to occur early in infection before immune selection has had a chance to operate <ref name=cohen11>{{cite pmid | 21591946}}</ref>.
* The replicative fitness measured in transmissions per host are largely extrinsic to virological factors, and include heterogeneous sexual and drug-use behaviors in the host population.
There is some evidence of HIV [[wp:Virulence#Evolution |adaptation towards intermediate virulence]] which can maximize transmission potential between hosts. <ref name=fraser07>{{cite pmid | 17954909}}</ref> This hypothesis is predicated on several observations:
* The set-point [[wp:Viral_load | viral load]] (SPVL), which is the quasi-equilibrium titre of viral particles in the blood during [[wp:HIV#Chronic_infection | chronic infection]], is correlated with the time until AIDS. SPVL is therefore a useful proxy for virulence. <ref name=korenromp09>{{cite pmid | 19536329}}</ref>
* SPVL is correlated between HIV donor and recipients in transmission pairs. <ref name=hollingsworth10>{{cite pmid | 20463808}}</ref>
* The transmission probability per sexual act is has positive correlation with viral load. Thus, there is a trade-off between the [[wp:Intensity_function | intensity]] of transmission and the lifespan of the host. <ref name=baeten03>{{cite pmid | 15043213}}</ref><ref name=fiore97>{{cite pmid | 9233454}}</ref>
* Recent work <ref name=shirreff11>{{cite pmid | 22022243}}</ref> has shown that given empirical values for transmissibility of HIV and lifespan of hosts as a function of SPVL, adaptation of HIV towards optimum SPVL could be expected over 100-150 years.
===References===
<references/>
1007
2012-06-28T14:36:13Z
Tbedford
7
Readability improvements and formatting.
'''Viral phylodynamics''' is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, the evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection to escape [[wp:immune system | host immunity]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
== Sources of phylodynamic variation ==
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogenies]] of viruses.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref>, in coining the term ''phylodynamics'', postulated that virus phylogenies "... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics".
Grenfell and colleagues identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* [[wp:Directional selection | Selection]] driven by host immunity may be reflected by an unbalanced or ladder-like tree (see {{See Figure|1}}).<ref name=Grenfell04/>
* Rapid population expansion of the virus may be reflected by a star-like tree (see {{See Figure|2}}), in which external branches are long relative to internal branches.<ref name=Grenfell04/>
* Host [[wp:population stratification | population structure]] may be reflected by the clustering of [[wp:taxa | taxa]] on the tree (see {{See Figure|3}}), so that geographically of otherwise similar viruses show increased genetic similarity.<ref name=Grenfell04/>
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which show the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which show the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which show the effects of population structure. Tip labels A and B in this case denote spatial locations from which viral samples were isolated. ]]
The first pattern, addressing the effect of immune selection on the topology of a viral phylogeny, is exemplified by contrasting the trees of [[wp:influenza | influenza virus]] and [[wp:HIV | HIV]]'s surface proteins. The phylogeny of influenza virus's [[wp:Hemagglutinin (influenza) | hemagglutinin]] protein bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]])).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection, which is reflected in the phylogeny of HIV's envelope protein inferred from sequences isolated from different individuals in a population (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The second pattern, addressing the effect of viral population growth on viral phylogenies, holds that expanding viral populations have "star-like" trees, with long external branches relative to internal branches.
Such trees arise because viruses are more likely to share a common ancestor when the population is small, and a growing population has a smaller population size towards the past.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitis B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population that has remained roughly constant in size.
The third pattern addresses the effect that population structure of the host, especially spatial structure, can have on patterns of genetic diversity in the viral population.
Viruses within similar hosts, such as hosts that reside in the same region or that have similar risk factors for infection, are expected to be more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure.
However this structure is not observed at all spatial scales, and at smaller spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
== Applications ==
=== Dating origins ===
Phylodynamic models may aid in dating epidemic and pandemic origins.
The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus.
With the rate of evolution measured in real units of time, it's possible to infer the date of the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences.
The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population.
In April 2009, genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref>
This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
=== Epidemiological ===
An understanding of the transmission and spread of an infectious disease is important to public health interventions.
Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means.
For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for variation in the reporting rate and the intensity of surveillance.
Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provide a separate avenue for inference of <math>R_0</math>.<ref name=Volz09>{{cite pmid|19797047}}</ref>
Such approaches have been used to estimate <math>R_0</math> in hepatitis C<ref name=Pybus01>{{cite pmid|11423661}}</ref> and HIV<ref name=Volz09/>.
Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone.
[[#Phylogeography | Phylogeographic models]] have the possibility of more directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref>
Phylodynamic approaches have mapped the geographic movement of human the influenza virus<ref name=Bedford10>{{cite pmid|20523898}}</ref> and quantified the epidemic spread of rabies virus in North American raccoons.<ref name=Lemey11>{{cite pmid|PMC2915639}}</ref>
=== Antiviral resistance ===
Without recourse to antibiotics, [[wp: antiviral drug | antiviral treatment]] is a popular approach for the control of viral pathogens.
However, the use of antivirals creates selective pressure for the evolution of [[wp: drug resistance | drug resistance]] in the virus population, as resistant strains will have a transmission advantage compared to susceptible strains.
Commonly, there is a fitness [[wp: trade-off | trade-off]] between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals.<ref name=Bloom10>{{cite pmid|20522774}}</ref>
Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of [[wp: Oseltamivir | Oseltamivir]] in influenza A (H1N1).<ref name=Chao12>{{cite pmid|21865253}}</ref>
==Methods==
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured by comparing the the rate of nonsynonymous substitution to the rate of synonymous substitution [[wp:DN/dS|(dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima's D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to fit complex demographic scenarios while integrating out phylogenetic uncertainty.<ref name=Drummond05>{{cite pmid| 15703244}}</ref>
Several analytical methods have been developed to deal specifically with problems related to phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population demographics such as the historical prevalence of infection.
Simulation methods have most commonly been used to model immune selection (see section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
===Coalescent theory and phylodynamics===
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In modeling the coalescent process, time is usually considered to flow backwards from the present.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the expected time for a sample of 2 gene copies to ''coalesce'', i.e. find a common ancestor, is <math>N</math> generations.
More generally, the waiting time for two members of a sample of <math>n</math> gene copies to share a common ancestor is [[wp:Exponential_distribution | exponentially distributed]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
This interval is labeled <math>T_n</math>, and its end there are <math> n-1 </math> extant lineages remaining (see {{See Figure|4}}).
These remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected waiting time to find the most recent common ancestor of the sample is the sum of the expected values of the internode intervals,
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corollaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[\mathrm{TMRCA}] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al.<ref name=Robbins03>{{cite pmid| PMC155028}}</ref> estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donnelley and Tavaré<ref name=Donnelly>{{cite pmid| 8825481}}</ref> derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yield trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
Infectious disease epidemics are often characterized by highly non-linear and rapid changes in the number of infected individuals and effective population size of virus. In such cases, birth rates are highly variable, which can diminish the correspondence between effective population size and the prevalence of infection.<ref name=Frost10>{{cite pmid| 20478883}}</ref> Many mathematical models have been developed in the field of [[wp:Epidemic_model | mathematical epidemiology]] to describe the nonlinear time series of prevalence of infection and the number of susceptible hosts. A well studied example is the [[wp:Epidemic_model#The_SIR_Model | Susceptible-Infected-Recovered (SIR)]] system of [[wp:Ordinary_differential_equations | differential equations]], which describes the fractions of the population <math>S(t)</math> susceptible, <math>I(t)</math> infected and <math>R(t)</math> recovered as a function of time:
: <math>\frac{dS}{dt} = - \beta S I </math>,
: <math>\frac{dI}{dt} = \beta S I - \gamma I </math>,
: <math>\frac{dR}{dt} = \gamma I </math>.
Here, <math>\beta</math> is the per capita rate of transmissions to susceptible hosts, and <math>\gamma</math> is the rate that infected individuals recover, whereupon they are no longer infectious. In this case, the incidence of new infections per unit time is <math>f(t) = \beta S I</math>, which is analogous to the birth rate in classical population genetics models. Volz et al.<ref name=Volz09>{{cite pmid|19797047}}</ref> proposed that general formula for the rate of coalescence will be
: <math> \lambda_n(t) = {n \choose 2} \frac{2 f(t)}{I^2(t)}. </math>
For the simple SIR model, this yields
: <math> \lambda_n(t) = {n \choose 2} \frac{2 \beta S(t)}{I(t)}. </math>
This has a similar to the Kingman coalescent rate, but is damped by the fraction susceptible <math>S(t)</math>.
The ratio <math> 2{n \choose 2} / {I(t)^2} </math> can be understood as arising from the probability that two lineages selected uniformly at random are both ancestral to the sample. This probability is the ratio of the number of ways to pick two lineages without replacement from the set of lineages and from the set of all infections: <math>{n \choose 2} / {I(t) \choose 2} \approx 2{n \choose 2} / {I(t)^2}</math>. Coalescent events will occur with this probability at the rate given by the incidence function <math>f(t)</math>.
Early in an epidemic, <math>S(0) \approx 1</math>, so for the SIR model,
: <math> \lambda_n(0) \approx {n \choose 2} \frac{2 \beta}{I(0)}.</math>
This has the same mathematical form as the rate in the Kingman coalescent, substituting <math>N_e = I(t)/2\beta</math>. Consequently, estimates of effective population size based on the Kingman coalescent will be proportional to prevalence of infection during the early period of exponential growth of the epidemic.<ref name=Frost10>{{cite pmid| 20478883}}</ref>
=== Phylogeography ===
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness.
A basic question is whether geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}).
This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]].
If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref>
This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data.
By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] computed by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the geographic locations of ancestral lineages.
Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]].
The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref>
The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree.
In a geographic transmission network, some regions may mix more readily and other regions may be more isolated.
Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'.
With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
===Simulation===
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis.
Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and immune history of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref>
Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
It is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts.
For [[wp: antigenic variation | antigenically variable]] viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc...
The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]].
In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref>
A forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by its nucleotide or amino acid sequence.
Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another.
Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]].
Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
==Examples==
===Phylodynamics of Influenza===
Human influenza is an acute respiratory infection primarily caused by viruses [[wp: influenza A virus | influenza A]] and [[wp: Influenzavirus B | influenza B]].
Influenza A viruses can be further classified into subtypes, such as [[wp: influenza A virus subtype H1N1 | A/H1N1]] and [[wp: influenza A virus subtype H3N2 | A/H3N2]].
Here, subtypes are denoted according to their [[wp: hemagglutinin (influenza) | hemagglutinin]] (H or HA) and [[wp: viral neuraminidase | neuraminidase]] (N or NA) genes, which as surface proteins, act as the primary targets for the [[wp: humoral immunity | humoral immune response]].
Influenza viruses circulate in other species as well, most notably as [[wp: swine influenza | swine influenza]] and [[wp: avian influenza | avian influenza]].
Through [[wp: reassortment | reassortment]], genetic sequences from swine and avian influenza occasionally enter the human population.
If a particular hemagglutinin or neuraminidase has been circulating outside the human population, then humans will lack immunity to this protein and an [[wp: influenza pandemic | influenza pandemic ]] may follow a host switch event, as seen in 1918, 1957, 1968 and 2009.
After introduction into the human population, a lineage of influenza generally persists through [[wp: antigenic drift | antigenic drift]], in which HA and NA continually accumulate mutations allowing viruses to infect hosts immune to earlier forms of the virus.
These lineages of influenza show recurrent seasonal epidemics in temperate regions and less periodic transmission in the tropics.
Generally, at each pandemic event, the new form of the virus outcompetes existing lineages.<ref name=Ferguson03>{{cite pmid|12660783}}</ref>
The study of viral phylodynamics in influenza primarily focuses on the continual circulation and evolution of epidemic influenza, rather than on pandemic emergence.
Of central interest to the study of viral phylodynamics is the distinctive phylogenetic tree of epidemic influenza A/H3N2, which shows a single predominant trunk lineage that persists through time and side branches that persist for only 1-5 years before going extinct (see {{See Figure|5}}).<ref name=Fitch97>{{cite pmid|9223253}}</ref>
[[Image:flutree.png|300px|thumb|{{Figure|5}} Phylogenetic tree of the HA1 region of the HA gene of influenza A (H3N2) from viruses sampled between 1968 and 2002.
Tips are colored according to the antigenic cluster assignment made by Smith et al.<ref name=Smith04>{{cite pmid|15218094}}</ref> and branches are colored based on a phylogenetic discrete traits model.<ref name=Lemey09>{{cite pmid|PMC2740835}}</ref>]]
====Selective pressures====
Phylodynamic techniques have provided insight into the relative selective effects of mutations to different sites and different genes across the influenza virus genome.
The exposed location of hemagglutinin (HA) suggests that there should exist strong selective pressure for evolution to the specific sites on HA that are recognized by antibodies in the human immune system.
These sites are referred to as [[wp: epitope | epitope]] sites.
Phylogenetic analysis of H3N2 influenza has shown that putative epitope sites of the HA protein evolve approximately 3.5 times faster on the trunk of phylogeny than on side branches (see {{See Figure|5}}).<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref>
This suggests that viruses possessing mutations to these exposed sites benefit from positive selection and are more likely than viruses lacking such mutations to take over the influenza population.
Conversely, putative non-epitope sites of the HA protein evolve approximately twice as fast on side branches than on the trunk of the H3 phylogeny,<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> indicating that mutations to these sites are selected against and viruses possessing such mutations are less likely to take over the influenza population.
Thus, analysis of phylodynamic patterns gives insight into underlying selective forces.
A similar analysis combining sites across genes shows that while both HA and NA undergo substantial positive selection, internal genes have little acceleration of trunk substitutions, suggesting an absence of positive selection.<ref name=Bhatt11>{{cite pmid|21415025}}</ref>
Further analysis of HA has shown it to have a very small [[wp: effective population size | effective population size]] relative to the census size of the virus population, as expected for a gene undergoing strong positive selection.<ref name=Bedford11>{{cite pmid|PMC3199772}}</ref>
However, across the influenza genome, there is surprisingly little variation in effective population size; all genes are nearly equally low.<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref>
This finding suggests that reassortment between segments occurs slowly enough, relative to the actions of positive selection, that [[wp: genetic hitchhiking | genetic hitchhiking]] causes beneficial mutations in HA and NA to reduce diversity in linked neutral variation in other segments of the genome.
Influenza A/H1N1 shows a larger effective population size and greater genetic diversity than influenza H3N2,<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> suggesting that H1N1 undergoes less adaptive evolution than H3N2.
This hypothesis is supported by empirical patterns of antigenic evolution; there have been nine vaccine updates recommended by the [[wp: World Health Organization | WHO]] for H1N1 in the interpandemic period between 1978 and 2009, while there have been 20 vaccine updates recommended for H3N2 during this same time period.<ref name=IRD>{{cite pmid|22260278}}</ref>
Additionally, an analysis of patterns of sequence evolution on trunk and side branches suggests that H1N1 undergoes substantially less positive selection than H3N2.<ref name=Wolf06>{{cite pmid|17067369}}</ref><ref name=Bhatt11>{{cite pmid|21415025}}</ref>
However, the underlying evolutionary or epidemiological cause for this difference between H3N2 and H1N1 remains unclear.
====Circulation patterns====
The extremely rapid turnover of the influenza population means that the rate of geographic spread of influenza lineages must also, to some extent, be rapid.
Surveillance data shows a clear pattern of strong seasonal epidemics in temperate regions, and less periodic epidemics in the tropics.<ref name=Finkelman07>{{cite pmid|PMC2117904}}</ref> The geographic origin of seasonal epidemics in the Northern and Southern Hemispheres had been a major open question in the field.
However, recent work by Rambaut et al.<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> and Russell et al.<ref name=Russell08>{{cite pmid|18420927}}</ref> has shown that temperate epidemics usually emerge from a global reservoir rather than emerging from within the previous season's genetic diversity.
This work, and more recent work by Bedford et al.<ref name=Bedford10>{{cite pmid|PMC2877742}}</ref> and Bahl et al.,<ref name=Bahl11>{{cite pmid|22084096}}</ref> has suggested that the global persistence of the influenza population is driven by viruses being passed from epidemic to epidemic, with no individual region in the world showing continual persistence.
However, there is considerable debate regarding the particular configuration of the global network of influenza, with one hypothesis suggesting a metapopulation in East and Southeast Asia that continually seeds influenza in the rest of the world<ref name=Russell08>{{cite pmid|18420927}}</ref>, and another hypothesis advocating a more global metapopulation in which temperate lineages often return to the tropics at the end of a seasonal epidemic.<ref name=Bedford10>{{cite pmid|PMC2877742}}</ref><ref name=Bahl11>{{cite pmid|22084096}}</ref>
All of these phylogeographic studies necessarily suffer from limitations in the worldwide sampling of influenza viruses.
For example, the relative importance of tropical Africa and India have yet to be uncovered.
Additionally, the phylogeographic methods used in these studies (see section on [[#Phylogeography | phylogeographic methods]]) make inferences of the ancestral locations and migration rates on only the samples at hand, rather than on the population in which these samples are embedded.
Because of this, study-specific sampling procedures are a concern in extrapolating to population-level inferences.
However, through joint epidemiological and evolutionary simulations, Bedford et al.<ref name=Bedford10>{{cite pmid|PMC2877742}}</ref> show that their estimates of migration rates appear robust to a large degree of under-sampling or over-sampling of a particular region.
Further methodological progress is required to more fully address these issues.
====Simulation-based models====
Forward simulation-based approaches for addressing how immune selection can shape the phylogeny of influenza A/H3N2's hemagglutinin protein have been actively developed by disease modelers over the last decade.
These approaches include both [[wp: compartmental models in epidemiology | compartmental models]] and [[wp: agent-based model | agent-based models]].
One of the first compartmental models for influenza was developed by Gog and Grenfell<ref name=Gog02>{{cite pmid| 12481034}}</ref>, who simulated the dynamics of many strains with partial cross-immunity to one another.
Under a parameterization of long host lifespan and short infectious period, they found that strains would form self-organized sets that would emerge and replace one another.
Although the authors did not reconstruct a phylogeny from their simulated results, the dynamics they found were consistent with a ladder-like viral phylogeny exhibiting low strain diversity and rapid lineage turnover.
Later work by Ferguson and colleagues<ref name=Ferguson03>{{cite pmid| 12660783}}</ref> adopted an agent-based approach to better identify the immunological and ecological determinants of influenza evolution.
The authors modeled influenza's hemagglutinin as four epitopes, each consisting of three amino acids.
They showed that under strain-specific immunity alone (with partial cross-immunity between strains based on their amino acid similarity), the phylogeny of influenza A/H3N2's HA was expected to exhibit 'explosive genetic diversity', a pattern that was not observed in HA trees inferred from empirical influenza A/H3N2 isolates.
This led the authors to postulate the existence of a temporary strain-transcending immunity: individuals were immune to reinfection with any other influenza strain for approximately six months following an infection.
With this assumption, the agent-based model could reproduce the ladder-like phylogeny of influenza A/H3N2's HA protein.
Work by Koelle and colleagues<ref name=Koelle06>{{cite pmid| 17185596}}</ref> revisited the dynamics of influenza A/H3N2 evolution following the publication of a seminal paper by Smith and colleagues<ref name=Smith04>{{cite pmid| 15218094}}</ref>, which showed that, while the genetic evolution of influenza A/H3N2's HA was gradual, the antigenic evolution of the virus occurred in a punctuated manner.
The phylodynamic model designed by Koelle and coauthors argued that this pattern reflected a many-to-one genotype-to-phenotype mapping, with the possibility of strains from antigenically distinct clusters of influenza sharing a high degree of genetic similarity.
Through incorporating this mapping of viral genotype into viral phenotype (or antigenic cluster) into their model, the authors were able to reproduce the ladder-like phylogeny of influenza's HA protein without generalized strain-transcending immunity.
The reproduction of the ladder-like phylogeny resulted from the viral population passing through repeated selective sweeps.
These sweeps were driven by [[wp:Herd immunity | herd immunity]] and acted to constrain viral genetic diversity.
A simplification of this model<ref name=Koelle10>{{cite pmid| 20335193}}</ref> was also able to reproduce the phylogenetic trees of [[wp:Influenza A virus subtype H3N8|equine influenza A/H3N8]] and of influenza B, both of which have two co-circulating lineages.
Instead of modeling the genotypes of viral strains, a compartmental simulation model by Gökaydin and colleagues<ref name=Gökaydin07>{{cite pmid| 17015285}}</ref> considered influenza evolution at the scale of antigenic clusters (or phenotypes).
This model showed that antigenic emergence and replacement could result under certain epidemiological conditions.
These antigenic dynamics would be consistent with a ladder-like phylogeny of influenza exhibiting low genetic diversity and continual strain turn-over.
In recent work, Bedford and colleagues<ref name=Bedford12>{{cite pmid|22546494}}</ref> used an agent-based model to show that evolution in a Euclidean antigenic space can account for the phylogenetic pattern of influenza A/H3N2's HA, as well as the virus's antigenic, epidemiological, and geographic patterns.
The model showed the reproduction of influenza's ladder-like phylogeny depended critically on the mutation rate of the virus as well as the immunological distance yielded by each mutation.
===Phylodynamics of HIV===
====Origin and spread====
The global diversity of HIV-1 group M is shaped by its [[wp:Origin_of_AIDS |origins]] in Central Africa around the turn of the 20th century.
The epidemic underwent explosive growth throughout the early 20th century with multiple radiations out of Central Africa. Multiple [[wp:Founder_effect | founder events]] have given rise to distinct [[wp:HIV_subtypes | subtypes]] which predominate in different parts of the world.
Subtype B is most prevalent in North America and Western Europe, while A and C, which account for more than half of infections worldwide, are common in Africa.<ref name=osmanov02>{{cite pmid | 11832690}}</ref>
Transmissibility of virus, virulence, effectiveness of antiretroviral therapy, and pathogenesis may differ slightly between subtypes.<ref name=taylor08>{{cite pmid | 18971501}}</ref>
The rapid growth of the HIV epidemic is reflected in phylogenies of HIV virus, which are star-like.
Most coalescent events occur in the distant past.
Figure 6 shows an example based on 173 HIV-1 sequences from the Democratic Republic of Congo.<ref name=yusim01>{{cite pmid | 11405933}}</ref>
The rate of exponential growth of HIV in Central Africa in the early 20th century preceding the establishment of modern subtypes has been estimated
* Yusim et al.<ref name=yusim01>{{cite pmid | 11405933 }}</ref> estimated <math>r = 0.17</math> using coalescent methods and a parametric model of exponential growth
* Similar conclusions were reached using nonparametric estimates of <math>N_e</math><ref name=strimmer01>{{cite pmid | 11719579}}</ref>
Different epidemic growth rates have been estimated for different subtypes using coalescent approaches.
The early growth of subtype B in North America was quite high.
Estimates range from <math>r=0.48-0.83</math> transmissions per infection per year.<ref name=walker>{{cite pmid | 15737910}}</ref><ref name=robbins03>{{cite pmid | 12743293}}</ref>
The duration of exponential growth in North America was relatively short, with saturation occurring in the mid- and late-1980s.<ref name=Volz09>{{cite pmid|19797047}}</ref>
The early growth rates of the more common subtype C in Africa is lower (approximately 0.27 per infection per year), although exponential growth has continued for a longer period of time.<ref name=grassly99>{{pmid | 9927440}}</ref><ref name=walker>{{cite pmid | 15737910}}</ref>
HIV-1 group O, which is relatively rare and is found mainly in Cameroon, has grown at a lower rate (approximately 0.068 transmissions per infection per year.<ref name=lemey04> {{cite pmid | 15280223 }}</ref>
HIV-1 sequences sampled over a span of five decades have been used with relaxed molecular clock phylogenetic methods to estimate the time of zoonosis around the early 20th century.<ref name=worobey08>{{cite pmid | 18833279}}</ref>
The estimated TMRCA for HIV-1 coincides with the appearance of the first densely populated large cities in Central Africa.
Similar methods have been used to estimate the time that HIV originated in different parts of the world.
The origin of subtype B in North America is estimated to be in the 1960s, where it went undetected until the AIDS epidemic in the 1980s.<ref name=robbins03>{{cite pmid | 12743293}}</ref>
There is evidence that progenitors of modern subtype B originally colonized the Caribbean before undergoing multiple radiations to North and South America.<ref name=junqueira11>{{cite pmid | 22132104}}</ref>
Subtype C originated around the same time in Africa.<ref name=walker>{{cite pmid | 15737910}}</ref>
====Epidemiological dynamics====
At smaller geographical and time scales, phylogenies of HIV virus may reflect epidemiological dynamics related to risk behavior and [[wp:sexual_network | sexual networks]].
Very dense sampling of viral sequences within cities over short periods of time has given a detailed picture of HIV transmission patterns in modern epidemics.
Sequencing of virus from newly diagnosed patients is now routine in many countries for surveillance of [[wp:Drug_resistance | drug resistance]] mutations, which has yielded large databases of sequence data in those areas.
Lewis et al.<ref name=lewis08>{{cite pmid | 18351795}}</ref> used such sequence data from [[wp:Men_who_have_sex_with_men | men who have sex with men]] in London, UK, and found evidence that transmission is highly concentrated in the brief period of [[wp:Primary_HIV_infection | primary HIV infection]] (PHI), which consist of approximately the first 6 months of the infectious period.
Volz et al.{{Citation needed}} found that patients who were recently infected were more likely to harbor virus that is phylogenetically clustered with other samples from recently infected patients, reflecting that transmission had occurred recently and that transmission is more likely to occur during early infection.
Such phylogenetic patterns arise from epidemiological dynamics which featured an early period of intensified transmission during PHI, but also depends on sampling a large fraction of extant viral lineages.
Dense sampling of HIV sequences has enabled the estimation of transmission rates between different risk groups.
For example, Paraskevis et al.<ref name=paraskevis09>{{cite pmid | 19457244}}</ref> estimated transmission rates between countries in Europe, and Oster et al.<ref name=oster11>{{cite pmid | 21866038}}</ref> showed disparate transmissions within racial and demographic groups within Mississippi, USA.
Purifying immune selection dominates evolution of HIV within hosts, but evolution between hosts is largely decoupled from intra-host evolution.<ref name=rambaut04>{{cite pmid | 14708016}}</ref>
Intra-host HIV phylogenies show continual fixation of advantageous mutations, while population-level HIV phylogenies reflect continual diversification.
Immune selection has relatively little influence on HIV phylogenies at the population level because:
* There is an extreme bottleneck in viral diversity at the time of sexual transmission.<ref name=keele10>{{cite pmid | 20543609}}</ref>
* Transmission tends to occur early in infection before immune selection has had a chance to operate <ref name=cohen11>{{cite pmid | 21591946}}</ref>.
* The replicative fitness measured in transmissions per host are largely extrinsic to virological factors, and include heterogeneous sexual and drug-use behaviors in the host population.
There is some evidence of HIV [[wp:Virulence#Evolution |adaptation towards intermediate virulence]] which can maximize transmission potential between hosts.<ref name=fraser07>{{cite pmid | 17954909}}</ref>
This hypothesis is predicated on several observations:
* The set-point [[wp:Viral_load | viral load]] (SPVL), which is the quasi-equilibrium titer of viral particles in the blood during [[wp:HIV#Chronic_infection | chronic infection]], is correlated with the time until AIDS.
SPVL is therefore a useful proxy for virulence.<ref name=korenromp09>{{cite pmid | 19536329}}</ref>
* SPVL is correlated between HIV donor and recipients in transmission pairs.<ref name=hollingsworth10>{{cite pmid | 20463808}}</ref>
* The transmission probability per sexual act is has positive correlation with viral load.
Thus, there is a trade-off between the [[wp:Intensity_function | intensity]] of transmission and the lifespan of the host.<ref name=baeten03>{{cite pmid | 15043213}}</ref><ref name=fiore97>{{cite pmid | 9233454}}</ref>
* Recent work has shown that given empirical values for transmissibility of HIV and lifespan of hosts as a function of SPVL, adaptation of HIV towards optimum SPVL could be expected over 100-150 years.<ref name=shirreff11>{{cite pmid | 22022243}}</ref>
===References===
<references/>
1008
2012-06-28T14:38:05Z
Tbedford
7
Header
'''Viral phylodynamics''' is the study of how [[wp:genetic variation | genetic variation]] and [[wp:phylogeny | phylogenies]] of viruses are influenced by host and [[wp:pathogen | pathogen]] [[wp:population dynamics | population dynamics]].
Many viruses, especially [[wp:RNA virus | RNA viruses]], rapidly accumulate genetic variation because of short intracellular [[wp:generation time | generation times]] and high [[wp:Rna_virus#Mutation_rates | mutation rates]].
Because viruses evolve very rapidly relative to rates of [[wp:Epidemic | epidemic]] dispersal, the evolution of viruses is heavily influenced by patterns of transmission between hosts.
Phylogenies of viruses have been used to investigate many epidemiological processes, including the effect of selection to escape [[wp:immune system | host immunity]], the rate of [[wp:epidemic model | epidemic spread]], the time that a virus originated in a new host population or species, and the propensity for a virus to be transmitted between different host populations.
== Sources of phylodynamic variation ==
Distinct epidemiological and immunological processes may be recognized by their influence on the [[wp:Phylogenetic tree | phylogenies]] of viruses.
Grenfell et al.<ref name=Grenfell04>{{cite pmid| 14726583}}</ref>, in coining the term ''phylodynamics'', postulated that virus phylogenies "... are determined by a combination of immune selection, changes in viral population size, and spatial dynamics".
Grenfell and colleagues identified three qualitative phylogenetic patterns which may serve as [[wp:Rule of thumb | rules of thumb]] for identifying important epidemiological and immunological processes influencing virus evolution.
The precise mechanisms that generate phylogenetic patterns of viruses can be complex and are described below (See [[#Methods | Methods]]).
* [[wp:Directional selection | Selection]] driven by host immunity may be reflected by an unbalanced or ladder-like tree (see {{See Figure|1}}).<ref name=Grenfell04/>
* Rapid population expansion of the virus may be reflected by a star-like tree (see {{See Figure|2}}), in which external branches are long relative to internal branches.<ref name=Grenfell04/>
* Host [[wp:population stratification | population structure]] may be reflected by the clustering of [[wp:taxa | taxa]] on the tree (see {{See Figure|3}}), so that geographically of otherwise similar viruses show increased genetic similarity.<ref name=Grenfell04/>
[[Image:Wm_selection_tree_bw.svg|thumb|{{Figure|1}} Idealized caricatures of virus phylogenies which show the effects of immune selection. ]]
[[Image:Wm_population_size_bw.svg|thumb|{{Figure|2}} Idealized caricatures of virus phylogenies which show the effects of population growth. ]]
[[Image:Wm_spatial_structure_bw_hs.svg|thumb|{{Figure|3}} Idealized caricatures of virus phylogenies which show the effects of population structure. Tip labels A and B in this case denote spatial locations from which viral samples were isolated. ]]
The first pattern, addressing the effect of immune selection on the topology of a viral phylogeny, is exemplified by contrasting the trees of [[wp:influenza | influenza virus]] and [[wp:HIV | HIV]]'s surface proteins. The phylogeny of influenza virus's [[wp:Hemagglutinin (influenza) | hemagglutinin]] protein bears the hallmarks of strong immune selection (see {{See Figure|1}} and section [[#Phylodynamics of influenza | Phylodynamics of influenza]])).
Conversely, a more balanced phylogeny may occur when a virus is not subject to strong immune selection, which is reflected in the phylogeny of HIV's envelope protein inferred from sequences isolated from different individuals in a population (see {{See Figure|1}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The second pattern, addressing the effect of viral population growth on viral phylogenies, holds that expanding viral populations have "star-like" trees, with long external branches relative to internal branches.
Such trees arise because viruses are more likely to share a common ancestor when the population is small, and a growing population has a smaller population size towards the past.
Conversely, when a population is not growing, external branches will be short relative to branches on the interior of the tree.
The phylogeny of HIV provides a good example of a star-like tree, as the prevalence of HIV infection rose rapidly throughout the 1980s (see {{See Figure|2}} and section [[#Phylodynamics of HIV | Phylodynamics of HIV]]).
The phylogeny of [[wp:hepatitis B virus | HBV]] (see {{See Figure|2}}) is illustrative of a viral population that has remained roughly constant in size.
The third pattern addresses the effect that population structure of the host, especially spatial structure, can have on patterns of genetic diversity in the viral population.
Viruses within similar hosts, such as hosts that reside in the same region or that have similar risk factors for infection, are expected to be more closely related genetically.
The phylogeny of [[wp:Measles | measles]] and [[wp:rabies virus | rabies virus]] (See {{See Figure|3}}) illustrates a virus with strong spatial structure.
However this structure is not observed at all spatial scales, and at smaller spatial scales, the population may appear [[wp:Panmixis | panmictic]].
While spatial structure is the most commonly observed population structure observed in phylodynamic analyses, viruses may also have non-random admixture by attributes such as the age, race, risk behavior, and stage of infection of the infected host {{Citation needed}}.
== Applications ==
=== Dating origins ===
Phylodynamic models may aid in dating epidemic and pandemic origins.
The rapid rate of evolution in viruses allows [[wp:Molecular clock | molecular clock models]] to be estimated from genetic sequences, thus providing a per-year rate of evolution of the virus.
With the rate of evolution measured in real units of time, it's possible to infer the date of the [[wp: most recent common ancestor | most recent common ancestor]] for a set of viral sequences.
The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population.
In April 2009, genetic analysis of 11 sequences of [[wp: 2009 flu pandemic | swine-origin H1N1 influenza]], suggested that the common ancestor existed at or before 12 January 2009.<ref name=Fraser09>{{cite pmid|19433588}}</ref>
This finding aided in making an early estimate of the <math>R_0</math> of the pandemic.
=== Epidemiological ===
An understanding of the transmission and spread of an infectious disease is important to public health interventions.
Phylodynamic models may provide insight into epidemiological parameters that are difficult to assess through traditional surveillance means.
For example, assessment of the [[wp: basic reproduction number | basic reproduction number ]] <math> R_0 </math>, from surveillance data requires carefully controlling for variation in the reporting rate and the intensity of surveillance.
Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provide a separate avenue for inference of <math>R_0</math>.<ref name=Volz09>{{cite pmid|19797047}}</ref>
Such approaches have been used to estimate <math>R_0</math> in hepatitis C<ref name=Pybus01>{{cite pmid|11423661}}</ref> and HIV<ref name=Volz09/>.
Additionally, differential transmission between groups, be they geographic, age or risk-related, is very difficult to assess from surveillance data alone.
[[#Phylogeography | Phylogeographic models]] have the possibility of more directly revealing these otherwise hidden transmission patterns.<ref name=Volz11>{{cite pmid|PMC3249372}}</ref>
Phylodynamic approaches have mapped the geographic movement of human the influenza virus<ref name=Bedford10>{{cite pmid|20523898}}</ref> and quantified the epidemic spread of rabies virus in North American raccoons.<ref name=Lemey11>{{cite pmid|PMC2915639}}</ref>
=== Antiviral resistance ===
Without recourse to antibiotics, [[wp: antiviral drug | antiviral treatment]] is a popular approach for the control of viral pathogens.
However, the use of antivirals creates selective pressure for the evolution of [[wp: drug resistance | drug resistance]] in the virus population, as resistant strains will have a transmission advantage compared to susceptible strains.
Commonly, there is a fitness [[wp: trade-off | trade-off]] between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals.<ref name=Bloom10>{{cite pmid|20522774}}</ref>
Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of [[wp: Oseltamivir | Oseltamivir]] in influenza A (H1N1).<ref name=Chao12>{{cite pmid|21865253}}</ref>
==Methods==
Phylodynamic analyses utilize many of the same methods from [[wp:Computational phylogenetics | computational phylogenetics]] and [[wp:Population genetics | population genetics]].
For example,
* the magnitude of immune selection can be measured by comparing the the rate of nonsynonymous substitution to the rate of synonymous substitution [[wp:DN/dS|(dN/dS)]];
* population structure of the host population may be examined by calculation of [[wp:F-statistics | F-statistics]];
* hypotheses concerning panmixis and selective neutrality of the virus may be tested with statistics such as [[wp:Tajima%27s_D | Tajima's D]].
Most phylodynamic analyses begin with the estimation of a phylogenetic tree.
Genetic sequences are often sampled at multiple time points, which allows the estimation of [[wp:mutation rate | substitution rates]] and the time of the [[wp:Mrca | most recent common ancestor]] using a [[wp:Molecular clock | molecular clock model]].<ref name=Drummond02>{{cite pmid| PMC1462188}}</ref>
For viruses, [[wp:Bayesian inference in phylogeny | Bayesian phylogenetic]] methods are popular because of the ability to fit complex demographic scenarios while integrating out phylogenetic uncertainty.<ref name=Drummond05>{{cite pmid| 15703244}}</ref>
Several analytical methods have been developed to deal specifically with problems related to phylodynamics.
These methods are based on [[wp:Coalescent theory | coalescent theory]] and [[wp:simulation | simulation]].
Coalescent analyses usually assume selective neutrality and are most often utilized to model population demographics such as the historical prevalence of infection.
Simulation methods have most commonly been used to model immune selection (see section [[#Phylodynamics of influenza | Phylodynamics of influenza]]).
===Coalescent theory and phylodynamics===
The coalescent is a mathematical model which describes the ancestry of a sample of [[wp:Recombination_(biology) | non-recombining]] gene copies.
In modeling the coalescent process, time is usually considered to flow backwards from the present.
In a selectively neutral population of constant size <math>N</math> and non-overlapping generations (the [[wp:Fisher-Wright_population | Wright Fisher model]]),
the expected time for a sample of 2 gene copies to ''coalesce'', i.e. find a common ancestor, is <math>N</math> generations.
More generally, the waiting time for two members of a sample of <math>n</math> gene copies to share a common ancestor is [[wp:Exponential_distribution | exponentially distributed]], with rate
: <math> \lambda_n = {n \choose 2} \frac{1}{N} </math>.
This interval is labeled <math>T_n</math>, and its end there are <math> n-1 </math> extant lineages remaining (see {{See Figure|4}}).
These remaining lineages will coalesce at the rate <math>\lambda_{n-1}\cdots \lambda_2</math> after intervals <math>T_{n-1}\cdots T_2 </math>.
This process can be [[wp:Monte_carlo_simulation | simulated]] by drawing exponential [[wp:Random_variable | random variables]] with rates <math>\{\lambda_{n-i}\}_{i=0,\cdots,n-2}</math> until there is only a single lineage remaining (the MRCA of the sample).
In the absence of selection and population structure, the tree topology may be simulated by picking two lineages uniformly at random after each coalescent interval <math>T_i</math>.
[[Image:Tree_epochs.svg|thumb|{{Figure|4}} A gene genealogy illustrating internode intervals. ]]
The expected waiting time to find the most recent common ancestor of the sample is the sum of the expected values of the internode intervals,
: <math> \begin{align}
\mathrm{E}[\mathrm{TMRCA}] & = \mathrm{E}[ T_n ] + \mathrm{E}[ T_{n-1} ] + \cdots + \mathrm{E}[ T_2 ] \\
&= 1/\lambda_n + 1/\lambda_{n-1} + \cdots + 1/\lambda_2 \\
&= 2N(1 - \frac{1}{n}).
\end{align}</math>
Two corollaries are :
* The TMRCA of a sample is not unbounded in the sample size. <math> \lim_{n \rightarrow \inf} \mathrm{E}[\mathrm{TMRCA}] = 2N .</math>
* Few samples are required for the expected TMRCA of the sample to be close to the theoretical upper bound, as the difference is <math> O(1/n) </math>.
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al.<ref name=Robbins03>{{cite pmid| PMC155028}}</ref> estimated the TMRCA to be 1968 for 74 HIV-1 [[wp:HIV_subtype | subtype-B]] genetic sequences collected in the North America.
This is may be a reasonable estimate of the time HIV-1 began circulating in North America, because <math>1 - 1/74 = 99\% </math>.
If the population size <math>N(t)</math> changes over time, the coalescent rate <math> \lambda_n(t) </math> will also be a function of time.
Donnelley and Tavaré<ref name=Donnelly>{{cite pmid| 8825481}}</ref> derived this rate for a time-varying population size under the assumption of constant birth rates:
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N(t)} </math>.
Because all topologies are equally likely under the neutral coalescent, this model will have the same properties as the constant-size coalescent under a rescaling of the time variable: <math> t \rightarrow \int_{\tau=0}^t \frac{ \mathrm{d} \tau }{N(\tau)} </math>.
Very early in an epidemic, the population of virus may be growing [[wp:Exponential_growth | exponentially]] at rate <math> r </math>, so that <math>t</math> units of time in the past, the population will have size <math> N(t) = N_0 e^{-rt}</math>.
In this case, the rate of coalescence becomes
: <math> \lambda_n(t) = {n \choose 2} \frac{1}{N_0 e^{-rt}}</math>.
This rate is small close to when the sample was collected (<math> t=0</math>), so that external branches (those without descendants) of a gene genealogy will tend to be long relative to those close to the root of the tree.
This is why rapidly growing populations yield trees as depicted in {{See Figure|2}}.
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the [[wp:Serial_interval | serial interval]] <math>D</math> for a particular pathogen to estimate the [[wp:Basic_reproduction_number | reproduction number]], <math> R_0 </math>.
The two maybe linked by the following equation {{Citation needed}}:
: <math> r = \frac{R_0 - 1}{D} </math>.
For example, Fraser et al.<ref name=Fraser09/> generated one of the first estimates of <math>R_0</math> for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 [[wp:Influenza_hemagglutinin | hemagglutinin ]] sequences in combination with prior data about the infectious period for influenza.
Infectious disease epidemics are often characterized by highly non-linear and rapid changes in the number of infected individuals and effective population size of virus. In such cases, birth rates are highly variable, which can diminish the correspondence between effective population size and the prevalence of infection.<ref name=Frost10>{{cite pmid| 20478883}}</ref> Many mathematical models have been developed in the field of [[wp:Epidemic_model | mathematical epidemiology]] to describe the nonlinear time series of prevalence of infection and the number of susceptible hosts. A well studied example is the [[wp:Epidemic_model#The_SIR_Model | Susceptible-Infected-Recovered (SIR)]] system of [[wp:Ordinary_differential_equations | differential equations]], which describes the fractions of the population <math>S(t)</math> susceptible, <math>I(t)</math> infected and <math>R(t)</math> recovered as a function of time:
: <math>\frac{dS}{dt} = - \beta S I </math>,
: <math>\frac{dI}{dt} = \beta S I - \gamma I </math>,
: <math>\frac{dR}{dt} = \gamma I </math>.
Here, <math>\beta</math> is the per capita rate of transmissions to susceptible hosts, and <math>\gamma</math> is the rate that infected individuals recover, whereupon they are no longer infectious. In this case, the incidence of new infections per unit time is <math>f(t) = \beta S I</math>, which is analogous to the birth rate in classical population genetics models. Volz et al.<ref name=Volz09>{{cite pmid|19797047}}</ref> proposed that general formula for the rate of coalescence will be
: <math> \lambda_n(t) = {n \choose 2} \frac{2 f(t)}{I^2(t)}. </math>
For the simple SIR model, this yields
: <math> \lambda_n(t) = {n \choose 2} \frac{2 \beta S(t)}{I(t)}. </math>
This has a similar to the Kingman coalescent rate, but is damped by the fraction susceptible <math>S(t)</math>.
The ratio <math> 2{n \choose 2} / {I(t)^2} </math> can be understood as arising from the probability that two lineages selected uniformly at random are both ancestral to the sample. This probability is the ratio of the number of ways to pick two lineages without replacement from the set of lineages and from the set of all infections: <math>{n \choose 2} / {I(t) \choose 2} \approx 2{n \choose 2} / {I(t)^2}</math>. Coalescent events will occur with this probability at the rate given by the incidence function <math>f(t)</math>.
Early in an epidemic, <math>S(0) \approx 1</math>, so for the SIR model,
: <math> \lambda_n(0) \approx {n \choose 2} \frac{2 \beta}{I(0)}.</math>
This has the same mathematical form as the rate in the Kingman coalescent, substituting <math>N_e = I(t)/2\beta</math>. Consequently, estimates of effective population size based on the Kingman coalescent will be proportional to prevalence of infection during the early period of exponential growth of the epidemic.<ref name=Frost10>{{cite pmid| 20478883}}</ref>
=== Phylogeography ===
At the most basic level, the presence of geographic population structure can be revealed by comparing genetic relatedness of viral isolates to geographic relatedness.
A basic question is whether geographic character labels are more clustered on a phylogeny than expected under a simple non-structured model (see {{See Figure|3}}).
This question can be answered by counting the number of geographic transitions on the phylogeny via [[wp: maximum parsimony (phylogenetics) | parsimony]], [[wp: maximum likelihood | maximum likelihood]] or through [[wp: Bayesian inference in phylogeny | Bayesian inference]].
If population structure exists then there will be fewer geographic transitions on the phylogeny than expected in a panmictic model.<ref name=Chen09>{{cite pmid|PMC2633721}}</ref>
This hypothesis can be tested by randomly scrambling the character labels on the tips of the phylogeny and counting the number of geographic transition present in the scrambled data.
By repeatedly scrambling the data and calculating transition counts, a [[wp:null distribution | null distribution]] can be constructed and a [[wp:p-value | p-value]] computed by comparing the observed transition counts to this null distribution.<ref name=Chen09/>
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the geographic locations of ancestral lineages.
Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a [[wp:substitution model | substitution model]].
The same phylogenetic machinery that is used to infer [[wp: DNA substitution matrices | models of DNA evolution]] can thus be used to infer geographic transition matrices.<ref name=Lemey09>{{cite pmid|19779555}}</ref>
The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree.
In a geographic transmission network, some regions may mix more readily and other regions may be more isolated.
Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'.
With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny.<ref name=Lemey09/>
===Simulation===
Using coalescent and phylogenetic models, it is possible to answer questions about historic viral prevalence and patterns of geographic spread, but a full model of the interaction between the evolution of the virus and the development of host immunity is not amenable to straight-forward phylogenetic or coalescent analysis.
Here, it is common to instead use a forward simulation-based approach. Simulation-based models may be [[wp: compartmental models in epidemiology | compartmental]], tracking the numbers of hosts infecteds and recovered to different viral strains,<ref name=Gog02>{{cite pmid|PMC139294}}</ref> or may be [[wp: agent-based model | individual-based]], tracking the infection state and immune history of every host in the population.<ref name=Ferguson03>{{cite pmid|12660783}}</ref><ref name=Koelle06>{{cite pmid|17185596}}</ref>
Generally, compartmental models offer significant advantages in terms of speed and memory-usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
It is necessary to specify a transmission model for the process by which infection spreads between infected and susceptible hosts.
For [[wp: antigenic variation | antigenically variable]] viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc...
The level of protection against one strain of virus by a second strain is known as [[wp: cross-reactivity | cross-immunity]].
In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.<ref name=Park09>{{cite pmid|19900931}}</ref>
A forward simulation model may account for geographic population structure or age structure by modulating contact rates between host individuals of different geographic or age classes.
Commonly, a viral strain is denoted by its nucleotide or amino acid sequence.
Here, in addition to transmission and recovery events, mutations may occur, converting a virus from one strain to another.
Often, the degree of cross-immunity between virus strains is assumed to be related to their [[wp: hamming distance | sequence distance]].
Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
==Examples==
===Phylodynamics of Influenza===
Human influenza is an acute respiratory infection primarily caused by viruses [[wp: influenza A virus | influenza A]] and [[wp: Influenzavirus B | influenza B]].
Influenza A viruses can be further classified into subtypes, such as [[wp: influenza A virus subtype H1N1 | A/H1N1]] and [[wp: influenza A virus subtype H3N2 | A/H3N2]].
Here, subtypes are denoted according to their [[wp: hemagglutinin (influenza) | hemagglutinin]] (H or HA) and [[wp: viral neuraminidase | neuraminidase]] (N or NA) genes, which as surface proteins, act as the primary targets for the [[wp: humoral immunity | humoral immune response]].
Influenza viruses circulate in other species as well, most notably as [[wp: swine influenza | swine influenza]] and [[wp: avian influenza | avian influenza]].
Through [[wp: reassortment | reassortment]], genetic sequences from swine and avian influenza occasionally enter the human population.
If a particular hemagglutinin or neuraminidase has been circulating outside the human population, then humans will lack immunity to this protein and an [[wp: influenza pandemic | influenza pandemic ]] may follow a host switch event, as seen in 1918, 1957, 1968 and 2009.
After introduction into the human population, a lineage of influenza generally persists through [[wp: antigenic drift | antigenic drift]], in which HA and NA continually accumulate mutations allowing viruses to infect hosts immune to earlier forms of the virus.
These lineages of influenza show recurrent seasonal epidemics in temperate regions and less periodic transmission in the tropics.
Generally, at each pandemic event, the new form of the virus outcompetes existing lineages.<ref name=Ferguson03>{{cite pmid|12660783}}</ref>
The study of viral phylodynamics in influenza primarily focuses on the continual circulation and evolution of epidemic influenza, rather than on pandemic emergence.
Of central interest to the study of viral phylodynamics is the distinctive phylogenetic tree of epidemic influenza A/H3N2, which shows a single predominant trunk lineage that persists through time and side branches that persist for only 1-5 years before going extinct (see {{See Figure|5}}).<ref name=Fitch97>{{cite pmid|9223253}}</ref>
[[Image:flutree.png|300px|thumb|{{Figure|5}} Phylogenetic tree of the HA1 region of the HA gene of influenza A (H3N2) from viruses sampled between 1968 and 2002.
Tips are colored according to the antigenic cluster assignment made by Smith et al.<ref name=Smith04>{{cite pmid|15218094}}</ref> and branches are colored based on a phylogenetic discrete traits model.<ref name=Lemey09>{{cite pmid|PMC2740835}}</ref>]]
====Selective pressures====
Phylodynamic techniques have provided insight into the relative selective effects of mutations to different sites and different genes across the influenza virus genome.
The exposed location of hemagglutinin (HA) suggests that there should exist strong selective pressure for evolution to the specific sites on HA that are recognized by antibodies in the human immune system.
These sites are referred to as [[wp: epitope | epitope]] sites.
Phylogenetic analysis of H3N2 influenza has shown that putative epitope sites of the HA protein evolve approximately 3.5 times faster on the trunk of phylogeny than on side branches (see {{See Figure|5}}).<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref>
This suggests that viruses possessing mutations to these exposed sites benefit from positive selection and are more likely than viruses lacking such mutations to take over the influenza population.
Conversely, putative non-epitope sites of the HA protein evolve approximately twice as fast on side branches than on the trunk of the H3 phylogeny,<ref name=Bush99>{{cite pmid|10555276}}</ref><ref name=Wolf06>{{cite pmid|17067369}}</ref> indicating that mutations to these sites are selected against and viruses possessing such mutations are less likely to take over the influenza population.
Thus, analysis of phylodynamic patterns gives insight into underlying selective forces.
A similar analysis combining sites across genes shows that while both HA and NA undergo substantial positive selection, internal genes have little acceleration of trunk substitutions, suggesting an absence of positive selection.<ref name=Bhatt11>{{cite pmid|21415025}}</ref>
Further analysis of HA has shown it to have a very small [[wp: effective population size | effective population size]] relative to the census size of the virus population, as expected for a gene undergoing strong positive selection.<ref name=Bedford11>{{cite pmid|PMC3199772}}</ref>
However, across the influenza genome, there is surprisingly little variation in effective population size; all genes are nearly equally low.<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref>
This finding suggests that reassortment between segments occurs slowly enough, relative to the actions of positive selection, that [[wp: genetic hitchhiking | genetic hitchhiking]] causes beneficial mutations in HA and NA to reduce diversity in linked neutral variation in other segments of the genome.
Influenza A/H1N1 shows a larger effective population size and greater genetic diversity than influenza H3N2,<ref name=Rambaut08>{{cite pmid|PMC2441973}}</ref> suggesting that H1N1 undergoes less adaptive evolution than H3N2.
This hypothesis is supported by empirical patterns of antigenic evolution; there have been nine vaccine updates recommended by the [[wp: World Health Organization | WHO]] for H1N1 in the interpandemic period between 1978 and 2009, while there have been 20 vaccine updates recommended for H3N2 during this same time period.<ref name=IRD>{{cite pmid|22260278}}</ref>
Additionally, an analysis of patterns of sequence evolution on trunk and side branches suggests that H1N1 undergoes substantially less positive selection than H3N2.<ref name=Wolf06>{{cite pmid|17067369}}</ref><ref name=Bhatt11>{{cite pmid|21415025}}</ref>
However, the underlying evolutionary or epidemiological cause for this difference between H3N2 and H1N1 remains unclear.
====Circulation patterns====
The extremely rapid turnover of the influenza population means that the rate of geographic spread of influenza lineages must also, to some extent, be rapid.
Surveillance data shows a clear pattern of strong seasonal epidemics in temperate regions, and less periodic epidemics in the tropics.<ref name=Finkelman07>{{cite pmid|PMC2117904}}</ref&g