SK and ES conceived and designed the experiments and wrote the paper. All authors analyzed the data and contributed reagents/materials/analysis tools.
Weizmann Institute of Science may file a patent based on this work.
Trinucleotide hereditary diseases such as Huntington disease and Friedreich ataxia are cureless diseases associated with inheriting an abnormally large number of DNA trinucleotide repeats in a gene. The genes associated with different diseases are unrelated and harbor a trinucleotide repeat in different functional regions; therefore, it is striking that many of these diseases have similar correlations between their genotype, namely the number of inherited repeats and age of onset and progression phenotype. These correlations remain unexplained despite more than a decade of research. Although mechanisms have been proposed for several trinucleotide diseases, none of the proposals, being disease-specific, can account for the commonalities among these diseases. Here, we propose a universal mechanism in which length-dependent somatic repeat expansion occurs during the patient's lifetime toward a pathological threshold. Our mechanism uniformly explains for the first time to our knowledge the genotype–phenotype correlations common to trinucleotide disease and is well-supported by both experimental and clinical data. In addition, mathematical analysis of the mechanism provides simple explanations to a wide range of phenomena such as the exponential decrease of the age-of-onset curve, similar onset but faster progression in patients with Huntington disease with homozygous versus heterozygous mutation, and correlation of age of onset with length of the short allele but not with the long allele in Friedreich ataxia. If our proposed universal mechanism proves to be the core component of the actual mechanisms of specific trinucleotide diseases, it would open the search for a uniform treatment for all these diseases, possibly by delaying the somatic expansion process.
Trinucleotide diseases are a broad family of hereditary diseases characterized genetically by an expanded DNA region consisting of a repeated three-letter code. Patients inheriting such an abnormal DNA region experience sudden disease onset at an age that inversely depends on the size of the expanded region, followed by inevitable and highly predictable suffering and death. Despite more than a decade of research, the underlying mechanism of these diseases remains an enigma. Although the genes implicated with the various trinucleotide diseases are unrelated, and the defects in these genes occur in different parts of the DNA coding for the gene, the diseases' shared characteristics suggest a common mechanism underlies their root cause. We suggest a mechanism that uniformly explains how the inherited DNA repeats genetically encode the time of onset and the rate of progression of trinucleotide diseases. It suggests the disease manifests and progresses through the further expansion of the inherited abnormally expanded DNA region. It explains the clinical data of many diseases in this family, including previously unexplained onset-related phenomena. It also predicts that a general therapy for these diseases would be a drug or procedure that successfully interferes with the ongoing expansion of the disease trinucleotide repeat.
Trinucleotide diseases are hereditary disorders in which a gene that harbors a trinucleotide repeat is inherited with a number of repeats that exceeds a disease-specific threshold [
The genes associated with the various diseases are structurally and functionally unrelated. Despite their differences, many of the trinucleotide diseases share intriguing phenotype characteristics [
The mechanism, which leads to such genetically encoded delay in disease onset, is yet unknown. For polyglutamine diseases, it is currently assumed that the extended polyglutamine has a gain of a toxic function which leads to cumulative damage in the affected cells, possibly in the form of glutamine aggregate formation [
This suggested mechanism of cumulative damage has several shortcomings and is unlikely to explain the strong correlations of onset and repeat length. First, the strong correlations of repeat length and age of onset are also apparent in nonpolyglutamine diseases such as DM1 and FRDA, suggesting a mechanism that is unrelated to the specific gene function or expression level. Second, in the rare case of patients with homozygous mutation (two expanded alleles), the cumulative damage mechanism would predict a significant decrease in age of onset, which is in contradiction with recent clinical findings that homozygousity does not result in earlier onset [
Several previous studies in trinucleotide diseases animal models, including mouse [
Understanding the mechanism by which the number of inherited repeats affects the onset age and disease progression is highly desirable, as it may open new treatment opportunities. Here, we propose that a universal mechanism of length-dependent somatic mutation underlies trinucleotide diseases and accounts for these striking genotype–phenotype correlations.
Our proposed mechanism specifies that onset and progression of the disease are determined by the rate of expansion of the trinucleotide repeat in certain cells in the patient's body. The disease manifests when the trinuecleotide repeat has expanded beyond a certain threshold in a sufficient number of these cells, and progresses as more and more cells do so. For each disease, our universal mechanism, described in
(A) The patient inherits one gene (or two genes in recessive disease) that harbours a trinucleotide repeat that exceeds disease-specific threshold (green line,
(B) A disease-specific group of cells that determines the disease onset and progression is initially clustered around the inherited value.
(C) During the lifetime of the patient, the number of repeats in these cells increases stochastically, (D) some crossing a pathological threshold (red line, 150 in this example) while the patient is still considered healthy.
(E) Disease commences when in a critical portion of these cells (
(F) The disease progresses toward death as more cells cross the target threshold.
(G) The rate of allele expansion
(H) Equations for the mean and standard deviation of allele size as a function of the patient's age
(I) The mechanism predicts an exponentially decreasing onset curve similar to curves obtained from clinical data for trinucleotide diseases.
We have conducted computer simulations and mathematical analysis of our proposed mechanism (see
(A) The mechanism, with parameters fitted to the clinical data [
(B) The somatic expansion of repeats as a function of age of patients with HD with various inherited allele size. Onset occurs when enough cells (critical portion of 20%) cross the pathological threshold (red line). The slower expansion of shorter alleles (40–50 repeats) accounts for a larger difference in the age of onset (26 y) in contrast to longer alleles (60–70 accounts for only 7 y). A single-repeat (39–40) difference in short alleles close to the initial threshold (green line) may reduce several years from onset age compared to a single repeat difference in longer alleles (49–50).
While most trinucleotide diseases are autosomal dominant, FRDA is the only known autosomal recessive trinucleotide disease. In this disease, the repeat sequence GAA is found in the first intron of the gene coding for Frataxin. A patient with FRDA has inherited two expanded disease alleles, which typically range in size from 200 repeats and up to more than 1,000 repeats. Previous studies [
Simulation results for patients of (A) dominant and (B) recessive diseases with various combinations of two inherited alleles. The age of onset as a function of short/long allele size and regression line are presented. In a dominant disease (A), only the longer allele size is in strong anticorrelation (
In trinucleotide diseases there is also a correlation between the number of repeats and the rate of symptoms progression [
The variability of repeat-size distribution is zero at birth and increases during the patient's lifetime as a result of the independent stochastic expansion process. At the time of onset (20% of the cells exceed the pathological threshold) a wide distribution in the CAG40 patient with late onset (A) accounts for slower progression (only 35% of cells exceed pathological threshold after 4 y) while a narrow distribution in the CAG70 patient with juvenile onset accounts for faster progression (85% after 4 y).
In rare cases, patients with polyglutamine diseases carry two copies of the disease allele and are considered homozygote to the disease. One would expect that if polyglutamine toxicity damage accumulated from the patient's birth time, having two copies of a disease allele would have a tremendous effect on the age of onset. However, recent clinical studies of homozygote patients did not find any reduction in the expected age of onset due to homozygousity [
(A–C) Simulation results show the long allele distribution at onset for heterozygote patients (A) with onset at 60 y and homozygote patients (B) with onset at 56 y, both with 40 inherited repeats. The homozygote patients show more narrow distribution, which is closer to the pathological threshold, which leads to a faster disease progression.
(C) The difference in age of onset is rather small (homozygote ∼6% earlier) and therefore is undetectable considering other variability factors; however, the difference in progression is significant (homozygote ∼30% faster).
Mouse models of trinucleotide diseases demonstrate that somatic mutations exist in the disease-associated tissue and that those mutations expand with age [
Clinical studies [
We suggest that a length-dependent somatic expansion mechanism underlies the genetically encoded delayed onset of trinucleotide diseases. According to the mechanism, the inherited disease allele has no toxic implications on the disease-related cells before it expands beyond a disease-specific pathological threshold, leading to cell pathology. Several clinical and experimental findings provide support for this mechanism. First, it provides a simple explanation to the correlation between age of onset and number of inherited repeats uniformly for both polyglutamine and nonpolyglutamine diseases. In addition, the disease dynamics implied by our mechanism explains the exponential shape of the onset curves, the faster progression associated with juvenile onset, the correlation with the short allele only in the recessive disease FRDA, and the similar onset but faster progression for patients with HD with homozygous mutations. The commonly assumed mechanism of cumulative damage or slow aggregate formation does not seem to be able to explain most of these disease-related phenomena. Our mechanism does not contradict studies in mouse models, which are focused on understanding the pathology of the different diseases, showing that this pathology occurs only when the number of repeats is sufficiently long. Thus, it provides explanation to the large repeat number that is required for symptomatic mouse models.
The universal mechanism suggested in this work may apply to many trinucleotide diseases. Nevertheless, it provides several predictions that may be subject to further experimental validation in a disease-specific context, possibly by the use of animal models. One challenge is to identify for each disease which group of cells triggers disease onset. Our mechanism predicts that the somatic expansion in this group of cells would be particularly high. Another prediction is that somatic repeat expansion is expected to progress with the age of an affected animal even prior to disease onset. Finally, the model predicts that the rate of repeat expansion increases with time, and that at any time is a function of the repeat length at that time. Newly available technologies that facilitate the amplification and measurement of the repeat length at a single-cell resolution may characterize and accurately measure the mutation progress rate for various cell populations in the affected organ of mouse models for various diseases.
Our mechanism suggests that the disease gene is not toxic for many years and that the time to onset is counted by a silent expansion of the repeat with no physiological implication on the cell. This may have significant clinical implications on the effort to find therapies for these cureless inherited diseases. Rather than addressing direct causes of pathology such as polyglutamine aggregates, therapeutic effort may focus on delaying the onset by slowing the somatic expansion process, which is known to be mediated by DNA repair mechanisms [
Computer simulations were performed on a group of 1,000 cells in which the number of repeats was initialized to
To simulate recessive disease and compare it with dominant disease patient with two disease alleles (
Duration Difference = 100 × (
Onset Difference = 100 × (
We have derived an analytical model that describes the dynamic behavior of the mean and the standard deviation of allele size distribution that is stochastically expanding under the length-dependent expansion rate assumed by the mechanism we describe. The equations (shown in
(326 KB JPG)
(410 KB DOC)
(1.8 MB QT).
(1.9 MB QT).
The Entrez Gene (
ES is the incumbent of the Harry Weinrebe Professorial Chair of Computer Science and Biology. SK is supported by the Yeshaya Horowitz association through the Center for Complexity Science.
Friedrich ataxia
Huntington disease