Skip to main content
Advertisement
  • Loading metrics

An Exploration of the Universe of Polyglutamine Structures

Abstract

Deposits of misfolded proteins in the human brain are associated with the development of many neurodegenerative diseases. Recent studies show that these proteins have common traits even at the monomer level. Among them, a polyglutamine region that is present in huntingtin is known to exhibit a correlation between the length of the chain and the severity as well as the earliness of the onset of Huntington disease. Here, we apply bias exchange molecular dynamics to generate structures of polyglutamine expansions of several lengths and characterize the resulting independent conformations. We compare the properties of these conformations to those of the standard proteins, as well as to other homopolymeric tracts. We find that, similar to the previously studied polyvaline chains, the set of possible transient folds is much broader than the set of known-to-date folds, although the conformations have different structures. We show that the mechanical stability is not related to any simple geometrical characteristics of the structures. We demonstrate that long polyglutamine expansions result in higher mechanical stability than the shorter ones. They also have a longer life span and are substantially more prone to form knotted structures. The knotted region has an average length of 35 residues, similar to the typical threshold for most polyglutamine-related diseases. Similarly, changes in shape and mechanical stability appear once the total length of the peptide exceeds this threshold of 35 glutamine residues. We suggest that knotted conformers may also harm the cellular machinery and thus lead to disease.

Author Summary

Misfolding and aggregation of several proteins are known to be related to neurodegenerative diseases. Among them, polyglutamine expansions are known to be responsible for at least 9 diseases, including Huntington. Nonetheless, the structural properties of these intrinsically disordered proteins are difficult to study using classical techniques because of their rapid fluctuations that result in high conformational polymorphism. Here, we use molecular dynamics simulations to study polyglutamines of different chain lengths, starting with short non-pathogenic ones, and study the independent structures they are able to form. We characterize all structures by their geometrical properties, connectivity, putative mechanical stability and residence time (life span). Similar to the findings of a previous study with polyvalines, only some of the conformers are similar to those found in natural globular proteins. Moreover, we find structures that contain knots in both polyglutamine and polyvaline 60-mers, although the former contains many more knotted conformers than the latter. We suggest that these knotted conformers may impair the cell machinery for degradation and eventually lead to toxicity.

Introduction

Less than two thousands protein folds have been identified in nature [1, 2], indicating that similar folds can be adopted by large numbers of sequences. These folds have been characterized and classified in the CATH database [3]. Recently, Cossio et al. [4] considered a single sequence—polyvaline (polyV) 60-mer (denoted here as V60)—and generated, through all-atom simulations, an exhaustive database with 30 063 conformations. Interestingly, only a small fraction of the V60 conformations turned out to be CATH-like in that they had at least one similar structure in the CATH database. The similarity was assessed by a TM-score being higher than 45%. The score is obtained through an algorithm for protein comparison based on secondary structure alignment [5]. Thus they explored, in their own words [4], the universe of protein structures beyond the Protein Data Bank. They argued that there must be an evolutionary principle that favors shorter loops and directs the evolution to a certain spot in the universe of possible conformations.

Long polyV chains do not exist in nature. However, many proteins in eukaryotic cells contain homopolymeric tracts, defined as repetitions of the same residue. In particular, upon inspection of the revised human proteome stored in UniProt Knowledge Database [6], we found that 18.9% of the human proteome involves homopolymeric tracts of size 5 or greater, while the probability of one happening by chance is 6 ⋅ 10−6. Among these, the longest chains have been found for polyserine (polyS, 58 repeats, in the TNRC18 protein, with a random probability of 4 ⋅ 10−76) and polyglutamine (polyQ, 40 repeats, in FOXP2 protein, random probability of 9 ⋅ 10−53).

PolyQ chains are known to be responsible for several brain disorders, including Huntington disease (HD). HD is caused by a protein in the human brain known as huntingtin (HTT)—of, as yet, not fully elucidated function. HTT is known to be highly involved in development [7], and is thought to be related to gene expression regulation [8] and to anchoring or transport of vesicles [9]. A HTT mutant with an expansion of polyQ that exceeds the threshold of about 35-mer was linked to the disease [10]. Even though polyQ tracts have been extensively studied [1113], the molecular physiopathology behind the connection between sequence length and disease remains elusive.

Another example of disease-related homopolymeric tracts is polyalanine (polyA), occurring in transcription factors. Expansions of the polyA tracts beyond certain thresholds (e.g. 19) have been recognized as the cause of congenital malformation syndromes, skeletal dysplasia and nervous system anomalies [14, 15]. The strong evolutionary conservation of the polyA tracts suggests the existence of critical structural or functional constraints [14]. It should be noted that in human proteins the polyA tracts are short compared to those of polyQ [15].

Here, we focus on polyQ chains of various lengths, Qn, where n goes from 16 to 80. The case of n = 62 was the subject of a recent single-molecule force spectroscopy study [16] that revealed a large conformational polymorphism (monitored as a spectrum of different breaking points and characteristic force-peak heights, up to 800 pN). The questions we ask are as follows: 1) can we explain this conformational polymorphism? 2) can polyQ tracts generate non-CATH-like conformers? 3) what are the structural and mechanical properties of the polyQ structures? In order to answer them, we follow a bias exchange molecular dynamics approach (BEMD) [17] also used by Cossio et al. [4]—one of the meta dynamics approaches—and explore the structural and dynamical properties of Qn with a particular focus on Q20 and Q60, representative examples below and above the HD’s pathological threshold.

We take two perspectives in our analysis: 1) making comparisons of Q60 to V60 and to the similar-sized proteins from the CATH database; 2) investigating the changes in the physical properties of the conformations corresponding to Qn as one varies n. The dynamical properties can be conveniently captured by their mechanical stability, as characterized by the characteristic force, Fmax, needed to unravel a structure by pulling by its termini at a constant speed, vp. This part of our studies makes use of a structure-based coarse-grained model [18, 19] to access the regime of near-experimental speeds and to deal with the large statistics. It should be noted that typical fluctuation times of these structures are much smaller than those needed to fully unravel them [20]. Thus, the results on Fmax are merely indicators of the putative mechanical stability of each specific conformer that do not take the intrinsic evolution of the disordered protein into account.

We find that relatively large mechanical stability may arise not only from structures with large secondary structure content (, measured as the percentages of residues belonging to α-helices, β-strands and hydrogen-bonded turns) but also from those with of about 30%. Interestingly, we also find spontaneous generation of knotted structures for n = 60, which tend to be of a size of 36 residues, about HD’s threshold. This is a novel feature in neurotoxic proteins that needs further investigation.

Methods

Generation and selection of structures

Our BEMD [17] simulations were carried out using the GROMACS molecular dynamics package [21] and the PLUMED extension [22]. The force field used is AMBER99SB [23] and the implicit solvent model is the generalized Born surface area method [24]. The same force field has been used before in folding simulations with explicit solvent [17, 25], but implicit solvent is preferred in order to efficiently explore the energy landscape [4]. Structures were initialized randomly using the MODELLER software [26]: 10 off-template models were done for each protein; the models that contained knots were discarded and the remaining ones were minimized through up to 1000 steps of the steepest descent method followed by up to 4000 steps of the conjugate gradient algorithm [27]. The system which acquired the smallest potential energy after the two minimization stages was chosen for further studies.

In order to generate a variety of Qn structures, we applied the BEMD method with six replicas, each with a different secondary structure bias: the first one with no bias; the next three with a preference to α-helix in the first, second and last third of the chain sequence; the fifth with a preference to anti-parallel β-strands and the last one with parallel β-strands. The method is explained in detail in the S1 Text.

We first obtained a number of conformations with a varying secondary structure content. To select the structures of interest, we followed the three-sieve protocol used in [4], described in the S1 Text, that yields structures with which are temporally and structurally independent. From a 2 μs simulation of Q60, 246 independent conformers were obtained from 953 time clusters. For Q20, a 0.66 μs simulation resulted into 491 independent conformers out of 517 time clusters. Interestingly, half the simulation time for the short peptide yielded twice as many independent conformations as the longer one, which indicates higher polymorphism and faster dynamics in Q20. The procedure led to the emergence of some knotted structures even though there were no knots present in the initial homology-derived conformers.

After clustering, all independent structures underwent a minimization process of 10 000 steepest descent steps or until the maximum force between a pair of atoms was smaller than 0.25 J/(mol nm) so that the structure in the closest energy minimum is obtained. In this process, some of the residues may form or break contacts, thus changing their secondary structure content slightly. Therefore, even though the structures were selected with before the first clustering, some of the final structures may have a smaller content.

The structures of V60 were taken from ref. [4]. Their 50 μs simulation yielded 30 063 time clusters. We have applied the structural clustering to this set and obtained 7076 independent conformers—as these were not available. All the independent structures generated are provided in the S1 Text.

Three previous works have explored the aggregation properties of Qn by computer simulations. In the first of them, the focus is on the temporal stability of the structures and on the evaluation of their amyloidogenesis and fibrillation capabilites [28]. The second study explores the landscape of possible conformations by simplifying the structure of glutamine and generating a model that efficiently samples many conformations [29]. Finally, the third one applies replica exchange molecular dynamics to explore the dimerization of polyQ [30]. The three works show structures such as steric zippers and a β-helix, which have been found in our sampling among the strongest Q60 and Q20 conformations shown in S1 and S2 Figs. Furthermore, rod-like conformers with close to 100% α-helical content were also described in the aforementioned works and have likewise been found in this sampling. Our simulations did not find the mainly β conformations suggested in Ref. [30] because we consider monomers instead of dimers.

Descriptors of the structures

Our structural analysis deals with several descriptors. One is the radius of gyration, Rg, which characterizes the linear size of the molecule. Another is the w parameter which describes the shape [31, 32], which is defined through the diagonalization of the tensor of inertia and by making combinations of the three main radii such that a near-zero w corresponds to a globular shape, a positive w to an elongated one, and a negative w to a flattened object. The third descriptor is the parameter, which is determined by using the DSSP procedure [33]. This parameter is a sum of several ingredients: the α-helical content, the β-content (strands and bridges), and the hydrogen-bonded turn content.

The next two descriptors are Fmax and ⟨z⟩—the average coordination number. The former relates to the dynamics directly, while the latter relates to it indirectly since z measures the number of residues a given residue interacts with. These interactions are of two kinds: through the peptide bond with the two nearest residues along the sequence and through contact interactions with residues which are not sequential neighbors. The contacts play a dynamical role in coarse-grained structure-based models but they can also be used as descriptors in all-atom models. The specific definition of the contacts we use is based on enlarged van der Waals spheres associated with the heavy atoms [18, 19] and the radii of the spheres are given in ref. [34]: a contact between two residues exists if there is at least one pair of heavy atoms with overlapping spheres.

In the structure-based model, we assign Lennard-Jones potentials of depth ɛ to these contacts (the potential minimum is at the distance between the Cα atoms in the reference structure) so that larger values of ⟨z⟩ are expected to correspond to more stable structures. Maxwell demonstrated [35] that large three-dimensional systems of particles with pairwise interactions are stable provided the ⟨z⟩ is bigger than 6. In particular, this finding has been shown to be consistent with the behavior of virus capsids [36]. In our case, the structures are much smaller than the capsids and, therefore, the threshold value of ⟨z⟩ is reduced, as explained in the S1 Text. Furthermore, the systems we study are also endowed with the local backbone stiffness—a four-particle interaction [37]—which favors the local chirality of the reference state. Thus non-zero values of Fmax for structures with ⟨z⟩ smaller than 6 are allowed.

A stretching force (F) vs. displacement (d) curve may include articulated force peaks –which exceed the thermal noise level of about 0.1 ɛ/Å—before the F raises indefinitely due to stretching of the peptide bonds. The calculations are done at the temperature of 0.3 ɛ/kB, where kB denotes the Boltzmann constant. The number of peaks is denoted by np. Fmax is defined as the height of the largest peak. If none exists then Fmax is defined to be zero, even though it could take any value below the baseline of the curve (around 0.4 ɛ/Å). We simulate stretching at vp of 5 ⋅ 10−3 Å/τ, where τ is of order 1 ns. Experimental vp’s are typically lower, e.g. 4 ⋅ 10−6 Å/ns in ref. [16]. We have calibrated [19] ɛ to be 110 pN Å (with a 25% error bar) by comparing theoretical and experimental values of Fmax in 38 proteins (the theoretical results involved extrapolation to the experimental vp’s). The temperature in the coarse-grained simulations is in the vicinity of the room temperature.

Finally, one can consider the contact order (CO) as yet another structural descriptor. It is related to the number of contacts as well as the average distance along the sequence between the contacting residues, as defined in Ref. [38] as , where Sk is the distance between the residues that form contact k, L is the number of residues in the protein and N the number of contacts. There is a question whether CO correlates with the folding time or not (see [39] and [40] for the arguments for and against it), so one may also inquire whether Fmax correlates with CO. It seems unlikely that the free energy barrier to mechanical unfolding is of the same nature as the one for folding (or thermal unfolding) [41] but this does not preclude a correlation with CO. However, we do not find the correlation to be valid (see the S1 Text) which is consistent with the fact that the green fluorescent protein (PDB code 1GFL) has a bigger Fmax than the I27 domain of titin (the PDB code 1TIT), 2.7 vs. 2.1 ɛ/Å [19], whereas its CO is smaller, 0.22 vs. 0.36.

Results

Properties of Q60 and Q20

We first consider the Q60 set, so that one can compare with V60 from [4] and with the experimental results on Q62 in [16]. S1 Fig shows structures corresponding to the top five values of Fmax. Similar figures for other sets studied are shown in S2 and S3 Figs. The values range between 2.1 and 2.3 ɛ/Å (approximately between 230 and 250 pN) which is of the order of what has been found—about 200 pN—for the I27 domain of titin [42, 43] at smaller vp’s. The fact that these values are much smaller than the ones found experimentally in [16] can be attributed to a small statistics, since the experimental systems yielded high force only with low probability (p(Fmax > 200 pN) = 7 ± 6%). The figure also shows the corresponding Fd patterns together with distances at which particular contacts break down (the distance in the contact exceeds the reference distance by 50%) for the last time. The contacts are labeled by the sequential distances ∣ij∣ between residues i and j. The number of force peaks varies between 1 and 4, corresponding to several substructures forming in each conformer. The third column of panels in S1 Fig provides the values of the relevant descriptors. Rg is seen to range between 11.05 and 14.20 Å and the values of w indicate that the fifth structure is elongated while the other four are nearly globular. The most stable structure of the five shown (the top row of panels) corresponds to the largest ⟨z⟩ (7.67), and (66.7%)—the secondary structure is, in this case, exclusively of the β type.

Interestingly, the second most robust structure, as judged by the value of Fmax, has very low ⟨z⟩ (5.33) and (28.3%).

This system was compared to Q20, which is unrelated to disease and is close to Q19, which was studied experimentally [16]. The two left columns of Fig 1 refer to all structures in sets Q20 and Q60. In particular, the first row represents the geometries obtained on the Rgw plane. The convention we use is that we represent the data corresponding to structures with of at least 50% and those with lower by filled and open symbols, respectively. The structures from Q60 are seen to overlap with the region taken by Q20 but they also extend to much larger Rg and to bigger w. The largest value of Rg corresponds to a low , while high w can be achieved with any value of .

thumbnail
Fig 1. Scatter plot relating the specified variables for four differentsets, from left to right, Q20, Q60, V60 and CATH.

The empty black points represent the conformers with less than 50% secondary structure content, while the filled red dots represent the more structured conformers. The vertical dotted lines in the middle panels mark the simply stiff limits of stability for each case (see the S1 Text). The conformers to the left of this line are more volatile. The horizontal dashed lines in the middle and bottom panels mark off the top five conformers with respect to the value of Fmax.

https://doi.org/10.1371/journal.pcbi.1004541.g001

The two bottom rows of Fig 1 show scatter plots that compare the values of Fmax in Q60 to those in Q20 when represented as a function of ⟨z⟩ and . S4 Fig provides a continuation in which Fmax is plotted against the α-, β-, and turns (τ) content. It is clear that, for a given n, the mechanical stability is not related in any simple manner to either ⟨z⟩, CO or content. This is because typical high-force motifs include β-structured regions, while high ⟨z⟩ and can be achieved with α-structure and hydrogen-bonded turns [43]. S4 Fig shows that mechanical stability has no direct correlation to α-content or hydrogen-bonded turns (τ), and while most of the high β-content conformers lead to high forces, these can also be observed in cases with no β-content. Similar results are shown in the top panels of S5 Fig for the CO.

Furthermore, in the top left panel of Fig 2 there is a comparison of the distributions of Fmax for Q60 and Q20. We observe that although our BEMD simulations bias the chain towards the acquisition of , many conformers do not produce any articulated force peaks above the 10% noise level (Fmax = 0). In particular, Q20 presents (79 ± 2)% of this kind of conformers, while Q60 only contains (34 ± 3)% of them. This result is consistent with the experimental data [16], where no force peaks were detected in Q19, while some were found in Q62. Remarkably, although the diversity in mechanical stability for Q20 is smaller than for Q60, the frequency of independent structure generation is greater in the former (see S6 Fig), so its conformational polymorphism should be higher. The volatility of each conformer, as assessed by ⟨z⟩ lower than their threshold, also agrees with this result, with (49 ± 2)% volatile conformers in Q20 vs. (13 ± 2)% in Q60.

thumbnail
Fig 2. Distributions of Fmax for the studied species.

The top left panel shows the distribution for Q20 in a thick line. The conformations with no force peaks are not plotted in the histograms but contribute to normalization. The amount of such non-mechanostable conformers is (79 ± 2)% for Q20, (34 ± 3)% for Q60, (16.5 ± 0.2)% for V60, and (47 ± 3)% and (20.2 ± 0.5)% for CATH60 and CATH, respectively. The errors were computed using a bootstrapping method and the size of the error bar indicates the standard deviation.

https://doi.org/10.1371/journal.pcbi.1004541.g002

Taken together, these results show that Fmax is inherently different in Q20 and Q60 sets, even when is similar. This further points to Fmax not being related to hydrogen-bonded turns, α helices and even β-strand content and , and also neither to CO nor to ⟨z⟩.

Comparisons of Q60 to the remaining sets of structures

Fig 2 shows the normalized distributions of Fmax within the Q60, Q20, V60, CATH60, and CATH sets. The distributions do not show the peak at Fmax = 0, but its value is shown in the caption and contributes to the normalization. CATH60 is defined as structures containing between 57 and 63 residues and it contains 256 proteins. In order to exclude short peptides and most multidomain proteins we take CATH to represent all those proteins in the CATH database that are 40 to 250 residue long. This set comprises 5403 structures.

The characteristic forces grow on moving from Q20 to Q60. However, in all sets but Q20, the most probable Fmax is about the same, 1.2 ɛ/Å, but the shapes of the distributions differ. The distributions are comparably broad for CATH and CATH60 and comparably narrower for Q60 and V60, indicating the role of the stronger compositional homogeneity in the latter two sets. The rougher look of the distribution for Q60 is likely due to the one order of magnitude smaller statistics. Furthermore, it should be noted that, among the systems of about 60 residues, CATH60 leads to the biggest number of situations with no force peaks, (47 ± 3)%; and V60 to the smallest, (16.5 ± 0.2)%.

Despite the similarity of the distribution of the forces between Q60 and V60, the geometrical character of structures in the two sets are distinct. Fig 1 shows that V60 conformers are more compact and less elongated than Q60 or CATH60. Furthermore, this figure indicates that size and shape of a chain need not be correlated.

Fig 1 further shows that most of the structures in the 60-sized sets, and also for CATH, come with ⟨z⟩ between 5.5 and 8.5. However, the largest values of Fmax arise for ⟨z⟩ between 6.3 and 8.1 and many large ⟨z⟩ structures come with average or even small forces, including Fmax of 0. Similarly, Fig 1 also demonstrates that large may come with low or zero forces and large Fmax may arise when is at its lower range. Furthermore, the scattering of the data points in the FmaxCO plane shown in Fig. [38] also points in the direction of statistical independence. This observation further proves that there is no correlation between Fmax and ⟨z⟩, CO or . Interestingly, none of the independent V60 structures obtained from [4] have ⟨z⟩ below the volatility threshold, while (5.5 ± 1.4)% of the ones in CATH60 and (2.3 ± 0.2)% of CATH do. Our comparison reinforces the remarkable conclusion that Fmax is unrelated to CO, or ⟨z⟩ and extends it to general globular proteins instead of being a property specific for polyQ. This is further discussed in the S1 Text, where the independence of Fmax with the rest of parameters is proved.

Life span of the structures

In order to test whether the coordination number is actually related to temporal stability, we performed 10 ns free-dynamics simulations on 100 structures chosen randomly from each set: Q20, Q60 and V60. We have studied the time dependence of RMSD relative to the initial structure and the last time that it fluctuated below 2 Å was recorded for each conformer as its time of residence (tR). Similarly, we define the escape probability (Pe(t)) as the probability of leaving the initial conformation before time t.

Fig 3 shows the results of this study. The top panel shows that Q60 conformers last longer than Q20 in a specific state, while the average escape probability of V60 initially is lower but soon rises much faster than the other two sets. For completeness, we run the same study on three regular proteins: Trp-cage (PDB code 1L2Y, 20 residues long), an immunoglobulin binding domain of protein G (1GB1, 56 residues) and the 27th immunoglobulin domain of human cardiac titin (1TIT, 89 residues). All of them remained in the same conformation for longer than 10 ns: their RMSD was never higher than 2 Å.

thumbnail
Fig 3. Time evolution of the studied structures.

For each set in Q60, Q20 and V60, 100 randomly chosen structures have been placed under a free-dynamics evolution for 10 ns. After that, the RMSD has been studied and the last time when it fluctuates above 2 Å is recorded as the residence time (tR). The top graph shows the escape probability (Pe(t)), defined as the probability of having left the initial state of a conformer at time t. We can see how Q20 fluctuates out of the initial structure much faster than Q60, while V60 starts more slowly but rapidly outruns both Q60 and Q20. The inset shows the average evolution of the RMSD for the three sets compared to an example of a similar-sized globular protein, an immunoglobulin binding domain of protein G (PDB code 1GB1, 56 residues). The latter lasts for longer than 10 ns fluctuating around 2 Å, while the other three rapidly evolve out of the initial structure. The bottom graphs show scatter plots of ⟨zvs. tR. No simple relation can be established between these two quantities above the stiff limit (dashed vertical lines), while below it residence times never exceed 1 ns.

https://doi.org/10.1371/journal.pcbi.1004541.g003

The bottom panels of Fig 3 show scatter plots of tR vs. ⟨z⟩. This figure shows that ⟨z⟩ is not only unrelated to Fmax, but also to the temporal stability of the conformers (as measured by tR) in the cases where ⟨z⟩ is above the simply stiff limit. In the case of conformers with ⟨z⟩ below this limit, however, tR is always below 1 ns, reinforcing Maxwell’s theory on frame stiffness [35].

Interestingly, both theoretical and especially experimental pulling experiments are typically done at vp’s such that the time the protein is being pulled is far longer than 10 ns. In particular, the pulling simulations performed in this work take ≈ 50 μs to completely extend a protein with 60 residues, while experiments such as the ones performed in [16] take around 60 ms to accomplish the same task. This leads to question whether the force peaks present in the experimental traces really relate to the initial conformers or have actually been formed while the molecule was being pulled.

Therefore, one must look at Fmax carefully since it has different meaning in this kind of simulation than in experiments such as those in [16]: Here, mechanical stability is associated directly with a conformer, since simulations are based on the initial contact map. On the other hand, in experiments, molecules are subjected to fluctuations with a characteristic time of 1 ns and the Fd curves carry information not only about the initial conformer but also about the stretching-unrelated intrinsic shape transformations that the protein may undergo. All in all, we observe that disordered proteins such as polyglutamines are not long lasting when compared to structured globular ones, and that mechanical stabilities need to be looked at in the context of how they were measured, either referred to the initial conformer if done through structure-based modelling, or including bond formation during the stretching if performed experimentally.

Structures with knots

Even though the starting structures were not knotted, our BEMD simulation yielded some knotted conformers. In particular, (9.3 ± 1.8)% of the independent Q60 conformers have a knot, while Q20 include no knotted conformers. Moreover, only (3.6 ± 0.5)% of the V60 structures contain a knot, and none of the CATH structures have one. All knots generated in V60 are trefoil (31), while Q60 also contains one three-twist (52). Upon stretching, only (13 ± 7)% of the Q60 knotted structures untie, while (45 ± 6)% of the V60 ones do. As shown in ref. [44], tightening of knots may be associated with force peaks. Both for Q60 and V60, knot tightening yields Fmax from 0.9 to 2.4 ɛ/Å.

Fig 4 shows an example of a 31 knotted conformer found in Q60 plus a histogram of the ends of the knots (k, k+) and their extension (Δk, measured as the number of residues contained inside the knot) in the structures formed by sets Q60 and V60. An example of a 52 one from Q60 and a 31 from V60 are shown in S7 Fig Significantly, not only does V60 form fewer and less stable knots than Q60, but also the extension of the V60 knots is typically larger than the Q60 ones. Furthermore, the average extension of the knotted Q60 conformers is 36 (with a 0.12% error), which corresponds to the median threshold value for most polyglutamine-expansion-related diseases such as HD. We note that knotted structures would have been found experimentally as putative events in [16], since the final length would be reduced and thus they would render molecules with lower total contour length increase. Furthermore, knotted proteins have previously been found in nature especially in enzymes such as methyltransferases and carbonic anhydrases [4547]. Nonetheless, the only hypothesized function of the knot itself—as opposed to the whole protein—is to prevent the unfolding of the protein in a case where the proteasome were to try to degrade it [46].

thumbnail
Fig 4. Knots in the studied conformers.

The top left panel shows an example of a Q60 conformation containing a trefoil (31) knot with the knot ends highlighted with yellow spheres. To its right, the same conformation has been partially stretched, and the region inside the knot is highlighted in red and zoomed in. The middle panels represent histograms of the knot end positions, k±, for Q60 (left) and V60 (right). The bottom panel shows their corresponding extension, Δk. The percentage of knotted structures relative to to the total number of independent conformers found for Q60 and V60 are (9.3 ± 1.8)% and (3.6 ± 0.5)%, respectively. Shallow knots have an extension closer to 60 (the system size). Protein representations have been done with VMD [48].

https://doi.org/10.1371/journal.pcbi.1004541.g004

The presence of knots on its own is not indicative of their relevance: they need to last long enough to be able to have any effect. To that end, we performed 200 ns all-atom simulations with explicit TIP3P water of three randomly chosen knotted conformers to see the behaviour of these knots with time. The knot ends fluctuate along the protein as does the knot size, and in some cases the knot unties just to be formed again some time later –the time of the protein being in the untied conformation lasting for as long as 200 ps. Also, in two of the three cases, preferred places for the left and right ends can be seen, the right end being the same for both of them. The results of these simulations can be seen in S8 Fig.

Other lengths in polyQ chains

Given that the average extension of the knots corresponds with the median of the threshold of the polyQ diseases, we applied the same methodology to other Qn tracts, with n = 16, 25, 33, 38, 40 and 80. As expected, no knots were found for n < 35; but there were no knots in sets Q38, Q40 or Q80 either. This may be attributed to a low probability of knot formation combined with small statistics, which would imply that BEMD took Q60 through a knot-forming path while taking the rest of Qn studied through non-forming ones. This is reinforced by the fact that the greater statistics of V60 do find knotted conformers. Therefore, an increase in the sampling may catch these knotted structures in Q80 and Q40, while their formation is fairly improbable for n below 35 since the typical knot size is about this length.

Fig 5 shows the evolution of the mechanical stability and shape with the chain length. In particular, the fraction of conformers with Fmax > 0, which we name χF, follow a logarithmic law, while the maximum Fmax for each set, denoted as behaves like an avalanche system: it has a constant value until n = 33, and then starts growing as a power law with exponent 0.562. The average Rg appears to be saturating as n approaches 40, but it suddenly jumps for n = 60 and 80. Judging by w, the shapes of the conformers change around n = 35 from elongated to more globular. Interestingly, V60 behaves differently than Q60 except that the average w (lower right panel) is similar, suggesting the similarity of shapes. We also conclude that the fraction of mechanically stable conformers increases uniformly with n, while the maximum Fmax presents an avalanche behaviour for n > 30, once again close to HD’s threshold of 35.

thumbnail
Fig 5. Variability of the specified parameters with the length, n, of the polyQ chain (circles).

The values for V60 are indicated by a square. χF represents the fraction of conformers with at least one force peak for that particular length. The dotted fits correspond to a logarithmic function (top left.352 ln(x/8.115)) and a polynomial behavior (top right, y = 0.236x0.562), which is typical for avalanches. The bottom panels show average over the structures of Rg and w. ⟨Rg⟩ has a saturating behavior up to n = 40, but jumps for higher values. ⟨w⟩ presents a transition around n = 35 from slightly elongated to more globular proteins.

https://doi.org/10.1371/journal.pcbi.1004541.g005

Discussion

In this study, we have generated an ensemble of structurally independent conformers for glutamine expansions with n residues. We have focused on n = 60 which, if present in huntingtin protein, would result in Huntington disease, and on n = 20, which would not.

We have then expanded the study to n = 16, 25, 30, 33, 38, 40 and 80 in order to further explore the structural nature of the n ≈ 35 threshold in most polyQ-related diseases.

We find that proteins related to the disease exhibit less conformational polymorphism than the ones unrelated to it in terms of independent structures and transition kinetics, even though the former show much more mechanical variability (in terms of Fmax and np) as well as structural (measured by and ⟨z⟩). We also conclude that, contrary to intuition, ⟨z⟩, CO, and β-content are not good predictors of either temporal or mechanical stability. This conclusion extends not only for polyQ but also generally for all proteins in CATH.

Finally, we prove the presence of knots of length 35 at least in Q60. The sequential size of these knots suggests a relationship to HD. One of the possible mechanisms for the relevance of the knots in pathology is impairing the process of proteasomal degradation, as suggested in [46] and [16]. Moreover, although there is evidence for the toxicity of the monomeric polyQ species [49], even if the toxicity was due mainly to the oligomers (see e.g. Ref. [50]), the blockade of the degradation machinery by a knotted monomer would induce an increase of the concentration of aggregating protein, and thus toxicity may be caused by the monomers even if they are not toxic themselves.

Supporting Information

S1 Text. The details of structure generation and selection are explained here, together with the stability associated to the coordination number.

Furthermore, a statistical analysis of the lack of relation between Fmax and the rest of the descriptors used in this work is also presented.

https://doi.org/10.1371/journal.pcbi.1004541.s001

(PDF)

S1 Fig. Five conformers with highest mechanical stability in set Q60.

The structure with the biggest Fmax is at the top. The left column shows snapshots of the structures. The red ribbons represent β strands and the red lines correspond to β bridges. The black lines indicate hydrogen-bonded turns. The orange spheres mark the termini, from which the molecule is pulled. The center column displays the unfolding Fd curve (left axis) together with the unfolding scenario diagram (right axis), i.e. the time a contact is broken vs. the distance between the residues that are in contact. The column on the right shows the values of the relevant descriptors. All molecule cartoons were generated using VMD [48].

https://doi.org/10.1371/journal.pcbi.1004541.s002

(TIF)

S2 Fig. Five conformers with highest mechanical stability in set Q20.

The structure with the biggest Fmax is at the top. The left column shows snapshots of the structures. The red ribbons represent β strands and the red lines correspond to β bridges, while blue helices are α helices. The black lines indicate hydrogen-bonded turns. The center column displays the unfolding Fd curve (left axis) together with the unfolding scenario diagram (right axis). The column on the right shows the values of the relevant descriptors.

https://doi.org/10.1371/journal.pcbi.1004541.s003

(TIF)

S3 Fig. Five conformers with highest mechanical stability in set V60.

The structure with the biggest Fmax is at the top. The left column shows snapshots of the structures. The red ribbons represent β strands and the red lines correspond to β bridges. The black lines indicate hydrogen-bonded turns and α-helices are depicted in blue. The center column displays the unfolding Fd curve (left axis) together with the unfolding scenario diagram (right axis). The column on the right shows the values of the relevant descriptors.

https://doi.org/10.1371/journal.pcbi.1004541.s004

(TIF)

S4 Fig. Scatter plot of Fmax vs. α, β and hydrogen-bonded turns (τ) content for polyQ chains.

The horizontal dashed lines mark off the top five values of Fmax.

https://doi.org/10.1371/journal.pcbi.1004541.s005

(TIF)

S5 Fig. Scatter plot of Fmax vs. CO for the specified sets.

The horizontal dashed lines mark off the top five values of Fmax.

https://doi.org/10.1371/journal.pcbi.1004541.s006

(TIF)

S6 Fig. Kinetics of independent structure formation for Q60 (circles), V60 (triangles) and Q20 (diamonds).

Although more complete plots should be fit with a double exponential function [4], short trajectories correspond to a linear behavior. The fitted slopes are .28, .62 and .98 respectively. Data for V60 were taken from [4].

https://doi.org/10.1371/journal.pcbi.1004541.s007

(TIF)

S7 Fig. Examples of knotted structures.

The top structure corresponds to a three-twist (52) knot in Q60, while the lower panels are for a trefoil knot from V60, where no other knots were found. Left column shows a representation of the molecule before stretching, with the knot ends highlighted with yellow spheres. Right panels show the molecules partially stretched, and the region inside the knot is highlighted in red and zoomed in.

https://doi.org/10.1371/journal.pcbi.1004541.s008

(TIF)

S8 Fig. Time evolution of the knots.

Three randomly chosen knotted conformers were simulated with all-atom and explicit solvent. One of them is shown in Fig 4. The top panel shows the evolution of the knot size with time for one of the simulations. The middle panel shows a histogram of the knot sizes along this time for the three simulations, each with a different color. The bottom panel shows a histogram of the respective knot ends, the left end (k, inverted) and the right ones (k+).

https://doi.org/10.1371/journal.pcbi.1004541.s009

(TIF)

S9 Fig. An example of the sieve and time clustering stages.

The gray line in the top panel shows evolution of with time for one of the replicas. Structures with > 30% (the thin horizontal line) are taken for clustering. A cluster ends whenever the gap between successive structured conformers becomes greater than 50 ps. The black dots correspond to structures that represent clusters: these are the structures with the highest in the cluster. The red box in the top panel is shown zoomed in the middle panel, where clusters are represented by red lines. The bottom panel shows the RMSD of each cluster representative relative to the previous one. All of these RMSD’s are greater than 2 Å so the clusters can be considered to be uncorrelated in time.

https://doi.org/10.1371/journal.pcbi.1004541.s010

(TIF)

S10 Fig. Color-map plots of the difference between the joint CDF and the product of the independent CDFs of Fmax and the specified descriptor.

Differences are always below 0.1, and below 0.05 in three of the descriptors (, CO and τ). Therefore, Fmax is statistically independent of the descriptors studied.

https://doi.org/10.1371/journal.pcbi.1004541.s011

(TIF)

S1 Table. Parameters of a linear regression for the dependence of Fmax on various structural descriptors.

The top panel lists the values of the Pearson R2 coefficients with a 95% confidence interval. The lower panel lists the slopes of the linear fits together with the error bars. The number in the parenthesis is the corresponding p-value. Even though the slope for each correlation is significantly different from zero; R2 is never close to one, so no correlation can be established between the descriptors and Fmax. This is also assessed in S10 Fig.

https://doi.org/10.1371/journal.pcbi.1004541.s012

(PDF)

Acknowledgments

We thank many critical discussions with P. Cossio, R. Hervás, A. Rodríguez and J.M. Ortiz. We also acknowledge the contribution of TeideHPC and SVG High-Performance Computing facilities. TeideHPC facilities are provided by the Instituto Tecnolgico y de Energas Renovables (ITER), S.A., www.iter.es. SVG HPC facilities are provided by the Galician Supercomputing Center (CESGA), www.cesga.es.

Author Contributions

Conceived and designed the experiments: ÀGS MC MCV. Performed the experiments: ÀGS MS. Analyzed the data: ÀGS MC. Contributed reagents/materials/analysis tools: MS MC. Wrote the paper: ÀGS MC MCV MS.

References

  1. 1. Chothia C, Finkelstein AV. The classification and origins of protein folding patterns. Annual Review of Biochemistry. 1990;59(1):1007–1035. pmid:2197975
  2. 2. Chothia C. One thousand families for the molecular biologist. Nature. 1992 Jun;357(6379):543–544. pmid:1608464
  3. 3. Sillitoe I, Cuff AL, Dessailly BH, Dawson NL, Furnham N, Lee D, et al. New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures. Nucleic Acids Research. 2013;41(D1):D490–D498. Available from: http://nar.oxfordjournals.org/content/41/D1/D490.abstract. pmid:23203873
  4. 4. Cossio P, Trovato A, Pietrucci F, Seno F, Maritan A, Laio A. Exploring the universe of protein structures beyond the Protein Data Bank. PLoS Comput Biol. 2010;6(11):e1000957. pmid:21079678
  5. 5. Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins: Structure, Function, and Bioinformatics. 2004;57(4):702–710.
  6. 6. Magrane M, Consortium U. UniProt Knowledgebase: a hub of integrated protein data. Database. 2011;2011. pmid:21447597
  7. 7. Nasir J, Floresco SB, O’Kusky JR, Diewert VM, Richman JM, Zeisler J, et al. Targeted disruption of the Huntington’s disease gene results in embryonic lethality and behavioral and morphological changes in heterozygotes. Cell. 1995;81(5):811—823. pmid:7774020
  8. 8. Zuccato C, Ciammola A, Rigamonti D, Leavitt BR, Goffredo D, Conti L, et al. Loss of Huntingtin-Mediated BDNF Gene Transcription in Huntington’s Disease. Science. 2001;293(5529):493–498. Available from: http://www.sciencemag.org/content/293/5529/493.abstract. pmid:11408619
  9. 9. Velier J, Kim M, Schwarz C, Kim TW, Sapp E, Chase K, et al. Wild-Type and Mutant Huntingtins Function in Vesicle Trafficking in the Secretory and Endocytic Pathways. Experimental Neurology. 1998;152(1):34—40. Available from: http://www.sciencedirect.com/science/article/pii/S0014488698968327. pmid:9682010
  10. 10. Petruska J, Hartenstine MJ, Goodman MF. Analysis of Strand Slippage in DNA Polymerase Expansions of CAG/CTG Triplet Repeats Associated with Neurodegenerative Disease. Journal of Biological Chemistry. 1998;273(9):5204–5210. pmid:9478975
  11. 11. Ross CA. Polyglutamine Pathogenesis: Emergence of Unifying Mechanisms for Huntington’s Disease and Related Disorders. Neuron. 2002;35(5):819—822. pmid:12372277
  12. 12. Pla P, Orvoen S, Saudou F, DAVID DJ, Humbert S. Mood disorders in Huntington’s disease: from behavior to cellular and molecular mechanisms. Frontiers in Behavioral Neuroscience. 2014;8(135). pmid:24795586
  13. 13. Fan HC, Ho LI, Chi CS, Chen SJ, Peng GS, Chan TM, et al. Polyglutamine (PolyQ) Diseases: Genetics to Treatments. Cell Transplantation. 2014-04-09T00:00:00;23(4–5):441–458. Available from: http://www.ingentaconnect.com/content/cog/ct/2014/00000023/F0020004/art00006. pmid:24816443
  14. 14. Albrecht A, Mundlos S. The other trinucleotide repeat: polyalanine expansion disorders. Current Opinion in Genetics & Development. 2005;15(3):285—293. Genetics of disease. Available from: http://www.sciencedirect.com/science/article/pii/S0959437X05000559.
  15. 15. Amiel J, Trochet D, Clément-Ziza M, Munnich A, Lyonnet S. Polyalanine expansions in human. Human Molecular Genetics. 2004;13(suppl 2):R235–R243. Available from: http://hmg.oxfordjournals.org/content/13/suppl_2/R235.abstract. pmid:15358730
  16. 16. Hervás R, Oroz J, Galera-Prat A, Goñi O, Valbuena A, Vera AM, et al. Common features at the start of the neurodegeneration cascade. PLoS Biol. 2012;10(5):e1001335. pmid:22666178
  17. 17. Piana S, Laio A. A bias-exchange approach to protein folding. The Journal of Physical Chemistry B. 2007 May;111(17):4553–4559. pmid:17419610
  18. 18. Sułkowska JI, Cieplak M. Mechanical stretching of proteins—a theoretical survey of the Protein Data Bank. Journal of Physics: Condensed Matter. 2007;19(28):283201.
  19. 19. Sikora M, Sułkowska JI, Cieplak M. Mechanical strength of 17,134 model proteins and cysteine slipknots. PLoS Comput Biol. 2009 Oct;5(10):e1000547. pmid:19876372
  20. 20. Ferreon ACM, Moran CR, Gambin Y, Deniz AA. Single-molecule fluorescence studies of intrinsically disordered proteins. Methods in Enzymology. 2010;472:179–204. pmid:20580965
  21. 21. Hess B, Kutzner C, van der Spoel D, Lindahl E. GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. Journal of Chemical Theory and Computation. 2008;4(3):435–447.
  22. 22. Bonomi M, Branduardi D, Bussi G, Camilloni C, Provasi D, Raiteri P, et al. PLUMED: A portable plugin for free-energy calculations with molecular dynamics. Computer Physics Communications. 2009;180(10):1961–1972. Available from: http://www.sciencedirect.com/science/article/pii/S001046550900157X.
  23. 23. Cornell WD, Cieplak P, Bayly CI, Gould IR, Merz KM, Ferguson DM, et al. A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. Journal of the American Chemical Society. 1995;117(19):5179–5197.
  24. 24. Qiu D, Shenkin PS, Hollinger FP, Still WC. The GB/SA continuum model for solvation. A fast analytical method for the calculation of approximate Born radii. The Journal of Physical Chemistry A. 1997;101(16):3005–3014.
  25. 25. Pietrucci F, Laio A. A Collective Variable for the Efficient Exploration of Protein Beta-Sheet Structures: Application to SH3 and GB1. Journal of Chemical Theory and Computation. 2009;5(9):2197–2201.
  26. 26. Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen My, et al. In: Comparative Protein Structure Modeling Using Modeller. John Wiley & Sons, Inc.; 2002. Available from: http://dx.doi.org/10.1002/0471250953.bi0506s15.
  27. 27. Hestenes MR, Stiefel E. Methods of conjugate gradients for solving linear systems. vol. 49. National Bureau of Standards Washington, DC; 1952.
  28. 28. Miettinen MS, Knecht V, Monticelli L, Ignatova Z. Assessing polyglutamine conformation in the nucleating event by molecular dynamics simulations. The Journal of Physical Chemistry B. 2012;116(34):10259–10265. pmid:22770401
  29. 29. Khare SD, Ding F, Gwanmesia KN, Dokholyan NV. Molecular origin of polyglutamine aggregation in neurodegenerative diseases. PLoS Comput Biol. 2005;1(3):230–235. pmid:16158094
  30. 30. Laghaei R, Mousseau N. Spontaneous formation of polyglutamine nanotubes with molecular dynamics simulations. The Journal of Chemical Physics. 2010;132(16):165102. pmid:20441310
  31. 31. Cieplak M, Allan DB, Leheny RL, Reich DH. Proteins at Air–Water Interfaces: A Coarse-Grained Model. Langmuir. 2014;30:1288–96. Available from: http://dx.doi.org/10.1021/la502465m.
  32. 32. Sikora M, Szymczak P, Thompson D, Cieplak M. Linker-mediated assembly of gold nanoparticles into multimeric motifs. Nanotechnology. 2011;22(44):445601. pmid:21979426
  33. 33. Joosten RP, te Beek TAH, Krieger E, Hekkelman ML, Hooft RWW, Schneider R, et al. A series of PDB related databases for everyday needs. Nucleic Acids Res. 2011 Jan;39(Database issue):D411–D419. pmid:21071423
  34. 34. Tsai J, Taylor R, Chothia C, Gerstein M. The packing density in proteins: standard radii and volumes. Journal of Molecular Biology. 1999;290(1):253–266. Available from: http://www.sciencedirect.com/science/article/pii/S0022283699928292. pmid:10388571
  35. 35. Maxwell JC. L. on the calculation of the equilibrium and stiffness of frames. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Sciences. 1864;27(182):294–299.
  36. 36. Cieplak M, Robbins MO. Nanoindentation of 35 virus capsids in a molecular model: relating mechanical properties to structure. PloS One. 2013;8(6):e63640. pmid:23785395
  37. 37. Sułkowska JI, Cieplak M. Selection of optimal variants of Gō-like models of proteins through studies of stretching. Biophysical Journal. 2008;95(7):3174–3191. pmid:18567634
  38. 38. Plaxco KW, Simons KT, Baker D. Contact order, transition state placement and the refolding rates of single domain proteins. Journal of Molecular Biology. 1998;277(4):985–994. pmid:9545386
  39. 39. Kesner BA, Ding F, Temple BR, Dokholyan NV. N-terminal strands of filamin Ig domains act as a conformational switch under biological forces. Proteins: Structure, Function, and Bioinformatics. 2010;78(1):12–24.
  40. 40. Cieplak M, Hoang TX. Universality classes in folding times of proteins. Biophysical Journal. 2003;84(1):475–488. pmid:12524300
  41. 41. Berkovich R, Garcia-Manyes S, Klafter J, Urbakh M, Fernández JM. Hopping around an entropic barrier created by force. Biochemical and Biophysical Research Communications. 2010;403(1):133–137. pmid:21050839
  42. 42. Rief M, Gautel M, Oesterhelt F, Fernandez JM, Gaub HE. Reversible Unfolding of Individual Titin Immunoglobulin Domains by AFM. Science. 1997;276(5315):1109–1112. pmid:9148804
  43. 43. Carrion-Vazquez M, Oberhauser AF, Fowler SB, Marszalek PE, Broedel SE, Clarke J, et al. Mechanical and chemical unfolding of a single protein: A comparison. Proceedings of the National Academy of Sciences USA. 1999;96(7):3694–3699.
  44. 44. Sułkowska JI, Sułkowski P, Szymczak P, Cieplak M. Tightening of knots in proteins. Physical Review Letters. 2008;100(5):058106. pmid:18352439
  45. 45. Taylor WR. A deeply knotted protein structure and how it might fold. Nature. 2000;406(6798):916–919. pmid:10972297
  46. 46. Virnau P, Mirny LA, Kardar M. Intricate knots in proteins: Function and evolution. PLoS Computational Biology. 2006;2(9):e122. pmid:16978047
  47. 47. Sułkowska JI, Rawdon EJ, Millett KC, Onuchic JN, Stasiak A. Conservation of complex knotting and slipknotting patterns in proteins. Proceedings of the National Academy of Sciences USA. 2012;109(26):E1715–E1723.
  48. 48. Humphrey W, Dalke A, Schulten K. VMD: visual molecular dynamics. Journal of Molecular Graphics. 1996;14(1):33–38. pmid:8744570
  49. 49. Nagai Y, Inui T, Popiel HA, Fujikake N, Hasegawa K, Urade Y, et al. A toxic monomeric conformer of the polyglutamine protein. Nature Structural & Molecular Biology. 2007;14(4):332–340.
  50. 50. Ripaud L, Chumakova V, Antonin M, Hastie AR, Pinkert S, Körner R, et al. Overexpression of Q-rich prion-like proteins suppresses polyQ cytotoxicity and alters the polyQ interactome. Proceedings of the National Academy of Sciences. 2014;111(51):18219–18224.