OVG conceived and designed the experiments. SOG and MYL performed the experiments. OVG, SOG, and MYL analyzed the data. OVG wrote the paper.
The authors have declared that no competing interests exist.
The determination of factors that influence protein conformational changes is very important for the identification of potentially amyloidogenic and disordered regions in polypeptide chains. In our work we introduce a new parameter, mean packing density, to detect both amyloidogenic and disordered regions in a protein sequence. It has been shown that regions with strong expected packing density are responsible for amyloid formation. Our predictions are consistent with known disease-related amyloidogenic regions for eight of 12 amyloid-forming proteins and peptides in which the positions of amyloidogenic regions have been revealed experimentally. Our findings support the concept that the mechanism of amyloid fibril formation is similar for different peptides and proteins. Moreover, we have demonstrated that regions with weak expected packing density are responsible for the appearance of disordered regions. Our method has been tested on datasets of globular proteins and long disordered protein segments, and it shows improved performance over other widely used methods. Thus, we demonstrate that the expected packing density is a useful value with which one can predict both intrinsically disordered and amyloidogenic regions of a protein based on sequence alone. Our results are important for understanding the structural characteristics of protein folding and misfolding.
Protein folding is one of the most challenging issues in biophysical science. During the past few years it has been shown that some diseases are connected with protein misfolding and the formation of insoluble aggregates called amyloid plaques. These processes may be associated with several diseases such as Alzheimer disease, Parkinson disease, Creutzfeldt-Jacob disease, and even certain forms of cancer. It has been shown that proteins with intrinsically disordered regions are involved in protein–protein or protein–nucleic acid interactions. The main objective of this paper is to report insights into the molecular mechanisms of amyloid aggregation. This has been done using the parameter of the observed number of contacts for each amino acid residue in globular state, further called expected packing density. By analysis of sequences alone, the authors have demonstrated that regions that possess strong expected packing density can be responsible for amyloidogenic properties of a protein, while regions with weak expected packing density correspond to disordered regions. A new concept is proposed that could aid in understanding protein folding, misfolding, and amyloidosis. The results help to explain that the nature of the amyloidogenic propensity of proteins is connected to their amino acid sequences that are able to form a large number of contacts.
Amyloid fibril formation is associated with an increase of β structure content, which leads to fibrillar aggregation [
The experimental observation that not all proteins are amyloidogenic (or at least that some proteins are less amyloidogenic than others) and that specific continuous regions of amyloid-forming proteins are more amyloidogenic than others suggests that there is a sequence propensity for amyloid formation. Moreover, the observation that some short peptides also can form amyloids implies that these segments, which usually are exposed to the environment, can nucleate the transition of native proteins into the amyloid state, and suggests that fibril formation is sequence-specific [
Some intrinsically disordered proteins are involved in amyloid diseases (type II diabetes, Alzheimer disease, and Parkinson disease). This fact may indicate that disorder is a necessary condition for aggregation. It has been shown that a very small change in the environment of such proteins often might cause their partial folding and aggregation [
Uversky and Fink in their review [
The first high-resolution (1 Å) crystal of an amyloid fiber formed by a sequence-designed polypeptide has been obtained [
There are several computational methods for predicting a protein's propensity for amyloid fibril formation. In the work of Fernandez et al. [
A computational algorithm has been suggested that detects the nonnative (hidden) β strand propensity of sequences by consideration of the relationships between protein local sequence and secondary structure in terms of tertiary contacts [
Based on the physico–chemical properties of β aggregation sequences and a computational algorithm, a model was developed for predicting the aggregation rate for a broad range of polypeptide chains [
On the other hand, there is a method for the prediction of amyloidogenic regions from amino acid sequence alone [
Recently, a new method for identifying fibril-forming segments of proteins has been suggested [
It should be added that molecular dynamics can yield valuable information about the structural changes that arise at the atomic level upon the formation of amyloid fibrils [
Another interesting new method (named PASTA) is based on sequence-specific interaction energies between pairs of protein fragments calculated from statistical analysis of the native folds of globular proteins
The formation of a sufficient number of interactions is necessary to compensate for the loss of conformational entropy during the protein folding process. Therefore, the structural uniqueness of native proteins is a result of the balance between the conformational entropy and the energy of residue interactions. It seems that disordered regions in a protein chain do not have a sufficient number of interactions to compensate for the loss of conformational entropy that results from the formation of a globular state. On the other hand, a large increase in the energy of interactions will lead to a loss of the unique structure because the strengthening of contact energy will speed up folding, but it is also likely to lead to erroneous folds (for example, to amyloid fibrils).
It has been suggested that the lack of a rigid globular structure under physiological conditions might represent a considerable functional advantage for intrinsically disordered proteins, as their large plasticity allows them to interact efficiently with several different targets, as compared with a folded protein with limited conformational flexibility [
Despite considerable efforts to understand the mechanism, it is still unclear what is responsible for amyloidogenic and disordered regions. The goal of this work is to test our hypothesis about whether protein regions that possess expected strong packing density are responsible for the amyloidogenic properties of proteins, while regions with weak packing density simultaneously are responsible for the appearance of disordered regions. We introduce a new parameter, namely mean packing density (number of residues within the given distance from the considered residue), which enables the prediction of both amyloidogenic and intrinsically disordered regions from protein sequence. These findings support the concept that the occurrence of amyloidogenic and intrinsically disordered regions has a similar basis in different peptides and proteins.
To calculate the packing density observed in protein structures, we have constructed two databases of protein structures. The first one [
Mean Observed Packing Density for 20 Amino Acid Residues (and Errors in the Determination of the Average) Obtained Using a Contact Radius of 8.0 Å from Database 25%
The expected packing densities were averaged over a sliding window, and a packing density profile was produced (see
To obtain a threshold for our predictions, we took a database of six-residue peptides, some of which were fibril formers and some of which were fibril nonformers [
The symbols correspond to values chosen as thresholds.
We collected a database of all known proteins and peptides that are associated with amyloid diseases, and for which the position of amyloidogenic regions is now experimentally examined (see
Comparison of Prediction of Amyloidogenic Regions Using Contact Density Scale and Varying Size of Sliding Window (Scale Obtained from Database 25%)
Varying the size of the sliding window (three, five, seven, and nine residues), we constructed a packing density profile for each of these proteins and peptides. We predicted a region as amyloidogenic if expected packing density for the region (with size equal or greater than size of the window) is above the considered threshold. Our hypothesis is that regions with strong expected packing density should correspond to aggregation regions, which presumably intersect with amyloidogenic regions of proteins. The number of predicted amyloidogenic regions are presented in
We constructed a packing density profile using a sliding window of seven residues for each of the proteins and peptides considered here. The experimentally observed amyloidogenic regions and the predicted ones are presented in
Predicted versus Experimentally Observed Amyloid-Forming Regions in Amyloidogenic Proteins and Peptides
In Alzheimer disease, τ-protein forms neurofibrillary tangles, which are bundles of paired helical filaments. A single region (amino acid residues 306−311), which is shown experimentally to be amyloidogenic [
Despite a large body of experimental data related to the search for amyloidogenic regions in human prion protein, it is difficult to determine which regions these are. It has been shown that helix 1 (residues 144–153) of human prion protein (PrP) plays a critical role in the amyloidogenic process [
Most mutations described in apolipoprotein A (ApoA) are within the N-terminal portion of the protein (residues 1–93), which represents the proteolysis fragment that is incorporated into amyloid deposits [
The experimentally found amyloidogenic fragment of lysozyme (residues 49–64), which has been specifically implicated in amyloidogenic conversion [
The most amyloidogenic peptide fragments from transthyretin (TTR) have been demonstrated in two regions: residues 10–19, which encompass the A strand of the inner β sheet structure that readily forms amyloid fibrils when dissolved in water at low pH [
It has been found experimentally that the following sequences play a dominant role in the amyloidogenesis of β2-microglobulin: residues 20–41 [
Reactive (or secondary) amyloidosis is characterized by the extracellular deposition of amyloid fibrils containing predominantly amyloid A protein (AA), which is a proteolytically derived fragment of serum amyloid A (SAA) protein. The N-terminus of amyloid A protein (residues 1–11 of AA protein) was shown to be the amyloidogenic part of the molecule [
Medin is the main constituent of the aortic medial amyloid. It is derived from a proteolytic fragment of lactadherin, a mammary epithelial cell–expressed glycoprotein that is secreted as part of the milk fat globule membrane. It was previously demonstrated that an octapeptide fragment of medin (residues 42–49, NFGSVQFV) forms typical well-ordered amyloid fibrils [
It has been shown that residues 16–20 in amyloid β (Aβ) peptide are essential for the peptide's polymerization [
It has been shown that a fragment (residues 20–27) from amylin (also called human islet amyloid protein or hIAPP) is amyloidogenic and cytotoxic [
Alpha-synuclein is a major component of Lewy bodies in Parkinson disease and is found to be associated with several other forms of dementia. The central fragment of α-synuclein (35 residues long), which has been isolated from purified amyloid of Alzheimer disease brains, [
It has been shown that a peptide consisting of residues 15–19 of the human hormone calcitonin forms highly ordered fibrils, which are similar to those formed by the entire hormone sequence [
Our predicted regions are consistent with known disease-related regions for eight of 12 experimentally well-studied amyloidogenic peptides and proteins (transthyretin, β2-microglobulin, lysozyme, prion protein, and others). This result strongly indicates that the aggregation capability of a protein chain is one of the common properties of amyloid fibrils. Moreover, it should be noted that regions with high packing density are often surrounded by amino acids that disrupt their amyloidogenic capability, regions with weak expected packing density, that is, amyloid breakers.
Here we also tested the ability of two other scales, hydrophobicity [
Comparison of Prediction of Amyloidogenic Regions Using Different Scales
To test the quality of our predictions of intrinsically disordered regions in proteins, we have used two databases, of which one has 427 intrinsically disordered proteins and regions [
Each ROC curve corresponds to predictions with specified (on the legend) size of the sliding window. The open circle corresponds to the value of packing density that is chosen as a threshold, 20.5 for database 25% (A) and 20.4 for database 80% (B).
To test the quality of predictions obtained by our method compared with other methods of prediction of disordered regions such as IUPred [
Performance of Disorder Prediction Methods on Datasets of Globular Proteins (559 Proteins) and Long Disordered Protein Segments (129 Proteins) [
We demonstrate that expected packing density is a useful value for the prediction of both intrinsically disordered and amyloidogenic regions of a protein based only on its sequence. In
Arrows indicate upper and lower thresholds obtained from the ROC curves (see
Structures of peptides such as NNQQNY (derived from Sup35 protein [
If amyloid fibril formation is a generic feature of proteins [
We tried to collect all known amyloidogenic proteins and peptides for which disease-related regions are experimentally localized. By analysis of primary structure alone, we have demonstrated that regions that possess strong expected packing density can be responsible for the amyloidogenic properties of a protein, while regions with weak expected packing density correspond to disordered regions. A new concept is proposed that could aid in the understanding of protein folding, misfolding, and amyloidosis.
Our study provides new insights into the process of amyloid formation. The results help to explain that the nature of the amyloidogenic propensity of proteins is related to their amino-acid sequences that are able to form a large number of contacts. Our results can help determine the amyloidogenic propensity of amyloidogenic proteins for which the position of amyloidogenic regions now remains unexplored experimentally.
The set of protein structures used for calculation of the packing density observed in protein structures was obtained by inspection of the SCOP (Structural Classification of Proteins) [
It is worthwhile to emphasize that the order of the residues may play an important role in protein folding and may account for regions with weak and strong packing density in a protein structure. To predict such regions in a protein, we construct a profile of the expected packing density for the protein sequence. The calculations are based on a sliding window-averaging technique. For each peptide and protein, in the prediction of amyloidogenic regions the sliding window size is varied from three to nine residues while the sliding window size is 11 (or 41) residues in the case of intrinsically disordered regions prediction. The packing density profile is calculated as follows. First, the expected packing density is determined for each residue (see
To evaluate the accuracy of, and confidence in, our method of predicting amyloidogenic regions, a database of 67 peptides that are six-residue fibril formers and 91 peptides that are six-residue fibril nonformers was used [
To obtain the quality of predictions and to determine thresholds, we calculated true positive and false positive rates and made so-called receiver operator characteristic (ROC) curves. In predictions of intrinsically disordered regions, the true positive rate was calculated as the fraction of residues predicted as intrinsically disordered over the intrinsically disordered set of residues; the false positive rate was the fraction of predicted intrinsically disordered residues over the set of folded residues. Similarly, in the case of six-residue peptides that were fibril formers, the true positive rate was calculated as the fraction of peptides predicted as fibril formers in the fibril formers set of peptides while the false positive rate was the fraction of peptides predicted as fibril formers in the fibril nonformers set of peptides.
Using hydrophobicity and β sheet propensity scales, we predicted the amyloidogenic regions of the considered proteins and peptides and evaluated the obtained results in a similar way to how we predicted these regions using packing density scales. The hydrophobicity scale of 20 types of amino acid residues was taken from the work of Fauchere and Pliska [
We are grateful to D. Reifsnyder for assistance in preparation of the paper.
amyloid β peptide
non-Aβ component of Alzheimer disease amyloid
receiver operating characteristic