SMS, MK, and FE conceived and designed the experiments. SMS, MK, and WB performed the experiments. SMS and FLS analyzed the data. GS contributed reagents/materials/analysis tools. SMS and FE wrote the paper.
The authors have declared that no competing interests exist.
Three different prenyltransferases attach isoprenyl anchors to C-terminal motifs in substrate proteins. These lipid anchors serve for membrane attachment or protein–protein interactions in many pathways. Although well-tolerated selective prenyltransferase inhibitors are clinically available, their mode of action remains unclear since the known substrate sets of the various prenyltransferases are incomplete. The Prenylation Prediction Suite (PrePS) has been applied for large-scale predictions of prenylated proteins. To prioritize targets for experimental verification, we rank the predictions by their functional importance estimated by evolutionary conservation of the prenylation motifs within protein families. The ranked lists of predictions are accessible as PRENbase (
Various cellular functions require reversible membrane localization of proteins. This is often facilitated by attaching lipids to the respective proteins, thus anchoring them to the membrane. For example, addition of prenyl lipid anchors (prenylation) is directed by a motif in the protein sequence that can be predicted using a recently developed method. We describe the prediction of protein prenylation in all currently known proteins. The annotated results are available as an online database: PRENbase. A ranking of the predictions is introduced, assuming that existence of a prenylation sequence motif in related proteins from different species (evolutionary conservation) relates to functional importance of the lipid anchor. We present experimental evidence for high-ranked human proteins predicted to be affected by anticancer drugs inhibiting prenylation.
Protein prenylation is facilitated by three eukaryotic enzymes with partially overlapping substrate specificities [
Based on the refinement of descriptions of sequence motifs recognized by the three enzymes (FT, GGT1, and GGT2) in substrate proteins, we have recently developed amino acid sequence–based predictors for various types of protein prenylation (PrePS [
As previous experience with a similar project (the application of the MyrPS/NMT myristoylation predictor [
Here, we report the results obtained after applying the three prenylation predictors over the National Center for Biotechnology Institute's (NCBI) nonredundant protein sequence database (NR). The proteins predicted to be prenylated have been clustered into homologous families and are made available as the annotated database PRENbase. A sophisticated interface can generate target lists with regard to the experimental status of the modification (known, predicted, etc.), exclusive or shared types of modifying enzymes (FT, GGT1, GGT2), as well as for evolutionary conservation by constraining the taxonomic distribution within clusters or for single sequences. We investigate the validity of various hit-ranking schemes relying on sequence homology information and taxonomic distribution. Finally, we use PRENbase to list human proteins that could represent elusive cellular targets of FT inhibitors (lack of alternative prenylation by GGT1 under FT inhibition) [
The three predictors included in PrePS [
To facilitate the selection of targets for experimental validation, we tried to rank the predictions by the importance of the lipid anchor for their function based on the analysis of evolutionary motif conservation within protein families. It would be of special interest to study the conservation of farnesyl, geranylgeranyl, and double geranylgeranyl anchors within protein families, as this can indicate exclusive or overlapping substrate specificity between the three enzymes. Thus, the extent of variation can give additional hints on the importance of the specific anchor size [
We have manually curated protein family annotations for clusters with at least three sequences (201 clusters total). Due to the power law–like behavior of protein family cluster sizes [
In addition to the protein family name and function description, we annotated clusters with respect to verification status. This is not a trivial task because it requires manual lookup of hundreds of literature sources. While the actual number of experimentally verified proteins is small compared with the total number of predictions, many proteins can safely be assumed to be prenylated simply by similarity to known examples. We annotate clusters/families as KNOWN (+) when they include at least one from a list of 113 proteins experimentally verified to be prenylated. In addition, we created the annotation category LIKELY (*) for clusters that do not have an experimentally verified example included directly, but where members of the cluster show a clear similarity (BLAST E-value < 1e−10) to at least one of the verified cases. Finally, clusters without any detectable similarity to any of the 113 proteins experimentally verified to be prenylated are categorized in PRENbase as NEW (?). While the former families (with annotation KNOWN and LIKELY) form a basis to summarize existing knowledge of prenylated proteins, the latter (NEW) are of special interest because their function apparently has not been recognized yet in the context of prenylation.
During the annotation process, we have also encountered a few predictions where conservation of a C-terminal cysteine in CaaX box arrangement can also occur for prenylation-independent functions such as disulfide bridges (e.g., metridin-like ShK toxin family members). Although these do not appear to be prenylation targets in vivo, it cannot be excluded that they become prenylated in a different context when their C-termini would be exposed to the prenylating enzyme. The endothelin-converting enzyme 1 (ECE1) from the neprilysin-like zinc metallopeptidase family is another example with a CaaX box where the capacity for prenylation is apparently not used in vivo (possibly because of a disulfide bond). It is predicted by PrePS to be weakly prenylated and, indeed, its C-terminus has been shown to be weakly prenylatable in vitro [
If a predicted protein feature, such as a prenylated C-terminus, is conserved among a large number of homologues (large cluster size), this feature appears more critical for biological function and more reliably predicted. Thus, predictions can be scored by
Instead of ranking based on counting the number of homologues, it is also possible to analyze the taxonomic distribution and score the families according to how widespread (or old) the motif is in the evolution of the protein family. Such phylogenic complexity can simply be estimated as a score function of the number of species (
It should be noted that ranking based on phylogenic complexity does not require the computationally costly determination of the total family size (including members without the motif). Large clusters that consist mainly of sequences of closely related species are downranked in favor of families with a more widespread taxonomic distribution.
To investigate the performance of the different ranking schemes, we plotted the distribution of clusters colored by their annotated modification status (
Values of Cluster Medians from
It can be seen that the simple ranking by cluster size brings the known or likely prenylated proteins (green clusters) to the front of the list. However, the red clusters also appear to be highly ranked. Using the evOluation score [
We previously estimated that PrePS misses about 2% of yet unknown prenylation motifs (cross-validated average sensitivity of PrePS: 98%) while predicting only 0.1% false positives in complete database searches (average specificity of PrePS: 99.9%) [
To estimate the performance gain of adding the evOluation ranking compared with the standard PrePS prediction alone, we apply ROC analysis by sliding an artificial threshold over the cluster ranks and count the true positive (KNOWN) and contextual false positive (OUT-OF-CONTEXT) clusters above or below the given thresholds. This allows plotting sensitivity (100-rate of false negatives) versus specificity (100-rate of false positives) for the different methods (
The manually annotated clusters/families of prenylated proteins described above are available as PRENbase. A Web interface (
For biomedical applications, it is of great interest to know which human proteins are particularly affected by prenyltransferase inhibitors that have already passed phase II and III clinical trials [
The most prominent group of prenylated oncogenes comprises members of the Ras superfamily of small GTPases. In PRENbase, these are clustered together in a small number of large families with high
In total, we have collected a list of at least 113 individual proteins experimentally verified to be prenylated that are part of 41 “KNOWN” clusters, and similarity to these justifies the annotation as “LIKELY” for another 106 clusters in PRENbase. Thus, a major strength of this work is the complete proteomic view of prenylation with an added evolutionary perspective. For example, by querying PRENbase for families with conserved prenylation motif in mammals, insects, nematodes, fungi, and plants, we derive a core set of only three clusters of already known prenylated proteins. These are the Rab, the Rho/Rac, and the DnaJ-like heat shock chaperone families which, therefore, could be postulated as being the oldest examples of prenylated proteins due to their most widespread taxonomic distribution. When weakening the conservation requirements and “only” considering conservation in mammals, insects, and nematodes, several other families join this list of presumably important prenylated proteins. These are (in the order of the evOluation ranking): the Ras/Ral/Rap family, the Lamin B cluster (linking also more generally coiled coil proteins), a cluster of mixed serine/threonine kinases, geranylgeranylated G gamma subunits, protein tyrosine phosphatase IVA, protein phosphatase 1 regulatory subunit 16 (in cluster with other Ankyrin domain containing proteins), as well as phosphorylase B kinase α+β subunits. Although spread over multiple clusters due to their sequence diversity, fungal mating factors/pheromones compose another large functionally related group of prenylated proteins.
In contrast to the examples above where the prenylation site is highly conserved among various taxa, there are many cases where the predicted prenylation is specific to taxonomic lineages or even single species. Nevertheless, this posttranslational modification can be an important requirement for function of the respective proteins. Therefore, the smaller clusters that can be found in PRENbase also merit deeper investigation.
It is no surprise that the small GTPase families, well-known for their prenylation, top the evolutionary ranked lists in PRENbase. Apparently, multiple duplication events of common prenylated ancestor genes led to the numerous paralogous proteins in the Ras superfamily of small GTPases, resulting in the observed phylogeny of function [
Although the historical research focus [
In our predictions for prenylated protein families, we find a large group of 88 homologous plant proteins that are annotated to be metal-binding copper chaperones spread over 21 clusters. Surprisingly, the mainstream prenylation-related publications have not mentioned these proteins as prenylated, so far. A thorough search of the literature, however, reveals that a previous work has already shown prenylation for three of these proteins (all in soybean) [
Our approach identifies 979 sequences in 114 clusters that do not share similarity with already known prenylated proteins and whose predicted prenylation, therefore, would expand the possible functional repertoire of prenylated proteins in cells. Surprisingly, we find several proteins that are related to ubiquitin-mediated protein degradation.
One of these groups comprises some ubiquitin-like proteins. In particular, UBL3 and its prenylation motif are not only conserved in organisms from mammals to insects and worms but, apparently, also in some fungi and plants. Fitting into the related functional context of ubiquitin-mediated degradation, it is also interesting to observe predicted prenylation for several ubiquitin hydrolases. For example, ubiquitin specific protease 32 is conserved in mammals, pufferfish, and insects with a domain architecture of an N-terminal EF-hand domain, a central DUF1055 domain, followed by a C-terminal ubiquitin hydrolase domain which finally precedes the conserved prenylation motif. Furthermore, we predict several fungal proteins that have a carboxy-terminal ubiquitin hydrolase domain in addition to a prenylation motif. Interestingly, there also exists an E2 ubiquitin-conjugating enzyme with conserved prenylation motif in
The connection of prenylation and protein degradation continues with the prediction of a prenylation site in F-box and leucine-rich repeat proteins, with FBL2 being conserved in organisms from mammals to insects, worms, and fungi. These proteins typically serve as adaptors targeting substrate proteins of SCF (skip-cullin-F-box) and analogous degradation complexes [
Besides proteins with already known functions, a conserved prenylation motif is also valuable information for proteins with domains of unknown functions. Most prominently in our list, proteins containing a DUF544 domain appear conserved in organisms from mammals to worms, plants, and fungi. In another cluster, integral membrane proteins from mammals, insects, and worms share a DUF1339 domain together with the prenylation motif.
The selection of candidates for experimental verification focuses on predictions related to possible human target proteins for FTIs, because of the implications for important upcoming cancer therapeutics [
The experimental verification of prenylation predictions follows a new, recently described methodology [
Western blot and corresponding scans from TLC linear analyzer of wild-type GST-Rab28-fusion protein translated with [3H]mevalonic acid (lane 1), GST-Rab28 C218A with [3H]mevalonic acid (lane 2), GST-Rab28 with [3H]FPP (lane 3), and GST-Rab28 with [3H]GGPP (lane 4). There is significant incorporation of a product of mevalonic acid as well as FPP, while incorporation of GGPP is not detectable, suggesting that Rab28 is primarily a farnesylation target.
High-Ranked Predicted FTI-Targets (pFs) Sorted by EvOluation Score with Cluster and Taxonomy Statistics
The selective preference of RasD2 (eighth) for farnesyl anchors has been unambiguously shown in our previous work [
In humans, Rab28 exists in at least two isoforms, differing in an insertion at the C-terminus. They are distantly related to the Rab proteins (∼30% sequence identity), which are important in vesicle fusion and targeting. While the short isoform is expressed in most tissues, the long isoform is predominately found in testis [
The in vitro experimental study provides direct evidence that FLJ32421 (motif: -CYIS), a hypothetical human protein, is a preferential farnesylation target (
Western blot and corresponding scans from TLC linear analyzer of wild-type GST-FLJ32421-fusion protein translated with [3H]mevalonic acid (lane 1), GST-FLJ32421 C408A with [3H]mevalonic acid (lane 2), GST-FLJ32421 with [3H]FPP (lane 3) and GST-FLJ32421 with [3H]GGPP (lane 4). There is significant incorporation of a product of mevalonic acid as well as FPP, while incorporation of GGPP is close to the detection limit, suggesting that FLJ32421 (BROFTI) is primarily a farnesylation target.
Prickle1 (motif: -CIIS,
Western blot and corresponding scans from TLC linear analyzer of wild-type GST-ΔPrickle1 fusion protein translated with [3H]mevalonic acid (lane 1), GST-ΔPrickle1 C828A with [3H]mevalonic acid (lane 2), GST-ΔPrickle1 with [3H]FPP (lane 3) and GST-ΔPrickle1 with [3H]GGPP (lane 4). There is significant incorporation of a product of mevalonic acid as well as FPP, while incorporation of GGPP is lower despite a higher total amount of protein in the latter case, suggesting that Prickle1 is primarily a farnesylation target.
Western blot and corresponding scans from TLC linear analyzer of wild-type GST-ΔPrickle2-fusion protein translated with [3H]mevalonic acid (lane 1), GST-ΔPrickle2 C842A with [3H]mevalonic acid (lane 2), GST-ΔPrickle2 with [3H]FPP (lane 3), and GST-ΔPrickle2 with [3H]GGPP (lane 4). There is significant incorporation of a product of mevalonic acid as well as FPP, while incorporation of GGPP is lower despite a higher total amount of protein, suggesting that Prickle2 is primarily a farnesylation target.
HeLa cells were analysed by fluorescence microscopy after transfection with the following constructs: inserts 1, 3, and 4—GFP-Rab28; insert 2—GFP-Rab28 C218A; inserts 5, 7, and 8—GFP-FLJ32421; insert 6—GFP-FLJ32421 C408A; inserts 9, 11, and 12—Prickle2; insert 10—GFP-Prickle2 C841A; inserts 13, 15, and 16—GFP-RhoA63L (as positive control for a geranylgeranylated target); insert 14—GFP-RhoA63L C190S. The GFP-RhoA plasmids were kindly provided by Channing J. Der (University of North Carolina Chapel Hill, Chapel Hill, North Carolina, United States). Nuclei were co-stained with DAPI (blue color).
(A) GFP-Rab28, GFP-FLJ32421, and GFP-Prickle2 are membrane-localized with (4, 8, 12) or without (1, 5, 9) GGTI-298 treatment. Mutation of the Cys in the CaaX box (2, 6, 10) or treatment with FTI-277 (3, 7, 11) cause mislocalization and accumulation of the fusion proteins in the nucleus.
(B) GFP-RhoA is membrane-localized with (15) or without (13) FTI-277 treatment. Mutation of the Cys in the CaaX box (14) or treatment with GGTI-298 (16) cause mislocalization and accumulation of RhoA in the nucleus.
While we have tested the prenylation status of evolu-tionarily widely conserved, high-ranking examples in our list, there are in total 128 human proteins that serve as predicted FTI targets. The full list is available online at (
As opposed to pFs, pFGGs are classified due to their ability to be prenylated by either FT or GGT1 (
High-Ranked FT Substrates Predicted To Be Unaffected by FT Inhibition
Previously, the only known examples of prenylation of viral proteins by the eukaryotic host were the Hepatitis Delta large antigen and viral variants of H-Ras and K-Ras, as well as the US2 tegument protein of bovine Herpes viruses.
Surprisingly, our search reveals two candidate proteins from Mimivirus, a giant virus in amoebae that might be a pneumonia-associated human pathogen [
While there are several other predictions of prenylation motifs in viral proteins (170 sequences in 46 clusters), it is difficult to estimate the likelihood of their functionality, given the requirement that eukaryotic host enzymes be available. Hence, we are more confident in predicted prenylation motifs in proteins that are at least homologous to proteins with known prenylation in Eukaryotes. As an additional example to the above Mimivirus proteins, we find an ankyrin domain–containing protein with FT-specific prenylation motif conserved in canarypox and fowlpox virus.
Farnesyl (C15) and geranylgeranyl (C20) anchors differ in length by one isoprene unit (C5). However, this difference does not seem to matter for some proteins, such as the yeast a-factor mating pheromone [
In PRENbase, we observe that protein families differ in the evolutionary exchangeability of farnesyl and geranylgeranyl anchors. While there are several pFGG families where both anchor types are predicted to occur, there are a few pF-only families where farnesyl anchors appear to be the strongly preferred lipid type. From the above list of known examples for length dependency, we find that only G gamma 1 and 2 have a purely conserved farnesyl preference. While for rhodopsin kinase only the chicken orthologue switched to geranylgeranyl, there are several lower eukaryotes with an H-Ras orthologue ending in a geranylgeranylation motif. R-Ras and RhoB end with a -CXXL motif that by itself already can be substrate of either FT or GGT1.
At the same time, the a-factor mating pheromones, where anchor length should be less important, also appear in pF-only families, which, however, could be due to the confinement of clustering together only very closely related species lacking evolutionary time to diverge. The same probably applies to the many almost identical large subunits of Hepatitis delta virus, which are clustered into a pF-only family. On the other hand, the FT restriction also represents a possible vulnerability to FT inhibitors.
Given the above listed ambiguities, one cannot conclude with certainty whether a specific prenyl anchor length is important for a protein family based on the evolutionary variability of substrate preferences. However, in a taxonomically widely conserved family, a clear preference for farnesylation could still indicate a length dependency and, consequently, a requirement of farnesyl for specific protein–protein interactions. In HumanPRENbase, besides the above mentioned G gamma 1 and 2, the following families fall under these criteria: nucleosome assembly protein 1-like 1, prickle-like 1, phosphorylase kinase β, FLJ32421/BROFTI, RasD2/Rhes, RhoH, Rab28 long isoform, RhoQ, EH domain binding protein 1, DnaJ-homolog A4, 72kDa inositol polyphosphate-5-phosphatase E, and WD+tetratricopeptide repeats protein 1.
PRENbase provides (1) a review of previous knowledge of known and likely prenylated proteins resulting in the rediscovery of the large group of prenylated metal-binding chaperones in plants; (2) target lists for experimental validation of newly predicted prenylation are ranked by evolutionary conservation, which leads to the notion that several proteins involved in ubiquitin-mediated protein degradation could be prenylated; (3) lists of possible targets for FT inhibition (human proteins that are unique substrates of FT and not GGT1 or GGT2) with the experimental evidence for Prickle1, Prickle2, the BRO1-domain-containing FLJ32421 (termed BROFTI), and Rab28 (short isoform); (4) lists of dual FT/GGT substrates that are essentially not affected by FT inhibition or that can receive an altered anchor type under FT inhibition; (5) a list of viral proteins possibly processed by eukaryotic host enzymes, most notably two proteins from Mimivirus; as well as (6) examples of the importance of specific farnesyl anchor length (clusters that only include FT but not GGT1 or GGT2 substrates) that could be indicative of involvement in protein–protein interactions.
In MYRbase, sequences with higher than 40% sequence identity have been clustered into protein families. This rather conservative threshold is reasonable to infer similarity of biological function [
To cluster predicted proteins into their natural families independent of the existence/prediction of a lipid anchor, we have executed BLAST searches (E-value 0.005) starting with the 5,410 predicted proteins against the same complete database from which the predictions were derived (NCBI's NR with 2,179,151 entries, based on GenBank/GenPept version 144). Using the measured BLAST similarity as input for MCL [
We first generated a list of human proteins that are predicted to be prenylated by at least one of the three enzymes FT, GGT1, and GGT2 by running PrePS over NCBI's NR. Then, we determined the orthologues in other organisms with the condition of best reciprocal BLAST hits. The algorithm employed here follows in the steps of earlier methods to detect orthology and paralogy relationships [
We generated plasmids containing GST and pEGFP fusions of all genes studied in this work. The cDNAs of Rab28 short isoform and FLJ32421/BROFTI were cloned into the pGEX5X1-vector, thereby creating N-terminal GST-fusion proteins. Since the cDNAs received for Prickle1 and Prickle2 did not match or only partially matched the desired sequence, we used oligonucleotides representing the last 15 residues at the C-terminus instead. The Stratagene QuikChange XL Site-Directed Mutagenesis Kit was used to introduce a cysteine-to-alanine mutation in the CaaX motif. Since this residue is the site of covalent thioether linkage of the isoprenoid modification, the ability to become modified should be abolished. Both wild-type and mutant cDNA of Rab28 short isoform and FLJ32421/BROFTI were also cloned into the pEGFP C2 vector. For Prickle2, we used a C-terminal fragment representing the last 338 residues at the C-terminus, which is the longest matching sequence we had available. The N-terminal GFP-fusion proteins were used to investigate the subcellular localization in transiently transfected HeLa cells. No GFP-construct of Prickle1 was cloned, since the localization of the last 15 amino acids would not have been representative at all.
The cDNA of the GST fusion proteins was amplified by PCR and transcribed and translated in vitro using the Promega TNT Quick Coupled Transcription/Translation Kit in the presence of the radioactive label of choice (typically, 20 μCi [3H]mevalonic acid, 10 μCi [3H]FPP, or [3H]GGPP, all purchased from American Radiolabeled Chemicals,
HeLa cells were transfected with the GFP-expression vector constructs for Rab28 short isoform, FLJ32421/BROFTI and Prickle2 using Lipofectamine and Plus Reagent in serum-free medium (Life Technologies,
Accession numbers (IMAGE clone ID) of cDNA clones from the RZPD clone libraries (
Accession numbers from GenBank (
Accession numbers of clusters from PRENbase (
Accession numbers of clusters from HumanPRENbase (
Accession numbers (GI numbers) from GenBank (
Additional accession numbers of clusters from HumanPRENbase (
The authors are grateful for generous financial support from Boehringer Ingelheim. The computational facilities have been supported by SUN Microsystems through their academic Center of Excellence sponsorship program. Since November 2005, SMS is recipient of a Marie Curie Intra-European Fellowship.
farnesyltransferase
FT inhibitor
geranylgeranyltransferase type I
geranylgeranyltransferase type II
Markov chain clustering
National Center for Biotechnology Information
nonredundant protein sequence database
protein prenylated by FT but not GGT1
prenyltransferases
protein prenylated by FT and GGT1
protein prenylated by GGT1
Prenylation Prediction Suite
Rab escort protein
thin layer chromatography