Conceived and designed the experiments: NS MM BRC. Performed the experiments: NS BN LT. Analyzed the data: NS KV. Contributed reagents/materials/analysis tools: NS BN ARP KH AK BRC. Wrote the paper: NS. Was the lead investigator: MM.
† Deceased.
The authors have declared that no competing interests exist.
The role of alternative splicing in self-renewal, pluripotency and tissue lineage specification of human embryonic stem cells (hESCs) is largely unknown. To better define these regulatory cues, we modified the H9 hESC line to allow selection of pluripotent hESCs by neomycin resistance and cardiac progenitors by puromycin resistance. Exon-level microarray expression data from undifferentiated hESCs and cardiac and neural precursors were used to identify splice isoforms with cardiac-restricted or common cardiac/neural differentiation expression patterns. Splice events for these groups corresponded to the pathways of cytoskeletal remodeling, RNA splicing, muscle specification, and cell cycle checkpoint control as well as genes with serine/threonine kinase and helicase activity. Using a new program named AltAnalyze (
The reprogramming of pluripotent stem cells from adult cells is a crucial step toward producing patient-specific cells for transplant therapy. Critical to this goal is the ability to reproducibly drive the differentiation of these cells to specific fates, such as cardiac and neural cells. While gene expression is important in tissue specific differentiation, the impact of alternative splicing on the biology of differentiating cells has not been fully realized. To identify specific splicing events that may determine cell-type-specific differentiation, we compared splicing profiles of human embryonic stem cells (ESCs) and derived cardiac and neural precursors using Affymetrix exon tiling arrays. Segregation of splicing profiles into cardiac-restricted and common cardiac/neural differentiation pattern groups revealed unique groups of genes with clear implications for the biology of cardiomyocyte function and the maintenance of pluripotent ESCs. Alternative splicing of many of these genes, notably regulators of cell death and proliferation, were often predicted to impact protein domain or microRNA binding site inclusion, suggesting that the function or expression of these proteins is altered during differentiation. These results provide further evidence that alternative splicing is important in shaping the functional repertoire of ESCs and differentiated cells.
The differentiation of embryonic stem cells (ESCs)
Up to 80% of all human genes undergo AS to produce multiple mRNA transcripts that differ in their inclusion of exons and introns
Since ESCs can differentiate into all cell lineages, characterizing isoform expression along specific lineage paths requires efficient methods to obtain pure cell populations. To this end, hESCs have been differentiated into neural progenitors (NPs), isolated with an effective neural differentiation protocol, and profiled with whole-genome exon-arrays
While these methods were an important step toward delineating the role of AS in differentiation, profiling of other progenitor cell types and comparisons between cell types is required to identify and understand common processes in differentiation and processes that are specific to different paths of differentiation. Determining the consequences of AS on a genome-wide basis will require tools to predict the effects of AS on protein sequence, domain inclusion, and protein expression.
In this study, we sought to identify AS during differentiation into different progenitor populations by exon-level genome profiling of homogenous populations of undifferentiated hESCs and derived cardiac progenitors (CPs) using a new selectable marker strategy. By comparing CP differentiation to a reported dataset of neural differentiation
The genetically engineered hESC lines and electrophysiology of derived CPs have been described in detail in
Human exon array CEL files for the Cythera NP differentiation datasets (Cy-ESCs and Cy-NPs), HUES6 ESCs, HUES6 NPs, and fetal human central nervous system stem cells were provided by the Gage laboratory (
The methods and program components of AltAnalyze are described in detail in
To identify alternative exons, AltAnalyze was run with default parameters. This analysis consists of (A) selecting microarrays for expression summarization with RMA, (B) defining biological groups for each array and pairs of groups for alternative exon analysis (e.g., hESCs and CPs), (C) downloading/loading appropriate library and annotation files for the microarray, (D) defining thresholds for probe set filtering, (E) defining thresholds for alternative exon analysis statistics (splicing-index and MiDAS
After RMA expression values and gene expression statistics (e.g., hESC and CP group gene expression averages, fold changes, and
Human Affymetrix exon array data were compared for REX+ hESCs and derived CPs; Cythera, HUES6 hESCs and derived NPs; fetal human central nervous system stem cells (hCNS-scns); and 11 adult tissues, processed by RMA together. (A) Relative changes in gene expression (log2 fold, relative to the global expression mean) for all samples were clustered by array (rather than genes) for any Ensembl gene with a relative change in gene expression >2. Biological triplicates are indicated for each tissue or cell line. (B) Gene expression profiles for this combined dataset and for specific markers of CP-specification (columns 1 and 2) and for pluripotency (column 3).
The primary filters for identifying alternative exons were a conservative SI fold change >1 (equivalent to a 2-fold difference in expression relative to constitutive expression levels), an SI
To visualize alternative exons in the context of genes, we wrote a prototype plugin, currently in development, for the network visualization software Cytoscape
To identify protein domains and motifs potentially modified/disrupted by AS, a series of databases is built with each build of AltAnalyze (stored as distributed text files). These databases consist of an aligning and nonaligning protein (competitive isoforms) for all probe sets (
Theoretical transcripts with distinct exon compositions are shown. (A) Distinct alternative (Alt.) exon annotations for five mRNA transcripts, where the filled boxes are sequences retained in the processed mRNA transcript. Black filled boxes are exons common to all isoforms (constitutive). AltAnalyze considers all alternative exon annotations as AS except for alternative-N-terminal exons (expressed through alternative promoter selection). (B) All pairs of mRNA transcripts that do or do not align to an exon array probe set are compared to identify a single pair of competitive isoforms that minimally differ in exon composition. Curved arrows indicate all possible competitive transcript comparisons. The top selected competitive isoforms (dashed box) have the fewest exon differences and have the most exons in common. AltAnalyze selects this transcript pair for analysis of downstream protein domain/motif composition, after corresponding protein sequences are selected. (C) Protein domains and motifs differing between competitive isoforms. Exons for the two transcripts are labeled in order, 5′ to 3′, with protein sequence and Uniprot features (UPF) or InterPro regions (IPR) corresponding to each exon displayed above or below them. Yellow filled boxes indicate domains and motifs differencing between the competitive isoforms. (D) Domains and motifs directly aligning to a probe set's genomic position. A theoretical probe set aligning to the intron of a gene is shown. InterPro domains/motifs whose genomic position (genomic exon start and exon end position) overlaps with a given probe set (genomic start and end position) are shown with a yellow filled box. Rather than comparison of two protein sequences with the competitive isoform analysis, only a single protein sequence is required for the direct genomic alignment method.
In addition to these protein annotations, putative miRNA binding sites from PicTar
To segregate transcriptionally regulated genes and AS events into CP-specific and common CP-NP differentiation patterns, we used a two-way ANOVA strategy in which the LIMMA package in Bioconductor
Gene Ontology (GO) and pathway over-representation were evaluated with the program GO-Elite (
Alternative exons were selected for confirmation with RT-PCR using the following criteria: (1) prior evidence of AS or presence of predicted miRNA binding sites, (2) a small number of alternative exons per gene, and (3) predicted domain/motif changes. The second criterion was applied to favor splice events where both isoforms could be amplified in a single reaction and where domain/motif-level changes could be attributed to the splicing event examined. Fifty alternative exon sequences were selected for confirmation. Optimal flanking, isoform-specific, or constitutive primers designed with a custom implementation of primer 3 called AltPrimer (
To identify genes with common CP-NP differentiation or CP-specific AS patterns, we isolated homogenous populations of hESCs and cardiac precursors and compared them to a dataset of neural precursor differentiation
Homogenous populations of undifferentiated hESCs and CPs were isolated by modifying the H9 ESC line to stably express neomycin-and puromycin-resistance genes, driven by the pluripotent-cell-specific REX-1 and CP-specific myosin heavy chain alpha (MHCα or MHY6) promoters, respectively (see
Gene expression values from these exon arrays were determined for constitutive aligning or all probe sets for each Ensembl gene (
To identify alternative exons in day 40 CPs versus REX+ hESCs and link the results to predicted sequence changes that might alter protein expression/function, we created a free, open-source application called AltAnalyze (
Of the 13,583 genes with evidence of expression, 16% (2,106) were predicted to have at least one alternative exon in the differentiation to CPs, as compared to 3,044 genes with up- or downregulated gene expression (
29,151 | ||
13,583 | ||
Alternative splicing (AS) | 876 | |
Alternative promoter (AP) | 170 | |
AS and AP | 152 | |
No evidence of AS or AP | 908 |
Number of genes differentially expressed and alternatively spliced. Gene expression values were calculated for 29,151 Ensembl gene identifiers, of which only 13,583 were examined for AS. Genes examined for AS were required to have constitutive annotated probe sets expressed in both undifferentiated H9 ESCs and derived CPs. Transcriptional activity of genes was determined by using either constitutive probe sets, if present, or all probe sets, when not present. Genes with alternative exons are unique Ensembl genes reported by AltAnalyze with alternatively expressed probe sets. Genes with alternative exons only aligning to AS annotations or only to AP annotations are reported along with genes that associate with both AS and AP (multiple exons or one exon with multiple annotations) and no evidence of AS or AP.
As in earlier studies of NP differentiation
Although analysis of protein composition determined by AS is not entirely novel
For alternative exons regulated during CP differentiation, predicted changes in domain/motif and miRNA binding site composition were examined with AltAnalyze. The majority of alternative exons during CP differentiation (79%), corresponded to competitive mRNA isoforms (sharing some exons, but not the probed exon) (
To determine whether certain domains or motifs were over-represented by both methods, we examined the associated over-representation z scores and permutation-based p values for all CP differentiation ARGs (
In addition to protein domains/motifs, 12.5% of ARGs associated with CP differentiation (264 of 2,106) resulted in the predicted gain or loss of at least one miRNA binding site (
216 | ||
11,085 | ||
Upregulated in hESCs | 202 | |
Downregulated in hESCs | 60 | |
Up- and downregulated in hESCs | 2 |
Transcriptionally regulated genes annotated as miRNAs and genes containing alternative exons overlapping with predicted miRNA binding sites. Analysis of gene transcription data from the Affymetrix exon array, highlights 26 Ensembl annotated miRNA genes differentially expressed with CP differentiation (up- or downregulated >2 with a
Interestingly, a recent study also observed alternative expression of exon regions containing miRNA binding sites, when these cells began to actively divide
Several predicted splicing events identified in CPs were verified in previous analyses of hESC differentiation. These included genes that underwent AS in the differentiation to NPs (SLK, SORBS1
(A) Expression of splice isoforms confirmed by RT-PCR of genes with prior evidence of AS. ANXA7, SLK, NF1, and VCL were confirmed with flanking primers, and PKM2 and ATP2A2 with isoform-specific primers. DNA agarose gel images, with REX+ hESCs RNA on the left side of the gel and CPs on the right. (B–C) Exon structure (top graphic) and expression profiles (bottom graphic) for ANXA7 and ATP2A2. (B) SI fold changes are shown for probe sets aligning to exons and introns in the prototype Cytoscape plugin SubgeneViewer. Light red boxes indicate upregulation for CP versus hESC; blue boxes, downregulation; gray boxes no significant change; white boxes no probe set detected above expression thresholds. Probe set expression values (log2) are displayed for both CP (top graphs) and NP differentiation (bottom graph), ranked in order of genomic position on the
At least three of these verified AS events correspond to modified protein function/expression, producing differences in cell metabolism (PKM2)
For vinculin (VCL), the gain of a vinculin/alpha-catenin sequence by AS is associated with altered ligand binding properties of the muscle form of the protein
The combination of cardiac and neural differentiation data provides a unique opportunity to define molecular profiles unique to or in common to specific differentiation paradigms. To identify AS events during CP differentiation that correspond to CP-specification or inhibition/promotion of differentiation, we used two-way ANOVA to compare alternative isoform expression between cardiac and neural differentiation (
When applied to alternative exons regulated during CP differentiation, this ANOVA method identified 565 genes with a common CP-NP differentiation expression pattern and 414 genes with a CP-specific expression pattern (
AS predictions with evidence of (A) a common CP-NP differentiation or (B) a CP-specific expression pattern, relative to undifferentiated hESCs. Adjacent to each heatmap are alternative exons, ranked according to the ANOVA false-discovery rate (FDR) p value. Next to this p value, are the SI fold changes reported by AltAnalyze (negative values indicate increased alternative exon expression in CPs and vice versa). Gene names in blue have prior evidence of AS during hESC differentiation; genes in red have prior evidence of AS during cardiac differentiation. Genes associated with GO terms and WikiPathways are graphed that are overrepresented in genes with a (C) common CP-NP or (D) CP-specific AS pattern.
When pathway analysis was applied to AS genes with a common CP-NP differentiation splicing pattern, the most enriched ontology categories/pathways were water binding, RNA and chromatin binding, integrin-mediated signaling, microtubule binding, extracellular matrix, and lipid transport (
Fifty alternative exons with a CP-specific or common CP-NP differentiation pattern were selected for further confirmation and in-depth analysis of domain/miRNA binding sites. When applied to a previously described dataset with comprehensive validation (knockdown of the splicing factor PTB)
RT-PCR results for a panel of predicted CP differentiation-splicing events with both a common CP-NP differentiation and CP-specific ANOVA pattern. Genes are categorized based on predicted domain/motif changes: truncation, disruption, modification, exchange or no associated predictions. The higher band in each gel image is the longer isoform with exon inclusion (in); the lower band is the shorter isoform with exon exclusion (ex), unless indicated as a constitutive (cs), mutually exclusive (mx), or miRNA (miR)-containing exon. Additional confirmed genes are shown in
Genes with some of the most pronounced confirmed changes and a common CP-NP differentiation AS pattern included those encoding serine/threonine kinases (SLK, FER, FYN, MARK3), spectrin-actin binding (SPTBN1, ADD3), cell cycle (MADD, PCBP4, SEPT6), and cell-cell communication (TJP1) proteins. Similarly regulated genes with a CP-specific AS pattern included those encoding calcium signaling (ASPH, ANXA7, ATP2A2), cell metabolism (PKM2, OGDH), cell cycle (NUMB, UBE4B, CSDE1, NF1, ANXA7), and double-stranded RNA binding (STAU1) proteins. Several of these confirmed AS events appeared to have cardiac/muscle-specific and common CP-NP differentiation patterns when examined with the entire adult tissue/cell line exon-array panel. This was the case for the genes KIF13A and CSDE1, each of which showed the highest alternative exon expression for hESCs or cardiac/muscle cells, respectively, when compared to all other tissues and cells (
(A) For the genes KIF13A and CAPZB, log2 expression values for exon aligning probe sets are shown; probe sets are ranked in order of genomic position on the
The possible effects of AS on protein function are diverse and therefore challenging to predict bioinformatically. Since AltAnalyze identified confirmed domain/motif changes that correlate with changes in protein function (e.g., PKM2
Of the 44 confirmed splicing events for CP differentiation, 34 were initially predicted to alter protein domain or motif composition (
Gene symbol | ANOVA pattern | SI | Exon ID | GE fold Δ | Exon annotations | Δ protein length | Primary functional change |
CSDE1 | cardiac | −4.01 | E3 | 0.05 | cassette exon | 767->767 | |
MADD | diff | −2.04 | E28 | 0.24 | cassette exon | 1608->1581 | |
NF1 | cardiac | −2.28 | E57 | 1.07 | cassette exon | 2127->2145 | |
NUMB | diff | 1.59 | E14 | −0.1 | cassette exon | 651->603 | |
SAPS2 | diff | −2.27 | E3|E4 | 0.17 | cassette exon | 932->966 | |
CDC42 | diff | −1.94 | E9-1 | 0.25 | alt-C-term | 116->191 | miRNA (gain), GTPase Rab/Ras/Rho (gain) |
CDC42BPA | cardiac | −1.8 | E37 | 0.28 | cassette exon | 1719->1045 | Kinase (loss), PAK box Rho BIND (gain/loss) |
CLK1 | cardiac | −2.83 | I7 | 0.4 | intron-retention | 454->134 | Kinase, H+ acceptor (loss) |
EWSR1 | cardiac | −2.03 | E16-2 | −0.3 | intron-retention | 600->146 | DNA BIND, RRM (loss) |
HDAC9 | cardiac | −3.12 | E4 | 0.78 | cassette exon | 1066->21 | Interaction w/MEF2, HDAC region (loss) |
HIF3A | diff | −1.67 | E8-3 | 0.06 | Alt-5′ | 237->363 | DNA BIND, HLH, PAS, Nuc_translocat (gain) |
LRRFIP1 | cardiac | −3.34 | E5 to E9 | 0.68 | cassette exon | 752->640 | DNA BIND, PT, PS (loss) |
NAV2 | diff | −2.96 | E21|E22 | 0.77 | cassette exon | 1493->2429 | Calponin_act_bd, Na_channel4 (gain) |
OGDH | cardiac | −3.14 | E6 | −0.04 | cassette exon | 1023->567 | 2 oxoglutarate_DH_E1, Transketo_Cen_R (loss) |
WNK2 | diff | −2.16 | E28 | −0.28 | cassette exon | 45->1004 | Kinase, H+ acceptor, PS (gain) |
FER | diff | 1.47 | E4 | 1.54 | cassette exon | 163->822 | Kinase, H+ acceptor, SH2 (gain) |
ABI2 | cardiac | −1.8 | E9 | −0.19 | cassette exon | 401->513 | Neu_cyt_fact_2 (gain) |
ANXA7 | cardiac | −4.04 | E6 | −0.32 | cassette exon | 466->488 | Annexin (gain/loss), Pro-rich (loss) |
ASPH | cardiac | −1.66 | E7|E8 | 0.96 | cassette exon | 313->225 | miRNA, Asp-b-hydro N-term (gain/loss), Cytoplasmic/Lumenal Topo, PS (loss) |
ATP2A2 | cardiac | 1.42 | E20|E22 | 1.38 | bleedingExon | 1042->997 | miRNA, Cytoplasmic Topo (loss) |
KIF13A | diff | 2.03 | E41 | 1 | cassette exon | 1805->1770 | PS (loss) |
NEDD4 | diff | 1.12 | E7 | −0.02 | cassette exon | 1000->1247 | C2 Domain (loss) |
PCBP4 | diff | 1.35 | E6-3 | 0.39 | intron-retention | 369->397 | KH 1 (loss) |
SPTBN1 | diff | −1.96 | E34-2 | 0.45 | bleedingExon | 2377->2155 | miRNA (gain), Carbohyd-O-linked, Spectrin, PH (loss) |
STAU1 | diff | −1.52 | E7 | −0.04 | cassette exon | 496->577 | dsRNA BIND (gain) |
UBE4B | cardiac | −1.97 | E9 | −0.41 | cassette exon | 1173->1302 | Phosphopantetheine attachment (loss) |
FYN | diff | 1.57 | E12 | −0.1 | cassette exon | 534->482 | Kinase (gain/loss) |
PKM2 | diff | −2.64 | E12 | 0.32 | cassette exon | 531->531 | Kinase (gain/loss), ISC, FBP, PT (loss) |
TCF3 | diff | −1.33 | E18 | −0.75 | cassette exon | 654->651 | AnnexinVII (loss), bHLH (gain/loss) |
ADD3 | diff | 2 | E16 | 0.26 | cassette exon | 706->674 | Oxred_Ald_Fedxn_C-term (gain/loss) |
CAPZB | cardiac | −3.77 | E12 | −0.21 | cassette exon | 272->277 | Factin_cap_beta (gain/loss) |
DNM1L | cardiac | −2.18 | E3 | −0.1 | cassette exon | 736->751 | Dynamin GTPase (gain/loss) |
HISPPD2A | diff | 1.68 | E49-1 | −0.17 | alt-5′|cassette exon | 1433->1412 | HisAc_phsphtse (gain/loss) |
MARK3 | diff | −1.52 | E19 | 0.04 | cassette exon | 729->744 | Kinase (gain/loss) |
SLK | diff | 2.44 | E15 | 0.64 | cassette exon | 1235->1204 | Kinase like (gain/loss) |
TJP1 | diff | −2.27 | E24 | −0.03 | cassette exon | 1676->1748 | ZU5 Domain (gain/loss) |
VCL | diff | −1.08 | E23 | −0.1 | cassette exon | 1066->1134 | Vinculin/catenin (gain/loss) |
VCL | Diff | −1.64 | E10-2 | −0.1 | alt-5′ | 1066->222 | Vinculin/catenin, PS, PT (loss) |
VPS39 | Diff | −1.53 | E3 | 0.08 | cassette exon | 875->886 | Citron homology, WD40 (gain/loss) |
C6orf134 | Cardiac | −1.37 | E11 | 0.19 | alt-C-term | 398->300 | miRNA (gain) |
DERP6 | Diff | −1.68 | I8 | −0.45 | intron-retention | 316->279 | miRNA (gain) |
LEFTY1 | Diff | 1.19 | E4 | −0.68 | 366->366 | miRNA (loss) | |
MAFB | Diff | −1.01 | E1-5 | 0.41 | 323->323 | miRNA (gain) | |
SEPT6 | Diff | 2.4 | E11 | 0.59 | cassette exon | 427->429 | miRNA (loss) |
Splicing, protein, and miRNA binding site annotations are shown for alternative exons confirmed by RT-PCR. For each alternative exon, the corresponding gene name (Gene symbol), ANOVA AS differentiation pattern (ANOVA pattern: diff = common CP-NP differentiation, cardiac = CP-specific), splicing index (SI) fold change, relative AltAnalyze exon/intron position in the gene structure (Exon ID), gene-expression (GE) fold change (Δ) for the gene, AS annotations that correspond to the Exon ID, change in predicted protein length (length of the competitive protein isoforms in hESC->CP), and top corresponding domain/motif or miRNA binding site annotations (Primary functional Δ). Negative SI fold changes indicate increased alternative exon expression in CPs and vise versa. For primary function Δ annotations, gain indicates the increase in the expression of an alternative exon overlapping with that domain in CPs versus hESCs, a loss indicates a relative decrease in expression and a gain/loss indicates that the domain/motif is present in both protein isoforms but with different sequence. PS = phosphoserine modified residue, PT = phosphotyrosine modified residue, miRNA = miRNA binding site. Complete annotations can be found in
Twenty-two of the 34 alternative exons were predicted to have domain/motif changes with both direct genomic alignment and competitive isoform analysis. These alternative exons should directly change the sequence or disrupt a domain/motif and thus represent higher-confidence predictions. Only one gene, LEFTY1, was predicted to alter the sequence of a domain (transforming growth factor β) with direct genomic alignment and not the competitive isoform analysis. In all but four of these 34 alternative exons, changes in domain/motif composition were also predicted by the exhaustive comparison method. Three of these four alternative exons were present in both untranslated and coding regions of the different possible isoforms. Since the exhaustive method is biased towards selection of competitive isoforms that produce no change in domain/motif composition, only competitive isoforms where the alternative exon was present in an untranslated region were selected. Of the remaining 30 alternative exons, 17 had identical domain/motif predictions with the exhaustive and the original competitive isoform analysis and 13 had almost identical predictions (largely the same but sometimes fewer domain/motif changes) with the exhaustive method (
Predicted changes in protein domain/motif composition for confirmed splice events could be classified into four groups: truncation, disruption, exchange, and modification (
Nine of the confirmed AS events (CDC42, CLK1, EWSR1, FER, HDAC9, LRRFIP1, OGDH, VCL (exon 10-2), and WNK2) were predicted to introduce a premature stop-codon into the transcript, causing either protein truncation (>30% decrease in protein length) or absence of translation (e.g., nonsense-mediated decay)
Since these splicing events are predicted to significantly reduce protein size and domain/motif composition, there is a much higher likelihood that these changes would disrupt protein function or prevent protein translation. For example, in cardiomyocytes, the large upregulation (∼8-fold by AltAnalyze) of a cassette exon in the histone deacetylase HDAC9 protein is predicted to truncate the reference isoform from 1066 to 21 aa. HDAC9 typically represses expression of myocyte enhancer factor 2 (MEF2), a potent cardiac inducing transcription factor
In addition to protein truncation, removal of protein sequences was also predicted to disrupt domains and motifs in 10 of the confirmed AS events (ABI2, ANXA7, ASPH, ATP2A2, KIF13A, NEDD4, PCBP4, SPTBN1, STAU1, and UBE4B). In CPs, these predictions include the disruption of the C2 calcium-dependent membrane targeting domain in the NEDD4 protein with exclusion of a 72-aa block of exons; intron retention in the PCBP4 gene, which produces a shorter N-terminus that disrupts a KH domain; and the disruption of a phosphopantetheine attachment site in the UBE4B protein with the insertion of a cassette exon encoding 129 aa. In hESCs, the disruption of presumptive domains was observed with the exclusion of 61-aa-encoding exon in the ABI2 protein that eliminated the predicted presence of a neutrophil cytosol factor domain; and removal of the first 9 aa from the double-stranded DNA binding domain in the STAU1 gene.
Since these domains are crucial for the annotated functions of these genes, the predicted sequence loss or disruption could affect their function considerably. An example is PCBP4, an RNA-binding protein and regulator of apoptosis characterized by presence of a KH domain. PCBP4 with an intact KH domain can suppress cell proliferation by inducing apoptosis, but is largely absent in hESCs. Since PCBP4 has a common CP-NP differentiation-splicing pattern, AS of this gene may be important in maintaining pluripotency in hESCs.
Two other, genes aspartyl beta-hydroxylase (ASPH) and spectrin, beta, non-erythrocytic 1 (SPTBN1) both had prior evidence of functionally distinct splice variants, linked in this case to the regulation of cardiac physiology. In the case of ASPH, the cardiac-enriched form specifically complexes with cardiac contractile components (calsequenstrin, triadin, and the ryanodine receptor)
Unlike the disruption of a critical protein domain, the functional impact on a domain with an altered sequence is less clear. As shown in the case of PKM2, mutual-exclusive splicing can alter the presence of functionally important protein residues without significant changes in overall protein sequence. This was also the case for the E2A immunoglobulin enhancer-binding factor TCF3 and for the serine/threonine and protein-tyrosine kinase FYN, in which a DNA-binding or kinase domain is specifically altered by the mutually exclusive exchange of a cassette exon of similar lengths, respectively. Interestingly, the mutually exclusive isoforms of TCF3 and the FYN have different biochemical properties
Although some confirmed AS events significantly changed a domain sequence, the domain was still called present in both alternative isoforms. This was the case for nine genes with confirmed alternative exons (ADD3, CAPZB, DNM1L, HISPPD2A, MARK3, SLK, TJP1, VCL, and VPS39). Specific examples include the removal of 32 aa in the C-terminal aldehyde ferredoxin oxidoreductase domain of the ADD3 protein, insertion of 13 aa into the dynamin GTPase region of DNM1L, modification of the C-terminal end of the F-actin capping beta subunit region of CAPZB, and removal of 11 aa from the N-terminal Citron homology domain (CNH) of VPS39. In each case, except VSP39, altering the sequence has unknown consequences for protein function. VPS39 is a putative adaptor protein that displays downregulation of a cassette exon in hESCs relative to CPs. The CNH domain in this protein is required for the clustering and fusion of late endosomes and lysosomes
At least two confirmed AS events (NUMB and MADD) had no difference in domain-level predictions, but did have known functional isoform differences associated with the AS events
A number of recent studies demonstrated a critical connection between miRNA expression and the maintenance of pluripotency or the differentiation of cardiac cells from hESCs. In our exon-array gene expression analysis, genes for 26 miRNAs were up- or downregulated during differentiation to CPs and NPs, including previously implicated pluripotency (mir-302a, 302b)
(A) Expression profiles of two previously characterized miRNAs, mir-302a and mir-133-1, from combined tissue/cell-line gene expression data. (B) RT-PCR isoform expression of genes with putative miRNA binding sites within the regulated probe set. The presence of one or more putative miRNAs is indicated by the notation miR. (C–E) The 3′ region of genes corresponding to three genes are shown, where the regulated isoforms are displayed from the UCSC genome browser along with regulated probe sets and putative miRNA binding site locations. Exons are indicated by thin boxes, UTR regions by thinner boxes and introns by a line with overlapping arrows. Each gene (MAFB, SEPT6, and CDC42) represents distinct possible modes of exon regulation that lead to altered miRNA binding site inclusion: shorter 3′UTR, alternate cassette exon inclusion, and alternate C-terminal exon. Both MAFB and SETP6 are on the reverse genomic strand, where orientation is 3′ to 5′. The term ”multiple algorithms” indicates that two or more miRNA binding site prediction algorithms (PicTar, miRanda, miRbase or TargetScan) predicted a binding site in aligning probe sets.
Although much effort has been devoted to defining the expression patterns and novel targets of miRNAs, little is known about the role of AS in miRNA binding site inclusion in processed mRNA transcripts. Traditional gene expression microarrays focus on the coding regions of transcripts and ignore the noncoding exons, which can be alternatively spliced to produce different C-terminal exons or 3′UTRs of different lengths. However, exon-tiling arrays allow us to assess mRNA features in tandem with existing predictions for miRNA binding site position on a global basis.
Our analysis identified 264 genes with putative miRNA binding sites that overlap with alternative exons, including those undergoing AS. We tested 10 of these alternative exons by RT-PCR and confirmed nine, including the SPTBN1 and ASPH variants described earlier. Putative miRNA binding sites were included or excluded as a result of alternative cassette exons (ASPH, SEPT6), alternative C-terminal exons (CDC42, C6orf134), bleeding exons (SPTBN1), intron retention (ATP2A2, DERP6), or 3′UTRs with a longer or shorter sequence (LEFTY1, MAFB) (
Examination of miRNAs with previously established hESC or cardiac differentiation expression patterns identified binding sites for mir-302a, 302c (ESC), and mir-26a (cardiac) in the alternative bleeding exon of SPTBN1 and the afore mentioned binding sites in the 3′UTR of ATP2A2. These data suggests a new, largely AS-dependent mechanism for miRNA regulation of such genes. Since miRNAs can promote and inhibit the translation of targets dependent on cell cycle stage
Using high-density expression profiling, a new method for isolating cardiomyocytes, and novel bioinformatics methods (AltAnalyze), we characterized AS along distinct developmental pathways that influence both pluripotency and the commitment to cardiac and neural lineages. In addition to new insights into these processes, these results offer novel targets for driving the expression of pluripotent cells to distinct lineages and inducing pluripotency from adult cells at the level of specific splice isoforms.
We identified genes that undergo AS during differentiation and observed several global trends which suggest that functional elements, such as protein domains and miRNA binding sites, are coordinately regulated by AS. Many alternative exons highlighted in our analysis were predicted to disrupt or modify functionally important sequences, such as DNA-binding and protein kinase domains that are likely impact protein function. Several of our domain-level predictions also correlated with known changes in protein isoform function or expression as a result of AS
We identified and confirmed many splicing events that occurred along pathways of apoptosis and proliferation. Two genes confirmed by RT-PCR encode the apoptosis activators PCBP4 and MADD. Isoforms for both genes that induce apoptosis, were downregulated in hESCs but not CPs. Conversely, the proliferation-promoting isoform of NUMB is expressed in hESCs but is undetectable in CPs, while the anti-proliferation isoform is upregulated in CPs, as shown by RT-PCR. These results suggest the intriguing possibility that splicing may coordinately alter the functional repertoire of distinct members of the same pathway to elicit a biological effect, in this case, self-renewal in hESCs. We also observed AS of the apoptotic regulators CSDE1 and UBE4B along with previously demonstrated tumor suppressor genes ANXA7
Although only one confirmed CP-specific AS event (ASPH) was clearly linked to the regulation of cardiac physiology, several other novel CP-specific AS events were predicted to alter the composition of critical protein domains (CAPZB, UBE4B, HIF3A, HDAC9). One of the most intriguing was AS of the cardiac inhibitor HDAC9, which results in a highly truncated or nonexpressed form specifically in CPs. These data further support a role for AS in the direct specification of cardiac precursors.
Finally, analysis of the overlap between predicted miRNA binding sites and alternative exons revealed a potential mechanism by which specific cell types may regulate miRNA activity independently of miRNA expression. Such regulation involves AS of exons and differential expression of distal terminal exons, where the mechanism regulating exon length is unclear. Two recent analyses have further demonstrated the interaction between miRNAs and alternatively spliced isoforms
While we present several new analyses in this study, it will be essential to experimentally validate these protein and miRNA-level predictions. Additional computational analyses, such as comparative genomic sequence analysis, will also be important for delineating common and distinct cis-regulatory sequences that may regulate cardiac and neuronal splicing. Further refinement of our algorithm to decrease false negatives, similar to other approaches
Segregation of transcriptional profiles by comparison of neural and cardiac differentiation. Patterns of gene expression are shown for two analyzed pattern groups, (A) common to neural and cardiac differentiation or (B) specific to CPs. Adjacent to each heatmap are the top-ranked genes based on ANOVA p values for each specific pattern; genes highlighted in blue are associated with ESCs or self-renewal, and genes in red with cardiac-specification. Gene Ontology (GO) terms and pathways enriched in the (C) common or (D) cardiac-specific differentiation pattern groups are displayed as compared to the number of associated gene changes in each of the two pattern groups. Asterisks indicate significant GO-Elite scores (permute p<0.01) in the alternate pattern group.
(0.55 MB EPS)
The first column in each gel is for RNA from REX+ hESCs and the second is CPs. The numbers listed under these columns are the predicted amplicon lengths for those reactions. Left adjacent tick marks indicate predicted amplicon positions. Mx-mx = mutual-exclusive splicing, bleeding = exon bleeding, miR = miRNA binding site (predicted), ex = exon exclusion isoform, in = exon exclusion isoform, cs = constitutive mRNA region.
(3.97 MB EPS)
Primer sequences for confirmed and non-confirmed AS events.
(0.02 MB PDF)
Supplemental methods file. Includes detailed descriptions of algorithms, expression filtering and database architecture of AltAnalyze.
(0.35 MB DOC)
Analysis of sensitivity and specificity of AltAnalyze predictions with a publicly available alternative splicing validation dataset (PTB knockdown).
(0.07 MB DOC)
Gene expression results from the human exon array analysis for all conditions examined. Gene annotations, statistics, ANOVA patterns and log2 expression values provided for all Ensembl genes.
(9.48 MB ZIP)
Alternative exon results for CP differentiation. Multiple spreadsheets are included. Complete probeset- and gene-level results along with microRNA binding site and protein domain/motif over-representation analysis results from AltAnalyze are provided. Additional ANOVA pattern, splicing calls and cross-tissue comparison information is included.
(5.20 MB ZIP)
Alternative exon results for NP differentiation. Multiple spreadsheets are included. Complete probeset- and gene-level results along with microRNA binding site and protein domain/motif over-representation analysis results from AltAnalyze are provided.
(5.18 MB XLS)
Gene Ontology and pathway over-representation analyses. The top-scoring terms from GO-Elite are provided for all comparisons.
(0.17 MB XLS)
We thank Stephen Ordway and Gary Howard for editorial review of the manuscript and Chris Barker from the Gladstone Genomics Core. In loving memory of Karen Vranizan and Milo Salomonis.