Conceived and designed the experiments: CNH GM AR. Performed the experiments: CNH. Analyzed the data: CNH GC. Contributed reagents/materials/analysis tools: GC MTZ CF SB GM AR. Wrote the paper: CNH GC SB GM AR.
¶ The laboratories of SB, GM and AR contributed equally to this work.
The authors have declared that no competing interests exist.
The genetic dissection of the phenotypes associated with Williams-Beuren Syndrome (WBS) is advancing thanks to the study of individuals carrying typical or atypical structural rearrangements, as well as
A fundamental question in current biomedical research is to establish a link between genomic variation and phenotypic differences, which encompasses both the seemingly neutral diversity, as well as the pathological variation that causes or predisposes to disease. Once the primary genetic cause(s) of a disease or phenotype has been identified, we need to understand the biochemical consequences of such variants that eventually lead to increased disease risk. Such phenotypic effects of genetic differences are supposedly brought about by changes in expression levels, either of the genes affected by the genetic change or indirectly through position effects. Thus, transcriptome analyses seem appropriate proxies to study the consequences of structural variation, such as the 7q11.23 deletion present in individuals with Williams-Beuren syndrome (WBS). Here, we present an approach that takes experimental data into account instead of relying solely on functional annotation, following the rationale that coherently regulated genes are likely to play a role in the same biological process. While our algorithm can be applied to expression data from any source, our study provides a resource for the identification of additional candidate genes and pathways to explain the WBS phenotype, as well as a basis for uncovering novel functional interactions between sets of genes.
Williams-Beuren Syndrome (WBS; OMIM #194050) is a
The WBS is associated with a microdeletion within the 7q11.23 chromosomal band, which encompasses 28 genes
While the primary cause of WBS is well-understood, we still know little about the molecular basis of the phenotype. Only very recently, strains of mice were engineered to carry complementary half-deletions of the region syntenic to the WBS region, which replicate several features of WBS, including abnormal social interaction phenotypes
We showed in previous work that the vast majority of the genes hemizygous due to the 7q11.23 deletion are underexpressed in lymphoblastoid cell lines and fibroblasts derived from patients
To assess the effect of the WBS microdeletion on genome-wide expression, we first profiled the transcriptome of primary skin fibroblasts of eight WBS patients and nine sex- and age-matched control individuals using Affymetrix expression arrays (see
Genes are ordered according to their chromosomal position. Shaded areas represent the LCRs flanking the deletion. Gene names are indicated at the bottom and corresponding differential expression
We used these 868 differentially expressed genes (DEG) to perform gene enrichment analyses. A hypergeometric test on Gene Ontology (GO) categories uncovered a significant overrepresentation of extracellular matrix genes (
GO ID | BH-adjusted P-value | Direction | Odds Ratio | Expected Count | Count | Category Size | Term |
GO:0005576 | 2.64E-06 | ind/sup | 2.12 | 48.61 | 89 | 592 | |
GO:0031226 | 2.29E-05 | ind/sup | 2.26 | 32.19 | 63 | 392 | |
GO:0031012 | 3.59E-05 | ind/sup | 3.25 | 11.58 | 31 | 141 | |
GO:0005887 | 3.70E-05 | ind/sup | 2.2 | 31.78 | 61 | 387 | |
GO:0005578 | 6.27E-05 | ind/sup | 3.21 | 10.92 | 29 | 133 | |
GO:0044421 | 1.01E-04 | ind/sup | 2.32 | 23.73 | 48 | 289 | |
GO:0044459 | 3.60E-04 | ind/sup | 1.75 | 57.32 | 90 | 698 | |
GO:0005886 | 1.59E-03 | ind/sup | 1.52 | 101.58 | 139 | 1237 | |
GO:0042612 | 2.17E-03 | ind | 11.28 | 1.15 | 7 | 14 | |
GO:0005581 | 8.08E-03 | ind/sup | 6.45 | 1.81 | 8 | 22 | |
GO:0042611 | 8.08E-03 | ind | 7.9 | 1.4 | 7 | 17 | |
GO:0044420 | 1.27E-02 | ind/sup | 3.51 | 4.52 | 13 | 55 | |
GO:0032393 | 1.82E-02 | ind | 22.11 | 0.75 | 6 | 9 | |
GO:0005201 | 1.82E-02 | ind/sup | 5.55 | 2.76 | 11 | 33 | |
GO:0045211 | 1.87E-02 | ind/sup | 4.62 | 2.55 | 9 | 31 | |
GO:0002474 | 2.00E-02 | ind | 10.86 | 1.36 | 8 | 16 | |
GO:0048002 | 2.00E-02 | ind | 10.86 | 1.36 | 8 | 16 | |
GO:0004888 | 3.74E-02 | ind/sup | 2.16 | 17.64 | 34 | 211 |
Instead of considering the expression levels of single genes, a more robust approach is to work with gene sets. One such method is gene set enrichment analysis
In our first modular study (to which we refer as
To test whether some of the identified modules are differentially expressed in WBS patients compared to controls we calculated the weighted average expression of the genes of each module, using the ISA gene scores as weights. This was done separately for each WBS and control sample, after which the two groups were compared using a
GO ID | BH-adjusted p-value | Count | Category size | Best module (size) | GO term |
GO:0005576 | 2.92E-07 | 44 | 445 | 958 (294) | |
GO:0031012 | 5.20E-05 | 18 | 114 | 957 (323) | |
GO:0045449 | 1.23E-04 | 90 | 1084 | 1012 (542) | |
GO:0010468 | 1.23E-04 | 97 | 1213 | 1012 (542) | |
GO:0005125 | 1.25E-04 | 7 | 34 | 349 (75) | |
GO:0003677 | 1.25E-04 | 85 | 971 | 1012 (542) | |
GO:0032501 | 1.29E-04 | 30 | 1167 | 349 (75) | |
GO:0009887 | 1.29E-04 | 14 | 224 | 349 (75) | |
GO:0002376 | 1.58E-04 | 16 | 327 | 349 (75) | |
GO:0042127 | 2.17E-04 | 15 | 291 | 349 (75) | |
GO:0005057 | 3.02E-04 | 8 | 88 | 753 (120) | |
GO:0009611 | 3.88E-04 | 11 | 156 | 349 (75) | |
GO:0042612 | 4.51E-04 | 6 | 13 | 1037 (341) | |
GO:0008283 | 5.55E-04 | 17 | 437 | 349 (75) | |
GO:0006954 | 6.16E-04 | 10 | 98 | 747 (151) | |
GO:0009605 | 6.69E-04 | 13 | 252 | 349 (75) | |
GO:0007165 | 8.14E-04 | 30 | 1342 | 349 (75) |
Next, we searched specifically for coherent perturbations in gene expression driven by the WBS deletion. To this end, we performed a second modular study (to which we refer as
This module contains 149 genes (one per line) and 9 samples (columns). Seven samples are from WBS patients (denoted with “W”), C-5290 is a control sample from our dataset, while HPGS-9 belongs to a publicly available dataset. Gene scores are plotted on the left and sample scores at the top. The 59 genes with positive gene scores (bottom lines) are downregulated (green) in the seven WBS samples and upregulated (red) in the other two. The remaining 90 genes show the opposite pattern: they are upregulated in the WBS samples and downregulated in the remaining two samples. Hemizygous gene names are emphasized in red and the names of genes mapping to HSA7 in boldface. Red asterisks indicate genes belonging to the GO category “extracellular region” while black asterisks denote genes from the “intrinsic to membrane” category.
Several smaller modules are included completely in other larger ones, forming a hierarchical structure. We organized the 72 and 23 dysregulated modules identified in M1 and M2, respectively, into a directed graph based on their subset relationships, i.e. two modules are connected by a directed edge, if all the genes in the first module are included in the second (see
Directed edges indicate direct subset relationships, and they always point upwards. The number of genes in a module is shown at the top left corner of the module box. Modules annotated with a red star on their top right corner contain at least one hemizygous (or flanking) gene; the ones with green stars on their bottom right corner were replicated in lymphoblastoid cell lines; blue stars on the bottom left corner indicate modules that show significant enrichment for extracellular region genes. An interactive version of this figure is available in the online supporting material at
We found that the dysregulated M1 modules include only two hemizygous genes (i.e.
ID | BH-adjusted p-value | Count | Category size | Best module (size) | Term/name |
GO:0005576 | 1.78E-12 | 73 | 700 | 991 (373) | |
GO:0006955 | 1.62E-06 | 17 | 295 | 503 (73) | |
GO:0031224 | 2.09E-05 | 89 | 2051 | 806 (239) | |
GO:0009605 | 2.78E-05 | 17 | 370 | 503 (73) | |
GO:0005102 | 1.02E-04 | 16 | 408 | 503 (73) | |
GO:0007165 | 3.95E-04 | 35 | 1800 | 503 (73) | |
GO:0042824 | 4.27E-04 | 4 | 6 | 702 (163) | |
GO:0007154 | 7.32E-04 | 36 | 1947 | 503 (73) | |
GO:0008083 | 8.75E-04 | 8 | 94 | 503 (73) | |
GO:0042330 | 2.88E-03 | 7 | 69 | 503 (73) | |
GO:0009887 | 2.88E-03 | 13 | 312 | 503 (73) | |
GO:0007626 | 2.98E-03 | 8 | 101 | 503 (73) | |
GO:0005578 | 3.21E-03 | 18 | 156 | 991 (373) | |
GO:0001525 | 3.47E-03 | 8 | 104 | 503 (73) | |
KEGG: 4060 | 7.51E-04 | 10 | 122 | 503 (73) |
The severity of a phenotype correlates with the connectivity and thus centrality of the associated gene within the functional network
Only genes that appear at least ten times in the dysregulated modules are considered. (
Interestingly, the function of some of these frequently occurring genes may be relevant to the pathophysiology of some WBS features, such as metabolic phenotypes (
Gene expression in fibroblasts can only provide a partial picture of the gene dysregulation that gives rise to the WBS clinical phenotypes. Thus, data from other cell types or tissues may provide additional clues as to dysregulated pathways, as well as confirm some of our findings in fibroblasts. Indeed, comparison with the recently published transcriptome of lymphoblastoid, i.e. EBV-transformed, cell lines from WBS patients
Out of the 72 M1 modules the average gene expression of which is altered in WBS fibroblasts, seven are also changed in the lymphoblastoid cell lines; four modules are altered in the same direction, three modules are opposite in the two studies. Moreover, 19 of the 23 dysregulated M2 modules are also perturbed in the lymphoblastoid samples, 18 in the same direction (
We have profiled the transcriptomes of skin fibroblasts from eight WBS patients and nine sex- and age-matched control individuals, and identified a number of transcription modules dysregulated in WBS patient cells. One caveat of this study lies in the use of isolated cells
Both our single-gene and modular analyses provide a resource to enable a deeper exploration of the pathophysiology of WBS, which may lead to the discovery of potential novel functional interactions between their products. Our study further exemplifies how integration of transcription data unrelated to the studied condition can be used to complement annotation-dependent analyses. Indeed, the modular approach reduces the complexity of the expression data, allowing a more targeted assignment of functional categories to specific sets of co-regulated genes. Consistently, Turcan
We have obtained the approval of the ethics committees of the University of Lausanne (reference number Protocol 123/06) and of the “Hospices Civils de Lyon” for this project. All patients provided written informed consent for the collection of samples and subsequent analysis.
Skin fibroblasts of 8 classical WBS and 9 control Caucasian female individuals aged between 3 and 8 years (see
Human skin fibroblasts were grown in HAM F-10, supplemented with 10% fetal bovine serum and 1% antibiotics (all Invitrogen). Total RNA was prepared using TriZOL Reagent (Invitrogen) and RNeasy Mini Columns (Qiagen) according to the manufacturers' instructions. The quality of all RNAs was assessed using an Agilent 2100 Bioanalyzer (Agilent Technologies) and used as a template for complementary DNA (cDNA) synthesis and biotinylated antisense cRNA preparation. The synthesis of cDNA and cRNA, labeling, hybridization and scanning of the samples were performed as described by Affymetrix (
The data of the 17 expression arrays produced for this report have been deposited in NCBIs Gene Expression Omnibus (GEO,
Expression data analyses were performed using GNU R (version 2.9.2)
A transcription module comprises a subset of genes that are co-expressed in a subset of conditions
In the first ISA run, we used skin fibroblast samples from seven experiments from public repositories, as well as collaborators of the AnEUploidy consortium (the latter can be obtained by contacting the consortium at
We applied the ComBat batch correction algorithm
For the identification of the dysregulated modules, we used the GCRMA normalized WBS data set. Probesets that were called “Present” in less than six samples were omitted from the analysis. We only considered the 7,447 probesets that were included both in this filtered WBS data set and the modular study.
732 modules that contained at least ten genes were tested for dysregulation. For the dysregulation test we standardized the WBS expression data for every gene separately. Standardization is an important step, since the test for dysregulation involves the average expression of the module genes. Specifically, to test a module, we calculated the weighted average expression of its genes, separately for each WBS sample. The weights were defined by the gene scores of the module. Then a
To check the significance of finding 72 dysregulated modules, we permuted the WBS case/control labels 1,000 times and tested for dysregulation as before. These permutations serve as a null-model to estimate how many dysregulated modules could have resulted by chance. Only 14 permutations yielded at least one dysregulated module. Within these 14 cases, the mean number of dysregulated modules was 12.1, the median 1.5. The highest number of dysregulated modules found for a permutation was 58. We note that the three permutations that yielded multiple (false positive) WBS dysregulated modules had almost correct WBS case/control labels: only one pair was swapped.
Hypergeometric tests were used to calculate the functional enrichment of the 72 dysregulated modules, with Benjamini-Hochberg correction for the number of categories and the number of modules tested. The significance threshold was chosen as 0.05.
The second modular study (
We used version 8.3 of the STRING database to interrogate the genes that frequently appear in the dysregulated modules. All network measures were calculated using the igraph R package
The enrichment calculations for the extracellular region genes (
To identify genes commonly dysregulated in cells from WBS patients identified in this study and in
The modules and related details are available at
The expression array annotation data were taken from the hgu133a2.db (version 2.2.11) and hgu133plus2.db (version 2.2.11) Bioconductor packages. The GO.db package (version 2.2.11) was used for the Gene Ontology and the KEGG.db package (version 2.2.11) for the KEGG pathway data.
Software packages are listed in
Over- and under-representation of GO biological process and molecular function terms among “extracellular compartment” annotated genes of the DEG list and each set of dysregulated modules. Dark coloured bars denote significant enrichment/depletion.
(2.61 MB EPS)
(A) Relationship between the number of times genes appear in transcription modules (M1, M2, or their union) and their number of connections in the STRING database. First row: genes were binned according to their frequency in modules, and the mean STRING degree of each bin is plotted. The line is the fit from the linear regression of STRING degree on frequency, the slope is always significant with a p-value less than 10−9. Second row: the mean (black) and median (blue) degree is plotted for the genes that appear at least a given number of times in the modules. In other words, the first point is the mean/median degree of all genes, the second data point is the mean/median degree of all genes that appear at least once in a module, etc. There is a clear correlation between the frequency in the modules and STRING degree. (B) Relationship between the number of times genes appear in modules and their PageRank centrality in the STRING network. The plots are essentially the same as in (A), but the PageRank centrality is plotted instead of degree. There is a clear correlation between the frequency in the modules and the centrality of the genes in the STRING network.
(1.14 MB EPS)
Cell line information.
(0.02 MB XLS)
Differentially expressed genes in WBS samples compared to controls.
(0.23 MB XLS)
Differential expression of the WBS hemizygous and flanking genes.
(0.02 MB XLS)
Datasets used for modular analysis.
(0.02 MB XLS)
Dysregulated modules, M1.
(0.04 MB XLS)
GO/KEGG term enrichment in dysregulated M1 modules.
(0.07 MB XLS)
Dysregulated modules, M2.
(0.03 MB XLS)
GO/KEGG term enrichment in dysregulated M2 modules.
(0.05 MB XLS)
Most frequently occurring genes among dysregulated M1 and M2 modules.
(0.08 MB XLS)
GO/KEGG term enrichment among genes common to both sets of dysregulated modules.
(0.04 MB XLS)
Dysregulated single genes and modules common to fibroblasts and lymphoblastoid cell lines.
(0.03 MB XLS)
Software packages used for the analysis.
(0.03 MB XLS)
We thank the members of the “Frontiers in Genetics” Genomics Platform in Geneva for technical assistance, and Samuel Deutsch, Stylianos E. Antonarakis, Anna Antonell, Luis A. Pérez-Jurado and the members of the anEUploidy consortium (