Conceived and designed the experiments: NJH AR BPD. Analyzed the data: NJH AR BPD. Wrote the paper: NJH AR BPD.
The authors have declared that no competing interests exist.
Transcription factor (TF) regulation is often post-translational. TF modifications such as reversible phosphorylation and missense mutations, which can act independent of TF expression level, are overlooked by differential expression analysis. Using bovine Piedmontese myostatin mutants as proof-of-concept, we propose a new algorithm that correctly identifies the gene containing the causal mutation from microarray data alone. The myostatin mutation releases the brakes on Piedmontese muscle growth by translating a dysfunctional protein. Compared to a less muscular non-mutant breed we find that myostatin is not differentially expressed at any of ten developmental time points. Despite this challenge, the algorithm identifies the myostatin ‘smoking gun’ through a coordinated, simultaneous, weighted integration of three sources of microarray information: transcript abundance, differential expression, and differential wiring. By asking the novel question “which regulator is cumulatively most differentially wired to the abundant most differentially expressed genes?” it yields the correct answer, “myostatin”. Our new approach identifies causal regulatory changes by globally contrasting co-expression network dynamics. The entirely data-driven ‘weighting’ procedure emphasises regulatory movement relative to the phenotypically relevant part of the network. In contrast to other published methods that compare co-expression networks, significance testing is not used to eliminate connections.
Evolution, development, and cancer are governed by regulatory circuits where the central nodes are transcription factors. Consequently, there is great interest in methods that can identify the causal mutation/perturbation responsible for any circuit rewiring. The most widely available high-throughput technology, the microarray, assays the transcriptome. However, many regulatory perturbations are post-transcriptional. This means that they are overlooked by traditional differential gene expression analysis. We hypothesised that by viewing biological systems as networks one could identify causal mutations and perturbations by examining those regulators whose position in the network changes the most. Using muscular myostatin mutant cattle as a proof-of-concept, we propose an analysis that succeeds based solely on microarray expression data from just 27 animals. Our analysis differs from competing network approaches in that we do not use significance testing to eliminate connections. All connections are contrasted, no matter how weak. Further, the identity of target genes is maintained throughout the analysis. Finally, the analysis is ‘weighted’ such that movement relative to the phenotypically most relevant part of the network is emphasised. By identifying the question to which myostatin is the answer, we present a comparison of network connectivity that is potentially generalisable.
Evolution, normal development, immune responses and aberrant processes such as
diseases and cancer all involve at least some rewiring of regulatory circuits
Identifying regulatory change solely through contrasts in gene expression data has
been elusive because TF tend to be stably expressed at baseline levels
We hypothesised that a system-wide network approach might have utility, on the
grounds that while a differentially-regulated TF might not be DE between two
systems, its new position in the network of the perturbed system might allow
detection of the ‘smoking gun.’ To allow reliable evaluation of
such a hypothesis a well-defined experimental model system is required. Piedmontese
cattle are double-muscled because they possess a genomic DNA mutation in the
myostatin (GDF8) mRNA transcript
Thus we have a system in which we know the identity of the gene containing the causal mutation, myostatin (MSTN), but we cannot identify it by DE of the mRNA in muscle samples. By contrasting the muscle transcriptomes of the Piedmontese and Wagyu crosses across 10 developmental time points, our aim was to establish the question to which myostatin is the answer. In other words, what question do we need to ask of the gene expression data for it to reveal the identity of the transcriptional regulator containing the causal mutation?
We found that 11,057 genes gave valid expression signal: noise data across the 10
developmental time points for the 2 crosses (
Within each cross symbols with the same number indicate samples derived from the same individual animal.
Genes expressed more highly in the Wagyu cross are on the bottom, and genes expressed more highly in the Piedmontese cross are on the top. Regulators are denoted by triangles. MYL2 (slow twitch muscle structural protein) is the most differentially expressed gene. CSRP3 is the most differentially expressed regulator. Myostatin is neither abundant nor differentially expressed.
Next, we examined the difference in the specific behaviour or co-expression of
targeted pairs of genes between the two crosses, by subtracting the correlation
coefficient in Wagyu from that in Piedmontese. This approach has a very recent
precedent
In circumstances where we do analyse changes in total numbers of
‘significant’ connections, we elected to use the term
differential hubbing (DH) on the grounds that the total number of connections
determines the extent to which a gene can be considered a hub. The PCIT
algorithm was used to establish significance in these cases
Our Nomenclature | Existing Literature Nomenclature | Formal Definition | Purpose and Further Notes |
Differential Expression (DE) | Differential Expression (DE) | The difference between the expression level of a given gene in state 1 minus its expression in state 2 | Compares the transcriptional status of a given gene to itself in two states. In longitudinal experiments where DE is averaged across the developmental time points, it will yield a conservative measure of true DE. |
Differential Hubbing (DH) | Differential Connectivity (DC) | The difference in the number of significant connections a gene has in two different states e.g. a gene that has 5 significant connections in state 1 and 3 significant connections in state 2 yields a DH of 5−3 = 2 | In order to compute differences in the number of significant connections, one first computes which of the co-expression arrangements are significant in the two states. Typically, most connections will be deemed non-significant. The difference between the two states can be computed by subtracting the significant connections a gene has in state 2 from state 1. In the present data this approach fails to identify myostatin as being differentially behaved in Piedmontese versus Wagyu muscle. |
Differential Wiring (DW) | Differentially Correlated |
The difference in co-expression between a specified pair of genes in two different states. For example GDF8 and MYL2 have a co-expression of +0.761 in the Piedmontese and −0.342 in the Wagyu giving a DW of + 0.761 - - 0.342 = 1.103 | This approach forms the basis of our RIF analysis (in conjunction with PIF, see below). In contrast to conventional analyses, no significance testing is used to establish connections. |
Phenotypic Impact Factor (PIF) | None, no precedent for the method | The average expression (state 1 and state 2 combined) multiplied by the DE (see above for definition), computed for all DE genes. | A mathematical abstraction quantifying the contribution the various DE genes make to the difference in the molecular anatomy of the two systems. Abundant highly DE genes are emphasised. In the present dataset this enriches for slow muscle structural proteins, correctly reflecting the fibre type shift observed at the gross anatomical level. |
Regulatory Impact Factor (RIF) | None, no precedent for the method | The cumulative DW of each regulator relative to the target DE genes, weighted for PIF. | Regulators that are highly DW to the high PIF (i.e., abundant highly DE genes) score highly. In our data, the regulator awarded the highest RIF was myostatin, the causal Piedmontese mutation. |
The most DE gene in our dataset is MYL2, and myostatin is the third most DW
regulator to it, with a value of 1.103. The derivation of DW is illustrated for
the myostatin-MYL2 connection in
Myostatin is not differentially expressed, but it is highly differentially wired to the highly DE MYL2.
In an attempt to assess the importance of each DE gene to the change in
phenotype, we propose a new metric: the “phenotypic impact factor
(PIF).” PIF is a mathematical abstraction designed to
‘weight’ for the contribution the various DE genes make to
the difference in the molecular anatomy of the two systems, based purely on
their numerical properties. The values were generated by combining the amount of
DE between the crosses, coupled with the average abundance calculated for both
crosses at all time points for each of the 85 DE genes. Abundant transcripts
that were highly DE scored highly, whereas scarce transcripts that were only
slightly DE scored poorly. The high phenotypic impact genes enriched for slow
twitch muscle structural genes (MYL2, MYL3, TNNT1, MYH7, ACTN2 and MYOZ2)
correctly highlighting the observed phenotype change between the breed crosses,
namely the gross muscle fibre transition. The coherence of the output is very
consistent with an expectation based on the observed gross anatomical fibre
change
We formalised this observation using the GOrilla tool
On the other hand, the PIF metric is not particularly well suited to regulators, although they were included in the analysis. Regulators are often stably expressed at close to baseline levels making detection of isolated changes in expression level challenging and possibly misleading. To account for this, we ascribed “regulatory impact factors” (RIFs) to each of the 920 regulators based on their cumulative, simultaneous, DW to the DE genes, accounting for the PIF of the DE genes. This metric was intended as a mathematical abstraction to represent the relative importance of the regulators in driving the phenotypically relevant part of the network described above, based on differences in their correlations.
Those regulators that were highly DW to many of the high PIF genes received
strong scores, whereas those that were DW to a few, low PIF genes scored poorly.
The red circles represent the co-expression relationships of myostatin to the 85 DE genes, with circle size corresponding to the PIF of the DE gene represented at that particular co-expression intersection (DW). Myostatin is highly DW (as represented by long perpendicular distances from the diagonal) to the highest PIF genes (largest red circles). This dynamic underpins myostatin's exceptional RIF. The density of all points is highest at the extreme co-expression range (i.e., +1, +1 and −1, −1) and lowest for a complete reversal (i.e., +1, −1 and −1, +1).
It is important to note that
We explored 2 alternative methods to compute RIF scores (Eq4 and Eq5
Regulator | DE | Gene Function | RIF | Rank Eq4 | Rank Eq5 |
MSTN | no | Causal mutation in double-muscled Piedmontese cattle, negative regulator of muscle mass. TGF-ß signalling. | 3.49 | 4 | 2 |
MEF2C | no | Muscle transcription factor | 3.21 | 37 | 1 |
SUV39H2 | no | Histone methyltransferase. Cooperates with SMADS to repress promoter activity. TGF-ß signalling. | 3.13 | 3 | 4 |
ACTL6B | no | Regulation of genes in the brain | 3.02 | 14 | 5 |
HNRNPD | −0.41 | Pre mRNA processing | 3.01 | 10 | 6 |
MYOD1 | −0.41 | Master regulator of muscle cell differentiation | 2.94 | 58 | 3 |
ATRX | no | Chromatin remodelling | 2.85 | 106 | 7 |
IRF9 | no | Interferon regulatory factor | 2.82 | 67 | 9 |
CCNK | no | Regulation of transcription | 2.79 | 160 | 8 |
HAT1 | no | Histone acetyl transferase | 2.79 | 13 | 11 |
In the absence of evidence favouring one approach over the other we decided to
follow the original thread of defining the question to which myostatin was the
answer. When we calculated the mean of the two different RIF values, myostatin
received the highest score out of the 920 regulators with a RIF of 3.49 (
The DEs of the 920 regulators are plotted against their respective RIFs (mean dot Eq4+Eq5). Myostatin, indicated by a red dot, is awarded the highest RIF despite not being DE.
Regulator | DE | Gene Function | RIF | Rank Eq4 | Rank Eq5 |
HOXB13 | no | Body patterning along main axis, suppressor of cell growth | −2.46 | 906 | 920 |
IFRD1 | no | Interferon-related development regulator 1 | −2.42 | 882 | 919 |
CDK7 | no | Link between regulation of transcription and cell cycle | −2.39 | 885 | 918 |
FOSL2 | no | Regulator of cell proliferation, differentiation and transformation | −2.30 | 898 | 917 |
MYT1 | no | Myelin transcription factor | −2.28 | 914 | 915 |
MAFK | no | Erythroid transcription factor | −2.28 | 870 | 916 |
PADI4 | no | Possible role in granulocyte and macrophage development | −2.12 | 853 | 914 |
LMCD1 | −0.34 | Negative regulator of muscle cell differentiation | −2.11 | 874 | 913 |
CTNND2 | no | Catenin delta 2 | −2.10 | 897 | 910 |
KLF15 | no | Kruppel-like factor 15 | −2.05 | 857 | 912 |
Regulator | DE | Gene Function | RIF | Rank Eq4 | Rank Eq5 |
HOXB6 | 0.37 | Regulation of development | −0.80 | 869 | 718 |
TFDP2 | 0.35 | Transcription factor, E2F dimerization partner 2 | −0.88 | 476 | 764 |
HOXB5 | 0.34 | Regulation of development | 0.13 | 286 | 352 |
BHLHB5 | 0.33 | Brain transcription factor | 2.03 | 91 | 40 |
FOXO1 | 0.32 | May play a role in myogenic growth and differentiation | −0.54 | 481 | 637 |
SCAND1 | 0.30 | Peroxisome proliferative activated receptor, gamma, coactivator 1, role in lipid metabolism | −0.62 | 664 | 666 |
BACH2 | 0.30 | B-cell leucine zipper transcription factor | 0.45 | 245 | 254 |
MLLT10 | 0.28 | Remodelling histones/nucleosomes | 2.40 | 101 | 17 |
FOXQ1 | 0.26 | TGFB2 pathway | −1.82 | 1 | 39 |
MAX | 0.26 | Role in cell proliferation and differentiation | −1.39 | 162 | 50 |
Regulator | DE | Gene Function | RIF | Rank Eq4 | Rank Eq5 |
CSRP3 | −0.83 | Positive regulator of myogenesis | −1.36 | 883 | 862 |
BTG2 | −0.63 | Cell cycle regulator, anti-proliferative | −0.93 | 701 | 795 |
ATF3 | −0.60 | Negative regulator of Toll-like receptor 4 | −1.52 | 860 | 881 |
ANKRD1 | −0.53 | Positive regulator of myogenesis | −1.04 | 429 | 817 |
CDK9 | −0.49 | Cell cycle regulator | 1.28 | 531 | 97 |
FOS | −0.44 | Cell differentiation and proliferation in bone, cartilage and blood TGF-ß signalling | −1.37 | 826 | 867 |
CILP | −0.44 | Negatively regulates TGF-ß signalling | 1.30 | 7 | 123 |
FST | −0.42 | Positive regulator of muscle mass TGF-ß signalling | −0.47 | 575 | 602 |
HOMER2 | −0.42 | Negative regulator T cell activation | −0.80 | 339 | 729 |
FRZB | −0.41 | Negative regulation of Wnt signalling | 2.56 | 826 | 907 |
To highlight which cluster of DE genes are being ‘perturbed’
by which cluster of regulators, the DW values for the 920 regulators (in rows)
and the 85 DE genes (in columns) (
In
We applied Permut Matrix's hierarchical clustering algorithm to both rows (920 regulators) and columns (85 DE genes). A subset of the full matrix including the high phenotypic impact slow twitch module (blue line) and the major high impact transcriptional regulator circuit (red line). The scale is −1.53 (bright green) to +1.53 (bright red), with 0 being black.
These biologically-sensible clusters imply that co-differential wiring can be
used as an explicit criterion to form an edge in a regulatory perturbation
network. We used a hard 0.9 threshold to establish network edges between those
regulators that were highly co-differentially wired to the 85 DE genes. We
visualised the deduced network in Cytoscape
Myostatin, MyoD1 and IFRD1 are highly co-differentially wired across the 85 DE genes (correlation coefficients >0.9 or <−0.9). Here their respective relationships are visualised against only those 18 DE genes that cluster into the slow twitch Permut module, but the relationship holds for all 85 DE genes.
However, positive correlation of DW of regulators does not necessarily imply
positive correlation, or indeed any significant correlation, of expression of
the regulators themselves and vice versa. In other words, neither the clustered
regulators on the y axis of the perturbation matrix nor the clustered DE genes
on the x axis are actually significantly co-expressed with each other in any
combination, based on a PCIT analysis (unpublished data). Furthermore,
Myostatin, MyoD1 and IFRD1 are not significantly co-expressed with any of the
other 11,057 genes in the system, let alone the subset in the matrix. The same
applies to ACTN2, MYH6, CSRP3, ANKRD1, MYL3 and MYOZ2 (unpublished data).
Rather, it is the coordinated manner in which two genes
We tested the distributional and numerical properties of RIF1 and RIF2 (Eq4 and Eq5) on a simulated data to assess the extent to which our real output could be ascribed to chance. The simulated data comprised 5,000 genes surveyed across 10 experimental conditions (in line with the 10 time points) in two treatments (in line with the two breed crosses). In accordance with the real data, expression values were simulated from a normal distribution with a mean of 8.6 and a standard deviation of 2.8 and truncated at 4 and 16. Also, for each gene, its expression profile across the two treatments was simulated to have a correlation of 0.95.
Simulations were performed under the null hypothesis of no differential expression between treatments, no correlation between genes across conditions, and no regulator-target relationships. Therefore, in these settings any observed association could be attributed to chance alone.
For the computations of RIF1 and RIF2, a random 920 genes were selected and
treated as potential regulators and their regulatory impact factor computed
against the 85 genes showing the most extreme measure of differential expression
across the two conditions. Based on this approach a simulated version of
We used the PCIT algorithm
While the extremes of the DH axis enriched for transcriptional regulators in general, myostatin is neither DH nor DE.
In the introduction we posed a computational challenge: identify the question in P×H versus W×H muscle development to which myostatin is the answer. The subsequent analysis suggests the following: “Which transcriptional regulator is cumulatively most differentially wired to the abundant most differentially expressed genes?” This question is clearly very different to the conventional “which transcriptional regulator is the most differentially expressed?” and unsurprisingly the latter gives quite different answers, including the notable failure to identify myostatin out of the 920 candidates.
This result suggests that traditional microarray approaches generating lists of
DE regulators may be committing type III statistical error, the error committed
when giving the right answers to the wrong questions
The positive identification of myostatin as the major regulatory perturbation in this specific set of experimental contrasts is noteworthy, despite the stated aims of the approach. The Piedmontese causative mutation exists at the first level of organisation (genomic DNA), and manifests its effect at the third (protein) and higher levels (phenotype). Despite this, we can identify it using only data at the second level of biological organisation – the transcriptome. In addition, all animals were Hereford hybrids so 50% of the protein translated by the P×H animals was as functional as the myostatin protein translated by the W×H; in line with this, the increase in muscle mass was correspondingly subtle (∼9%) (unpublished data).
The new algorithm works, in effect, by firstly establishing a Phenotypic Impact Factor (PIF) for each of the DE genes. Thus, genes that are both highly abundant and highly DE between the crosses derive a correspondingly high PIF, or discrimination factor. Taken together, this weighting provides an abstract molecular description of the phenotype perturbation specific to the treatments under consideration. In the P×H versus W×H comparison, the genes with the highest PIF (i.e., those that are abundant and highly DE) tend to be slow twitch muscle structural genes (MYL2, MYL3, TNNT1, MYH7, ACTN2 and MYOZ2). This correctly reflects the most pervasive phenotypic change in Piedmontese myostatin mutants (along with the increase in muscle mass) namely the gross fibre type transition. We therefore conclude that DE, in the context of transcript abundance, is a powerful measure of phenotypic / anatomical change (but not necessarily, as we have already argued, regulatory change).
RIF is based on the cumulative, simultaneous, differential wiring (DW) of each
regulator to the DE genes, ‘weighted’ for the PIF of each DE
gene. Satisfactorily, the regulator awarded the highest RIF by this approach is
myostatin, the gene that bears the known causal mutation (SNP) in Piedmontese
genomic DNA
The highest impact regulators are documented in
During the conceptual development of the algorithm we tried several permutations. The best performer, as described above and in the results section, incorporates the average abundance and differential expression of the DE genes (which tend not to be transcriptional regulators), and the cumulative DW of the regulators to those weighted DE genes. Surprisingly, inclusion of either the average abundance or DE of the regulators themselves actually impairs the ability of the algorithm to identify myostatin (data not shown).
While we assessed several versions of the algorithm, there is no evidence that the data has been over-fitted because (1) the model is relatively simple compared to the data it analyses, (2) like in any other expression experiment, only the normalized gene expression levels for each gene in each of the samples (or experimental conditions) are needed, (3) it is built on sound mathematical principles (mixed-ANOVA models and model-based clustering), and (4) those mathematical principles mesh well with our biological understanding of the behaviour of both structural proteins (where DE and abundance are always important) and transcriptional regulators (where DE and abundance are not necessarily important, but transcriptional connectivity is important) in a range of living systems.
The two versions of the algorithm provided (Eq4 and Eq5) are alternatives in the
sense that they are built on the same set of concepts. However, at this stage,
it is not clear whether one can be considered superior to the other.
Consequently, we have derived our impact factor discussion from the combined,
averaged output of both equations (
Our observations imply caution when assessing isolated DE lists of TF. That TF
can behave differently in two systems without being strongly DE, has been
discussed before
Assigning impact factors to the regulators (based on the behaviour of its co-expression with respect to the phenotypically most relevant part of the network) forms step 1 in a 2-step process, and it yields biologically valid results. The next step is to computationally wire up the high impact regulators into coherent transcriptional modules, whose coordinated behaviour drives the phenotype change. We attempted to do this by establishing relationships between regulators who were ‘similarly’ or co-ordinately differentially-wired between the two crosses. To our knowledge this is the first time co-differential wiring has been used for reverse-engineering regulatory circuitry. The resultant output captures the phenotypic and regulatory differences between the two crosses and so we view it as a ‘perturbation matrix.’
The building and clustering of the perturbation matrix satisfactorily resolves both axes into biologically sensible modules. For example the DE axis generates a very tight module of high phenotypic impact slow twitch muscle fibres (ACTN2, MYH7, TNNT1, MYL3 and MYOZ2). Equally, the regulator axis resolves a high impact regulatory module comprising myostatin and MYOD1, among others. Myostatin is embedded in the middle of this high impact module. We interpret these clusters of regulatory disturbance as representing ‘hot spots’ of circuit rewiring that account for the major phenotypic changes between the crosses.
The exceptionally tight coupling of myostatin and MYOD1 on the y axis is the
product of a near perfect matching of co-differential wiring across all 85 DE
genes (p = 0.917).
With specific regard to the myostatin and MyoD1 clustering, the high co-DW
congruence makes a clear prediction that the myostatin SNP in Piedmontese exerts
its effect on skeletal muscle via circuit rewiring with MyoD1. MyoD1 has not
only been shown to drive the expression of a set of genes necessary for fast
muscle differentiation
When we next used the co-DW patterns to generate edges in a network, myostatin
was linked to 2 other high impact regulators, MYOD1 and IFRD1. It is highly
noteworthy that IFRD1, which is required for myoblast differentiation, forms a
known, experimentally-verified regulatory circuit with MYOD1
The utility of this algorithm clearly relies on appropriate data selection.
Presumably, the microarray data must be assayed on the right tissue and at
biologically important times. However, the dataset that we analysed was not
designed to address the specific question of identifying the gene containing the
causal mutation, rather it was designed to study the impact of nutrition
restriction of the mother on the subsequent performance of the calves
Finally, during the development of the algorithm we initially attempted to
determine regulatory changes via a simpler version of connectivity, i.e.,
describing changes in the
While it was true that high DH (coupled with low DE) proved diagnostic of
regulators in general (
Our definition of RIF does not require computation of the number of connections of a given regulator in each of the two networks. Therefore, algorithms for network re-construction (weighted or otherwise) are of no relevance. Instead, the difference between the connection weight of a given gene with each of the DE genes, accounting for PIF, appears to be sufficient. In other words, RIF has a set of refinements which make it highly sensitive. These refinements include recognising the specific identity of target genes, recognising the possible importance of ‘weak’ edges that would be deemed non-significant by other methods and recognising the phenotypic importance of the target genes.
This principle is well illustrated by the DW of myostatin to MYL2. The
co-expression relationship significantly changes from +0.761 in the
P×H system to −0.342 in the W×H system. The
−0.342 Myostatin-MYL2 ‘edge’ in the Wagyu network
would be unequivocally discarded by all statistical methods as being
insignificant (whether by ARACNE, PCIT or some other approach) whilst the
+0.761 Myostatin-MYL2 ‘edge’ in the Piedmontese
would be borderline insignificant depending on the exact analysis used.
Therefore, comparisons between these arrangements (which underpin the success of
our present analysis) cannot be sensitively quantified by DH. Further, the fact
that MYL2 is highly abundant and highly DE (and therefore of great phenotypic
importance) would be overlooked by DH, unless the PIF metric was applied. It is
a telling observation that myostatin is neither DE nor DH (
We have argued that the algorithms success is built on controlling type III
error, i.e., it gives the right answer because it asks the right question. The
approach should be generalisable to other ‘omics data because its
mathematical approaches mesh well with the known biology of regulatory and
non-regulatory molecules. Unlike other causal mutation finding computational
approaches
Use of animals and the procedures performed in this study was approved by the New South Wales North Coast Animal Care and Ethics Committee (Approval No. G2000/05).
Hereford cows were artificially inseminated or mated to one of 5 different Wagyu
sires or one of 6 different Piedmontese sires. All Piedmontese sires were
homozygous for the MSTN (GDF8) missense mutation in exon 3 and none of the Wagyu
sires carried the mutation. We sequenced the myostatin transcript from cDNA and
found it to be heterozygous for the SNP mutation in all Piedmontese samples with
approximately equal peak heights for both alleles. Muscle tissue from these
animals has been contrasted previously across both pre-
We used a bovine oligonucleotide microarray, developed in 2006 by ViaLactia
Bioscience in collaboration with Agilent, containing 21,475 unique 60-mer
probes, representing approximately 19,500 distinct bovine genes. Four
microarrays are present on each Agilent chip. Issues considered in the
experimental design included the availability of biological replicates as well
as the quality of the extracted mRNA. The experimental layout was designed to
allow a focus on the cross comparison, but to also permit a developmental aspect
to be carried out (
We used a number of approaches to establish a reasonably definitive list of genes
encoding proteins that directly or indirectly modify gene expression, including
chromatin remodelers. We made use of a comprehensive list of TF previously
published in humans
Gene expression intensity signals were subjected to a series of data acquisition
criteria based on signal to noise ratio and mean to median correlation as
detailed previously
Data normalization was carried out using a linear mixed ANOVA model as described
in
For the random effects in model (1), standard stochastic assumptions are:
To determine which genes are DE between the two crosses, the following
t-statistic was computed for each gene in
Finally, the DE measurement contrasts in (2) were processed by fitting a
two-component normal mixture model and posterior probabilities of belonging to
the non-null component were used to identify DE genes with an estimated
experiment-wise false discovery rate of <1% as described by
We introduce the term differential wiring (DW) which, defined for every pair of genes, is computed from the difference between the co-expression correlation observed between these two genes in the Piedmontese network minus the co-expression correlation between the same pair of genes in the Wagyu network.
In algebraical terms, DW is computed as follows:
rp(
rw(
For every regulator in our dataset, we introduce a new term, namely Regulatory
Impact Factor (RIF) which simultaneously combines the DW between the TF and each
of the DE genes, weighted for the PIF of the DE genes, i.e., their expression
averaged across the two crosses (denoted as A
In algebraical terms, the RIF associated with the
PIF is implicit in the Equation 4 representation of RIF and is defined as the
product of the average and the differential expression of a gene, computed as follows:
Differential hubbing was calculated in two ways. Firstly, by subtracting the
number of significant connections a gene has in Wagyu from the number of
significant connections it has in Piedmontese where significance was established
using the PCIT algorithm
The DE and DiffK for all 11,057 genes. Myostatin is not DiffK.
(0.28 MB TIF)
The normalised mean expression for the 11,057 genes across the ten developmental time points for the two breed crosses.
(4.30 MB XLS)
The list of the transcriptional regulators (column 1) with their DE (column 2) and their combined, averaged RIF scores from Eq4 and Eq5 (column 3).
(0.09 MB XLS)
The differential wiring arrangements (Piedmontese coexpression minus Wagyu coexpression) for the 920 regulators versus the 85 DE genes.
(1.39 MB XLS)
We wish to acknowledge everyone involved in the development of the Grafton bovine development dataset in the Co-operative Research Centre for Cattle and Beef Quality, in particular Sigrid Lehnert, YongHong Wang, Paul Greenwood, and Hutton Oddy.