Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models

Richard R. Stein; Debora S. Marks; Chris Sander

doi:10.1371/journal.pcbi.1004182

Abstract

Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles. Here, we review undirected pairwise maximum-entropy probability models in two categories of data types, those with continuous and categorical random variables. As a concrete example, we present recently developed inference methods from the field of protein contact prediction and show that a basic set of assumptions leads to similar solution strategies for inferring the model parameters in both variable types. These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system. Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene–gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.

Citation: Stein RR, Marks DS, Sander C (2015) Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models. PLoS Comput Biol 11(7): e1004182. https://doi.org/10.1371/journal.pcbi.1004182

Editor: Shi-Jie Chen, University of Missouri, UNITED STATES

Published: July 30, 2015

Copyright: © 2015 Stein et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Funding: This work was supported by NIH awards R01 GM106303 (DSM, CS, and RRS) and P41 GM103504 (CS). The funders had no role in the preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Modern high-throughput techniques allow for the quantitative analysis of various components of the cell. This ability opens the door to analyzing and understanding complex interaction patterns of cellular regulation, organization, and evolution. In the last few years, undirected pairwise maximum-entropy probability models have been introduced to analyze biological data and have performed well, disentangling direct interactions from artifacts introduced by intermediates or spurious coupling effects. Their performance has been studied for diverse problems, such as gene network inference [1,2], analysis of neural populations [3,4], protein contact prediction [5–8], analysis of a text corpus [9], modeling of animal flocks [10], and prediction of multidrug effects [11]. Statistical inference methods using partial correlations in the context of graphical Gaussian models (GGMs) have led to similar results and provide a more intuitive understanding of direct versus indirect interactions by employing the concept of conditional independence [12,13].

Our goal here is to derive a unified framework for pairwise maximum-entropy probability models for continuous and categorical variables and to discuss some of the recent inference approaches presented in the field of protein contact prediction. The structure of the manuscript is as follows: (1) introduction and statement of the problem, (2) deriving the probabilistic model, (3) inference of interactions, (4) scoring functions for the pairwise interaction strengths, and (5) discussion of results, improvements and applications.

Better knowledge of these methods, along with links to existing implementations in terms of software packages, may be helpful to improve the quality of biological data analysis compared to standard correlation-based methods and increase our ability to make predictions of interactions that define the properties of a biological system. In the following, we highlight the power of inference methods based on the maximum-entropy assumption using two examples of biological problems: inferring networks from gene expression data and residue contacts in proteins from multiple sequence alignments. We compare solutions obtained using (1) correlation-based inference and (2) inference based on pairwise maximum-entropy probability models (or their incarnation in the continuous case, the multivariate Gaussian distribution).

Gene association networks

Pairwise associations between genes and proteins can be determined by a variety of data types, such as gene expression or protein abundance. Association between entities in these data types are commonly estimated by the sample Pearson correlation coefficient computed for each pair of variables x_i and x_j from the set of random variables x₁,…, x_L. In particular, for M given samples in L measured variables, , it is defined as, where denotes the (i, j)-element of the empirical covariance matrix . The sample mean operator provides the empirical mean from the measured data and is defined as . A simple way to characterize dependencies in data is to classify two variables as being dependent if the absolute value of their correlation coefficient is above a certain threshold (and independent otherwise) and then use those pairs to draw a so-called relevance network [14]. However, the Pearson correlation is a misleading measure for direct dependence as it only reflects the association between two variables while ignoring the influence of the remaining ones. Therefore, the relevance network approach is not suitable to deduce direct interactions from a dataset [15–18]. The partial correlation between two variables removes the variational effect due to the influence of the remaining variables (Cramér [19], p. 306). To illustrate this, let’s take a simplified example with three random variables x_A, x_B, x_C. Without loss of generality, we can scale each of these variables to zero-mean and unit-standard deviation by , which simplifies the correlation coefficient to . The sample partial correlation coefficient of a three-variable system between x_A and x_B given x_C is then defined as [19,20]

The latter equivalence by Cramer’s rule holds if the empirical covariance matrix, , is invertible. Krumsiek et al. [21] studied the Pearson correlations and partial correlations in data generated by an in silico reaction system consisting of three components A, B, C with reactions between A and B, and B and C (Fig 1A). A graphical comparison of Pearson’s correlations, r_AB, r_AC r_BC, versus the corresponding partial correlations, r_AB·C, r_AC·B, r_BC·A, shows that variables A and C appear to be correlated when using Pearson’s correlation as a dependency measure since both are highly correlated with variable B, which results in a false inferred reaction r_AC. The strength of the incorrectly inferred interaction can be numerically large and therefore particularly misleading if there are multiple intermediate variables B [22]. The partial correlation analysis removes the effect of the mediating variable(s) B and correctly recovers the underlying interaction structure. This is always true for variables following a multivariate Gaussian distribution, but also seems to work empirically on realistic systems as Krumsiek et al. [21] have shown for more complex reaction structures than the example presented here.

Download:

Fig 1. Reaction system reconstruction and protein contact prediction.

Association results of correlation-based and maximum-entropy methods on biological data from an in silico reaction system (A) and protein contacts (B). (A) Analysis by Pearson’s correlation yields interactions associating all three compounds A, B, and C, in contrast to the partial correlation approach which omits the “false” link between A and C. (Fig 1A based on [21].) (B) Protein contact prediction for the human RAS protein using the correlation-based mutual information, MI, and the maximum-entropy based direct information, DI, (blue and red, respectively). The 150 highest scoring contacts from both methods are plotted on the protein contacts from experimentally determined structure in gray. (Fig 1B based on [6].)

https://doi.org/10.1371/journal.pcbi.1004182.g001

Protein contact prediction

The idea that protein contacts can be extracted from the evolutionary family record was formulated and tested some time ago [23–26]. The principle used here is that slightly deleterious mutations are compensated during evolution by mutations of residues in contact in order to maintain the function and, by implication, the shape of the protein. Protein residues that are close in space in the folded protein are often mutated in a correlated manner. The main problem here is that one has to disentangle the directly co-evolving residues and remove transitive correlations from the large number of other co-variations in protein sequences that arise due to statistical noise or phylogenetic sampling bias in the sequence family. Interactions not internal to the protein are, for example, evolutionary constraints on residues involved in oligomerization, protein–protein, protein–substrate interactions [6,27,28]. In particular, the empirical single-site and pair frequency counts in residue i and in residues i and j for elements σ, ω of the 20-element amino acid alphabet plus gap, f_i(σ) and f_ij(σ, ω), are extracted from a representative multiple sequence alignment under applied reweighting to account for biases due to undersampling. Correlated evolution in these positions was analyzed, e.g., by [29], by using the mutual information between residue i and j,

Although results did show promise, an important improvement was made years later by using a maximum-entropy approach on the same setup [5–7,30]. In this framework, the direct information of residue i and j was introduced by replacing f_ij in the mutual information by , (1) where and and Z_ij are chosen such that , which is based on a pairwise probability model of an amino acid sequence compatible with the iso-structural sequence family, is consistent with the single-site frequency counts. In an approximative solution, [6,7] determined the contact strength between the amino acids σ and ω in position i and j, respectively, by (2)

Here, (C⁻¹(σ,ω))_ij denotes the inverse element corresponding to C_ij (σ,ω) ≡ f_ij(σ,ω) − f_i(σ) f_j(ω) for amino acids σ, ω from a subset of 20 out of the 21 different states (the so-called gauge fixing, see below). The comparison of contact prediction results based on MI- and DI-score for the RAS human protein on top of the actual crystal structure shows a much more accurate prediction result when using the direct information instead of the mutual information (Fig 1B).

The next section lays the foundation to deriving maximum-entropy models for the two data types: continuous, as used in the first example, and categorical, as used in the second one. Subsequently, we will present inference techniques to solve for their interaction parameters.

Deriving the Probabilistic Model

Ideally, one would like to use a probabilistic model that is, on the one hand, able to capture all orders of directed interactions of all observables at play and, on the other hand, correctly reproduces the observed and to-be-predicted frequencies. However, this would require a prohibitively large number of observed data points. For this reason, we restrict ourselves to probabilistic models with terms up to second order, which we derive for continuous, real-valued variables, and extend this framework to models with categorical variables that are suitable, for example, to treat sequence information in the next section.

Model formulation for continuous random variables

We model the occurrence of sets of events in a particular biological system by a multivariate probability distribution P(x) of L random variables x = (x₁,…, x_L)^T ∈ℝ^L that is, on the one hand, consistent with the mean and covariance obtained from M observed data values x¹,…, x^M and, on the other hand, maximizing the information entropy, S, to obtain the simplest possible probability model consistent with the data. At this point, each of the data’s variables x_i is continuously distributed on real values. In a biological example, these data originate from gene expression studies and each variable x_i corresponds to the normalized mRNA level of a gene measured in M samples. As an example, a recent pan-cancer study of The Cancer Genome Atlas (TCGA) provided mRNA levels from M = 3,299 patient tumor samples from 12 cancer types [31]. The problem can be large, e.g., in the case of a gene–gene association study one has L ≈ 20,000 human genes.

The first constraint on the unknown probability distribution, P: ℝ^L →ℝ_≥0 is that its integral normalizes to 1, (3) which is a natural requirement on any probability distribution. Additionally, the first moment of variable x_i is supposed to match the value of the corresponding sample mean over M measurements in each i = 1,…, L, (4) where we define the n-th moment of the random variable x_i distributed by the multivariate probability distribution P as . Analogously, the second moment of the variables x_i and x_j and its corresponding empirical expectation is supposed to be equal, (5) for i, j = 1,…, L. Taken together, Eqs 4 and 5 constrain the distribution’s covariance matrix to be coherent to the empirical covariance matrix. Finally, the probability distribution should maximize the information entropy, (6) with the natural logarithm ln. A well-known analytical strategy to find functional extrema under equality constraints is the method of Lagrange multipliers [32], which converts a constrained optimization problem into an unconstrained one by means of the Lagrangian . In our case, the probability distribution maximizing the entropy (Eq 6) subject to Eqs 3–5 is found as the stationary point of the Lagrangian [33,34], (7)

The real-valued Lagrange multipliers α, β = (β_i)_{i = 1,…, L} and γ = (γ_ij)_{i,j = 1,…, L} correspond to the constraints Eqs 3, 4, and 5, respectively. The maximizing probability distribution is then found by setting the functional derivative of with respect to the unknown density P(x) to zero [33,35],

Its solution is the pairwise maximum-entropy probability distribution, (8) which is contained in the family of exponential probability distributions and assigns a non-negative probability to any system configuration x = (x₁,…, x_L)^T ∈ℝ^L. For the second identity, we introduced the partition function as normalization constant, with the Hamiltonian,. It can be shown by means of the information inequality that Eq 8 is the unique maximum-entropy distribution satisfying the constraints Eqs 3–5 (Cover and Thomas [35], p. 410). Note that α is fully determined for given β = (β_i) and γ = (γ_ij) by the normalization constraint Eq 3 and is therefore not a free parameter. The right-hand representation of Eq 8 is also referred to as Boltzmann distribution. The matrix of Lagrange multipliers γ = (γ_ij) has to have full rank in order to ensure a unique parametrization of P(x), otherwise, one can eliminate dependent constraints [33,36]. In addition, for the integrals in Eqs 3–6 to converge with respect to L-dimensional Lebesgue measure, we require γ to be negative definite, i.e., all of its eigenvalues to be negative or for x ≠ 0.

Concept of entropy maximization

Shannon states in his seminal work that information and (information) entropy are linked: the more information is encoded in the system, the lower its entropy [37]. Jaynes introduced the entropy maximization principle, which selects for the probability distribution that is (1) in agreement with the measured constraints and (2) contains the least information about the probability distribution [38–40]. In particular, any unnecessary information would lower the entropy and, thus, introduce biases and allow overfitting. As demonstrated in the section above, the assumption of entropy maximization under first and second moment constraints results in an exponential model or Markov random field (in log-linear form) and many of the properties shown here can be generalized to this model class [41]. On the other hand, there is some analogy of entropy as introduced by Shannon to the thermodynamic notion of entropy. Here, the Second law of Thermodynamics states that each isolated system monotonically evolves in time towards a state of maximum entropy, the equilibrium. A thorough discussion of this analogy and its limitation in non-equilibrium systems is beyond the scope of this review, but can be found in [42,43]. Here, we exclusively use the notion entropy maximization as the principle of minimal information content in the probability model consistent with the data.

Categorical random variables

In the following section, we derive the pairwise maximum-entropy probability distribution on categorical variables. For jointly distributed categorical variables x = (x₁,…, x_L)^T ∈Ω^L, each variable x_i is defined on the finite set Ω = {σ₁,…, σ_q} consisting of q elements. In the concrete example of modeling protein co-evolution, this set contains the 20 amino acids represented by a 20-letter alphabet from A standing for Alanine to Y for Tyrosine plus one gap element, then Ω = {A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y, −} and q = 21. Our goal is to extract co-evolving residue pairs from the evolutionary record of a given protein family. As input data, we use a so-called multiple sequence alignment, {x¹,…, x^M} ⊂Ω^L×M, a collection of closely homologous protein sequences that is formatted such that it allows comparison of the evolution across each residue [44]. These alignments may stem from different hidden Markov model-derived resources, such as PFAM [45], hhblits [46], and Jackhmmer [47].

To formalize the derivation of the pairwise maximum-entropy probability distribution on categorical variables, we use the approach of [8,30,48] and replace, as depicted in Fig 2, each variable x_i defined on categorical variables by an indicator function of the amino acid σ ∈ Ω, 1_σ: Ω → {0, 1}^q,

Download:

Fig 2. Illustration of binary embedding.

The binary embedding 1_σ: Ω → {0, 1}^Lq maps each vector of categorical random variables, x∈Ω^L, here represented by a sequence of amino acids from the amino acid alphabet (containing the 20 amino acids and one gap element), Ω = {A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y, −}, onto a unique binary representation, x(σ)∈{0, 1}^Lq.

https://doi.org/10.1371/journal.pcbi.1004182.g002

This embedding specifies a unique representation of any L-vector of categorical random variables, x, as a binary Lq-vector, x(σ) with a single non-zero entry in each binary q-subvector x_i(σ) = (x_i(σ₁),…, x_i(σ_q))^T ∈{0,1}^q,

Inserting this embedding into the first and second moment constraints, corresponding to Eqs 3 and 4 in the continuous variable case, we find their embedded analogues, the single and pairwise marginal probability in positions i and j for amino acids σ,ω,∈Ω including P_ii(σ,ω) = P_i(σ)1_σ(ω) and with the distribution’s first moment in each random variable, and y = (y₁,…, y_Lq)^T ∈ℝ^Lq. The analogue of the covariance matrix then becomes a symmetric Lq × Lq matrix of connected correlations whose entries C_ij(σ,ω) = P_ij(σ,ω) − P_i(σ) P_j(ω) characterize the dependencies between pairs of variables. In the same way, the sample means translate to the single-site and pair frequency counts over m = 1,…, M data vectors ,

The pairwise maximum-entropy probability distribution in categorical variables has to fulfill the normalization constraint, (9)

Furthermore, the single and pair constraints, the analogues of Eqs 3 and 4, enforce the resulting probability distribution to be compatible with the measured single and pair frequency counts, (10) for each i, j = 1,…, L and amino acids σ,ω∈Ω. As before, we require the probability distribution to maximize the information entropy, (11)

The corresponding Lagrangian, , has the functional form,

For notational convenience, the Lagrange multipliers β_i(σ) and γ_ij(σ,ω) are grouped to the Lq-vector and the Lq × Lq-matrix , respectively. The Lagrangian’s stationary point, found as the solution of , determines the pairwise maximum-entropy probability distribution in categorical variables [30,49], (12) with normalization by the partition function, Z ≡ exp(1−α). Note that distribution Eq 12 is of the same functional form as Eq 8 but with binary random variables x(σ) ∈{0,1}^Lq instead of continuous ones x∈ℝ^L. At this point, we introduce the reduced parameter set, h_i(σ): = β_i(σ)+γ_ii(σ, σ) and e_ij(σ,ω): = 2γ_ij(σ,ω) for i < j, using the symmetry of the Lagrange multipliers, γ_ij(σ,ω): = γ_ji(ω, σ), and that x_i(σ) x_i(ω) = 1 if and only if σ = ω. For a given sequence (z₁,…, z_L)∈Ω^L summing over all non-zero elements, (x₁(z₁) = 1,…, x_L(z_L) = 1) or equivalently (x₁ = z₁,…, x_L = z_L) then yields the probability assigned to the sequence of interest, (13)

This is the 21-state maximum-entropy probability distribution as presented by [5–7].

Gauge fixing

In contrast to the continuous variable case in which the number of constraints naturally matches the number of unknown parameters, the case of categorical variables has dependencies due to for each i = 1,…, L and for each i, j = 1,…, L and σ∈Ω. This results in at most independent constraints compared to free parameters to be estimated. To ensure the uniqueness of the inferred parameters in defining the Hamiltonian, , and, by implication, the probability distribution, one has to reduce the number of independent parameters such that these match the number of independent constraints. For this purpose, so-called gauge fixing [5] has been proposed, which can be realized in different ways. For example, the authors of [6,7] set the parameters corresponding to the last amino acid in the alphabet, σ_q, to zero, i.e., e_ij(σ_q, ·) = e_ij(·, σ_q) = 0 and h_i(σ_q) = 0 for 1 ≤ i < j ≤ L, resulting in rows and columns of zeros at the end of each -block of the Lq × Lq coupling matrix. Alternatively, the authors of [5] introduce a zero-sum gauge, and for each 1 ≤ i < j ≤ L and ω^′∈Ω. However, different gauge fixings are not equally efficient for the purpose of protein contact prediction. The zero-sum gauge is the parameter fixing that minimizes the sum of squares of the pairwise parameters in the Hamiltonian ℋ, , which makes it the suitable choice when using non-gauge invariant scoring functions, such as the (average product-corrected) Frobenius norm [5,50] (see section “Scoring Functions”). Moreover, no gauge fixing is required when combining the strictly convex ℓ¹- or ℓ²-regularizer with negative loglikelihood minimization; here the regularizer selects for a unique representation among all parametrizations of the optimal distribution [32,51]. However, to additionally minimize the Frobenius norm of the pairwise interactions, [51] changed the obtained full parameter set from regularized inference with plmDCA to zero-sum gauge by, , where q denotes the length of the alphabet.

Network interpretation

The derived pairwise maximum-entropy distributions in Eqs 13 or 12 and 8 specify an undirected graphical model or Markov random field [34,41]. In particular, a graphical model represents a probability distribution in terms of a graph that consists of a node and an edge set. Edges characterize the dependence structure between nodes and a missing edge then corresponds to conditional independence given the remaining random variables. For continuous, real-valued variables, the maximum-entropy distribution with first and second moment constraints is multivariate Gaussian, which will be demonstrated in the next section. Its dependency structure is represented by a graphical Gaussian model (GGM) in which a missing edge, γ_ij = 0, corresponds to conditional independence between the random variables x_i and x_j (given the remaining ones), and is further specified by a zero entry in the corresponding inverse covariance matrix, (C⁻¹)_ij = 0.

In the next section, we describe how the dependency structure of the graph is inferred.

Inference of Interactions

Up to this point, the functional form of the maximum-entropy probability distribution is specified, but not its determining parameters. For categorical variables with dimension L > 1, there is typically no closed-form solution. In the following section, we present several inference methods to estimate these parameters that have recently been used in the context of protein contact prediction. Those are (1) for continuous variables, the exact closed-form solution which approximates the mean-field result for categorical variables, and (2) three inference methods for categorical variables based on the maximum-likelihood methodology: the stochastic maximum likelihood, the approximation by pseudo-likelihood maximization, and finally, the sparse maximum-likelihood solution.

Closed-Form Solution for Continuous Variables

The simplest approach to extract the unknown Lagrange multipliers α, β = (β_i), and γ = (γ_ij) from P(x) exactly is to use basic integration properties of the continuous random variables x_i in the constraints Eqs 3–5. For this purpose, we rewrite the exponent of the pairwise maximum-entropy probability distribution Eq 8, where we use the replacement and require to be positive definite (which is equivalent to γ being negative definite), i.e., for any x ≠ 0, which makes its inverse well-defined. As already discussed, this is a sufficient condition on the integrals in Eqs 3–6 to be finite. For notational convenience, we define the shifted variable or and accordingly, the maximum-entropy probability distribution becomes (14) with the normalization constant . The normalization condition Eq 3 in the new variable is, (15) and the linear shift does not affect the integral when integrated over ℝ^L yielding for the normalization constant,. Furthermore, the first-order constraint Eq 4 becomes for each i = 1,…, L, and we used the point symmetry of the integrand then, in each i = 1,…, L. Analogously, we find for the second moment, determining the correlations for each index pair i, j = 1,…, L, where we use again the point symmetry and the result on the normalization constraint. Based on this, the covariance is found as,

Finally, the term 〈z_i z_j〉 is solved using a spectral decomposition of the symmetric and positive-definite matrix as sum over products of its eigenvectors v₁,…, v_L and real-valued and positive eigenvalues λ₁,…, λ_L,. The eigenvectors form a basis of ℝ^L and assign new coordinates, y₁,…, y_L, to , which allows writing of the exponent 〈z_i z_j〉 as . The covariance between x_i and x_j then reads as (Bishop [52], p. 83) with solution or . Taken together, the Lagrange multipliers β and γ are specified in terms of the mean, 〈x〉, and the inverse covariance matrix (also known as the precision or concentration matrix), C⁻¹, (16)

As a consequence, the real-valued maximum-entropy distribution Eq 14 for given first and second moments is found as the multivariate Gaussian distribution, which is determined by the mean 〈x〉 and the covariance matrix C, (17) and we refer to [52] for the derivation of the normalization factor. The initial requirement of to be positive definite results in a positive-definite covariance matrix C, a necessary condition for the Gaussian density to be well defined. In summary, the multivariate Gaussian distribution maximizes the entropy among all probability distributions of continuous variables with specified first and second moments. The pair interaction strength is now evaluated by the already introduced partial correlation coefficient between x_i and x_j given the remaining variables {x_r}_{r∈{1,…, L}\{i,j}}, (18)

Data integration

In biological datasets as used to study gene association, the number of measurements, M, is typically smaller than the number of observables, L, i.e., M < L in our terminology. Consequently, the empirical covariance matrix, , will in these cases always be rank-deficient (and, thus, not invertible) since its rank can exceed neither the number of variables, L, nor the number of measurements, M. Moreover, even in cases when M ≥ L, the empirical covariance matrix may become non-invertible or badly conditioned (i.e., close to singular) due to dependencies in the data. However, for variables following a multivariate Gaussian distribution, one can access the elements of its inverse by maximizing the penalized Gaussian loglikelihood, which results in the following estimate of the inverse covariance matrix, , (19) with penalty parameter λ ≥ 0 and . If λ = 0, we obtain the maximum-likelihood estimate, for δ = 1 and λ > 0 the ℓ¹-regularized (sparse) maximum-likelihood solution that selects for sparsity [53,54], and for δ = 2 and λ > 0 the ℓ²-regularized maximum-likelihood solution that favors small absolute values in the entries of the selected inverse covariance matrix [55]. For δ = 1 and λ > 0, the method is called LASSO, for δ = 2 and λ > 0, ridge regression. Alternatively, regularization can be directly applied to the covariance matrix, e.g., by shrinkage [17,56].

Solution for categorical variables

An ad hoc ansatz to extract the pairwise parameters in the categorical variables case (12) is to extend the binary variable to a continuous one, y = (y_j)_j ∈ℝ^L(q−1), and replace the sums in the distribution and the moments 〈·〉 by integrals. The extended binary maximum-entropy distribution Eq 12 is then approximated by the Lq-dimensional multivariate Gaussian with inherited analogues of the mean and the empirical covariance matrix whose elements are characterizing the pairwise dependency structure. The gauge fixing results in setting the preassigned entries referring to the last amino acid in the mean vector and the covariance matrix to zero, which reduces the model’s dimension from Lq to L(q−1); otherwise the unregularized covariance matrix would always be non-invertible. Typically, the single and pair frequency counts are reweighted and regularized by pseudocounts (see section “Sequence data preprocessing”) to additionally ensure that is invertible. Final application of the closed-form solution for continuous variables Eq 16 to the extended binary variables for yields the so-called mean-field (MF) approximation [48], (20) for amino acids σ,ω∈Ω and with restriction to residues i < j in the latter identity. The same solution has been obtained by [6,7] using a perturbation ansatz to solve the q-state Potts model termed (mean-field) Direct Coupling Analysis (DCA or mfDCA). In Ising models, this result is also known as naïve mean-field approximation [57–59].

The following section is dedicated to maximum likelihood-based inference approaches, which have been presented in the field of protein contact prediction.

Maximum-Likelihood Inference

A well-known approach to estimate the parameters of a model is maximum-likelihood inference. The likelihood is a scalar measure of how likely the model parameters are, given the observed data (Mackay [34], p. 29), and the maximum-likelihood solution denotes the parameter set maximizing the likelihood function. For Markov random fields, the maximum-likelihood solution is consistent, i.e., recovers the true model parameters in the limit of infinite data (Koller and Friedman [32], p. 949). In particular, for a pairwise model with parameters and , we find the likelihood l(h(σ),e(σ,ω)) = l(h(σ),e(σ,ω)|x¹,…, x^M) given observed data, x¹,…, x^M ∈Ω^L, which are assumed to be independent and identically distributed (iid), as (21)

The estimates of the model parameters are then obtained as the maximizer of l or, using the monotonicity of the logarithm, the minimizer of —ln l,

When we specify the maximum-entropy distribution Eq 13 as model distribution, the then-concave loglikelihood [32] becomes (22)

The maximum-likelihood solution is found by taking the derivatives of Eq 22 with respect to the model parameters h_i(σ) and e_ij(σ,ω) and setting to zero, (23)

The partial derivatives of the partition function, , follow the well-known identities

The maximizing parameters, and , are those matching the distribution’s single and pair marginal probabilities with the empirical single and pair frequency counts, in residues i = 1,…, L and i,j = 1,…, L, respectively, and for amino acids σ,ω∈Ω. In other words, matching the moments of the pairwise maximum-entropy probability distribution to the given data is equivalent to maximum-likelihood fitting of an exponential family [34,60]. Although the maximum-likelihood solution is globally optimal for the pairwise maximum-entropy probability model, based on the concavity of ln l, the resulting distribution is not necessarily unique, due to dependencies in the input data (Koller and Friedman [32], p. 948). To remove these equivalent optima and select for a unique representation, one needs to introduce further constraints by, for example, gauge fixing or regularization.

Based on the maximum-likelihood principle, we present three solution approaches in the remainder of this section.

Stochastic maximum likelihood

The maximum-likelihood solution is typically inaccessible for models of categorical variables due to the computational complexity of estimating the partition function Z which involves a sum over all possible states and grows exponentially with the size of the system [3,61]. Lapedes et al. [30] solved Eq 22 by likelihood maximization on sampled subsets using the Metropolis–Hastings algorithm [32,34]. In particular, the likelihood is maximized iteratively by following the steepest ascent of the loglikelihood function ln l using Eq 23. In each maximization step, the parameters and are changed in proportion to the gradient of ln l and scaled by the constant step size ε > 0, until convergence is reached as the differences , i = 1,…, L, and , 1 ≤ i < j ≤ L, go to zero [30]. The computation of the marginals requires summing over 20^L states and is, for example, estimated by Monte-Carlo sampling. As the likelihood is concave, there are no local maxima and the maximum-likelihood parameters are obtained in the limit , or for i = 1,…, L and for 1 ≤ i < j ≤ L and σ,ω∈Ω \ {σ_q}, a subset of Ω containing q−1 elements to account for gauge fixing.

Pseudo-likelihood maximization

Besag [62] introduced the pseudo-likelihood as approximation to the likelihood function in which the global partition function is replaced by computationally tractable local estimates. The pseudo-likelihood inherits the concavity from the likelihood and yields the exact maximum-likelihood parameter in the limit of infinite data for Gaussian Markov random fields [41,62], but not in general [63]. Applications of this approximation to non-continuous categorical variables have been studied, for instance, in sparse inference of Ising models [64] but may lead to results that differ from the maximum-likelihood estimate. In this approach, the probability of the m-th observation, x^m, is approximated by the product of the conditional probabilities of given observations in the remaining variables [51],

Each factor is of the following analytical form, which only depends on the unknown parameters (e_ij(σ,ω))_i≠r,j≠r and (h_i(σ))_i≠r and makes the computation of the pseudo-likelihood tractable. Note, we treat e_ij(σ,ω) = e_ji(ω,σ) and e_ii(·,·) = 0. By this approximation, the loglikelihood Eq 21 becomes the pseudo-loglikelihood,

In the final formulation of the pseudo-likelihood maximization (PLM) problem, an ℓ²-regularizer is added to select for small absolute values of the inferred parameters, where λ_h, λ_e > 0 adjust the complexity of problem and are selected in a consistent manner across different protein families to avoid overfitting. This approach has been presented (with scaling of the pseudo-loglikelihood by to include sequence weighting, see section “Sequence data preprocessing”) by [51] under the name plmDCA (PseudoLikelihood Maximization Direct Coupling Analysis) and has shown performance improvements compared to the mean-field approximation Eq 20. Another inference method based on the pseudolikelihood maximization but including prior knowledge in terms of secondary structure and information on pairs likely to be in contact is Gremlin (Generative REgularized ModeLs of proteINs) [65–67].

Sparse maximum likelihood

Similar to the derivation of the mean-field result (20), Jones et al. [8] approximated Eq 12 by a multivariate Gaussian and accessed the elements of the inverse covariance matrix by a maximum-likelihood inference under sparsity constraint [54,68,69]. The corresponding method has been called Psicov (Protein Sparse Inverse COVariance). The validity of this approach to solve the sparse maximum-likelihood problem in binary systems such as Ising models has been demonstrated by [69], followed by consistency studies [70]. In particular, the Psicov method infers the sparse maximum-likelihood estimate of the inverse covariance matrix Eq 19 for δ = 1 using the analogue of the empirical covariance matrix derived from the observed amino acid frequencies,. Its elements , the empirical connected correlations, are preprocessed by reweighting and regularized by pseudocounts and shrinkage. Regularized loglikelihood maximization Eq 19 selects a unique representation of the model, i.e., no additional gauge fixing is required. Using identity Eq 16 on the elements of the sparse maximum-likelihood (SML) estimate of the inverse covariance, , yields the estimates for the Lagrange multipliers, for σ,ω∈Ω; in the second identity, the symmetric Lagrange multipliers γ_ij(σ,ω) defined for indices i,j = 1,…, L have been hypothetically translated to the reduced parameter formulation e_ij(σ,ω) for 1 ≤ i < j ≤ L.

Sequence data preprocessing

The study of residue–residue co-evolution is based on data from multiple sequence alignments, which represent sampling from the evolutionary record of a protein family. Multiple sequence alignments from currently existing sequence databases do not evenly represent the space of evolved sequences as they are subject to acquisition bias towards available species of interest. To account for uneven representation, sequence reweighting has been introduced to lower the contributions of highly similar sequences and assign higher weight to unique ones (see Durbin et al. [44], p. 124 ff.). In particular, the weight of the m-th sequence, w_m: = 1/k_m, in the alignment {x¹,…, x^M}, can be chosen to be the inverse of , the number of sequences x^m shares more than θ · 100% of its residues with. Here, θ denotes a similarity threshold and is typically chosen as 0.7 ≤ θ ≤ 0.9, 1(a,b) = 1 if a = b and 1(a,b) = 0, otherwise, and H is the step function with H(y) = 0 if y < 0 and H(y) = 1, otherwise. This also provides us with an estimate of the effective number of sequences in the alignment,. Additionally, pseudocount regularization with is used to deal with finite sampling bias and to account for underrepresentation [5–8,44,48], resulting in zero entries in , for instance, if a certain amino acid pair is never observed. The use of pseudocounts is equivalent to a maximum a posteriori (MAP) estimate under a specific inverse Wishart prior on the covariance matrix [48]. Both preprocessing steps combined yield the reweighted single and pair frequency counts, in residues i,j = 1,…, L and for amino acids σ,ω∈Ω. Ideally for maximum-likelihood inference, the random variables are assumed to be independent and identically distributed. However, this is typically violated in realistic sequence data due to phylogenetic and sequencing bias, and the reweighting presented here does not necessarily solve this problem.

Scoring Functions for the Pairwise Interaction Strengths

For pairwise maximum-entropy models of continuous variables, the natural scoring function for the interaction strength between two variables x_i and x_j, given the inferred inverse covariance matrix, is the partial correlation Eq 18. However, for categorical variables, the situation is more complicated, and there are several alternative choices of scoring functions. Requirements on the scoring function are that it has to account for the chosen gauge and, in the case of protein contact prediction, evaluate the coupling strength between two residues i and j summarized across all possible q² amino acids pairs. The highest scoring residue pair is, for instance, used to predict the 3-D structure of the protein of interest. For this purpose, the direct information, defined as the mutual information applied to instead of f_ij(σ,ω), has been introduced [5]. In , and are chosen to be consistent with the (reweighted and regularized) single-site frequency counts, f_i(σ) and f_j(ω), and Z_ij such that the sum over all pairs (i, j) with 1 ≤ i < j ≤ L is normalized to 1. The direct information is invariant under gauge changes of the Hamiltonian ℋ, which means that any suitable gauge choice results in the same scoring values. As an alternative measure of the interaction strength for a particular pair (i, j), the Frobenius norm of the 21×21-submatrices of (e_ij(σ,ω))_σ,ω has been used,

However, this expression is not gauge-invariant [5]. In this context, the notation with e_ij(σ,ω), which refers to indices restricted to i < j, is extended and treated such that e_ij(σ,ω) = e_ji(ω,σ) and e_ij(·,·) = 0; then ||e_ij||_F = ||e_ji||_F and ||e_ii||_F = 0. In order to correct for phylogenetic biases in the identification of co-evolved residues, Dunn et al. [27] introduced the average product correction (APC). It has been originally used in combination with the mutual information but was recently combined with the ℓ¹-norm [8] and the Frobenius/ℓ²-norm [51] and is derived from the averages over rows and columns of the corresponding norm of the matrix of the e_ij parameters. In this formulation, the pair scoring function is (24) for e_ij-parameters fixed by zero-sum gauge and with the means over the non-zero elements in row, column and full matrix, , and , respectively. Alternatively, the average product-corrected ℓ¹-norm applied to the 20×20-submatrices of the estimated inverse covariance matrix, in which contributions from gaps are ignored, has been introduced by the authors of [8] as the Psicov-score. Using the average product correction, the authors of [51] showed for interaction parameters inferred by the mean-field approximation that scoring with the average product-corrected Frobenius norm increased the precision of the predicted contacts compared to scoring with the DI-score. The practical consequence of the choice of scoring method depends on the dataset and the parameter inference method.

Discussion of Results, Improvements, and Applications

Maximum entropy-based inference methods can help in estimating interactions underlying biological data. This class of models, combined with suitable methods for inferring their numerical parameters, has been shown to reveal—to a reasonable approximation—the direct interactions in many biological applications, such as gene expression or protein residue—residue coevolution studies. In this review, we have presented maximum-entropy models for the continuous and categorical random variable case. Both approaches can be integrated into a framework, which allows the use of solutions obtained for continuous variables as approximations for the categorical random variable case (Fig 3).

Download:

Fig 3. Scheme of pairwise maximum-entropy probability models.

The maximum-entropy probability distribution with pairwise constraints for continuous random variables is the multivariate Gaussian distribution (left column). For the maximum-entropy probability distribution in the categorical variable case (right column), various approximative solutions exist, e.g., the mean-field, the sparse maximum-likelihood, and the pseudolikelihood maximization solution. The mean-field and the sparse maximum-likelihood result can be derived from the Gaussian approximation of binarized categorical variables (thin arrow). Pair scoring functions for the continuous case are the partial correlations (left column). For the categorical variable case, the direct information, the Frobenius norm, and the average product-corrected Frobenius norm are used to score pair couplings from the inferred parameters (right column).

https://doi.org/10.1371/journal.pcbi.1004182.g003

The validity and precision of the available maximum-entropy methods could be improved to yield more biologically insightful results in several ways. Advanced approximation methods derived from Ising model approaches [59,71] are possible extensions for efficient inference. Moreover, additional terms beyond pair interactions can be included in models of continuous and discrete random variables [1,33,59]. However, higher-order models demand more data, which is a major bottleneck for their application to biological problems. In the case of protein contact prediction, this could be resolved by getting more sequences, which are being obtained as the result of extraordinary advances in sequencing technology. The quality of existing methods can be improved by careful refinement of sequence alignments in terms of cutoffs and gaps or by attaching optimized weights to each of the data sequences. Alternatively, one could try to improve the existing model frameworks by accounting for phylogenetic progression [27,49,72] and finite sampling biases.

The advancement of inference methods for biological datasets could help solve many interesting biological problems, such as protein design or the analysis of multi-gene effects in relating variants to phenotypic changes as well as multi-genic traits [73,74]. The methods presented here could help reduce the parameter space of genome-wide association studies to first approximation. In particular, we envision the following applications: (1) in the disease context, co-evolution studies of oncogenic events, for example copy number alterations, mutations, fusions and alternative splicing, can be used to derive direct co-evolution signatures of cancer from available data, such as The Cancer Genome Atlas (TCGA); (2) de novo design of protein sequences as, for example, described in [65,75] for the WW domain using design rules based on the evolutionary information extracted from the multiple sequence alignment; and (3) develop quantitative models of protein fitness computed from sequence information.

In general, in a complex biological system, it is often useful for descriptive and predictive purposes to derive the interactions that define the properties of the system. With the methods presented here and available software (Table 1), our goal is not only to describe how to infer these interactions but also to highlight tools for the prediction and redesign of properties of biological systems.

Download:

Table 1. Overview of software tools to infer pairwise interactions from datasets in continuous or categorical variables with maximum-entropy/GGM-based methods.

https://doi.org/10.1371/journal.pcbi.1004182.t001

Acknowledgments

We thank Theofanis Karaletsos, Sikander Hayat, Stephanie Hyland, Quaid Morris, Deb Bemis, Linus Schumacher, John Ingraham, Arman Aksoy, Julia Vogt, Thomas Hopf, Andrea Pagnani, and Torsten Groß for insightful discussions.

References

1. Lezon TR, Banavar JR, Cieplak M, Maritan A, Fedoroff NV. Using the principle of entropy maximization to infer genetic interaction networks from gene expression patterns. Proceedings of the National Academy of Sciences of the United States of America. 2006;103(50):19033–19038. pmid:17138668
- View Article
- PubMed/NCBI
- Google Scholar
2. Locasale JW, Wolf-Yadlin A. Maximum entropy reconstructions of dynamic signaling networks from quantitative proteomics data. PloS one. 2009;4(8):e6522. pmid:19707567
- View Article
- PubMed/NCBI
- Google Scholar
3. Schneidman E, Berry II MJ, Segev R, Bialek W. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature. 2006;440:1007–1012. pmid:16625187
- View Article
- PubMed/NCBI
- Google Scholar
4. Tang A, Jackson D, Hobbs J, Chen W, Smith JL, Patel H, et al. A maximum entropy model applied to spatial and temporal correlations from cortical networks in vitro. The Journal of Neuroscience. 2008;28(2):505–518. pmid:18184793
- View Article
- PubMed/NCBI
- Google Scholar
5. Weigt M, White RA, Szurmant H, Hoch JA, Hwa T. Identification of direct residue contacts in protein—protein interaction by message passing. Proceedings of the National Academy of Sciences of the United States of America. 2009;106(1):67–72. pmid:19116270
- View Article
- PubMed/NCBI
- Google Scholar
6. Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, et al. Protein 3D Structure Computed from Evolutionary Sequence Variation. PLoS One. 2011;6(12):e28766. pmid:22163331
- View Article
- PubMed/NCBI
- Google Scholar
7. Morcos F, Pagnani A, Lunt B, Bertolino A, Marks D, Sander C, et al. Direct-coupling analysis of residue co-evolution captures native contacts across many protein families. Proceedings of the National Academy of Sciences of the United States of America. 2011;108:E1293–E1301. pmid:22106262
- View Article
- PubMed/NCBI
- Google Scholar
8. Jones DT, Buchan DWA, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics. 2012;28(2):184–190. pmid:22101153
- View Article
- PubMed/NCBI
- Google Scholar
9. Stephens GJ, Bialek W. Statistical mechanics of letters in words. Physical Review E. 2010;81(6):066119.
- View Article
- Google Scholar
10. Bialek W, Cavagna A, Giardina I, Mora T, Silvestri E, Viale M, et al. Statistical mechanics for natural flocks of birds. Proceedings of the National Academy of Sciences. 2012;109(13):4786–4791.
- View Article
- Google Scholar
11. Wood K, Nishida S, Sontag ED, Cluzel P. Mechanism-independent method for predicting response to multidrug combinations in bacteria. Proceedings of the National Academy of Sciences of the United States of America. 2012;109(30):12254–12259. pmid:22773816
- View Article
- PubMed/NCBI
- Google Scholar
12. Whittaker J. Graphical models in applied multivariate statistics. Wiley Publishing; 2009.
13. Lauritzen SL. Graphical models. Oxford: Oxford University Press; 1996.
14. Butte AJ, Kohane IS. Unsupervised knowledge discovery in medical databases using relevance networks. In: Proceedings of the AMIA Symposium. American Medical Informatics Association; 1999. p. 711–715.
15. Toh H, Horimoto K. Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling. Bioinformatics. 2002;18(2):287–297. pmid:11847076
- View Article
- PubMed/NCBI
- Google Scholar
16. Dobra A, Hans C, Jones B, Nevins JR, Yao G, West M. Sparse graphical models for exploring gene expression data. Journal of Multivariate Analysis. 2004;90(1):196–212.
- View Article
- Google Scholar
17. Schäfer J, Strimmer K. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical applications in genetics and molecular biology. 2005;4(1):1–32.
- View Article
- Google Scholar
18. Roudi Y, Nirenberg S, Latham PE. Pairwise maximum entropy models for studying large biological systems: when they can work and when they can’t. PLoS Computational Biology. 2009;5(5):e1000380. pmid:19424487
- View Article
- PubMed/NCBI
- Google Scholar
19. Cramér H. Mathematical methods of statistics. vol. 9. Princeton university press; 1999.
20. Guttman L. A note on the derivation of formulae for multiple and partial correlation. The Annals of Mathematical Statistics. 1938;9(4):305–308.
- View Article
- Google Scholar
21. Krumsiek J, Suhre K, Illig T, Adamski J, Theis FJ. Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics data. BMC Systems Biology. 2011;5(1):21.
- View Article
- Google Scholar
22. Giraud BG and Heumann , John M and Lapedes , Alan S. Superadditive correlation. Physical Review E. 1999;59(5):4983–4991.
- View Article
- Google Scholar
23. Neher E. How frequent are correlated changes in families of protein sequences? Proceedings of the National Academy of Sciences of the United States of America. 1994;91(1):98–102. pmid:8278414
- View Article
- PubMed/NCBI
- Google Scholar
24. Göbel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins. 1994;18(4):309–317. pmid:8208723
- View Article
- PubMed/NCBI
- Google Scholar
25. Taylor WR, Hatrick K. Compensating changes in protein multiple sequence alignments. Protein Engineering. 1994;7(3):341–348. pmid:8177883
- View Article
- PubMed/NCBI
- Google Scholar
26. Shindyalov IN and Kolchanov NA and Sander C. Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Engineering. 1994;7(3):349–358. pmid:8177884
- View Article
- PubMed/NCBI
- Google Scholar
27. Dunn SD, Wahl LM, Gloor GB. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics. 2008;24(3):333–340. pmid:18057019
- View Article
- PubMed/NCBI
- Google Scholar
28. Burger L, Van Nimwegen E. Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS computational biology. 2010;6(1):e1000633. pmid:20052271
- View Article
- PubMed/NCBI
- Google Scholar
29. Atchley WR, Wollenberg KR, Fitch WM, Terhalle W, Dress AW. Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. Molecular biology and evolution. 2000;17(1):164–178. pmid:10666716
- View Article
- PubMed/NCBI
- Google Scholar
30. Lapedes A, Giraud B, Jarzynski C. Using Sequence Alignments to Predict Protein Structure and Stability With High Accuracy. eprint arXiv:12072484. 2002;.
31. Ciriello G, Miller ML, Aksoy BA, Senbabaoglu Y, Schultz N, Sander C. Emerging landscape of oncogenic signatures across human cancers. Nature genetics. 2013;45(10):1127–1133. pmid:24071851
- View Article
- PubMed/NCBI
- Google Scholar
32. Koller D, Friedman N. Probabilistic graphical models: principles and techniques. MIT press; 2009.
33. Mead LR, Papanicolaou N. Maximum entropy in the problem of moments. Journal of Mathematical Physics. 1984;25:2404–2417.
- View Article
- Google Scholar
34. MacKay DJ. Information theory, inference and learning algorithms. Cambridge university press; 2003.
35. Cover TM, Thomas AJ. Elements of information theory. John Wiley & Sons; 2012.
36. Agmon N, Alhassid Y, Levine RD. An algorithm for finding the distribution of maximal entropy. Journal of Computational Physics. 1979;30(2):250–258.
- View Article
- Google Scholar
37. Shannon CE. A Mathematical Theory of Communication. Bell system technical journal. 1948;27(3):379–423.
- View Article
- Google Scholar
38. Jaynes ET. Information Theory and Statistical Mechanics. Physical Review. 1957;106(4):620–630.
- View Article
- Google Scholar
39. Jaynes ET. Information Theory and Statistical Mechanics II. Physical Review. 1957;108(2):171–190.
- View Article
- Google Scholar
40. Jaynes ET. Probability theory: the logic of science. Cambridge: Cambridge university press; 2003.
41. Murphy KP. Machine learning: a probabilistic perspective. The MIT Press; 2012.
42. Balescu R. Matter out of Equilibrium. World Scientific; 1997.
43. Goldstein S, Lebowitz JL. On the (Boltzmann) entropy of non-equilibrium systems. Physica D: Nonlinear Phenomena. 2004;193(1):53–66.
- View Article
- Google Scholar
44. Durbin R, Eddy SR, Krogh A, Mitchison G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press; 1998.
45. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. Pfam: the protein families database. Nucleic acids research. 2014;42:D222–D230. pmid:24288371
- View Article
- PubMed/NCBI
- Google Scholar
46. Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nature methods. 2012;9(2):173–175.
- View Article
- Google Scholar
47. Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic acids research. 2011;p. gkr367.
- View Article
- Google Scholar
48. Baldassi C, Zamparo M, Feinauer C, Procaccini A, Zecchina R, Weigt M, et al. Fast and accurate multivariate Gaussian modeling of protein families: Predicting residue contacts and protein-interaction partners. PloS one. 2014;9(3):e92721. pmid:24663061
- View Article
- PubMed/NCBI
- Google Scholar
49. Lapedes AS, Giraud BG, Liu LC, Stormo GD. A Maximum Entropy Formalism for Disentangling Chains of Correlated Sequence Positions. In: Proceedings of the IMS/AMS International Conference on Statistics in Molecular Biology and Genetics; 1998. p. 236–256.
50. Santolini M, Mora T, Hakim V. A general pairwise interaction model provides an accurate description of in vivo transcription factor binding sites. PloS one. 2014;9(6):e99015. pmid:24926895
- View Article
- PubMed/NCBI
- Google Scholar
51. Ekeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E. Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models. Physical Review E. 2013;87(1):012707.
- View Article
- Google Scholar
52. Bishop CM. Pattern recognition and machine learning. New York: Springer-Verlag; 2006.
53. Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the lasso. The Annals of Statistics. 2006;34(3):1436–1462.
- View Article
- Google Scholar
54. Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008;9(3):432–441. pmid:18079126
- View Article
- PubMed/NCBI
- Google Scholar
55. Witten DM, Tibshirani R. Covariance-regularized regression and classification for high dimensional problems. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2009;71(3):615–636.
- View Article
- Google Scholar
56. Ledoit O, Wolf M. A well-conditioned estimator for large-dimensional covariance matrices. Journal of multivariate analysis. 2004;88(2):365–411.
- View Article
- Google Scholar
57. Kappen HJ, Rodriguez F. Efficient learning in Boltzmann machines using linear response theory. Neural Computation. 1998;10(5):1137–1156.
- View Article
- Google Scholar
58. Tanaka T. Mean-field theory of Boltzmann machine learning. Physical Review E. 1998;58(2):2302–2310.
- View Article
- Google Scholar
59. Roudi Y, Aurell E, Hertz JA. Statistical physics of pairwise probability models. Frontiers in computational neuroscience. 2009;3. pmid:19242556
- View Article
- PubMed/NCBI
- Google Scholar
60. Wainwright MJ, Jordan MI. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning. 2008;1(1–2):1–305.
- View Article
- Google Scholar
61. Broderick T, Dudik M, Tkacik G, Schapire RE, Bialek W. Faster solutions of the inverse pairwise Ising problem. arXiv preprint arXiv:07122437. 2007;.
62. Besag J. Statistical analysis of non-lattice data. The Statistician. 1975;24(3):179–195.
- View Article
- Google Scholar
63. Liang P, Jordan MI. An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators. In: Proceedings of the 25th international conference on Machine learning. ACM; 2008. p. 584–591.
64. Höfling H, Tibshirani R. Estimation of sparse binary pairwise markov networks using pseudo-likelihoods. The Journal of Machine Learning Research. 2009;10:883–906.
- View Article
- Google Scholar
65. Balakrishnan S, Kamisetty H, Carbonell JG, Lee SI, Langmead CJ. Learning generative models for protein fold families. Proteins: Structure, Function, and Bioinformatics. 2011;79(4):1061–1078.
- View Article
- Google Scholar
66. Kamisetty H, Ovchinnikov S, Baker D. Assessing the utility of coevolution-based residue—residue contact predictions in a sequence-and structure-rich era. Proceedings of the National Academy of Sciences. 2013;110(39):15674–15679.
- View Article
- Google Scholar
67. Ovchinnikov S, Kamisetty H, Baker D. Robust and accurate prediction of residue—residue interactions across protein interfaces using evolutionary information. eLife. 2014;3: e02030. pmid:24842992
- View Article
- PubMed/NCBI
- Google Scholar
68. Wainwright MJ, Jordan MI. Log-determinant relaxation for approximate inference in discrete Markov random fields. IEEE Transactions on Signal Processing. 2006;54(6):2099–2109.
- View Article
- Google Scholar
69. Banerjee O, El Ghaoui L, d’Aspremont A. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. The Journal of Machine Learning Research. 2008;9:485–516.
- View Article
- Google Scholar
70. Ravikumar P, Wainwright MJ, Lafferty JD. High-dimensional Ising model selection using l1-regularized logistic regression. The Annals of Statistics. 2010;38(3):1287–1319.
- View Article
- Google Scholar
71. Sessak V, Monasson R. Small-correlation expansions for the inverse Ising problem. Journal of Physics A: Mathematical and Theoretical. 2009;42(5):055001.
- View Article
- Google Scholar
72. Lapedes AS, Giraud BG, Liu L, Stormo GD. Correlated mutations in models of protein sequences: phylogenetic and structural effects. In: Statistics in Molecular Biology/IMS Lecture Notes—Monograph Series. JSTOR; 1999. p. 236–256.
- View Article
- Google Scholar
73. Rockman MV. Reverse engineering the genotype—phenotype map with natural genetic variation. Nature. 2008;456(7223):738–744. pmid:19079051
- View Article
- PubMed/NCBI
- Google Scholar
74. Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D. Methods of integrating data to uncover genotype-phenotype interactions. Nature Reviews Genetics. 2015;16(2):85–97. pmid:25582081
- View Article
- PubMed/NCBI
- Google Scholar
75. Russ WP, Lowery DM, Mishra P, Yaffe MB, Ranganathan R. Natural-like function in artificial WW domains. Nature. 2005; 437(7058):579–583. pmid:16177795
- View Article
- PubMed/NCBI
- Google Scholar
76. EVFold. http://evfold.org.
77. Direct Coupling Analysis. http://dca.rice.edu.
78. Ekeberg M. pseudolikelihood maximization Direct-Coupling Analysis. http://plmdca.csc.kth.se.
79. Pagnani A. Pseudo Likelihood Maximization for protein in Julia. https://github.com/pagnani/PlmDCA.
80. CCMpred. https://bitbucket.org/soedinglab/ccmpred.
81. Gremlin. http://gremlin.bakerlab.org.
82. Psicov. http://bioinfadmin.cs.ucl.ac.uk/downloads/PSICOV.
83. Friedman J, Hastie T, Tibshirani R. Graphical lasso in R and Matlab. http://statweb.stanford.edu/~tibs/glasso/.
84. Witten DM, Tibshirani R. scout: Implements the Scout method for Covariance-Regularized Regression. http://cran.r-project.org/web/packages/scout/index.html.
85. Schäfer J, Opgen-Rhein R, Strimmer K. Modeling and Inferring Gene Networks. http://strimmerlab.org/software/genenet/.
86. Schäfer J, Opgen-Rhein R, Zuber V, Ahdesmäki M, Silva APD, Strimmer K. Efficient Estimation of Covariance and (Partial) Correlation. http://strimmerlab.org/software/corpcor/.

[ref1] 1. Lezon TR, Banavar JR, Cieplak M, Maritan A, Fedoroff NV. Using the principle of entropy maximization to infer genetic interaction networks from gene expression patterns. Proceedings of the National Academy of Sciences of the United States of America. 2006;103(50):19033–19038. pmid:17138668
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Locasale JW, Wolf-Yadlin A. Maximum entropy reconstructions of dynamic signaling networks from quantitative proteomics data. PloS one. 2009;4(8):e6522. pmid:19707567
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Schneidman E, Berry II MJ, Segev R, Bialek W. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature. 2006;440:1007–1012. pmid:16625187
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Tang A, Jackson D, Hobbs J, Chen W, Smith JL, Patel H, et al. A maximum entropy model applied to spatial and temporal correlations from cortical networks in vitro. The Journal of Neuroscience. 2008;28(2):505–518. pmid:18184793
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Weigt M, White RA, Szurmant H, Hoch JA, Hwa T. Identification of direct residue contacts in protein—protein interaction by message passing. Proceedings of the National Academy of Sciences of the United States of America. 2009;106(1):67–72. pmid:19116270
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, et al. Protein 3D Structure Computed from Evolutionary Sequence Variation. PLoS One. 2011;6(12):e28766. pmid:22163331
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Morcos F, Pagnani A, Lunt B, Bertolino A, Marks D, Sander C, et al. Direct-coupling analysis of residue co-evolution captures native contacts across many protein families. Proceedings of the National Academy of Sciences of the United States of America. 2011;108:E1293–E1301. pmid:22106262
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref8] 8. Jones DT, Buchan DWA, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics. 2012;28(2):184–190. pmid:22101153
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref9] 9. Stephens GJ, Bialek W. Statistical mechanics of letters in words. Physical Review E. 2010;81(6):066119.
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref10] 10. Bialek W, Cavagna A, Giardina I, Mora T, Silvestri E, Viale M, et al. Statistical mechanics for natural flocks of birds. Proceedings of the National Academy of Sciences. 2012;109(13):4786–4791.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref11] 11. Wood K, Nishida S, Sontag ED, Cluzel P. Mechanism-independent method for predicting response to multidrug combinations in bacteria. Proceedings of the National Academy of Sciences of the United States of America. 2012;109(30):12254–12259. pmid:22773816
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref12] 12. Whittaker J. Graphical models in applied multivariate statistics. Wiley Publishing; 2009.

[ref13] 13. Lauritzen SL. Graphical models. Oxford: Oxford University Press; 1996.

[ref14] 14. Butte AJ, Kohane IS. Unsupervised knowledge discovery in medical databases using relevance networks. In: Proceedings of the AMIA Symposium. American Medical Informatics Association; 1999. p. 711–715.

[ref15] 15. Toh H, Horimoto K. Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling. Bioinformatics. 2002;18(2):287–297. pmid:11847076
View Article
PubMed/NCBI
Google Scholar

[47] View Article

[48] PubMed/NCBI

[49] Google Scholar

[ref16] 16. Dobra A, Hans C, Jones B, Nevins JR, Yao G, West M. Sparse graphical models for exploring gene expression data. Journal of Multivariate Analysis. 2004;90(1):196–212.
View Article
Google Scholar

[51] View Article

[52] Google Scholar

[ref17] 17. Schäfer J, Strimmer K. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical applications in genetics and molecular biology. 2005;4(1):1–32.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref18] 18. Roudi Y, Nirenberg S, Latham PE. Pairwise maximum entropy models for studying large biological systems: when they can work and when they can’t. PLoS Computational Biology. 2009;5(5):e1000380. pmid:19424487
View Article
PubMed/NCBI
Google Scholar

[57] View Article

[58] PubMed/NCBI

[59] Google Scholar

[ref19] 19. Cramér H. Mathematical methods of statistics. vol. 9. Princeton university press; 1999.

[ref20] 20. Guttman L. A note on the derivation of formulae for multiple and partial correlation. The Annals of Mathematical Statistics. 1938;9(4):305–308.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref21] 21. Krumsiek J, Suhre K, Illig T, Adamski J, Theis FJ. Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics data. BMC Systems Biology. 2011;5(1):21.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref22] 22. Giraud BG and Heumann , John M and Lapedes , Alan S. Superadditive correlation. Physical Review E. 1999;59(5):4983–4991.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref23] 23. Neher E. How frequent are correlated changes in families of protein sequences? Proceedings of the National Academy of Sciences of the United States of America. 1994;91(1):98–102. pmid:8278414
View Article
PubMed/NCBI
Google Scholar

[71] View Article

[72] PubMed/NCBI

[73] Google Scholar

[ref24] 24. Göbel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins. 1994;18(4):309–317. pmid:8208723
View Article
PubMed/NCBI
Google Scholar

[75] View Article

[76] PubMed/NCBI

[77] Google Scholar

[ref25] 25. Taylor WR, Hatrick K. Compensating changes in protein multiple sequence alignments. Protein Engineering. 1994;7(3):341–348. pmid:8177883
View Article
PubMed/NCBI
Google Scholar

[79] View Article

[80] PubMed/NCBI

[81] Google Scholar

[ref26] 26. Shindyalov IN and Kolchanov NA and Sander C. Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Engineering. 1994;7(3):349–358. pmid:8177884
View Article
PubMed/NCBI
Google Scholar

[83] View Article

[84] PubMed/NCBI

[85] Google Scholar

[ref27] 27. Dunn SD, Wahl LM, Gloor GB. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics. 2008;24(3):333–340. pmid:18057019
View Article
PubMed/NCBI
Google Scholar

[87] View Article

[88] PubMed/NCBI

[89] Google Scholar

[ref28] 28. Burger L, Van Nimwegen E. Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS computational biology. 2010;6(1):e1000633. pmid:20052271
View Article
PubMed/NCBI
Google Scholar

[91] View Article

[92] PubMed/NCBI

[93] Google Scholar

[ref29] 29. Atchley WR, Wollenberg KR, Fitch WM, Terhalle W, Dress AW. Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. Molecular biology and evolution. 2000;17(1):164–178. pmid:10666716
View Article
PubMed/NCBI
Google Scholar

[95] View Article

[96] PubMed/NCBI

[97] Google Scholar

[ref30] 30. Lapedes A, Giraud B, Jarzynski C. Using Sequence Alignments to Predict Protein Structure and Stability With High Accuracy. eprint arXiv:12072484. 2002;.

[ref31] 31. Ciriello G, Miller ML, Aksoy BA, Senbabaoglu Y, Schultz N, Sander C. Emerging landscape of oncogenic signatures across human cancers. Nature genetics. 2013;45(10):1127–1133. pmid:24071851
View Article
PubMed/NCBI
Google Scholar

[100] View Article

[101] PubMed/NCBI

[102] Google Scholar

[ref32] 32. Koller D, Friedman N. Probabilistic graphical models: principles and techniques. MIT press; 2009.

[ref33] 33. Mead LR, Papanicolaou N. Maximum entropy in the problem of moments. Journal of Mathematical Physics. 1984;25:2404–2417.
View Article
Google Scholar

[105] View Article

[106] Google Scholar

[ref34] 34. MacKay DJ. Information theory, inference and learning algorithms. Cambridge university press; 2003.

[ref35] 35. Cover TM, Thomas AJ. Elements of information theory. John Wiley & Sons; 2012.

[ref36] 36. Agmon N, Alhassid Y, Levine RD. An algorithm for finding the distribution of maximal entropy. Journal of Computational Physics. 1979;30(2):250–258.
View Article
Google Scholar

[110] View Article

[111] Google Scholar

[ref37] 37. Shannon CE. A Mathematical Theory of Communication. Bell system technical journal. 1948;27(3):379–423.
View Article
Google Scholar

[113] View Article

[114] Google Scholar

[ref38] 38. Jaynes ET. Information Theory and Statistical Mechanics. Physical Review. 1957;106(4):620–630.
View Article
Google Scholar

[116] View Article

[117] Google Scholar

[ref39] 39. Jaynes ET. Information Theory and Statistical Mechanics II. Physical Review. 1957;108(2):171–190.
View Article
Google Scholar

[119] View Article

[120] Google Scholar

[ref40] 40. Jaynes ET. Probability theory: the logic of science. Cambridge: Cambridge university press; 2003.

[ref41] 41. Murphy KP. Machine learning: a probabilistic perspective. The MIT Press; 2012.

[ref42] 42. Balescu R. Matter out of Equilibrium. World Scientific; 1997.

[ref43] 43. Goldstein S, Lebowitz JL. On the (Boltzmann) entropy of non-equilibrium systems. Physica D: Nonlinear Phenomena. 2004;193(1):53–66.
View Article
Google Scholar

[125] View Article

[126] Google Scholar

[ref44] 44. Durbin R, Eddy SR, Krogh A, Mitchison G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press; 1998.

[ref45] 45. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. Pfam: the protein families database. Nucleic acids research. 2014;42:D222–D230. pmid:24288371
View Article
PubMed/NCBI
Google Scholar

[129] View Article

[130] PubMed/NCBI

[131] Google Scholar

[ref46] 46. Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nature methods. 2012;9(2):173–175.
View Article
Google Scholar

[133] View Article

[134] Google Scholar

[ref47] 47. Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic acids research. 2011;p. gkr367.
View Article
Google Scholar

[136] View Article

[137] Google Scholar

[ref48] 48. Baldassi C, Zamparo M, Feinauer C, Procaccini A, Zecchina R, Weigt M, et al. Fast and accurate multivariate Gaussian modeling of protein families: Predicting residue contacts and protein-interaction partners. PloS one. 2014;9(3):e92721. pmid:24663061
View Article
PubMed/NCBI
Google Scholar

[139] View Article

[140] PubMed/NCBI

[141] Google Scholar

[ref49] 49. Lapedes AS, Giraud BG, Liu LC, Stormo GD. A Maximum Entropy Formalism for Disentangling Chains of Correlated Sequence Positions. In: Proceedings of the IMS/AMS International Conference on Statistics in Molecular Biology and Genetics; 1998. p. 236–256.

[ref50] 50. Santolini M, Mora T, Hakim V. A general pairwise interaction model provides an accurate description of in vivo transcription factor binding sites. PloS one. 2014;9(6):e99015. pmid:24926895
View Article
PubMed/NCBI
Google Scholar

[144] View Article

[145] PubMed/NCBI

[146] Google Scholar

[ref51] 51. Ekeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E. Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models. Physical Review E. 2013;87(1):012707.
View Article
Google Scholar

[148] View Article

[149] Google Scholar

[ref52] 52. Bishop CM. Pattern recognition and machine learning. New York: Springer-Verlag; 2006.

[ref53] 53. Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the lasso. The Annals of Statistics. 2006;34(3):1436–1462.
View Article
Google Scholar

[152] View Article

[153] Google Scholar

[ref54] 54. Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008;9(3):432–441. pmid:18079126
View Article
PubMed/NCBI
Google Scholar

[155] View Article

[156] PubMed/NCBI

[157] Google Scholar

[ref55] 55. Witten DM, Tibshirani R. Covariance-regularized regression and classification for high dimensional problems. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2009;71(3):615–636.
View Article
Google Scholar

[159] View Article

[160] Google Scholar

[ref56] 56. Ledoit O, Wolf M. A well-conditioned estimator for large-dimensional covariance matrices. Journal of multivariate analysis. 2004;88(2):365–411.
View Article
Google Scholar

[162] View Article

[163] Google Scholar

[ref57] 57. Kappen HJ, Rodriguez F. Efficient learning in Boltzmann machines using linear response theory. Neural Computation. 1998;10(5):1137–1156.
View Article
Google Scholar

[165] View Article

[166] Google Scholar

[ref58] 58. Tanaka T. Mean-field theory of Boltzmann machine learning. Physical Review E. 1998;58(2):2302–2310.
View Article
Google Scholar

[168] View Article

[169] Google Scholar

[ref59] 59. Roudi Y, Aurell E, Hertz JA. Statistical physics of pairwise probability models. Frontiers in computational neuroscience. 2009;3. pmid:19242556
View Article
PubMed/NCBI
Google Scholar

[171] View Article

[172] PubMed/NCBI

[173] Google Scholar

[ref60] 60. Wainwright MJ, Jordan MI. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning. 2008;1(1–2):1–305.
View Article
Google Scholar

[175] View Article

[176] Google Scholar

[ref61] 61. Broderick T, Dudik M, Tkacik G, Schapire RE, Bialek W. Faster solutions of the inverse pairwise Ising problem. arXiv preprint arXiv:07122437. 2007;.

[ref62] 62. Besag J. Statistical analysis of non-lattice data. The Statistician. 1975;24(3):179–195.
View Article
Google Scholar

[179] View Article

[180] Google Scholar

[ref63] 63. Liang P, Jordan MI. An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators. In: Proceedings of the 25th international conference on Machine learning. ACM; 2008. p. 584–591.

[ref64] 64. Höfling H, Tibshirani R. Estimation of sparse binary pairwise markov networks using pseudo-likelihoods. The Journal of Machine Learning Research. 2009;10:883–906.
View Article
Google Scholar

[183] View Article

[184] Google Scholar

[ref65] 65. Balakrishnan S, Kamisetty H, Carbonell JG, Lee SI, Langmead CJ. Learning generative models for protein fold families. Proteins: Structure, Function, and Bioinformatics. 2011;79(4):1061–1078.
View Article
Google Scholar

[186] View Article

[187] Google Scholar

[ref66] 66. Kamisetty H, Ovchinnikov S, Baker D. Assessing the utility of coevolution-based residue—residue contact predictions in a sequence-and structure-rich era. Proceedings of the National Academy of Sciences. 2013;110(39):15674–15679.
View Article
Google Scholar

[189] View Article

[190] Google Scholar

[ref67] 67. Ovchinnikov S, Kamisetty H, Baker D. Robust and accurate prediction of residue—residue interactions across protein interfaces using evolutionary information. eLife. 2014;3: e02030. pmid:24842992
View Article
PubMed/NCBI
Google Scholar

[192] View Article

[193] PubMed/NCBI

[194] Google Scholar

[ref68] 68. Wainwright MJ, Jordan MI. Log-determinant relaxation for approximate inference in discrete Markov random fields. IEEE Transactions on Signal Processing. 2006;54(6):2099–2109.
View Article
Google Scholar

[196] View Article

[197] Google Scholar

[ref69] 69. Banerjee O, El Ghaoui L, d’Aspremont A. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. The Journal of Machine Learning Research. 2008;9:485–516.
View Article
Google Scholar

[199] View Article

[200] Google Scholar

[ref70] 70. Ravikumar P, Wainwright MJ, Lafferty JD. High-dimensional Ising model selection using l1-regularized logistic regression. The Annals of Statistics. 2010;38(3):1287–1319.
View Article
Google Scholar

[202] View Article

[203] Google Scholar

[ref71] 71. Sessak V, Monasson R. Small-correlation expansions for the inverse Ising problem. Journal of Physics A: Mathematical and Theoretical. 2009;42(5):055001.
View Article
Google Scholar

[205] View Article

[206] Google Scholar

[ref72] 72. Lapedes AS, Giraud BG, Liu L, Stormo GD. Correlated mutations in models of protein sequences: phylogenetic and structural effects. In: Statistics in Molecular Biology/IMS Lecture Notes—Monograph Series. JSTOR; 1999. p. 236–256.
View Article
Google Scholar

[208] View Article

[209] Google Scholar

[ref73] 73. Rockman MV. Reverse engineering the genotype—phenotype map with natural genetic variation. Nature. 2008;456(7223):738–744. pmid:19079051
View Article
PubMed/NCBI
Google Scholar

[211] View Article

[212] PubMed/NCBI

[213] Google Scholar

[ref74] 74. Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D. Methods of integrating data to uncover genotype-phenotype interactions. Nature Reviews Genetics. 2015;16(2):85–97. pmid:25582081
View Article
PubMed/NCBI
Google Scholar

[215] View Article

[216] PubMed/NCBI

[217] Google Scholar

[ref75] 75. Russ WP, Lowery DM, Mishra P, Yaffe MB, Ranganathan R. Natural-like function in artificial WW domains. Nature. 2005; 437(7058):579–583. pmid:16177795
View Article
PubMed/NCBI
Google Scholar

[219] View Article

[220] PubMed/NCBI

[221] Google Scholar

[ref76] 76. EVFold. http://evfold.org.

[ref77] 77. Direct Coupling Analysis. http://dca.rice.edu.

[ref78] 78. Ekeberg M. pseudolikelihood maximization Direct-Coupling Analysis. http://plmdca.csc.kth.se.

[ref79] 79. Pagnani A. Pseudo Likelihood Maximization for protein in Julia. https://github.com/pagnani/PlmDCA.

[ref80] 80. CCMpred. https://bitbucket.org/soedinglab/ccmpred.

[ref81] 81. Gremlin. http://gremlin.bakerlab.org.

[ref82] 82. Psicov. http://bioinfadmin.cs.ucl.ac.uk/downloads/PSICOV.

[ref83] 83. Friedman J, Hastie T, Tibshirani R. Graphical lasso in R and Matlab. http://statweb.stanford.edu/~tibs/glasso/.

[ref84] 84. Witten DM, Tibshirani R. scout: Implements the Scout method for Covariance-Regularized Regression. http://cran.r-project.org/web/packages/scout/index.html.

[ref85] 85. Schäfer J, Opgen-Rhein R, Strimmer K. Modeling and Inferring Gene Networks. http://strimmerlab.org/software/genenet/.

[ref86] 86. Schäfer J, Opgen-Rhein R, Zuber V, Ahdesmäki M, Silva APD, Strimmer K. Efficient Estimation of Covariance and (Partial) Correlation. http://strimmerlab.org/software/corpcor/.

Figures

Abstract

Introduction

Gene association networks

Protein contact prediction

Deriving the Probabilistic Model

Model formulation for continuous random variables

Concept of entropy maximization

Categorical random variables

Gauge fixing

Network interpretation

Inference of Interactions

Closed-Form Solution for Continuous Variables

Data integration

Solution for categorical variables

Maximum-Likelihood Inference

Stochastic maximum likelihood

Pseudo-likelihood maximization

Sparse maximum likelihood

Sequence data preprocessing

Scoring Functions for the Pairwise Interaction Strengths

Discussion of Results, Improvements, and Applications

Acknowledgments

References