ࡱ > 6 8 5 #` 7 bjbjmm X m. t t t t 4 ( ( , K . y U W W W W W W $ ! h j# { u K { t t , I $ t , U H T , pś 0 0$ u 0$ 0$ { { $ t t t t t t Text S2: The Bayesian score and the maximum likelihood score
Here we present two scoring methods the maximum likelihood score and the Bayesian score and explain why we chose the later. The maximum likelihood score is based on the probability of the data given a tree-CPD with parameters equal to their maximum likelihood estimators, formally:
where I is the set of data instances, T is a tree-CPD and (T are the set of maximum likelihood estimators for the parameters of T. This score is also proportional to the mutual information of the data and the tree-CPD minus the entropy of the data, under the distribution imposed by the maximum likelihood estimators ADDIN EN.CITE Cover2006116Cover, T. M.Thomas, Joy A.Elements of information theoryxxiii, 748 p.2ndInformation theory.2006Hoboken, N.J.Wiley-Interscience0471241954ENG Q360.C68 2006
SCI Q360.C68 2006http://www.loc.gov/catdir/enhancements/fy0624/2005047799-d.htmlhttp://www.loc.gov/catdir/enhancements/fy0624/2005047799-t.html [1], formally:
Where is the distribution of chemical synapse formation imposed by the decomposition of I over T. For a complete tree, this score is equivalent to the conditional entropy measure that was used by Varadan et al. in ADDIN EN.CITE Varadan20062217Varadan, V.Miller, D. M., 3rdAnastassiou, D.Center for Computational Biology and Bioinformatics (C2B2), and Department of Electrical Engineering, Columbia University, New York, NY 10027, USA.Computational inference of the molecular logic for synaptic connectivity in C. elegansBioinformaticsBioinformatics (Oxford, England)BioinformaticsBioinformatics (Oxford, England)BioinformaticsBioinformatics (Oxford, England)e497-5062214AnimalsBrain/physiologyCaenorhabditis elegans/*physiologyCaenorhabditis elegans Proteins/*metabolismComputer SimulationGene Expression Profiling/methodsLogistic Models*Models, NeurologicalNerve Net/*physiologyNerve Tissue Proteins/*metabolismNeural Pathways/*physiologySynapses/*physiologySynaptic Transmission/physiology2006Jul 151460-2059 (Electronic)16873513http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16873513 eng[2] to search for a group of genes that minimize the uncertainty of the existence of synapses. However, this score tends to overfit the data, since almost every addition of nodes to the tree will increase this score as the uncertainty of the existence of synapses is decreased. Thus it is necessary to pre-determine the number of expected interacting genes, as was done by Varadan et al. in ADDIN EN.CITE Varadan20062217Varadan, V.Miller, D. M., 3rdAnastassiou, D.Center for Computational Biology and Bioinformatics (C2B2), and Department of Electrical Engineering, Columbia University, New York, NY 10027, USA.Computational inference of the molecular logic for synaptic connectivity in C. elegansBioinformaticsBioinformatics (Oxford, England)BioinformaticsBioinformatics (Oxford, England)BioinformaticsBioinformatics (Oxford, England)e497-5062214AnimalsBrain/physiologyCaenorhabditis elegans/*physiologyCaenorhabditis elegans Proteins/*metabolismComputer SimulationGene Expression Profiling/methodsLogistic Models*Models, NeurologicalNerve Net/*physiologyNerve Tissue Proteins/*metabolismNeural Pathways/*physiologySynapses/*physiologySynaptic Transmission/physiology2006Jul 151460-2059 (Electronic)16873513http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16873513 eng[2].
The second scoring method is the Bayesian score ADDIN EN.CITE Heckerman19953317Heckerman, DGeiger, DChickernig, DMLearning Bayesian Networks: The Combination of Knowledge and Statistical Data.Machine LearningMachine Learning197-243201995eng[3]. This score is proportional to the posterior probability of the tree given the data, formally,
where the first term of the score depends on the probability of the structure of the tree-CPD and the second term is equal to
This second term evaluates the fit of the model to the data by averaging the likelihood of the data over all possible parameterization of the model. In addition it enables the incorporation of prior knowledge over the parameters of the model. The most simple and intuitive priors are the Dirichlet priors. These priors are specified by a set of hyperparameters corresponding to the number of imaginary examples we saw before starting the experiment. When the data is fully observed (like in our case) and the priors are in Dirichlet form this score has a simple analytic form as a function of the sufficient statistics of the model and the hyperparameters of the Dirichlet priors ADDIN EN.CITE Heckerman19953317Heckerman, DGeiger, DChickernig, DMLearning Bayesian Networks: The Combination of Knowledge and Statistical Data.Machine LearningMachine Learning197-243201995eng[3]. In our case, since we do not have prior knowledge for the structure of the tree nor for its parameters, we have used a uniform prior for the space of trees, and Dirichlet priors with all the hyperparameters equal to one for the parameters of the tree.
At the limit of the Bayesian score is equal to the Bayesian information criteria (BIC) ADDIN EN.CITE Haughton19886617Haughton, Dominique Marie-AnnickOn the choice of a model to fit data from an exponential family.The Annals of StatisticsThe Annals of Statistics342-355161988engSchwarz19785517Schwarz, GEstimating the dimension of a model.The Annals of StatisticsThe Annals of Statistics461-46461978[4,5]:
Where W is the sum of weights of I and Dim(T) is proportional to the number of parameters of T. The meaning of this is that Bayesian score is proportional to the Likelihood score minus a factor that is proportional to the dimensionality of the model, thus it exhibits a tradeoff between fit to data and complexity. As W grows, more emphasis is given to the fit to the data. This way, the use of the Bayesian score relaxes the demand for pre-determining the number of expected interacting genes.
Figure S3 compares the prediction performance of the tree-CPDs that were learned using a Bayesian scoring method to those of tree-CPDs that were learned using the maximum likelihood scoring method. In both of the scoring methods the performance on the train set of deep tree-CPDs was better than the performance of tree-CPDs that were restricted to a shallower maximal depth of the leaves. However, using the maximum likelihood score, tree-CPDs that were learned without restriction on the maximal depth of the leaves performed worse than tree-CPDs that were learned with this restriction. This was not the case when the Bayesian score was used, thus demonstrating how classifier learned with Bayesian score is less prone to overfitting than classifier learned with maximum likelihood score.
References
ADDIN EN.REFLIST 1. Cover TM, Thomas JA (2006) Elements of information theory. Hoboken, N.J.: Wiley-Interscience. xxiii, 748 p. p.
2. Varadan V, Miller DM, 3rd, Anastassiou D (2006) Computational inference of the molecular logic for synaptic connectivity in C. elegans. Bioinformatics 22: e497-506.
3. Heckerman D, Geiger D, Chickernig D (1995) Learning Bayesian Networks: The Combination of Knowledge and Statistical Data. Machine Learning 20: 197-243.
4. Haughton DM-A (1988) On the choice of a model to fit data from an exponential family. The Annals of Statistics 16: 342-355.
5. Schwarz G (1978) Estimating the dimension of a model. The Annals of Statistics 6: 461-464.
EMBED Equation.3
EMBED Equation.3
EMBED Equation.3
EMBED Equation.3
EMBED Equation.3
EMBED Equation.3
EMBED Equation.3
$ - < = > ? [ \ ^ c d e f X
Y
žwpld`dld hu j hfR UhfR h hJp hv hJp 6] h\N hJp 6H*]h\N hJp 6]o( j q h\N hJp 6]o(hJp 6] hJU hJp o(hG hJp 6] hJU hJp h\N hJp j hJp UmH nH tHu hY hJp hJp 5\ h>3 5\ h: 5\ hM 5\% = > ? [ ] ^ " " " p# s# t# ) . $h7$ 8$ H$ `ha$gdu $h7$ 8$ H$ `ha$gdfR $7$ 8$ H$ a$gd` $7$ 8$ H$ a$gdfR $7$ 8$ H$ a$gdJp $h7$ 8$ H$ `ha$gdJp 7$ 8$ H$ gdM m6 7
= ] ^ _ c d e h
" " " " " " " " " " p# hJp 6] hGj hJp
hu PJ j hfR PJ U
hfR PJ
hJp PJ
h` PJ h) h` PJ h) h` h` hfR hu j hfR UhEW hJp 6] hEW hJp j hJp UmH nH tHu hJp 0p# q# s# t# $ $ $ $ $ 4% % % &