Reader Comments

Post a new comment on this article

Predictive feature extraction

Posted by akundaje on 01 Nov 2008 at 23:31 GMT

The article is a great review of the use of SVMs and kernels in compbio. One of the problems though with SVMs and the use of some kernels is that it is rather difficult to delve into them and figure out which features in the original feature space contribute to the classification. This is something that is very desirable along with sound classification or learning performance. It would have been great if the authors could have reviewed some work or suggested ideas on this aspect.

RE: Predictive feature extraction

dfernandez-reyes replied to akundaje on 12 Mar 2009 at 15:29 GMT

The problem of selecting the most discriminatory variables in the input space is indeed challenging in kernel-based classification, since at the end, the classifier uses new (transformed, combined, possibly infinite, etc.) features in the kernel-induced space. Given that the kernel trick allows the classifier to use similarity measures without explicity computing such feature space, one way to select variables is to weight the contribution of each variable while measuring the similarity, in other words, while evaluating kernel computations. Variables obtaining higher weights can be selected as relevant for the classification task whereas those having low weights can be discarded. The problem then is to adjust those weights, specially in high dimensional input spaces such as gene expression or proteomic spectra data. Several approaches have been proposed: in our paper [1] we combine a Gaussian weighted-kernel classifier with an iterative stochastic probability estimation algorithm to discover the relevance distribution over the set of variables; other authors have considered gradient-descent, genetic algorithms, evolutionary strategies, bayesian estimation (see references within) in an attempt to solve this problem.

[1] Rojas-Galeano S, Hsieh E, Agranoff D, Krishna S, Fernandez-Reyes D (2008) Estimation of Relevant Variables on High-Dimensional Biological Patterns Using Iterated Weighted Kernel Functions. PLoS ONE 3(3): e1806. doi:10.1371/journal.pone.0001806