QM Logo Fabrizio

Handling Missing Features with Boosting Algorithms for Protein-Protein interaction prediction

  1. Smeraldi, M. Defoin-Platel and M. Saqi, in: Proceedings of the 7th International Conference on Data Integration in the Life Sciences, pp 132-147, Gothenburg (Sweden), Aug 2010, DOI:10.1007/978-3-642-15120-0_11


Combining information from multiple heterogeneous data sources can aid prediction of protein-protein interaction. This information can be arranged into a feature vector for classification. However, missing values in the data can impact on the prediction accuracy. Boosting has emerged as a powerful tool for feature selection and classification. Bayesian methods have traditionally been used to cope with missing data, with boosting being applied to the output of Bayesian classifiers. We explore a variation of Adaboost
that deals with the missing values at the level of the boosting algorithm itself, without the need for any density estimation step. Experiments on a publicly available PPI dataset suggest this overall simpler and mathematically coherent approach may be more accurate.

Full paper (PDF)

Backlinks: Publications , Home