QM Logo Fabrizio


Mebomine Logo
 

A Self-training Approach for Functional Annotation of UniProtKB Proteins

  1. Abdollahyan, R. Saidi, F. Smeraldi and M.-J. Martin: Poster and Lightning Talk, Function SIG, ISMB/ECCB 2017, in: Recall, vol. 71, pages 87.4, 2017


Abstract


Automatic annotation systems are essential to reduce the gap between the amount of protein sequence data and functional information in public databases such as UniProtKB. These systems rely on manually annotated (also called labelled) data to learn rules for predicting annotations. Manually labelled data are, however, often scarce or time consuming to obtain as they have to be reviewed by expert human curators. On the other hand, unlabelled data are abundant and comparatively easy to gather. In this work, we present a self-training automatic annotation approach that utilises unlabelled data in order to improve the accuracy of predictions. We evaluated our system on a set of entries in UniProtKB/Swiss-Prot. The results show improvement in different performance metrics when self-training is used. The generated model was then used to predict metabolic pathway involvement of UniProtKB/TrEMBL proteins. As a result, it covered 86% of the proteins currently annotated by UniProt pipelines, but also could annotate 6.7 million proteins that lacked any previous pathway annotations.

(full text)


Backlinks: Publications