Semi-supervised prediction of protein interaction sentences exploiting semantically encoded metrics

Polajnar, T. and Girolami, M. (2009) Semi-supervised prediction of protein interaction sentences exploiting semantically encoded metrics. Lecture Notes in Computer Science, 5780, pp. 270-281. (doi: 10.1007/978-3-642-04031-3_24)

[img] Text
6454.pdf

422kB

Publisher's URL: http://dx.doi.org/10.1007/978-3-642-04031-3_24

Abstract

Protein-protein interaction (PPI) identification is an integral component of many biomedical research and database curation tools. Automation of this task through classification is one of the key goals of text mining (TM). However, labelled PPI corpora required to train classifiers are generally small. In order to overcome this sparsity in the training data, we propose a novel method of integrating corpora that do not contain relevance judgements. Our approach uses a semantic language model to gather word similarity from a large unlabelled corpus. This additional information is integrated into the sentence classification process using kernel transformations and has a re-weighting effect on the training features that leads to an 8% improvement in F-score over the baseline results. Furthermore, we discover that some words which are generally considered indicative of interactions are actually neutralised by this process.

Item Type:Articles
Keywords:Text mining; gaussian processes; gps; kernel; semantic; hyperspace analogue to language; hal; bound encoding of the aggregate language environment; beagle
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Girolami, Prof Mark
Authors: Polajnar, T., and Girolami, M.
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Lecture Notes in Computer Science
Publisher:Springer Berlin / Heidelberg
ISSN:0302-9743
ISSN (Online):1611-3349
Published Online:31 August 2009
Copyright Holders:Copyright © 2009 Springer
First Published:First published in Lecture Notes in Computer Science 5780:270-281
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher.

University Staff: Request a correction | Enlighten Editors: Update this record