Classification of protein interaction sentences via gaussian processes

Polajnar, T., Rogers, S. and Girolami, M. (2009) Classification of protein interaction sentences via gaussian processes. Lecture Notes in Computer Science, 5780, pp. 282-292. (doi:10.1007/978-3-642-04031-3_25)

[img] Text
6452.pdf

232kB

Publisher's URL: http://dx.doi.org/10.1007/978-3-642-04031-3_25

Abstract

The increase in the availability of protein interaction studies in textual format coupled with the demand for easier access to the key results has lead to a need for text mining solutions. In the text processing pipeline, classification is a key step for extraction of small sections of relevant text. Consequently, for the task of locating protein-protein interaction sentences, we examine the use of a classifier which has rarely been applied to text, the Gaussian processes (GPs). GPs are a non-parametric probabilistic analogue to the more popular support vector machines (SVMs). We find that GPs outperform the SVM and na\"ive Bayes classifiers on binary sentence data, whilst showing equivalent performance on abstract and multiclass sentence corpora. In addition, the lack of the margin parameter, which requires costly tuning, along with the principled multiclass extensions enabled by the probabilistic framework make GPs an appealing alternative worth of further adoption.

Item Type:Articles
Keywords:Gaussian processes; support vector machines; protein interaction; text mining; bioinformatics
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Rogers, Dr Simon and Girolami, Prof Mark
Authors: Polajnar, T., Rogers, S., and Girolami, M.
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Lecture Notes in Computer Science
Publisher:Springer Berlin / Heidelberg
ISSN:0302-9743
ISSN (Online):1611-3349
Published Online:31 August 2009
Copyright Holders:Copyright © 2009 Springer
First Published:First published in Lecture Notes in Computer Science 5780:282-292
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher.

University Staff: Request a correction | Enlighten Editors: Update this record