Probabilistic framework for integration of mass spectrum and retention time information in small molecule identification

Bach, E., Rogers, S. , Williamson, J. and Rousu, J. (2021) Probabilistic framework for integration of mass spectrum and retention time information in small molecule identification. Bioinformatics, 37(12), pp. 1724-1731. (doi: 10.1093/bioinformatics/btaa998) (PMID:33244585)

[img] Text
230634.pdf - Published Version
Available under License Creative Commons Attribution.



Motivation: Identification of small molecules in a biological sample remains a major bottleneck in molecular biology, despite a decade of rapid development of computational approaches for predicting molecular structures using mass spectrometry (MS) data. Recently, there has been increasing interest in utilizing other information sources, such as liquid chromatography (LC) retention time (RT), to improve identifications solely based on MS information, such as precursor mass-per-charge and tandem mass spectrometry (MS2). Results: We put forward a probabilistic modelling framework to integrate MS and RT data of multiple features in an LC-MS experiment. We model the MS measurements and all pairwise retention order information as a Markov random field and use efficient approximate inference for scoring and ranking potential molecular structures. Our experiments show improved identification accuracy by combining MS2 data and retention orders using our approach, thereby outperforming state-of-the-art methods. Furthermore, we demonstrate the benefit of our model when only a subset of LC-MS features has MS2 measurements available besides MS1. Availability and implementation: Software and data are freely available at Supplementary information: Supplementary data are available at Bioinformatics online.

Item Type:Articles
Glasgow Author(s) Enlighten ID:Bach, Eric and Rogers, Dr Simon and Williamson, Dr John
Authors: Bach, E., Rogers, S., Williamson, J., and Rousu, J.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Bioinformatics
Publisher:Oxford University Press
ISSN (Online):1367-4811
Published Online:27 November 2020
Copyright Holders:Copyright © 2020 The Authors
First Published:First published in Bioinformatics 2020
Publisher Policy:Reproduced under a Creative Commons License

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
300982Exploiting Closed-Loop Aspects in Computationally and Data Intensive AnalyticsRoderick Murray-SmithEngineering and Physical Sciences Research Council (EPSRC)EP/R018634/1Computing Science