Topic modeling for untargeted substructure exploration in metabolomics

van Der Hooft, J. J. J. , Wandy, J. , Barrett, M. P. , Burgess, K. E.V. and Rogers, S. (2016) Topic modeling for untargeted substructure exploration in metabolomics. Proceedings of the National Academy of Sciences of the United States of America, 113(48), pp. 13738-13743. (doi: 10.1073/pnas.1608041113) (PMID:27856765) (PMCID:PMC5137707)

130100.pdf - Accepted Version



The potential of untargeted metabolomics to answer important questions across the life sciences is hindered due to a paucity of computational tools that enable extraction of key biochemically relevant information. Available tools focus on using mass spectrometry fragmentation spectra to identify molecules whose behavior suggests they are relevant to the system under study. Unfortunately, fragmentation spectra cannot identify molecules in isolation, but require authentic standards or databases of known fragmented molecules. Fragmentation spectra are, however, replete with information pertaining to the biochemical processes present; much of which is currently neglected. Here we present an analytical workflow that exploits all fragmentation data from a given experiment to extract biochemically-relevant features in an unsupervised manner. We demonstrate that an algorithm originally utilized for text-mining, Latent Dirichlet Allocation, can be adapted to handle metabolomics datasets. Our approach extracts biochemically-relevant molecular substructures (‘Mass2Motifs’) from spectra as sets of co-occurring molecular fragments and neutral losses. The analysis allows us to isolate molecular substructures, whose presence allows molecules to be grouped based on shared substructures regardless of classical spectral similarity. These substructures in turn support putative de novo structural annotation of molecules. Combining this spectral connectivity to orthogonal correlations (e.g. common abundance changes under system perturbation) significantly enhances our ability to provide mechanistic explanations for biological behavior.

Item Type:Articles
Glasgow Author(s) Enlighten ID:Rogers, Dr Simon and Wandy, Dr Joe and Van Der Hooft, Mr Justin and Burgess, Dr Karl and Barrett, Professor Michael
Authors: van Der Hooft, J. J. J., Wandy, J., Barrett, M. P., Burgess, K. E.V., and Rogers, S.
College/School:College of Medical Veterinary and Life Sciences
College of Medical Veterinary and Life Sciences > School of Infection & Immunity
College of Science and Engineering > School of Computing Science
Journal Name:Proceedings of the National Academy of Sciences of the United States of America
Publisher:National Academy of Sciences
ISSN (Online):1091-6490
Published Online:16 November 2016
Copyright Holders:Copyright © 2016 National Academy of Sciences
First Published:First published in Proceedings of the National Academy of Sciences of the United States of America 2016
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher
Related URLs:
Data DOI:10.5525/gla.researchdata.313

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
632234Funding SchemesAnna DominiczakWellcome Trust (WELLCOME)105614/Z/14/ZRI CARDIOVASCULAR & MEDICAL SCIENCES
371799The Wellcome Centre for Molecular Parasitology ( Core Support )Andrew WatersWellcome Trust (WELLCOME)104111/Z/14/Z &III - PARASITOLOGY
680241Unifying metabolome and proteome informaticsSimon RogersBiotechnology and Biological Sciences Research Council (BBSRC)BB/L018616/1COM - COMPUTING SCIENCE