Deciphering complex metabolite mixtures by unsupervised and supervised substructure discovery and semi-automated annotation from MS/MS spectra

Rogers, S. , Ong, C. W., Wandy, J. , Ernst, M., Ridder, L. and Van Der Hooft, J. J.J. (2019) Deciphering complex metabolite mixtures by unsupervised and supervised substructure discovery and semi-automated annotation from MS/MS spectra. Faraday Discussions, 218, pp. 284-302. (doi: 10.1039/C8FD00235E) (PMID:31120050)

[img]
Preview
Text
176662.pdf - Published Version
Available under License Creative Commons Attribution.

1MB

Abstract

Complex metabolite mixtures are challenging to unravel. Mass spectrometry (MS) is a widely used and sensitive technique to obtain structural information on complex mixtures. However, just knowing the molecular masses of the mixture’s constituents is almost always insufficient for confident assignment of the associated chemical structures. Structural information can be augmented through MS fragmentation experiments whereby detected metabolites are fragmented giving rise to MS/MS spectra. However, how can we maximize the structural information we gain from fragmentation spectra? We recently proposed a substructure-based strategy to enhance metabolite annotation for complex mixtures by considering metabolites as the sum of (bio)chemically relevant moieties that we can detect through mass spectrometry fragmentation approaches. Our MS2LDA tool allows us to discover - unsupervised - groups of mass fragments and/or neutral losses termed Mass2Motifs that often correspond to substructures. After manual annotation, these Mass2Motifs can be used in subsequent MS2LDA analyses of new datasets, thereby providing structural annotations for many molecules that are not present in spectral databases. Here, we describe how additional strategies, taking advantage of i) combinatorial in-silico matching of experimental mass features to substructures of candidate molecules, and ii) automated machine learning classification of molecules, can facilitate semi-automated annotation of substructures. We show how our approach accelerates the Mass2Motif annotation process and therefore broadens the chemical space spanned by characterized motifs. Our machine learning model used to classify fragmentation spectra learns the relationships between fragment spectra and chemical features. Classification prediction on these features can be aggregated for all molecules that contribute to a particular Mass2Motif and guide Mass2Motif annotations. To make annotated Mass2Motifs available to the community, we also present motifDB: an open database of Mass2Motifs that can be browsed and accessed programmatically through an Application Programming Interface (API). MotifDB is integrated within ms2lda.org, allowing users to efficiently search for characterized motifs in their own experiments. We expect that with an increasing number of Mass2Motif annotations available through a growing database we can more quickly gain insight in the constituents of complex mixtures. That will allow prioritization towards novel or unexpected chemistries and faster recognition of known biochemical building blocks.

Item Type:Articles
Additional Information:JJJvdH is supported by an ASDI eScience grant (ASDI.2017.030) from the Netherlands eScience Center (NLeSC). SR is supported by an BBSRC grant BB/R022054/1 and a Carnegie Trust for Scotland grant.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Wandy, Dr Joe and Van Der Hooft, Mr Justin and Rogers, Dr Simon
Authors: Rogers, S., Ong, C. W., Wandy, J., Ernst, M., Ridder, L., and Van Der Hooft, J. J.J.
College/School:College of Medical Veterinary and Life Sciences
College of Medical Veterinary and Life Sciences > School of Infection & Immunity
College of Science and Engineering > School of Computing Science
Journal Name:Faraday Discussions
Publisher:Royal Society of Chemistry
ISSN:1359-6640
ISSN (Online):1364-5498
Published Online:29 January 2019
Copyright Holders:Copyright © 2019 The Royal Society of Chemistry
First Published:First published in Faraday Discussions 218:284-302
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher

University Staff: Request a correction | Enlighten Editors: Update this record