Spec2Vec: improved mass spectral similarity scoring through learning of structural relationships

Jensen, L. J., Huber, F., Ridder, L., Verhoeven, S., Spaaks, J. H., Diblen, F., Rogers, S. and van der Hooft, J. J. J. (2021) Spec2Vec: improved mass spectral similarity scoring through learning of structural relationships. PLoS Computational Biology, 17(2), e1008724. (doi: 10.1371/journal.pcbi.1008724) (PMID:33591968) (PMCID:PMC7909622)

[img] Text
234462.pdf - Published Version
Available under License Creative Commons Attribution.

3MB

Abstract

Spectral similarity is used as a proxy for structural similarity in many tandem mass spectrometry (MS/MS) based metabolomics analyses such as library matching and molecular networking. Although weaknesses in the relationship between spectral similarity scores and the true structural similarities have been described, little development of alternative scores has been undertaken. Here, we introduce Spec2Vec, a novel spectral similarity score inspired by a natural language processing algorithm—Word2Vec. Spec2Vec learns fragmental relationships within a large set of spectral data to derive abstract spectral embeddings that can be used to assess spectral similarities. Using data derived from GNPS MS/MS libraries including spectra for nearly 13,000 unique molecules, we show how Spec2Vec scores correlate better with structural similarity than cosine-based scores. We demonstrate the advantages of Spec2Vec in library matching and molecular networking. Spec2Vec is computationally more scalable allowing structural analogue searches in large databases within seconds.

Item Type:Articles
Additional Information:Funding: J.J.J.v.d.H. acknowledges funding from an ASDI eScience grant, ASDI.2017.030, from the Netherlands eScience Center—NLeSC, www.esciencecenter.nl.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Rogers, Dr Simon
Creator Roles:
Rogers, S.Methodology, Software, Validation, Writing – original draft, Writing – review and editing
Authors: Jensen, L. J., Huber, F., Ridder, L., Verhoeven, S., Spaaks, J. H., Diblen, F., Rogers, S., and van der Hooft, J. J. J.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:PLoS Computational Biology
Publisher:Public Library of Science
ISSN:1553-734X
ISSN (Online):1553-7358
Published Online:16 February 2021
Copyright Holders:Copyright © 2021 Huber et al.
First Published:First published in PLoS Computational Biology 17(2):e1008724
Publisher Policy:Reproduced under a Creative Commons license

University Staff: Request a correction | Enlighten Editors: Update this record