Generating Multimodal Augmentations with LLMs from Song Metadata for Music Information Retrieval

Rossetto, F., Dalton, J. and Murray-Smith, R. (2023) Generating Multimodal Augmentations with LLMs from Song Metadata for Music Information Retrieval. In: 1st Workshop on Large Generative Models Meet Multimodal Application (LGM3A), Ottawa, Canada, 2 November 2023, pp. 51-59. ISBN 9798400702839 (doi: 10.1145/3607827.3616842)

Full text not currently available from Enlighten.

Abstract

In this work we propose a set of new automatic text augmentations that leverage Large Language Models from song metadata to improve on music information retrieval tasks. Compared to recent works, our proposed methods leverage large language models and copyright-free corpora from web sources, enabling us to release the knowledge sources collected. We show how combining these representations with the audio signal provides a 21% relative improvement on five of six datasets on genre classification, emotion recognition and music tagging, achieving state-of-the-art in three (GTZAN, FMA-Small and Deezer). We demonstrate the benefit of injecting external knowledge sources by comparing them withintrinsic text representation methods that rely only on the sample's information.

Item Type:Conference Proceedings
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Murray-Smith, Professor Roderick and Rossetto, Federico and Dalton, Dr Jeff
Authors: Rossetto, F., Dalton, J., and Murray-Smith, R.
College/School:College of Science and Engineering > School of Computing Science
ISBN:9798400702839
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
310549Dalton-UKRI-Turing FellowJeff DaltonEngineering and Physical Sciences Research Council (EPSRC)EP/V025708/1Computing Science