A Unified Framework for Learned Sparse Retrieval

Nguyen, T., MacAvaney, S. and Yates, A. (2023) A Unified Framework for Learned Sparse Retrieval. In: 45th European Conference on Information Retrieval (ECIR2023), Dublin, Ireland, 2-6 April 2023, pp. 101-116. ISBN 9783031282409 (doi: 10.1007/978-3-031-28241-6_7)

[img] Text
287838.pdf - Accepted Version
Restricted to Repository staff only until 16 March 2024.



Learned sparse retrieval (LSR) is a family of first-stage retrieval methods that are trained to generate sparse lexical representations of queries and documents for use with an inverted index. Many LSR methods have been recently introduced, with Splade models achieving state-of-the-art performance on MSMarco. Despite similarities in their model architectures, many LSR methods show substantial differences in effectiveness and efficiency. Differences in the experimental setups and configurations used make it difficult to compare the methods and derive insights. In this work, we analyze existing LSR methods and identify key components to establish an LSR framework that unifies all LSR methods under the same perspective. We then reproduce all prominent methods using a common codebase and re-train them in the same environment, which allows us to quantify how components of the framework affect effectiveness and efficiency. We find that (1) including document term weighting is most important for a method’s effectiveness, (2) including query weighting has a small positive impact, and (3) document expansion and query expansion have a cancellation effect. As a result, we show how removing query expansion from a state-of-the-art model can reduce latency significantly while maintaining effectiveness on MSMarco and TripClick benchmarks. Our code is publicly available (Code: https://github.com/thongnt99/learned-sparse-retrieval).

Item Type:Conference Proceedings
Glasgow Author(s) Enlighten ID:MacAvaney, Dr Sean
Authors: Nguyen, T., MacAvaney, S., and Yates, A.
College/School:College of Science and Engineering > School of Computing Science
Copyright Holders:Copyright © 2023 The Authors, under exclusive license to Springer Nature Switzerland AG
First Published:First published in Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13982. Springer, Cham., pp 101-116
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record