Improving zero-shot retrieval using dense external expansion

Wang, X., Macdonald, C. and Ounis, I. (2022) Improving zero-shot retrieval using dense external expansion. Information Processing and Management, 59(5), 103026. (doi: 10.1016/j.ipm.2022.103026)

[img] Text
274807.pdf - Published Version
Available under License Creative Commons Attribution.



Pseudo-relevance feedback (PRF) is a classical technique to improve search engine retrieval effectiveness, by closing the vocabulary gap between users’ query formulations and the relevant documents. While PRF is typically applied on the same target corpus as the final retrieval, in the past, external expansion techniques have sometimes been applied to obtain a high-quality pseudo-relevant feedback set using the external corpus. However, such external expansion approaches have only been studied for sparse (BoW) retrieval methods, and its effectiveness for recent dense retrieval methods remains under-investigated. Indeed, dense retrieval approaches such as ANCE and ColBERT, which conduct similarity search based on encoded contextualised query and document embeddings, are of increasing importance. Moreover, pseudo-relevance feedback mechanisms have been proposed to further enhance dense retrieval effectiveness. In particular, in this work, we examine the application of dense external expansion to improve zero-shot retrieval effectiveness, i.e. evaluation on corpora without further training. Zero-shot retrieval experiments with six datasets, including two TREC datasets and four BEIR datasets, when applying the MSMARCO passage collection as external corpus, indicate that obtaining external feedback documents using ColBERT can significantly improve NDCG@10 for the sparse retrieval (by upto 28%) and the dense retrieval (by upto 12%). In addition, using ANCE on the external corpus brings upto 30% NDCG@10 improvements for the sparse retrieval and upto 29% for the dense retrieval.

Item Type:Articles
Glasgow Author(s) Enlighten ID:Macdonald, Professor Craig and Ounis, Professor Iadh and Wang, Ms Xiao
Creator Roles:
Wang, X.Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing – original draft
Macdonald, C.Writing – review and editing, Project administration, Investigation, Validation, Supervision, Methodology, Software, Resources, Conceptualization
Ounis, I.Writing – review and editing, Project administration, Investigation, Supervision, Methodology, Conceptualization
Authors: Wang, X., Macdonald, C., and Ounis, I.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Information Processing and Management
ISSN (Online):1873-5371
Published Online:02 August 2022
Copyright Holders:Copyright © 2022 The Authors
First Published:First published in Information Processing and Management 59(5): 103026
Publisher Policy:Reproduced under a Creative Commons License

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
300982Exploiting Closed-Loop Aspects in Computationally and Data Intensive AnalyticsRoderick Murray-SmithEngineering and Physical Sciences Research Council (EPSRC)EP/R018634/1Computing Science