CEQE to SQET: a study of contextualized embeddings for query expansion

Naseri, S., Dalton, J. , Yates, A. and Allan, J. (2022) CEQE to SQET: a study of contextualized embeddings for query expansion. Information Retrieval, 25(2), pp. 184-208. (doi: 10.1007/s10791-022-09405-y)

Full text not currently available from Enlighten.

Abstract

In this work, we study recent advances in context-sensitive language models for the task of query expansion. We study the behavior of existing and new approaches for lexical word-based expansion in both unsupervised and supervised contexts. For unsupervised models, we study the behavior of the Contextualized Embeddings for Query Expansion (CEQE) model. We introduce a new model, Supervised Contextualized Query Expansion with Transformers (SQET) that performs expansion as a supervised classification task and leverages context in pseudo-relevant results. We study the behavior of these expansion approaches for the tasks of ad-hoc document and passage retrieval. We conduct experiments combining expansion with probabilistic retrieval models as well as neural document ranking models. We evaluate expansion effectiveness on three standard TREC collections: Robust, Complex Answer Retrieval, and Deep Learning. We analyze the results of extrinsic retrieval effectiveness, intrinsic ability to rank expansion terms, and perform a qualitative analysis of the differences between the methods. We find out CEQE statically significantly outperforms static embeddings across all three datasets for Recall@1000. Moreover, CEQE outperforms static embedding-based expansion methods on multiple collections (by up to 18% on Robust and 31% on Deep Learning on average precision) and also improves over proven probabilistic pseudo-relevance feedback (PRF) models. SQET outperforms CEQE by 6% in P@20 on the intrinsic term ranking evaluation and is approximately as effective in retrieval performance. Models incorporating neural and CEQE-based expansion score achieves gains of up to 5% in P@20 and 2% in AP on Robust over the state-of-the-art transformer-based re-ranking model, Birch.

Item Type:Articles
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Dalton, Dr Jeff and Naseri, Shahrzad
Authors: Naseri, S., Dalton, J., Yates, A., and Allan, J.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Information Retrieval
Publisher:Springer
ISSN:1386-4564
ISSN (Online):1573-7659
Published Online:22 March 2022

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
310549Dalton-UKRI-Turing FellowJeff DaltonEngineering and Physical Sciences Research Council (EPSRC)EP/V025708/1Computing Science