Supervised approaches for explicit search result diversification

Yigit-Sert, S. , Altingovde, I. , Macdonald, C. , Ounis, I. and Ulusoy, Ö. (2020) Supervised approaches for explicit search result diversification. Information Processing and Management, 57(6), 102356. (doi: 10.1016/j.ipm.2020.102356)

[img] Text
219943.pdf - Accepted Version
Restricted to Repository staff only until 30 July 2021.
Available under License Creative Commons Attribution Non-commercial No Derivatives.

1MB

Abstract

Diversification of web search results aims to promote documents with diverse content (i.e., covering different aspects of a query) to the top-ranked positions, to satisfy more users, enhance fairness and reduce bias. In this work, we focus on the explicit diversification methods, which assume that the query aspects are known at the diversification time, and leverage supervised learning methods to improve their performance in three different frameworks with different features and goals. First, in the LTRDiv framework, we focus on applying typical learning to rank (LTR) algorithms to obtain a ranking where each top-ranked document covers as many aspects as possible. We argue that such rankings optimize various diversification metrics (under certain assumptions), and hence, are likely to achieve diversity in practice. Second, in the AspectRanker framework, we apply LTR for ranking the aspects of a query with the goal of more accurately setting the aspect importance values for diversification. As features, we exploit several pre- and post-retrieval query performance predictors (QPPs) to estimate how well a given aspect is covered among the candidate documents. Finally, in the LmDiv framework, we cast the diversification problem into an alternative fusion task, namely, the supervised merging of rankings per query aspect. We again use QPPs computed over the candidate set for each aspect, and optimize an objective function that is tailored for the diversification goal. We conduct thorough comparative experiments using both the basic systems (based on the well-known BM25 matching function) and the best-performing systems (with more sophisticated retrieval methods) from previous TREC campaigns. Our findings reveal that the proposed frameworks, especially AspectRanker and LmDiv, outperform both non-diversified rankings and two strong diversification baselines (i.e., xQuAD and its variant) in terms of various effectiveness metrics.

Item Type:Articles
Additional Information:This work is partially funded by the Royal Society Newton Int.’l Exchanges Scheme (no. NI140231) and The Scientific and Technological Research Council of Turkey (TÜBİTAK) under grant no. 117E861. S. Yigit-Sert is supported by the TÜBİTAK-BİDEB 2211/A program. I. S. Altingovde is partially supported by the Turkish Academy of Sciences Distinguished Young Scientist Award (TÜBA-GEBIP 2016).
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Ounis, Professor Iadh and Macdonald, Dr Craig
Creator Roles:
Macdonald, C.Conceptualization, Methodology, Software, Writing – review and editing
Ounis, I.Conceptualization, Methodology, Supervision, Writing – review and editing
Authors: Yigit-Sert, S., Altingovde, I., Macdonald, C., Ounis, I., and Ulusoy, Ö.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Information Processing and Management
Publisher:Elsevier
ISSN:0306-4573
ISSN (Online):1873-5371
Published Online:30 July 2020
Copyright Holders:Copyright © 2020 Elsevier Ltd.
First Published:First published in Information Processing and Management 57(6): 102356
Publisher Policy:Reproduced in accordance with the publisher copyright policy

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
171404Newton International Fellowship ExchangeCraig MacDonaldThe Royal Society (ROYSOC)NI140231Computing Science