Enlighten Publications

In this section

The Istella22 Dataset: Bridging Traditional and Neural Learning to Rank Evaluation

Dato, D., MacAvaney, S. , Nardini, F. M., Perego, R. and Tonellotto, N. (2022) The Istella22 Dataset: Bridging Traditional and Neural Learning to Rank Evaluation. In: SIGIR 2022: 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11-15 Jul 2022, pp. 3099-3107. ISBN 9781450387323 (doi: 10.1145/3477495.3531740)

Text
268514.pdf - Accepted Version
986kB

Abstract

Neural approaches that use pre-trained language models are effective at various ranking tasks, such as question answering and ad-hoc document ranking. However, their effectiveness compared to feature-based Learning-to-Rank (LtR) methods has not yet been well-established. A major reason for this is because present LtR benchmarks that contain query-document feature vectors do not contain the raw query and document text needed for neural models. On the other hand, the benchmarks often used for evaluating neural models, e.g., MS MARCO, TREC Robust, etc., provide text but do not provide query-document feature vectors. In this paper, we present Istella22, a new dataset that enables such comparisons by providing both query/document text and strong query-document feature vectors used by an industrial search engine. The dataset consists of a comprehensive corpus of 8.4M web documents, a collection of query-document pairs including 220 hand-crafted features, relevance judgments on a 5-graded scale, and a set of 2,198 textual queries used for testing purposes. Istella22 enables a fair evaluation of traditional learning-to-rank and transfer ranking techniques on the same data. LtR models exploit the feature-based representations of training samples while pre-trained transformer-based neural rankers can be evaluated on the corresponding textual content of queries and documents. Through preliminary experiments on Istella22, we find that neural re-ranking approaches lag behind LtR models in terms of effectiveness. However, LtR models identify the scores from neural models as strong signals.

Item Type:	Conference Proceedings
Status:	Published
Refereed:	Yes
Glasgow Author(s) Enlighten ID:	MacAvaney, Dr Sean and Tonellotto, Dr Nicola
Authors:	Dato, D., MacAvaney, S., Nardini, F. M., Perego, R., and Tonellotto, N.
College/School:	College of Science and Engineering > School of Computing Science
ISBN:	9781450387323
Copyright Holders:	Copyright © 2022 Association for Computing Machinery
First Published:	First published in SIGIR 2022: 45th International ACM SIGIR Conference on Research and Development in Information Retrieval: 3099-3107
Publisher Policy:	Reproduced in accordance with the publisher copyright policy
Related URLs:	Organisation

University Staff: Request a correction | Enlighten Editors: Update this record

Deposit and Record Details

ID Code:	268514
Depositing User:	Ms Rachael Briggs
Datestamp:	04 Apr 2022 11:14
Last Modified:	11 Jul 2022 14:21
Date of acceptance:	2 April 2022
Date of first online publication:	7 July 2022
Date Deposited:	10 May 2022
Data Availability Statement:	No