The Information Retrieval Experiment Platform

Fröbe, M., Reimer, J. H., MacAvaney, S. , Deckers, N., Reich, S., Bevendorff, J., Stein, B., Hagen, M. and Potthast, M. (2023) The Information Retrieval Experiment Platform. In: 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR23), Taipei, Taiwan, 23-27 July 2023, pp. 2826-2836. ISBN 9781450394086 (doi: 10.1145/3539618.3591888)

[img] Text
296336.pdf - Accepted Version

876kB

Abstract

We integrate irdatasets, ir_measures, and PyTerrier with TIRA in the Information Retrieval Experiment Platform (TIREx) to promote more standardized, reproducible, scalable, and even blinded retrieval experiments. Standardization is achieved when a retrieval approach implements PyTerrier's interfaces and the input and output of an experiment are compatible with ir_datasets and ir_measures. However, none of this is a must for reproducibility and scalability, as TIRA can run any dockerized software locally or remotely in a cloud-native execution environment. Version control and caching ensure efficient (re)execution. TIRA allows for blind evaluation when an experiment runs on a remote server or cloud not under the control of the experimenter. The test data and ground truth are then hidden from public access, and the retrieval software has to process them in a sandbox that prevents data leaks. We currently host an instance of TIREx with 15 corpora (1.9~billion documents) on which 32 shared retrieval tasks are based. Using Docker images of 50~standard retrieval approaches, we automatically evaluated all approaches on all tasks (50 ⋅ 32 = 1,600 runs) in less than a week on a midsize cluster (1,620 cores and 24 GPUs). This instance of TIREx is open for submissions and will be integrated with the IR Anthology, as well as released open source.

Item Type:Conference Proceedings
Additional Information:This work has been partially supported by the OpenWebSearch.eu project (funded by the EU; GA 101070014).
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:MacAvaney, Dr Sean
Authors: Fröbe, M., Reimer, J. H., MacAvaney, S., Deckers, N., Reich, S., Bevendorff, J., Stein, B., Hagen, M., and Potthast, M.
College/School:College of Science and Engineering > School of Computing Science
Research Centre:College of Science and Engineering > School of Computing Science > IDA Section > GPU Cluster
ISBN:9781450394086
Copyright Holders:Copyright: © 2023 ACM
First Published:First published in SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval: 2826–2836
Publisher Policy:Reproduced in accordance with the publisher copyright policy
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record