PyTerrier: Declarative Experimentation in Python from BM25 to Dense Retrieval

Macdonald, C. , Tonellotto, N. , MacAvaney, S. and Ounis, I. (2021) PyTerrier: Declarative Experimentation in Python from BM25 to Dense Retrieval. In: 30th ACM International Conference on Information and Knowledge Management, Virtual Event Queensland, Australia, 01-05 Nov 2021, pp. 4526-4533. ISBN 9781450384469 (doi: 10.1145/3459637.3482013)

[img] Text
249268.pdf - Accepted Version



PyTerrier is a Python-based retrieval framework for expressing simple and complex information retrieval (IR) pipelines in a declarative manner. While making use of the long-established Terrier IR platform for basic text indexing and retrieval, its salient utility comes from its expressive Python operators, which allow for individual IR operations to be pipelined and combined in different flexible manners as requested by the search application. Each operation applies a transformation upon a dataframe, while operators are defined with clear semantics in relational algebra. Going further, we have recently expanded the PyTerrier framework to include additional support for state-of-the-art BERT-based text re-rankers (such as EPIC) and dense retrieval implementations (such as ANCE and ColBERT). Transformer pipelines can be tuned and evaluated in a declarative manner. To increase the reusability of this framework as a resource for the IR community, PyTerrier provides easy access to a variety of standard benchmark datasets, including pre-built indices. Finally, we highlight the advantages of such a framework for information retrieval researchers and educators.

Item Type:Conference Proceedings
Glasgow Author(s) Enlighten ID:MacAvaney, Dr Sean and Macdonald, Professor Craig and Ounis, Professor Iadh and Tonellotto, Dr Nicola
Authors: Macdonald, C., Tonellotto, N., MacAvaney, S., and Ounis, I.
College/School:College of Science and Engineering > School of Computing Science
Copyright Holders:Copyright © 2021 Association for Computing Machinery
First Published:First published in CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
Publisher Policy:Reproduced in accordance with the publisher copyright policy

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
300982Exploiting Closed-Loop Aspects in Computationally and Data Intensive AnalyticsRoderick Murray-SmithEngineering and Physical Sciences Research Council (EPSRC)EP/R018634/1Computing Science