Lin, J., Mackenzie, J., Kamphuis, C., Macdonald, C. , Mallia, A., Siedlaczek, M., Trotman, A. and de Vries, A. (2020) Supporting Interoperability Between Open-Source Search Engines with the Common Index File Format. In: 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020), Xi'an, China, 25-30 Jul 2020, pp. 2149-2152. ISBN 9781450380164 (doi: 10.1145/3397271.3401404)
|
Text
218387.pdf - Accepted Version 587kB |
Abstract
There exists a natural tension between encouraging a diverse ecosystem of open-source search engines and supporting fair, replicable comparisons across those systems. To balance these two goals, we examine two approaches to providing interoperability between the inverted indexes of several systems. The first takes advantage of internal abstractions around index structures and building wrappers that allow one system to directly read the indexes of another. The second involves sharing indexes across systems via a data exchange specification that we have developed, called the Common Index File Format (CIFF). We demonstrate the first approach with the Java systems Anserini and Terrier, and the second approach with Anserini, JASSv2, OldDog, PISA, and Terrier. Together, these systems provide a wide range of implementations and features, with different research goals. Overall, we recommend CIFF as a low-effort approach to support independent innovation while enabling the types of fair evaluations that are critical for driving the field forward.
Item Type: | Conference Proceedings |
---|---|
Additional Information: | This research was supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada, Compute Ontario and Compute Canada, the Australian Research Council (ARC) Discovery Grant DP170102231, the US National Science Foundation (IIS-1718680), and research program Commit2Data with project number 628.011.001 financed by the Dutch Research Council (NWO). |
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Macdonald, Professor Craig |
Authors: | Lin, J., Mackenzie, J., Kamphuis, C., Macdonald, C., Mallia, A., Siedlaczek, M., Trotman, A., and de Vries, A. |
College/School: | College of Science and Engineering > School of Computing Science |
ISBN: | 9781450380164 |
Published Online: | 25 July 2020 |
Copyright Holders: | Copyright © 2020 Association for Computing Machinery |
First Published: | First published in SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval: 2149-2152 |
Publisher Policy: | Reproduced in accordance with the publisher copyright policy |
Related URLs: |
University Staff: Request a correction | Enlighten Editors: Update this record