CODEC: Complex Document and Entity Collection

Mackie, I., Owoicho, P., Gemmell, C., Fischer, S. , MacAvaney, S. and Dalton, J. (2022) CODEC: Complex Document and Entity Collection. In: SIGIR 2022: 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11-15 Jul 2022, pp. 3067-3077. ISBN 9781450387323 (doi: 10.1145/3477495.3531712)

[img] Text
269440.pdf - Accepted Version



CODEC is a document and entity ranking benchmark that focuses on complex research topics. We target essay-style information needs of social science researchers, i.e. "How has the UK's Open Banking Regulation benefited Challenger Banks". CODEC includes 42 topics developed by researchers and a new focused web corpus with semantic annotations including entity links. This resource includes expert judgments on 17,509 documents and entities (416.9 per topic) from diverse automatic and interactive manual runs. The manual runs include 387 query reformulations, providing data for query performance prediction and automatic rewriting evaluation. CODEC includes analysis of state-of-the-art systems, including dense retrieval and neural re-ranking. The results show the topics are challenging with headroom for document and entity ranking improvement. Query expansion with entity information shows significant gains on document ranking, demonstrating the resource's value for evaluating and improving entity-oriented search. We also show that the manual query reformulations significantly improve document ranking and entity ranking performance. Overall, CODEC provides challenging research topics to support the development and evaluation of entity-centric search methods.

Item Type:Conference Proceedings
Glasgow Author(s) Enlighten ID:Mackie, Iain and Dalton, Dr Jeff and Owoicho, Paul Ogbonoko and Gemmell, Carlos and Fischer, Ms Sophie and MacAvaney, Dr Sean
Authors: Mackie, I., Owoicho, P., Gemmell, C., Fischer, S., MacAvaney, S., and Dalton, J.
College/School:College of Science and Engineering > School of Computing Science
Copyright Holders:Copyright © 2022 The Authors
First Published:First published in SIGIR 2022: 45th International ACM SIGIR Conference on Research and Development in Information Retrieval: 3067-3077
Publisher Policy:Reproduced in accordance with the publisher copyright policy
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
310549Dalton-UKRI-Turing FellowJeff DaltonEngineering and Physical Sciences Research Council (EPSRC)EP/V025708/1Computing Science