CAsT-19: A Dataset for Research on Conversational Information Seeking

Dalton, J. , Xiong, C., Kumar, V. and Callan, J. (2020) CAsT-19: A Dataset for Research on Conversational Information Seeking. In: 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020), Xi'an, China, 25-30 Jul 2020, pp. 1985-1988. ISBN 9781450380164 (doi:10.1145/3397271.3401206)

Full text not currently available from Enlighten.


CAsT-19 is a new dataset that supports research on conversational information seeking. The corpus is 38,426,252 passages from the TREC Complex Answer Retrieval (CAR) and Microsoft MAchine Reading COmprehension (MARCO) datasets. Eighty information seeking dialogues (30 train, 50 test) are an average of 9 to 10 questions long. A dialogue may explore a topic broadly or drill down into subtopics. Questions contain ellipsis, implied context, mild topic shifts, and other characteristics of human conversation that may prevent them from being understood in isolation. Relevance assessments are provided for 30 training topics and 20 test topics. CAsT-19 promotes research on conversational information seeking by defining it as a task in which effective passage selection requires understanding a question's context (the dialogue history). It focuses attention on user modeling, analysis of prior retrieval results, transformation of questions into effective queries, and other topics that have been difficult to study with existing datasets.

Item Type:Conference Proceedings
Glasgow Author(s) Enlighten ID:Dalton, Dr Jeff
Authors: Dalton, J., Xiong, C., Kumar, V., and Callan, J.
College/School:College of Science and Engineering > School of Computing Science
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record