Retrieval sensitivity under training using different measures

He, B., Macdonald, C. and Ounis, I. (2008) Retrieval sensitivity under training using different measures. In: 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, 20-24 July 2008, p. 67. (doi: 10.1145/1390334.1390348)

Full text not currently available from Enlighten.

Publisher's URL: http://dx.doi.org/10.1145/1390334.1390348

Abstract

Various measures, such as binary preference (bpref), inferred average precision (infAP), and binary normalised discounted cumulative gain (nDCG) have been proposed as alternatives to mean average precision (MAP) for being less sensitive to the relevance judgements completeness. As the primary aim of any system building is to train the system to respond to user queries in a more robust and stable manner, in this paper, we investigate the importance of the choice of the evaluation measure for training, under different levels of evaluation incompleteness. We simulate evaluation incompleteness by sampling from the relevance assessments. Through large-scale experiments on two standard TREC test collections, we examine retrieval sensitivity when training - i.e. if a training process, based on any of the four discussed measures has an impact on the final retrieval performance. Experimental results show that training by bpref, infAP and nDCG provides significantly better retrieval performance than training by MAP when relevance judgements completeness is extremely low. When relevance judgements completeness increases, the measures behave more similarly.

Item Type:Conference Proceedings
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:He, Mr Ben and Macdonald, Professor Craig and Ounis, Professor Iadh
Authors: He, B., Macdonald, C., and Ounis, I.
College/School:College of Science and Engineering > School of Computing Science

University Staff: Request a correction | Enlighten Editors: Update this record