He, B., Macdonald, C. and Ounis, I. (2008) Retrieval sensitivity under training using different measures. In: 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, 20-24 July 2008, p. 67. (doi: 10.1145/1390334.1390348)
Full text not currently available from Enlighten.
Publisher's URL: http://dx.doi.org/10.1145/1390334.1390348
Abstract
Various measures, such as binary preference (bpref), inferred average precision (infAP), and binary normalised discounted cumulative gain (nDCG) have been proposed as alternatives to mean average precision (MAP) for being less sensitive to the relevance judgements completeness. As the primary aim of any system building is to train the system to respond to user queries in a more robust and stable manner, in this paper, we investigate the importance of the choice of the evaluation measure for training, under different levels of evaluation incompleteness. We simulate evaluation incompleteness by sampling from the relevance assessments. Through large-scale experiments on two standard TREC test collections, we examine retrieval sensitivity when training - i.e. if a training process, based on any of the four discussed measures has an impact on the final retrieval performance. Experimental results show that training by bpref, infAP and nDCG provides significantly better retrieval performance than training by MAP when relevance judgements completeness is extremely low. When relevance judgements completeness increases, the measures behave more similarly.
Item Type: | Conference Proceedings |
---|---|
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | He, Mr Ben and Macdonald, Professor Craig and Ounis, Professor Iadh |
Authors: | He, B., Macdonald, C., and Ounis, I. |
College/School: | College of Science and Engineering > School of Computing Science |
University Staff: Request a correction | Enlighten Editors: Update this record