Hypothesis testing for the risk-sensitive evaluation of retrieval systems

Dinçer, B. T., Macdonald, C. and Ounis, I. (2014) Hypothesis testing for the risk-sensitive evaluation of retrieval systems. In: 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, Gold Coast, Qld, Australia, 6-11 Jul 2014, pp. 23-32. (doi:10.1145/2600428.2609625)

Full text not currently available from Enlighten.

Publisher's URL: http://dx.doi.org/10.1145/2600428.2609625

Abstract

The aim of risk-sensitive evaluation is to measure when a given information retrieval (IR) system does not perform worse than a corresponding baseline system for any topic. This paper argues that risk-sensitive evaluation is akin to the underlying methodology of the Student's t test for matched pairs. Hence, we introduce a risk-reward tradeoff measure TRisk that generalises the existing URisk measure (as used in the TREC 2013 Web track's risk-sensitive task) while being theoretically grounded in statistical hypothesis testing and easily interpretable. In particular, we show that TRisk is a linear transformation of the t statistic, which is the test statistic used in the Student's t test. This inherent relationship between TRisk and the t statistic, turns risk-sensitive evaluation from a descriptive analysis to a fully-fledged inferential analysis. Specifically, we demonstrate using past TREC data, that by using the inferential analysis techniques introduced in this paper, we can (1) decide whether an observed level of risk for an IR system is statistically significant, and thereby infer whether the system exhibits a real risk, and (2) determine the topics that individually lead to a significant level of risk. Indeed, we show that the latter permits a state-of-the-art learning to rank algorithm (LambdaMART) to focus on those topics in order to learn effective yet risk-averse ranking systems.

Item Type:Conference Proceedings
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Ounis, Professor Iadh and Macdonald, Dr Craig
Authors: Dinçer, B. T., Macdonald, C., and Ounis, I.
College/School:College of Science and Engineering > School of Computing Science

University Staff: Request a correction | Enlighten Editors: Update this record