The whens and hows of learning to rank for web search

Macdonald, C. , Santos, R. L. T. and Ounis, I. (2013) The whens and hows of learning to rank for web search. Information Retrieval, 16(5), pp. 584-628. (doi:10.1007/s10791-012-9209-9)

Macdonald, C. , Santos, R. L. T. and Ounis, I. (2013) The whens and hows of learning to rank for web search. Information Retrieval, 16(5), pp. 584-628. (doi:10.1007/s10791-012-9209-9)

Full text not currently available from Enlighten.

Publisher's URL: http://dx.doi.org/10.1007/s10791-012-9209-9

Abstract

Web search engines are increasingly deploying many features, combined using learning to rank techniques. However, various practical questions remain concerning the manner in which learning to rank should be deployed. For instance, a sample of documents with sufficient recall is used, such that re-ranking of the sample by the learned model brings the relevant documents to the top. However, the properties of the document sample such as when to stop ranking—i.e. its minimum effective size—remain unstudied. Similarly, effective listwise learning to rank techniques minimise a loss function corresponding to a standard information retrieval evaluation measure. However, the appropriate choice of how to calculate the loss function—i.e. the choice of the learning evaluation measure and the rank depth at which this measure should be calculated—are as yet unclear. In this paper, we address all of these issues by formulating various hypotheses and research questions, before performing exhaustive experiments using multiple learning to rank techniques and different types of information needs on the ClueWeb09 and LETOR corpora. Among many conclusions, we find, for instance, that the smallest effective sample for a given query set is dependent on the type of information need of the queries, the document representation used during sampling and the test evaluation measure. As the sample size is varied, the selected features markedly change—for instance, we find that the link analysis features are favoured for smaller document samples. Moreover, despite reflecting a more realistic user model, the recently proposed ERR measure is not as effective as the traditional NDCG as a learning loss function. Overall, our comprehensive experiments provide the first empirical derivation of best practices for learning to rank deployments.

Item Type:Articles
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Ounis, Professor Iadh and Macdonald, Dr Craig
Authors: Macdonald, C., Santos, R. L. T., and Ounis, I.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Information Retrieval
Publisher:Springer Verlag
ISSN:1386-4564
ISSN (Online):1573-7659
Published Online:02 September 2012

University Staff: Request a correction | Enlighten Editors: Update this record