Fairly retrieving documents of all lengths

Azzopardi, L. and Losada, D. (2007) Fairly retrieving documents of all lengths. In: 1st International Conference on the Theory of Information Retrieval, Budapest, Hungary, 18-20 Oct 2007, pp. 65-76.

Full text not currently available from Enlighten.

Abstract

Normalizing document length is widely recognized as an important factor for adjusting retrieval systems. Previous studies have shown that tuning the retrieval model so that the lengths of retrieved documents are similar to the lengths of relevant documents will result in substantially better performance. However, the goal of Document Length Normalization is to “fairly” retrieve documents of all lengths. In this paper, we consider this proposition against the previous findings in the context of the Language Modeling approach for ad hoc information retrieval, and study the impact of the smoothing method and parameter setting on the length of documents retrieved. Our study confirms that tuning the system to fairly retrieve documents results in mediocre performance, whereas tuning to favor relevant (longer) documents delivers superior performance. While this re-confirms previous findings, we discover that this discrepancy appears to stem from the fact that relevant documents are drawn from a biased sample, the set of assessed documents which are substantially longer than documents in the collection.

Item Type:Conference Proceedings
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Azzopardi, Dr Leif
Authors: Azzopardi, L., and Losada, D.
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
College/School:College of Science and Engineering > School of Computing Science

University Staff: Request a correction | Enlighten Editors: Update this record