Setting per-field normalisation hyper-parameters for the named-page finding search task

He, B. and Ounis, I. (2007) Setting per-field normalisation hyper-parameters for the named-page finding search task. Lecture Notes in Computer Science, 4425, pp. 468-480. (doi:10.1007/978-3-540-71496-5_42)

[img]
Preview
Text
he3762.pdf

511kB

Publisher's URL: http://dx.doi.org/10.1007/978-3-540-71496-5_42

Abstract

Per-field normalisation has been shown to be effective for Web search tasks, e.g. named-page finding. However, per-field normalisation also suffers from having hyper-parameters to tune on a per-field basis. In this paper, we argue that the purpose of per-field normalisation is to adjust the linear relationship between field length and term frequency. We experiment with standard Web test collections, using three document fields, namely the body of the document, its title, and the anchor text of its incoming links. From our experiments, we find that across different collections, the linear correlation values, given by the optimised hyper-parameter settings, are proportional to the maximum negative linear correlation. Based on this observation, we devise an automatic method for setting the per-field normalisation hyper-parameter values without the use of relevance assessment for tuning. According to the evaluation results, this method is shown to be effective for the body and title fields. In addition, the difficulty in setting the per-field normalisation hyper-parameter for the anchor text field is explained.

Item Type:Articles
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:He, Mr Ben and Ounis, Professor Iadh
Authors: He, B., and Ounis, I.
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Lecture Notes in Computer Science
Publisher:Springer
ISSN:1611-3349
Copyright Holders:Copyright © 2007 Springer
First Published:First published in Lecture Notes in Computer Science 4425:468-480
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher.

University Staff: Request a correction | Enlighten Editors: Update this record