An evaluation of resource description quality measures

Baillie, M., Azzopardi, L. and Crestani, F. (2006) An evaluation of resource description quality measures. In: Symposium on Applied Computing, Dijon, France, 23-27 April, pp. 1110-1111. ISBN 1595931082 (doi:10.1145/1141277.1141538)



Publisher's URL:


An open problem for Distributed Information Retrieval is how to represent large document repositories (known as resources) efficiently. To facilitate resource selection, estimated descriptions of each resource are required, especially when faced with non-cooperative distributed environments. Accurate and efficient Resource description estimation is required as this can have an affect on resource selection, and as a consequence retrieval quality. Query-Based Sampling (QBS) has been proposed as a novel solution for resource estimation, with proceeding techniques developed therafter. However, the challenge to determine if one QBS technique is better at generating resource description than another is still an unresolved issue. The initial metrics tested and deployed for measuring resource description quality were the Collection Term Frequency ratio (CTF) and Spearman Rank Correlation Coefficient (SRCC). The former provides an indication of the percentage of terms seen, whilst the later measures the term ranking order, although neither consider the term frequency, which is important for resource selection. We re-examine this problem and consider measuring the quality of a resource description in context to resource selection, where an estimate of the probability of a term given the resource is typically required. We believe a natural measure for comparing the estimated resource against the actual resource is the Kullback-Leibler Divergence (KL) measure. KL addresses the concerns put forward previously, by not over-representing low frequency terms, and also considering term order. In this paper, we re-assess the two previous measures alongside KL. Our preliminary investigation revealed that the former metrics display contradictory results. Whilst, KL suggested a different QBS technique than that prescribed in, would provide better estimates. This is a significant result, because it now remains unclear as to which technique will consistently provide better resource descriptions. The remainder of this paper details the three measures, the experimental analysis of our preliminary study and outlines our points of concern along with further research directions.

Item Type:Conference Proceedings
Additional Information:© ACM, 2006. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published Proceedings of the 2006 ACM symposium on Applied computing
Glasgow Author(s) Enlighten ID:Azzopardi, Dr Leif
Authors: Baillie, M., Azzopardi, L., and Crestani, F.
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
College/School:College of Science and Engineering > School of Computing Science
Publisher:ACM Press
Copyright Holders:Copyright © 2006 ACM Press
First Published:First published in Proceedings of the 2006 ACM symposium on Applied computing
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher including additional information statement.

University Staff: Request a correction | Enlighten Editors: Update this record